Training: 2022-04-10 23:13:41,044-rank_id: 0 Training: 2022-04-10 23:14:09,028-: margin_list [1.0, 0.0, 0.4] Training: 2022-04-10 23:14:09,029-: network r50 Training: 2022-04-10 23:14:09,029-: resume False Training: 2022-04-10 23:14:09,029-: output work_dirs/glint360k_r50 Training: 2022-04-10 23:14:09,029-: embedding_size 512 Training: 2022-04-10 23:14:09,029-: sample_rate 1.0 Training: 2022-04-10 23:14:09,029-: interclass_filtering_threshold0 Training: 2022-04-10 23:14:09,029-: fp16 True Training: 2022-04-10 23:14:09,029-: batch_size 128 Training: 2022-04-10 23:14:09,029-: optimizer sgd Training: 2022-04-10 23:14:09,029-: lr 0.1 Training: 2022-04-10 23:14:09,029-: momentum 0.9 Training: 2022-04-10 23:14:09,029-: weight_decay 0.0001 Training: 2022-04-10 23:14:09,029-: verbose 2000 Training: 2022-04-10 23:14:09,029-: frequent 10 Training: 2022-04-10 23:14:09,029-: dali False Training: 2022-04-10 23:14:09,029-: rec /train_tmp/glint360k Training: 2022-04-10 23:14:09,029-: num_classes 360232 Training: 2022-04-10 23:14:09,029-: num_image 17091657 Training: 2022-04-10 23:14:09,029-: num_epoch 20 Training: 2022-04-10 23:14:09,030-: warmup_epoch 0 Training: 2022-04-10 23:14:09,030-: val_targets ['lfw', 'cfp_fp', 'agedb_30'] Training: 2022-04-10 23:14:09,030-: total_batch_size 1024 Training: 2022-04-10 23:14:09,030-: warmup_step 0 Training: 2022-04-10 23:14:09,030-: total_step 333820 Training: 2022-04-10 23:15:32,724-Reducer buckets have been rebuilt in this iteration. Training: 2022-04-10 23:15:36,349-Speed 5112.66 samples/sec Loss 42.4827 LearningRate 0.1000 Epoch: 0 Global Step: 20 Fp16 Grad Scale: 16384 Required: 132 hours Training: 2022-04-10 23:15:38,338-Speed 5150.16 samples/sec Loss 43.2535 LearningRate 0.1000 Epoch: 0 Global Step: 30 Fp16 Grad Scale: 16384 Required: 95 hours Training: 2022-04-10 23:15:40,330-Speed 5142.42 samples/sec Loss 43.2435 LearningRate 0.1000 Epoch: 0 Global Step: 40 Fp16 Grad Scale: 16384 Required: 77 hours Training: 2022-04-10 23:15:42,309-Speed 5177.45 samples/sec Loss 43.6868 LearningRate 0.1000 Epoch: 0 Global Step: 50 Fp16 Grad Scale: 16384 Required: 65 hours Training: 2022-04-10 23:15:44,290-Speed 5171.92 samples/sec Loss 43.2114 LearningRate 0.1000 Epoch: 0 Global Step: 60 Fp16 Grad Scale: 16384 Required: 57 hours Training: 2022-04-10 23:15:46,242-Speed 5245.15 samples/sec Loss 43.2619 LearningRate 0.1000 Epoch: 0 Global Step: 70 Fp16 Grad Scale: 16384 Required: 52 hours Training: 2022-04-10 23:15:48,202-Speed 5225.78 samples/sec Loss 42.9441 LearningRate 0.1000 Epoch: 0 Global Step: 80 Fp16 Grad Scale: 16384 Required: 48 hours Training: 2022-04-10 23:15:50,173-Speed 5197.63 samples/sec Loss 42.7735 LearningRate 0.0999 Epoch: 0 Global Step: 90 Fp16 Grad Scale: 16384 Required: 45 hours Training: 2022-04-10 23:15:52,131-Speed 5231.88 samples/sec Loss 42.7227 LearningRate 0.0999 Epoch: 0 Global Step: 100 Fp16 Grad Scale: 16384 Required: 42 hours Training: 2022-04-10 23:15:54,090-Speed 5231.75 samples/sec Loss 42.6271 LearningRate 0.0999 Epoch: 0 Global Step: 110 Fp16 Grad Scale: 32768 Required: 40 hours Training: 2022-04-10 23:15:56,051-Speed 5222.87 samples/sec Loss 42.7096 LearningRate 0.0999 Epoch: 0 Global Step: 120 Fp16 Grad Scale: 32768 Required: 38 hours Training: 2022-04-10 23:15:58,010-Speed 5230.02 samples/sec Loss 42.4253 LearningRate 0.0999 Epoch: 0 Global Step: 130 Fp16 Grad Scale: 32768 Required: 36 hours Training: 2022-04-10 23:15:59,971-Speed 5222.83 samples/sec Loss 42.1690 LearningRate 0.0999 Epoch: 0 Global Step: 140 Fp16 Grad Scale: 32768 Required: 35 hours Training: 2022-04-10 23:16:01,931-Speed 5226.77 samples/sec Loss 42.0998 LearningRate 0.0999 Epoch: 0 Global Step: 150 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-04-10 23:16:03,886-Speed 5238.11 samples/sec Loss 42.0148 LearningRate 0.0999 Epoch: 0 Global Step: 160 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-04-10 23:16:05,976-Speed 4902.83 samples/sec Loss 41.8704 LearningRate 0.0999 Epoch: 0 Global Step: 170 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-04-10 23:16:07,932-Speed 5235.70 samples/sec Loss 41.7840 LearningRate 0.0999 Epoch: 0 Global Step: 180 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-04-10 23:16:09,892-Speed 5226.64 samples/sec Loss 41.6874 LearningRate 0.0999 Epoch: 0 Global Step: 190 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-04-10 23:16:11,884-Speed 5143.83 samples/sec Loss 41.4887 LearningRate 0.0999 Epoch: 0 Global Step: 200 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-04-10 23:16:13,864-Speed 5172.96 samples/sec Loss 41.2986 LearningRate 0.0999 Epoch: 0 Global Step: 210 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-04-10 23:16:15,824-Speed 5225.04 samples/sec Loss 41.2603 LearningRate 0.0999 Epoch: 0 Global Step: 220 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-04-10 23:16:17,786-Speed 5222.26 samples/sec Loss 41.1316 LearningRate 0.0999 Epoch: 0 Global Step: 230 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-04-10 23:16:19,756-Speed 5198.00 samples/sec Loss 40.8875 LearningRate 0.0999 Epoch: 0 Global Step: 240 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-04-10 23:16:21,734-Speed 5180.25 samples/sec Loss 40.9604 LearningRate 0.0999 Epoch: 0 Global Step: 250 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-04-10 23:16:23,709-Speed 5184.13 samples/sec Loss 40.9148 LearningRate 0.0998 Epoch: 0 Global Step: 260 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-04-10 23:16:25,676-Speed 5208.14 samples/sec Loss 40.6732 LearningRate 0.0998 Epoch: 0 Global Step: 270 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-04-10 23:16:27,634-Speed 5231.81 samples/sec Loss 40.5303 LearningRate 0.0998 Epoch: 0 Global Step: 280 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-04-10 23:16:29,598-Speed 5218.10 samples/sec Loss 40.3651 LearningRate 0.0998 Epoch: 0 Global Step: 290 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-04-10 23:16:31,555-Speed 5231.88 samples/sec Loss 40.3107 LearningRate 0.0998 Epoch: 0 Global Step: 300 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-04-10 23:16:33,517-Speed 5220.73 samples/sec Loss 40.2087 LearningRate 0.0998 Epoch: 0 Global Step: 310 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-04-10 23:16:35,478-Speed 5223.34 samples/sec Loss 40.0549 LearningRate 0.0998 Epoch: 0 Global Step: 320 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-04-10 23:16:37,462-Speed 5163.63 samples/sec Loss 39.8826 LearningRate 0.0998 Epoch: 0 Global Step: 330 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-10 23:16:39,441-Speed 5176.14 samples/sec Loss 39.7984 LearningRate 0.0998 Epoch: 0 Global Step: 340 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-10 23:16:41,412-Speed 5198.52 samples/sec Loss 39.7149 LearningRate 0.0998 Epoch: 0 Global Step: 350 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-10 23:16:43,374-Speed 5220.03 samples/sec Loss 39.5309 LearningRate 0.0998 Epoch: 0 Global Step: 360 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-10 23:16:45,339-Speed 5214.36 samples/sec Loss 39.5198 LearningRate 0.0998 Epoch: 0 Global Step: 370 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-10 23:16:47,318-Speed 5175.91 samples/sec Loss 39.2914 LearningRate 0.0998 Epoch: 0 Global Step: 380 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-10 23:16:49,285-Speed 5205.81 samples/sec Loss 39.2925 LearningRate 0.0998 Epoch: 0 Global Step: 390 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-10 23:16:51,257-Speed 5194.27 samples/sec Loss 39.1216 LearningRate 0.0998 Epoch: 0 Global Step: 400 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-10 23:16:53,216-Speed 5229.09 samples/sec Loss 39.0448 LearningRate 0.0998 Epoch: 0 Global Step: 410 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-10 23:16:55,176-Speed 5225.43 samples/sec Loss 38.8242 LearningRate 0.0997 Epoch: 0 Global Step: 420 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-10 23:16:57,132-Speed 5237.45 samples/sec Loss 38.8880 LearningRate 0.0997 Epoch: 0 Global Step: 430 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-04-10 23:16:59,099-Speed 5208.04 samples/sec Loss 38.7533 LearningRate 0.0997 Epoch: 0 Global Step: 440 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-04-10 23:17:01,063-Speed 5217.37 samples/sec Loss 38.6109 LearningRate 0.0997 Epoch: 0 Global Step: 450 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-04-10 23:17:03,031-Speed 5202.93 samples/sec Loss 38.5082 LearningRate 0.0997 Epoch: 0 Global Step: 460 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-10 23:17:04,996-Speed 5212.79 samples/sec Loss 38.3378 LearningRate 0.0997 Epoch: 0 Global Step: 470 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-10 23:17:06,961-Speed 5213.05 samples/sec Loss 38.3194 LearningRate 0.0997 Epoch: 0 Global Step: 480 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-10 23:17:08,931-Speed 5202.45 samples/sec Loss 38.2442 LearningRate 0.0997 Epoch: 0 Global Step: 490 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-10 23:17:10,892-Speed 5223.43 samples/sec Loss 38.0496 LearningRate 0.0997 Epoch: 0 Global Step: 500 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-10 23:17:12,857-Speed 5213.10 samples/sec Loss 37.9200 LearningRate 0.0997 Epoch: 0 Global Step: 510 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-10 23:17:14,833-Speed 5183.34 samples/sec Loss 37.8118 LearningRate 0.0997 Epoch: 0 Global Step: 520 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-10 23:17:16,797-Speed 5214.27 samples/sec Loss 37.7435 LearningRate 0.0997 Epoch: 0 Global Step: 530 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:17:18,771-Speed 5189.03 samples/sec Loss 37.6042 LearningRate 0.0997 Epoch: 0 Global Step: 540 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:17:20,739-Speed 5204.98 samples/sec Loss 37.5399 LearningRate 0.0997 Epoch: 0 Global Step: 550 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:17:22,700-Speed 5225.00 samples/sec Loss 37.3407 LearningRate 0.0997 Epoch: 0 Global Step: 560 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:17:24,661-Speed 5221.54 samples/sec Loss 37.3164 LearningRate 0.0997 Epoch: 0 Global Step: 570 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:17:26,635-Speed 5189.84 samples/sec Loss 37.1853 LearningRate 0.0997 Epoch: 0 Global Step: 580 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:17:28,605-Speed 5200.72 samples/sec Loss 37.1955 LearningRate 0.0996 Epoch: 0 Global Step: 590 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:17:30,581-Speed 5197.23 samples/sec Loss 37.0661 LearningRate 0.0996 Epoch: 0 Global Step: 600 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:17:32,543-Speed 5220.37 samples/sec Loss 36.8319 LearningRate 0.0996 Epoch: 0 Global Step: 610 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:17:34,506-Speed 5217.63 samples/sec Loss 36.7516 LearningRate 0.0996 Epoch: 0 Global Step: 620 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:17:36,479-Speed 5191.60 samples/sec Loss 36.7559 LearningRate 0.0996 Epoch: 0 Global Step: 630 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-10 23:17:38,435-Speed 5237.12 samples/sec Loss 36.5742 LearningRate 0.0996 Epoch: 0 Global Step: 640 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:17:40,407-Speed 5194.34 samples/sec Loss 36.5151 LearningRate 0.0996 Epoch: 0 Global Step: 650 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:17:42,376-Speed 5202.68 samples/sec Loss 36.5342 LearningRate 0.0996 Epoch: 0 Global Step: 660 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:17:44,337-Speed 5222.11 samples/sec Loss 36.2996 LearningRate 0.0996 Epoch: 0 Global Step: 670 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:17:46,329-Speed 5142.49 samples/sec Loss 36.2128 LearningRate 0.0996 Epoch: 0 Global Step: 680 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:17:48,294-Speed 5213.10 samples/sec Loss 36.1964 LearningRate 0.0996 Epoch: 0 Global Step: 690 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:17:50,258-Speed 5216.99 samples/sec Loss 35.9569 LearningRate 0.0996 Epoch: 0 Global Step: 700 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:17:52,223-Speed 5214.32 samples/sec Loss 35.9454 LearningRate 0.0996 Epoch: 0 Global Step: 710 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:17:54,186-Speed 5217.66 samples/sec Loss 35.8650 LearningRate 0.0996 Epoch: 0 Global Step: 720 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:17:56,161-Speed 5184.53 samples/sec Loss 35.7035 LearningRate 0.0996 Epoch: 0 Global Step: 730 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:17:58,122-Speed 5223.46 samples/sec Loss 35.6771 LearningRate 0.0996 Epoch: 0 Global Step: 740 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:00,105-Speed 5167.72 samples/sec Loss 35.4996 LearningRate 0.0996 Epoch: 0 Global Step: 750 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:02,070-Speed 5212.30 samples/sec Loss 35.4283 LearningRate 0.0995 Epoch: 0 Global Step: 760 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:04,031-Speed 5223.59 samples/sec Loss 35.4173 LearningRate 0.0995 Epoch: 0 Global Step: 770 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:06,008-Speed 5179.55 samples/sec Loss 35.2664 LearningRate 0.0995 Epoch: 0 Global Step: 780 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:07,984-Speed 5186.46 samples/sec Loss 35.0740 LearningRate 0.0995 Epoch: 0 Global Step: 790 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:09,951-Speed 5205.90 samples/sec Loss 35.0965 LearningRate 0.0995 Epoch: 0 Global Step: 800 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:11,927-Speed 5185.99 samples/sec Loss 35.0683 LearningRate 0.0995 Epoch: 0 Global Step: 810 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:13,893-Speed 5209.78 samples/sec Loss 35.0358 LearningRate 0.0995 Epoch: 0 Global Step: 820 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:15,861-Speed 5204.43 samples/sec Loss 34.7909 LearningRate 0.0995 Epoch: 0 Global Step: 830 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:17,825-Speed 5215.73 samples/sec Loss 34.7092 LearningRate 0.0995 Epoch: 0 Global Step: 840 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-10 23:18:19,782-Speed 5234.89 samples/sec Loss 34.6423 LearningRate 0.0995 Epoch: 0 Global Step: 850 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:21,749-Speed 5207.99 samples/sec Loss 34.6878 LearningRate 0.0995 Epoch: 0 Global Step: 860 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:23,715-Speed 5208.33 samples/sec Loss 34.4365 LearningRate 0.0995 Epoch: 0 Global Step: 870 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:25,695-Speed 5175.06 samples/sec Loss 34.3589 LearningRate 0.0995 Epoch: 0 Global Step: 880 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:27,657-Speed 5220.30 samples/sec Loss 34.4126 LearningRate 0.0995 Epoch: 0 Global Step: 890 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:29,632-Speed 5188.27 samples/sec Loss 34.2438 LearningRate 0.0995 Epoch: 0 Global Step: 900 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:31,593-Speed 5221.82 samples/sec Loss 34.0000 LearningRate 0.0995 Epoch: 0 Global Step: 910 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:33,571-Speed 5178.18 samples/sec Loss 33.9169 LearningRate 0.0994 Epoch: 0 Global Step: 920 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:35,549-Speed 5178.67 samples/sec Loss 33.9279 LearningRate 0.0994 Epoch: 0 Global Step: 930 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:37,525-Speed 5184.48 samples/sec Loss 33.8329 LearningRate 0.0994 Epoch: 0 Global Step: 940 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:39,500-Speed 5221.45 samples/sec Loss 33.8264 LearningRate 0.0994 Epoch: 0 Global Step: 950 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:41,463-Speed 5217.77 samples/sec Loss 33.5271 LearningRate 0.0994 Epoch: 0 Global Step: 960 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:43,432-Speed 5202.24 samples/sec Loss 33.5840 LearningRate 0.0994 Epoch: 0 Global Step: 970 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:45,398-Speed 5210.52 samples/sec Loss 33.5065 LearningRate 0.0994 Epoch: 0 Global Step: 980 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:47,371-Speed 5190.73 samples/sec Loss 33.3400 LearningRate 0.0994 Epoch: 0 Global Step: 990 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:49,334-Speed 5219.70 samples/sec Loss 33.1799 LearningRate 0.0994 Epoch: 0 Global Step: 1000 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:51,310-Speed 5184.40 samples/sec Loss 33.1923 LearningRate 0.0994 Epoch: 0 Global Step: 1010 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:53,275-Speed 5212.99 samples/sec Loss 33.1893 LearningRate 0.0994 Epoch: 0 Global Step: 1020 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:55,236-Speed 5223.27 samples/sec Loss 33.1714 LearningRate 0.0994 Epoch: 0 Global Step: 1030 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:18:57,207-Speed 5195.11 samples/sec Loss 32.9294 LearningRate 0.0994 Epoch: 0 Global Step: 1040 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:18:59,169-Speed 5222.77 samples/sec Loss 33.0238 LearningRate 0.0994 Epoch: 0 Global Step: 1050 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-10 23:19:01,128-Speed 5228.99 samples/sec Loss 32.7337 LearningRate 0.0994 Epoch: 0 Global Step: 1060 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:03,091-Speed 5216.42 samples/sec Loss 32.6986 LearningRate 0.0994 Epoch: 0 Global Step: 1070 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:05,060-Speed 5202.86 samples/sec Loss 32.4937 LearningRate 0.0994 Epoch: 0 Global Step: 1080 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:07,031-Speed 5197.35 samples/sec Loss 32.6216 LearningRate 0.0993 Epoch: 0 Global Step: 1090 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:08,992-Speed 5223.72 samples/sec Loss 32.3736 LearningRate 0.0993 Epoch: 0 Global Step: 1100 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:10,957-Speed 5213.56 samples/sec Loss 32.2388 LearningRate 0.0993 Epoch: 0 Global Step: 1110 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:12,922-Speed 5213.53 samples/sec Loss 32.0707 LearningRate 0.0993 Epoch: 0 Global Step: 1120 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:14,889-Speed 5207.87 samples/sec Loss 32.1868 LearningRate 0.0993 Epoch: 0 Global Step: 1130 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:16,874-Speed 5161.48 samples/sec Loss 32.1284 LearningRate 0.0993 Epoch: 0 Global Step: 1140 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:18,849-Speed 5185.25 samples/sec Loss 31.9552 LearningRate 0.0993 Epoch: 0 Global Step: 1150 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:20,807-Speed 5230.93 samples/sec Loss 31.9663 LearningRate 0.0993 Epoch: 0 Global Step: 1160 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:22,775-Speed 5205.78 samples/sec Loss 31.7581 LearningRate 0.0993 Epoch: 0 Global Step: 1170 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:24,741-Speed 5210.21 samples/sec Loss 31.7855 LearningRate 0.0993 Epoch: 0 Global Step: 1180 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:26,706-Speed 5212.42 samples/sec Loss 31.6624 LearningRate 0.0993 Epoch: 0 Global Step: 1190 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:28,673-Speed 5206.12 samples/sec Loss 31.5731 LearningRate 0.0993 Epoch: 0 Global Step: 1200 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:30,653-Speed 5174.71 samples/sec Loss 31.3426 LearningRate 0.0993 Epoch: 0 Global Step: 1210 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:32,625-Speed 5195.96 samples/sec Loss 31.4611 LearningRate 0.0993 Epoch: 0 Global Step: 1220 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:34,590-Speed 5211.79 samples/sec Loss 31.2412 LearningRate 0.0993 Epoch: 0 Global Step: 1230 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:36,554-Speed 5214.21 samples/sec Loss 31.2545 LearningRate 0.0993 Epoch: 0 Global Step: 1240 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:38,544-Speed 5149.38 samples/sec Loss 31.1401 LearningRate 0.0993 Epoch: 0 Global Step: 1250 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:40,520-Speed 5182.37 samples/sec Loss 31.0197 LearningRate 0.0992 Epoch: 0 Global Step: 1260 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:42,492-Speed 5195.45 samples/sec Loss 31.1234 LearningRate 0.0992 Epoch: 0 Global Step: 1270 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:44,456-Speed 5214.97 samples/sec Loss 30.9283 LearningRate 0.0992 Epoch: 0 Global Step: 1280 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:46,433-Speed 5182.71 samples/sec Loss 30.8641 LearningRate 0.0992 Epoch: 0 Global Step: 1290 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:48,407-Speed 5188.15 samples/sec Loss 30.8360 LearningRate 0.0992 Epoch: 0 Global Step: 1300 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:50,369-Speed 5220.28 samples/sec Loss 30.6178 LearningRate 0.0992 Epoch: 0 Global Step: 1310 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:52,336-Speed 5207.04 samples/sec Loss 30.6424 LearningRate 0.0992 Epoch: 0 Global Step: 1320 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:54,309-Speed 5192.08 samples/sec Loss 30.5970 LearningRate 0.0992 Epoch: 0 Global Step: 1330 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:56,275-Speed 5211.78 samples/sec Loss 30.5790 LearningRate 0.0992 Epoch: 0 Global Step: 1340 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:19:58,241-Speed 5210.48 samples/sec Loss 30.3145 LearningRate 0.0992 Epoch: 0 Global Step: 1350 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:00,204-Speed 5218.66 samples/sec Loss 30.1407 LearningRate 0.0992 Epoch: 0 Global Step: 1360 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:02,175-Speed 5197.25 samples/sec Loss 30.2517 LearningRate 0.0992 Epoch: 0 Global Step: 1370 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:04,144-Speed 5202.39 samples/sec Loss 30.1807 LearningRate 0.0992 Epoch: 0 Global Step: 1380 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:06,108-Speed 5215.62 samples/sec Loss 29.9508 LearningRate 0.0992 Epoch: 0 Global Step: 1390 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:08,086-Speed 5177.66 samples/sec Loss 30.1546 LearningRate 0.0992 Epoch: 0 Global Step: 1400 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:10,058-Speed 5195.44 samples/sec Loss 29.9935 LearningRate 0.0992 Epoch: 0 Global Step: 1410 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:12,030-Speed 5194.18 samples/sec Loss 29.8499 LearningRate 0.0992 Epoch: 0 Global Step: 1420 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:14,007-Speed 5178.86 samples/sec Loss 29.7210 LearningRate 0.0991 Epoch: 0 Global Step: 1430 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:15,978-Speed 5198.42 samples/sec Loss 29.5144 LearningRate 0.0991 Epoch: 0 Global Step: 1440 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:17,948-Speed 5201.47 samples/sec Loss 29.6384 LearningRate 0.0991 Epoch: 0 Global Step: 1450 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:19,909-Speed 5221.62 samples/sec Loss 29.5790 LearningRate 0.0991 Epoch: 0 Global Step: 1460 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:21,876-Speed 5209.25 samples/sec Loss 29.0083 LearningRate 0.0991 Epoch: 0 Global Step: 1470 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:23,860-Speed 5161.66 samples/sec Loss 29.2839 LearningRate 0.0991 Epoch: 0 Global Step: 1480 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:25,832-Speed 5194.76 samples/sec Loss 29.0904 LearningRate 0.0991 Epoch: 0 Global Step: 1490 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:27,798-Speed 5210.27 samples/sec Loss 29.2346 LearningRate 0.0991 Epoch: 0 Global Step: 1500 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:29,784-Speed 5158.21 samples/sec Loss 29.1180 LearningRate 0.0991 Epoch: 0 Global Step: 1510 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:31,757-Speed 5190.64 samples/sec Loss 28.9953 LearningRate 0.0991 Epoch: 0 Global Step: 1520 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:33,730-Speed 5195.20 samples/sec Loss 28.9010 LearningRate 0.0991 Epoch: 0 Global Step: 1530 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:35,695-Speed 5212.37 samples/sec Loss 28.8552 LearningRate 0.0991 Epoch: 0 Global Step: 1540 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:37,662-Speed 5207.74 samples/sec Loss 28.8907 LearningRate 0.0991 Epoch: 0 Global Step: 1550 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:39,636-Speed 5190.02 samples/sec Loss 28.7415 LearningRate 0.0991 Epoch: 0 Global Step: 1560 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-10 23:20:41,612-Speed 5184.34 samples/sec Loss 28.6750 LearningRate 0.0991 Epoch: 0 Global Step: 1570 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:43,576-Speed 5216.14 samples/sec Loss 28.4757 LearningRate 0.0991 Epoch: 0 Global Step: 1580 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:45,540-Speed 5214.81 samples/sec Loss 28.6029 LearningRate 0.0990 Epoch: 0 Global Step: 1590 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:47,514-Speed 5189.03 samples/sec Loss 28.4212 LearningRate 0.0990 Epoch: 0 Global Step: 1600 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:49,477-Speed 5218.85 samples/sec Loss 28.3676 LearningRate 0.0990 Epoch: 0 Global Step: 1610 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:51,441-Speed 5215.58 samples/sec Loss 28.0982 LearningRate 0.0990 Epoch: 0 Global Step: 1620 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:53,423-Speed 5167.82 samples/sec Loss 28.1865 LearningRate 0.0990 Epoch: 0 Global Step: 1630 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:55,401-Speed 5178.12 samples/sec Loss 28.0562 LearningRate 0.0990 Epoch: 0 Global Step: 1640 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:57,373-Speed 5194.41 samples/sec Loss 28.0380 LearningRate 0.0990 Epoch: 0 Global Step: 1650 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:20:59,339-Speed 5210.60 samples/sec Loss 27.8931 LearningRate 0.0990 Epoch: 0 Global Step: 1660 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:21:01,306-Speed 5209.11 samples/sec Loss 27.7888 LearningRate 0.0990 Epoch: 0 Global Step: 1670 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:21:03,274-Speed 5204.52 samples/sec Loss 27.6188 LearningRate 0.0990 Epoch: 0 Global Step: 1680 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:21:05,243-Speed 5202.32 samples/sec Loss 27.6391 LearningRate 0.0990 Epoch: 0 Global Step: 1690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:21:07,212-Speed 5201.42 samples/sec Loss 27.7151 LearningRate 0.0990 Epoch: 0 Global Step: 1700 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:21:09,177-Speed 5214.17 samples/sec Loss 27.5414 LearningRate 0.0990 Epoch: 0 Global Step: 1710 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:21:11,145-Speed 5203.77 samples/sec Loss 27.5800 LearningRate 0.0990 Epoch: 0 Global Step: 1720 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:21:13,119-Speed 5190.00 samples/sec Loss 27.3103 LearningRate 0.0990 Epoch: 0 Global Step: 1730 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:21:15,080-Speed 5222.15 samples/sec Loss 27.3173 LearningRate 0.0990 Epoch: 0 Global Step: 1740 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:21:17,067-Speed 5156.76 samples/sec Loss 27.1416 LearningRate 0.0990 Epoch: 0 Global Step: 1750 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:21:19,033-Speed 5210.66 samples/sec Loss 27.1545 LearningRate 0.0989 Epoch: 0 Global Step: 1760 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:21:20,992-Speed 5229.49 samples/sec Loss 27.1447 LearningRate 0.0989 Epoch: 0 Global Step: 1770 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-10 23:21:22,957-Speed 5210.66 samples/sec Loss 26.9864 LearningRate 0.0989 Epoch: 0 Global Step: 1780 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-10 23:21:24,927-Speed 5201.24 samples/sec Loss 27.1017 LearningRate 0.0989 Epoch: 0 Global Step: 1790 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-10 23:21:26,916-Speed 5150.53 samples/sec Loss 26.9664 LearningRate 0.0989 Epoch: 0 Global Step: 1800 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-10 23:21:28,884-Speed 5202.34 samples/sec Loss 26.8466 LearningRate 0.0989 Epoch: 0 Global Step: 1810 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-10 23:21:30,858-Speed 5189.73 samples/sec Loss 26.6870 LearningRate 0.0989 Epoch: 0 Global Step: 1820 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-10 23:21:32,831-Speed 5193.77 samples/sec Loss 26.7517 LearningRate 0.0989 Epoch: 0 Global Step: 1830 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-10 23:21:34,809-Speed 5176.51 samples/sec Loss 26.6316 LearningRate 0.0989 Epoch: 0 Global Step: 1840 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-10 23:21:36,792-Speed 5165.72 samples/sec Loss 26.7805 LearningRate 0.0989 Epoch: 0 Global Step: 1850 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-10 23:21:38,776-Speed 5165.14 samples/sec Loss 26.5113 LearningRate 0.0989 Epoch: 0 Global Step: 1860 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-10 23:21:40,740-Speed 5216.64 samples/sec Loss 26.4059 LearningRate 0.0989 Epoch: 0 Global Step: 1870 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-10 23:21:42,706-Speed 5209.06 samples/sec Loss 26.3561 LearningRate 0.0989 Epoch: 0 Global Step: 1880 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-10 23:21:44,684-Speed 5179.90 samples/sec Loss 26.3634 LearningRate 0.0989 Epoch: 0 Global Step: 1890 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-10 23:21:46,658-Speed 5188.34 samples/sec Loss 26.3136 LearningRate 0.0989 Epoch: 0 Global Step: 1900 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-10 23:21:48,637-Speed 5176.41 samples/sec Loss 26.1595 LearningRate 0.0989 Epoch: 0 Global Step: 1910 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-10 23:21:50,614-Speed 5179.75 samples/sec Loss 26.2491 LearningRate 0.0989 Epoch: 0 Global Step: 1920 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-10 23:21:52,579-Speed 5212.88 samples/sec Loss 25.9476 LearningRate 0.0988 Epoch: 0 Global Step: 1930 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-10 23:21:54,549-Speed 5199.43 samples/sec Loss 25.9246 LearningRate 0.0988 Epoch: 0 Global Step: 1940 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-10 23:21:56,513-Speed 5214.78 samples/sec Loss 25.9450 LearningRate 0.0988 Epoch: 0 Global Step: 1950 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-10 23:21:58,501-Speed 5154.39 samples/sec Loss 25.8793 LearningRate 0.0988 Epoch: 0 Global Step: 1960 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-10 23:22:00,467-Speed 5210.71 samples/sec Loss 25.7047 LearningRate 0.0988 Epoch: 0 Global Step: 1970 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-10 23:22:02,435-Speed 5205.18 samples/sec Loss 25.5736 LearningRate 0.0988 Epoch: 0 Global Step: 1980 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-10 23:22:04,401-Speed 5209.25 samples/sec Loss 25.6557 LearningRate 0.0988 Epoch: 0 Global Step: 1990 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-10 23:22:06,372-Speed 5198.91 samples/sec Loss 25.7822 LearningRate 0.0988 Epoch: 0 Global Step: 2000 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-10 23:22:33,225-[lfw][2000]XNorm: 21.536330 Training: 2022-04-10 23:22:33,225-[lfw][2000]Accuracy-Flip: 0.97117+-0.00919 Training: 2022-04-10 23:22:33,226-[lfw][2000]Accuracy-Highest: 0.97117 Training: 2022-04-10 23:23:04,059-[cfp_fp][2000]XNorm: 19.435067 Training: 2022-04-10 23:23:04,060-[cfp_fp][2000]Accuracy-Flip: 0.74843+-0.01435 Training: 2022-04-10 23:23:04,060-[cfp_fp][2000]Accuracy-Highest: 0.74843 Training: 2022-04-10 23:23:30,766-[agedb_30][2000]XNorm: 20.274658 Training: 2022-04-10 23:23:30,767-[agedb_30][2000]Accuracy-Flip: 0.81767+-0.02470 Training: 2022-04-10 23:23:30,767-[agedb_30][2000]Accuracy-Highest: 0.81767 Training: 2022-04-10 23:23:32,778-Speed 118.51 samples/sec Loss 25.6232 LearningRate 0.0988 Epoch: 0 Global Step: 2010 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:23:34,735-Speed 5233.07 samples/sec Loss 25.3817 LearningRate 0.0988 Epoch: 0 Global Step: 2020 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:23:36,692-Speed 5234.70 samples/sec Loss 25.3165 LearningRate 0.0988 Epoch: 0 Global Step: 2030 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:23:38,652-Speed 5226.30 samples/sec Loss 25.4058 LearningRate 0.0988 Epoch: 0 Global Step: 2040 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:23:40,624-Speed 5195.02 samples/sec Loss 25.4044 LearningRate 0.0988 Epoch: 0 Global Step: 2050 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:23:42,586-Speed 5220.76 samples/sec Loss 25.1061 LearningRate 0.0988 Epoch: 0 Global Step: 2060 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:23:44,548-Speed 5219.51 samples/sec Loss 25.0818 LearningRate 0.0988 Epoch: 0 Global Step: 2070 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:23:46,509-Speed 5223.58 samples/sec Loss 25.1871 LearningRate 0.0988 Epoch: 0 Global Step: 2080 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-04-10 23:23:48,474-Speed 5211.86 samples/sec Loss 25.0573 LearningRate 0.0988 Epoch: 0 Global Step: 2090 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-04-10 23:23:50,434-Speed 5227.53 samples/sec Loss 24.8806 LearningRate 0.0987 Epoch: 0 Global Step: 2100 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:23:52,404-Speed 5199.06 samples/sec Loss 25.0011 LearningRate 0.0987 Epoch: 0 Global Step: 2110 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:23:54,368-Speed 5217.18 samples/sec Loss 24.8857 LearningRate 0.0987 Epoch: 0 Global Step: 2120 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:23:56,332-Speed 5216.20 samples/sec Loss 24.9270 LearningRate 0.0987 Epoch: 0 Global Step: 2130 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:23:58,299-Speed 5206.66 samples/sec Loss 24.6795 LearningRate 0.0987 Epoch: 0 Global Step: 2140 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:24:00,265-Speed 5209.30 samples/sec Loss 24.6780 LearningRate 0.0987 Epoch: 0 Global Step: 2150 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:24:02,231-Speed 5210.98 samples/sec Loss 24.7557 LearningRate 0.0987 Epoch: 0 Global Step: 2160 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:24:04,207-Speed 5183.18 samples/sec Loss 24.4479 LearningRate 0.0987 Epoch: 0 Global Step: 2170 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:24:06,172-Speed 5211.83 samples/sec Loss 24.4910 LearningRate 0.0987 Epoch: 0 Global Step: 2180 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:24:08,137-Speed 5214.82 samples/sec Loss 24.6748 LearningRate 0.0987 Epoch: 0 Global Step: 2190 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:24:10,110-Speed 5190.91 samples/sec Loss 24.2253 LearningRate 0.0987 Epoch: 0 Global Step: 2200 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-04-10 23:24:12,090-Speed 5174.28 samples/sec Loss 24.3807 LearningRate 0.0987 Epoch: 0 Global Step: 2210 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:24:14,069-Speed 5176.29 samples/sec Loss 24.4167 LearningRate 0.0987 Epoch: 0 Global Step: 2220 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:24:16,042-Speed 5190.99 samples/sec Loss 24.1232 LearningRate 0.0987 Epoch: 0 Global Step: 2230 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:24:18,010-Speed 5206.85 samples/sec Loss 23.9856 LearningRate 0.0987 Epoch: 0 Global Step: 2240 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:24:19,983-Speed 5190.29 samples/sec Loss 24.0381 LearningRate 0.0987 Epoch: 0 Global Step: 2250 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:24:21,952-Speed 5203.04 samples/sec Loss 23.9839 LearningRate 0.0987 Epoch: 0 Global Step: 2260 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:24:23,920-Speed 5204.57 samples/sec Loss 23.8930 LearningRate 0.0986 Epoch: 0 Global Step: 2270 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:24:25,885-Speed 5214.16 samples/sec Loss 23.9383 LearningRate 0.0986 Epoch: 0 Global Step: 2280 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:24:27,852-Speed 5205.33 samples/sec Loss 23.7199 LearningRate 0.0986 Epoch: 0 Global Step: 2290 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:24:29,819-Speed 5209.16 samples/sec Loss 23.8503 LearningRate 0.0986 Epoch: 0 Global Step: 2300 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-10 23:24:31,784-Speed 5211.86 samples/sec Loss 23.6092 LearningRate 0.0986 Epoch: 0 Global Step: 2310 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-04-10 23:24:33,753-Speed 5202.71 samples/sec Loss 23.6710 LearningRate 0.0986 Epoch: 0 Global Step: 2320 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-04-10 23:24:35,733-Speed 5174.83 samples/sec Loss 23.6662 LearningRate 0.0986 Epoch: 0 Global Step: 2330 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-10 23:24:37,709-Speed 5184.51 samples/sec Loss 23.4788 LearningRate 0.0986 Epoch: 0 Global Step: 2340 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:24:39,677-Speed 5204.77 samples/sec Loss 23.5791 LearningRate 0.0986 Epoch: 0 Global Step: 2350 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:24:41,644-Speed 5205.57 samples/sec Loss 23.2693 LearningRate 0.0986 Epoch: 0 Global Step: 2360 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:24:43,610-Speed 5211.44 samples/sec Loss 23.4811 LearningRate 0.0986 Epoch: 0 Global Step: 2370 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:24:45,576-Speed 5209.00 samples/sec Loss 23.0131 LearningRate 0.0986 Epoch: 0 Global Step: 2380 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:24:47,553-Speed 5183.42 samples/sec Loss 23.2999 LearningRate 0.0986 Epoch: 0 Global Step: 2390 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:24:49,525-Speed 5193.23 samples/sec Loss 23.3251 LearningRate 0.0986 Epoch: 0 Global Step: 2400 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:24:51,501-Speed 5184.65 samples/sec Loss 23.0868 LearningRate 0.0986 Epoch: 0 Global Step: 2410 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:24:53,476-Speed 5185.18 samples/sec Loss 23.2158 LearningRate 0.0986 Epoch: 0 Global Step: 2420 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:24:55,443-Speed 5209.94 samples/sec Loss 23.2469 LearningRate 0.0985 Epoch: 0 Global Step: 2430 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:24:57,433-Speed 5146.44 samples/sec Loss 22.9402 LearningRate 0.0985 Epoch: 0 Global Step: 2440 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-10 23:24:59,401-Speed 5205.63 samples/sec Loss 22.8822 LearningRate 0.0985 Epoch: 0 Global Step: 2450 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:01,364-Speed 5218.75 samples/sec Loss 23.0413 LearningRate 0.0985 Epoch: 0 Global Step: 2460 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:03,333-Speed 5200.97 samples/sec Loss 22.7176 LearningRate 0.0985 Epoch: 0 Global Step: 2470 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:05,309-Speed 5184.66 samples/sec Loss 22.8644 LearningRate 0.0985 Epoch: 0 Global Step: 2480 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:07,273-Speed 5214.31 samples/sec Loss 22.7941 LearningRate 0.0985 Epoch: 0 Global Step: 2490 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:09,238-Speed 5213.66 samples/sec Loss 22.7539 LearningRate 0.0985 Epoch: 0 Global Step: 2500 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:11,213-Speed 5185.82 samples/sec Loss 22.6121 LearningRate 0.0985 Epoch: 0 Global Step: 2510 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:13,187-Speed 5190.82 samples/sec Loss 22.7024 LearningRate 0.0985 Epoch: 0 Global Step: 2520 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:15,148-Speed 5222.36 samples/sec Loss 22.4012 LearningRate 0.0985 Epoch: 0 Global Step: 2530 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:17,115-Speed 5209.84 samples/sec Loss 22.5427 LearningRate 0.0985 Epoch: 0 Global Step: 2540 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:19,077-Speed 5219.70 samples/sec Loss 22.4180 LearningRate 0.0985 Epoch: 0 Global Step: 2550 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-10 23:25:21,039-Speed 5220.44 samples/sec Loss 22.3269 LearningRate 0.0985 Epoch: 0 Global Step: 2560 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:23,011-Speed 5195.25 samples/sec Loss 22.4276 LearningRate 0.0985 Epoch: 0 Global Step: 2570 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:24,995-Speed 5162.78 samples/sec Loss 22.5226 LearningRate 0.0985 Epoch: 0 Global Step: 2580 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:26,969-Speed 5189.20 samples/sec Loss 22.4173 LearningRate 0.0985 Epoch: 0 Global Step: 2590 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:28,933-Speed 5213.34 samples/sec Loss 22.3770 LearningRate 0.0984 Epoch: 0 Global Step: 2600 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:30,909-Speed 5184.54 samples/sec Loss 22.1089 LearningRate 0.0984 Epoch: 0 Global Step: 2610 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:32,889-Speed 5174.05 samples/sec Loss 22.0150 LearningRate 0.0984 Epoch: 0 Global Step: 2620 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:34,866-Speed 5182.34 samples/sec Loss 22.0886 LearningRate 0.0984 Epoch: 0 Global Step: 2630 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:36,830-Speed 5215.65 samples/sec Loss 21.8769 LearningRate 0.0984 Epoch: 0 Global Step: 2640 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:38,793-Speed 5217.25 samples/sec Loss 21.9094 LearningRate 0.0984 Epoch: 0 Global Step: 2650 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:40,758-Speed 5214.56 samples/sec Loss 21.9715 LearningRate 0.0984 Epoch: 0 Global Step: 2660 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:42,720-Speed 5219.30 samples/sec Loss 21.7504 LearningRate 0.0984 Epoch: 0 Global Step: 2670 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:44,688-Speed 5205.53 samples/sec Loss 21.8491 LearningRate 0.0984 Epoch: 0 Global Step: 2680 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:46,660-Speed 5194.28 samples/sec Loss 21.9082 LearningRate 0.0984 Epoch: 0 Global Step: 2690 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:48,649-Speed 5149.90 samples/sec Loss 21.8227 LearningRate 0.0984 Epoch: 0 Global Step: 2700 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:50,613-Speed 5216.61 samples/sec Loss 21.7487 LearningRate 0.0984 Epoch: 0 Global Step: 2710 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:52,578-Speed 5211.42 samples/sec Loss 21.7443 LearningRate 0.0984 Epoch: 0 Global Step: 2720 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:54,549-Speed 5196.26 samples/sec Loss 21.5607 LearningRate 0.0984 Epoch: 0 Global Step: 2730 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:56,522-Speed 5193.91 samples/sec Loss 21.5670 LearningRate 0.0984 Epoch: 0 Global Step: 2740 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:25:58,501-Speed 5173.80 samples/sec Loss 21.4393 LearningRate 0.0984 Epoch: 0 Global Step: 2750 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:26:00,470-Speed 5203.58 samples/sec Loss 21.4493 LearningRate 0.0984 Epoch: 0 Global Step: 2760 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:26:02,440-Speed 5201.43 samples/sec Loss 21.4151 LearningRate 0.0983 Epoch: 0 Global Step: 2770 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:26:04,417-Speed 5181.23 samples/sec Loss 21.4010 LearningRate 0.0983 Epoch: 0 Global Step: 2780 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:26:06,382-Speed 5212.30 samples/sec Loss 21.4260 LearningRate 0.0983 Epoch: 0 Global Step: 2790 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:26:08,347-Speed 5211.59 samples/sec Loss 21.3921 LearningRate 0.0983 Epoch: 0 Global Step: 2800 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:26:10,312-Speed 5213.85 samples/sec Loss 21.1279 LearningRate 0.0983 Epoch: 0 Global Step: 2810 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:26:12,278-Speed 5210.42 samples/sec Loss 21.2552 LearningRate 0.0983 Epoch: 0 Global Step: 2820 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:26:14,244-Speed 5209.46 samples/sec Loss 21.1991 LearningRate 0.0983 Epoch: 0 Global Step: 2830 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:26:16,214-Speed 5201.65 samples/sec Loss 21.0406 LearningRate 0.0983 Epoch: 0 Global Step: 2840 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:26:18,201-Speed 5154.04 samples/sec Loss 21.0445 LearningRate 0.0983 Epoch: 0 Global Step: 2850 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:26:20,168-Speed 5209.47 samples/sec Loss 20.9895 LearningRate 0.0983 Epoch: 0 Global Step: 2860 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-10 23:26:22,129-Speed 5221.74 samples/sec Loss 21.0974 LearningRate 0.0983 Epoch: 0 Global Step: 2870 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:26:24,101-Speed 5194.64 samples/sec Loss 20.8325 LearningRate 0.0983 Epoch: 0 Global Step: 2880 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:26:26,067-Speed 5210.80 samples/sec Loss 20.8508 LearningRate 0.0983 Epoch: 0 Global Step: 2890 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:26:28,031-Speed 5216.15 samples/sec Loss 20.7732 LearningRate 0.0983 Epoch: 0 Global Step: 2900 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:26:29,999-Speed 5204.66 samples/sec Loss 20.8203 LearningRate 0.0983 Epoch: 0 Global Step: 2910 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:26:31,964-Speed 5213.20 samples/sec Loss 20.5960 LearningRate 0.0983 Epoch: 0 Global Step: 2920 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:26:33,930-Speed 5209.15 samples/sec Loss 20.4990 LearningRate 0.0983 Epoch: 0 Global Step: 2930 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:26:35,915-Speed 5161.47 samples/sec Loss 20.6097 LearningRate 0.0982 Epoch: 0 Global Step: 2940 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:26:37,893-Speed 5178.06 samples/sec Loss 20.5773 LearningRate 0.0982 Epoch: 0 Global Step: 2950 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:26:39,881-Speed 5154.30 samples/sec Loss 20.4136 LearningRate 0.0982 Epoch: 0 Global Step: 2960 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:26:41,852-Speed 5196.01 samples/sec Loss 20.4184 LearningRate 0.0982 Epoch: 0 Global Step: 2970 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-10 23:26:43,831-Speed 5177.19 samples/sec Loss 20.3272 LearningRate 0.0982 Epoch: 0 Global Step: 2980 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:26:45,799-Speed 5204.38 samples/sec Loss 20.3201 LearningRate 0.0982 Epoch: 0 Global Step: 2990 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:26:47,776-Speed 5181.32 samples/sec Loss 20.3939 LearningRate 0.0982 Epoch: 0 Global Step: 3000 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:26:49,746-Speed 5198.90 samples/sec Loss 20.2683 LearningRate 0.0982 Epoch: 0 Global Step: 3010 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:26:51,720-Speed 5189.72 samples/sec Loss 20.3534 LearningRate 0.0982 Epoch: 0 Global Step: 3020 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:26:53,689-Speed 5201.34 samples/sec Loss 20.2845 LearningRate 0.0982 Epoch: 0 Global Step: 3030 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:26:55,668-Speed 5177.25 samples/sec Loss 20.2613 LearningRate 0.0982 Epoch: 0 Global Step: 3040 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:26:57,634-Speed 5208.90 samples/sec Loss 20.0900 LearningRate 0.0982 Epoch: 0 Global Step: 3050 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:26:59,602-Speed 5206.47 samples/sec Loss 19.8915 LearningRate 0.0982 Epoch: 0 Global Step: 3060 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:27:01,571-Speed 5203.37 samples/sec Loss 20.0628 LearningRate 0.0982 Epoch: 0 Global Step: 3070 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:27:03,542-Speed 5195.53 samples/sec Loss 20.0556 LearningRate 0.0982 Epoch: 0 Global Step: 3080 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-10 23:27:05,506-Speed 5216.45 samples/sec Loss 20.0855 LearningRate 0.0982 Epoch: 0 Global Step: 3090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:27:07,474-Speed 5205.60 samples/sec Loss 20.0034 LearningRate 0.0982 Epoch: 0 Global Step: 3100 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:27:09,439-Speed 5211.00 samples/sec Loss 19.9307 LearningRate 0.0981 Epoch: 0 Global Step: 3110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:27:11,409-Speed 5199.78 samples/sec Loss 19.7413 LearningRate 0.0981 Epoch: 0 Global Step: 3120 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:27:13,380-Speed 5198.74 samples/sec Loss 19.8866 LearningRate 0.0981 Epoch: 0 Global Step: 3130 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:27:15,347-Speed 5207.21 samples/sec Loss 19.7998 LearningRate 0.0981 Epoch: 0 Global Step: 3140 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:27:17,317-Speed 5200.00 samples/sec Loss 19.8291 LearningRate 0.0981 Epoch: 0 Global Step: 3150 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:27:19,281-Speed 5215.34 samples/sec Loss 19.5915 LearningRate 0.0981 Epoch: 0 Global Step: 3160 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:27:21,250-Speed 5201.67 samples/sec Loss 19.6703 LearningRate 0.0981 Epoch: 0 Global Step: 3170 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:27:23,218-Speed 5205.40 samples/sec Loss 19.6844 LearningRate 0.0981 Epoch: 0 Global Step: 3180 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:27:25,189-Speed 5197.07 samples/sec Loss 19.6683 LearningRate 0.0981 Epoch: 0 Global Step: 3190 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:27:27,159-Speed 5198.98 samples/sec Loss 19.5349 LearningRate 0.0981 Epoch: 0 Global Step: 3200 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:27:29,130-Speed 5196.87 samples/sec Loss 19.6574 LearningRate 0.0981 Epoch: 0 Global Step: 3210 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:27:31,099-Speed 5203.01 samples/sec Loss 19.4470 LearningRate 0.0981 Epoch: 0 Global Step: 3220 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:27:33,064-Speed 5212.31 samples/sec Loss 19.6160 LearningRate 0.0981 Epoch: 0 Global Step: 3230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:27:35,038-Speed 5188.95 samples/sec Loss 19.4598 LearningRate 0.0981 Epoch: 0 Global Step: 3240 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:27:37,005-Speed 5209.79 samples/sec Loss 19.4792 LearningRate 0.0981 Epoch: 0 Global Step: 3250 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:27:38,981-Speed 5182.38 samples/sec Loss 19.3973 LearningRate 0.0981 Epoch: 0 Global Step: 3260 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:27:40,949-Speed 5206.31 samples/sec Loss 19.4512 LearningRate 0.0981 Epoch: 0 Global Step: 3270 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:27:42,934-Speed 5159.39 samples/sec Loss 19.1427 LearningRate 0.0980 Epoch: 0 Global Step: 3280 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:27:44,904-Speed 5200.50 samples/sec Loss 19.2809 LearningRate 0.0980 Epoch: 0 Global Step: 3290 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-10 23:27:46,884-Speed 5173.84 samples/sec Loss 19.2917 LearningRate 0.0980 Epoch: 0 Global Step: 3300 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:27:48,850-Speed 5208.95 samples/sec Loss 19.1244 LearningRate 0.0980 Epoch: 0 Global Step: 3310 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:27:50,818-Speed 5205.70 samples/sec Loss 19.3492 LearningRate 0.0980 Epoch: 0 Global Step: 3320 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:27:52,786-Speed 5204.63 samples/sec Loss 18.8661 LearningRate 0.0980 Epoch: 0 Global Step: 3330 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:27:54,753-Speed 5208.21 samples/sec Loss 19.0116 LearningRate 0.0980 Epoch: 0 Global Step: 3340 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:27:56,721-Speed 5204.33 samples/sec Loss 19.0592 LearningRate 0.0980 Epoch: 0 Global Step: 3350 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:27:58,692-Speed 5195.53 samples/sec Loss 18.9290 LearningRate 0.0980 Epoch: 0 Global Step: 3360 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:28:00,662-Speed 5201.59 samples/sec Loss 18.9033 LearningRate 0.0980 Epoch: 0 Global Step: 3370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:28:02,648-Speed 5157.09 samples/sec Loss 18.9075 LearningRate 0.0980 Epoch: 0 Global Step: 3380 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:28:04,625-Speed 5180.75 samples/sec Loss 18.8826 LearningRate 0.0980 Epoch: 0 Global Step: 3390 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:28:06,607-Speed 5167.92 samples/sec Loss 19.0148 LearningRate 0.0980 Epoch: 0 Global Step: 3400 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-10 23:28:08,570-Speed 5219.45 samples/sec Loss 18.5854 LearningRate 0.0980 Epoch: 0 Global Step: 3410 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:28:10,541-Speed 5198.56 samples/sec Loss 18.6278 LearningRate 0.0980 Epoch: 0 Global Step: 3420 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:28:12,525-Speed 5161.42 samples/sec Loss 18.7445 LearningRate 0.0980 Epoch: 0 Global Step: 3430 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:28:14,493-Speed 5205.73 samples/sec Loss 18.5936 LearningRate 0.0979 Epoch: 0 Global Step: 3440 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:28:16,463-Speed 5198.81 samples/sec Loss 18.6879 LearningRate 0.0979 Epoch: 0 Global Step: 3450 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:28:18,437-Speed 5190.46 samples/sec Loss 18.5875 LearningRate 0.0979 Epoch: 0 Global Step: 3460 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:28:20,406-Speed 5202.23 samples/sec Loss 18.4774 LearningRate 0.0979 Epoch: 0 Global Step: 3470 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:28:22,378-Speed 5194.44 samples/sec Loss 18.5976 LearningRate 0.0979 Epoch: 0 Global Step: 3480 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:28:24,342-Speed 5213.89 samples/sec Loss 18.5307 LearningRate 0.0979 Epoch: 0 Global Step: 3490 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:28:26,313-Speed 5199.15 samples/sec Loss 18.6159 LearningRate 0.0979 Epoch: 0 Global Step: 3500 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:28:28,292-Speed 5174.14 samples/sec Loss 18.5158 LearningRate 0.0979 Epoch: 0 Global Step: 3510 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-10 23:28:30,258-Speed 5211.85 samples/sec Loss 18.5346 LearningRate 0.0979 Epoch: 0 Global Step: 3520 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:28:32,224-Speed 5210.34 samples/sec Loss 18.3977 LearningRate 0.0979 Epoch: 0 Global Step: 3530 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:28:34,198-Speed 5190.62 samples/sec Loss 18.2748 LearningRate 0.0979 Epoch: 0 Global Step: 3540 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:28:36,164-Speed 5207.89 samples/sec Loss 18.2032 LearningRate 0.0979 Epoch: 0 Global Step: 3550 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:28:38,131-Speed 5208.99 samples/sec Loss 18.1598 LearningRate 0.0979 Epoch: 0 Global Step: 3560 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:28:40,102-Speed 5197.18 samples/sec Loss 18.2083 LearningRate 0.0979 Epoch: 0 Global Step: 3570 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:28:42,083-Speed 5170.53 samples/sec Loss 18.2340 LearningRate 0.0979 Epoch: 0 Global Step: 3580 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:28:44,048-Speed 5211.44 samples/sec Loss 18.0749 LearningRate 0.0979 Epoch: 0 Global Step: 3590 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:28:46,041-Speed 5141.17 samples/sec Loss 18.3144 LearningRate 0.0979 Epoch: 0 Global Step: 3600 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:28:48,023-Speed 5167.62 samples/sec Loss 18.1613 LearningRate 0.0978 Epoch: 0 Global Step: 3610 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:28:49,995-Speed 5195.21 samples/sec Loss 18.0822 LearningRate 0.0978 Epoch: 0 Global Step: 3620 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-10 23:28:51,959-Speed 5213.94 samples/sec Loss 18.1036 LearningRate 0.0978 Epoch: 0 Global Step: 3630 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:28:53,957-Speed 5127.23 samples/sec Loss 17.9977 LearningRate 0.0978 Epoch: 0 Global Step: 3640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:28:55,924-Speed 5208.83 samples/sec Loss 18.0010 LearningRate 0.0978 Epoch: 0 Global Step: 3650 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:28:57,898-Speed 5188.32 samples/sec Loss 17.9190 LearningRate 0.0978 Epoch: 0 Global Step: 3660 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:28:59,865-Speed 5207.58 samples/sec Loss 17.8657 LearningRate 0.0978 Epoch: 0 Global Step: 3670 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:29:01,837-Speed 5196.07 samples/sec Loss 17.8456 LearningRate 0.0978 Epoch: 0 Global Step: 3680 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:29:03,849-Speed 5090.81 samples/sec Loss 18.0239 LearningRate 0.0978 Epoch: 0 Global Step: 3690 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:29:05,827-Speed 5178.65 samples/sec Loss 17.9855 LearningRate 0.0978 Epoch: 0 Global Step: 3700 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:29:07,794-Speed 5206.77 samples/sec Loss 17.8560 LearningRate 0.0978 Epoch: 0 Global Step: 3710 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:29:09,765-Speed 5197.73 samples/sec Loss 17.8949 LearningRate 0.0978 Epoch: 0 Global Step: 3720 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:29:11,736-Speed 5195.77 samples/sec Loss 17.8250 LearningRate 0.0978 Epoch: 0 Global Step: 3730 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-10 23:29:13,703-Speed 5208.05 samples/sec Loss 17.7673 LearningRate 0.0978 Epoch: 0 Global Step: 3740 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-10 23:29:15,665-Speed 5221.31 samples/sec Loss 17.6920 LearningRate 0.0978 Epoch: 0 Global Step: 3750 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:29:17,634-Speed 5203.05 samples/sec Loss 17.8196 LearningRate 0.0978 Epoch: 0 Global Step: 3760 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:29:19,604-Speed 5201.25 samples/sec Loss 17.8798 LearningRate 0.0978 Epoch: 0 Global Step: 3770 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:29:21,572-Speed 5202.71 samples/sec Loss 17.5899 LearningRate 0.0977 Epoch: 0 Global Step: 3780 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:29:23,538-Speed 5211.27 samples/sec Loss 17.5464 LearningRate 0.0977 Epoch: 0 Global Step: 3790 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:29:25,520-Speed 5167.60 samples/sec Loss 17.5493 LearningRate 0.0977 Epoch: 0 Global Step: 3800 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:29:27,503-Speed 5166.80 samples/sec Loss 17.5182 LearningRate 0.0977 Epoch: 0 Global Step: 3810 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:29:29,480-Speed 5180.36 samples/sec Loss 17.5327 LearningRate 0.0977 Epoch: 0 Global Step: 3820 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:29:31,452-Speed 5194.51 samples/sec Loss 17.5183 LearningRate 0.0977 Epoch: 0 Global Step: 3830 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:29:33,418-Speed 5208.88 samples/sec Loss 17.5988 LearningRate 0.0977 Epoch: 0 Global Step: 3840 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:29:35,383-Speed 5214.33 samples/sec Loss 17.5215 LearningRate 0.0977 Epoch: 0 Global Step: 3850 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:29:37,366-Speed 5166.82 samples/sec Loss 17.5387 LearningRate 0.0977 Epoch: 0 Global Step: 3860 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:29:39,335-Speed 5200.80 samples/sec Loss 17.3302 LearningRate 0.0977 Epoch: 0 Global Step: 3870 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:29:41,313-Speed 5180.75 samples/sec Loss 17.3093 LearningRate 0.0977 Epoch: 0 Global Step: 3880 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:29:43,278-Speed 5210.68 samples/sec Loss 17.3333 LearningRate 0.0977 Epoch: 0 Global Step: 3890 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:29:45,245-Speed 5207.30 samples/sec Loss 17.2924 LearningRate 0.0977 Epoch: 0 Global Step: 3900 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:29:47,218-Speed 5192.96 samples/sec Loss 17.1734 LearningRate 0.0977 Epoch: 0 Global Step: 3910 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:29:49,191-Speed 5190.86 samples/sec Loss 17.3778 LearningRate 0.0977 Epoch: 0 Global Step: 3920 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:29:51,160-Speed 5202.56 samples/sec Loss 17.0060 LearningRate 0.0977 Epoch: 0 Global Step: 3930 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:29:53,138-Speed 5180.02 samples/sec Loss 17.1570 LearningRate 0.0977 Epoch: 0 Global Step: 3940 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:29:55,111-Speed 5191.71 samples/sec Loss 17.0622 LearningRate 0.0976 Epoch: 0 Global Step: 3950 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-10 23:29:57,076-Speed 5212.30 samples/sec Loss 17.1056 LearningRate 0.0976 Epoch: 0 Global Step: 3960 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:29:59,053-Speed 5179.88 samples/sec Loss 17.1201 LearningRate 0.0976 Epoch: 0 Global Step: 3970 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:30:01,033-Speed 5175.15 samples/sec Loss 17.0417 LearningRate 0.0976 Epoch: 0 Global Step: 3980 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:30:03,005-Speed 5195.69 samples/sec Loss 17.0662 LearningRate 0.0976 Epoch: 0 Global Step: 3990 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:30:04,972-Speed 5206.38 samples/sec Loss 16.9046 LearningRate 0.0976 Epoch: 0 Global Step: 4000 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:30:31,539-[lfw][4000]XNorm: 23.089236 Training: 2022-04-10 23:30:31,540-[lfw][4000]Accuracy-Flip: 0.98550+-0.00553 Training: 2022-04-10 23:30:31,540-[lfw][4000]Accuracy-Highest: 0.98550 Training: 2022-04-10 23:31:02,325-[cfp_fp][4000]XNorm: 20.256222 Training: 2022-04-10 23:31:02,325-[cfp_fp][4000]Accuracy-Flip: 0.84714+-0.01197 Training: 2022-04-10 23:31:02,326-[cfp_fp][4000]Accuracy-Highest: 0.84714 Training: 2022-04-10 23:31:28,867-[agedb_30][4000]XNorm: 22.060396 Training: 2022-04-10 23:31:28,867-[agedb_30][4000]Accuracy-Flip: 0.90050+-0.02111 Training: 2022-04-10 23:31:28,868-[agedb_30][4000]Accuracy-Highest: 0.90050 Training: 2022-04-10 23:31:30,846-Speed 119.24 samples/sec Loss 16.8107 LearningRate 0.0976 Epoch: 0 Global Step: 4010 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:31:32,814-Speed 5205.84 samples/sec Loss 16.8850 LearningRate 0.0976 Epoch: 0 Global Step: 4020 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:31:34,776-Speed 5220.50 samples/sec Loss 16.9643 LearningRate 0.0976 Epoch: 0 Global Step: 4030 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:31:36,735-Speed 5228.33 samples/sec Loss 16.8091 LearningRate 0.0976 Epoch: 0 Global Step: 4040 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:31:38,708-Speed 5191.75 samples/sec Loss 17.0338 LearningRate 0.0976 Epoch: 0 Global Step: 4050 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:31:40,664-Speed 5239.07 samples/sec Loss 16.6382 LearningRate 0.0976 Epoch: 0 Global Step: 4060 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:31:42,628-Speed 5214.82 samples/sec Loss 16.5477 LearningRate 0.0976 Epoch: 0 Global Step: 4070 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:31:44,597-Speed 5200.78 samples/sec Loss 16.6042 LearningRate 0.0976 Epoch: 0 Global Step: 4080 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:31:46,567-Speed 5199.97 samples/sec Loss 16.5635 LearningRate 0.0976 Epoch: 0 Global Step: 4090 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:31:48,543-Speed 5185.61 samples/sec Loss 16.8005 LearningRate 0.0976 Epoch: 0 Global Step: 4100 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:31:50,512-Speed 5202.24 samples/sec Loss 16.6280 LearningRate 0.0976 Epoch: 0 Global Step: 4110 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:31:52,481-Speed 5201.98 samples/sec Loss 16.5835 LearningRate 0.0975 Epoch: 0 Global Step: 4120 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:31:54,459-Speed 5179.63 samples/sec Loss 16.5015 LearningRate 0.0975 Epoch: 0 Global Step: 4130 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:31:56,425-Speed 5209.09 samples/sec Loss 16.5584 LearningRate 0.0975 Epoch: 0 Global Step: 4140 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:31:58,404-Speed 5176.04 samples/sec Loss 16.5731 LearningRate 0.0975 Epoch: 0 Global Step: 4150 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:00,379-Speed 5187.10 samples/sec Loss 16.3579 LearningRate 0.0975 Epoch: 0 Global Step: 4160 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-10 23:32:02,341-Speed 5220.67 samples/sec Loss 16.4213 LearningRate 0.0975 Epoch: 0 Global Step: 4170 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:04,324-Speed 5166.42 samples/sec Loss 16.6196 LearningRate 0.0975 Epoch: 0 Global Step: 4180 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:06,293-Speed 5200.14 samples/sec Loss 16.5377 LearningRate 0.0975 Epoch: 0 Global Step: 4190 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:08,274-Speed 5172.64 samples/sec Loss 16.6725 LearningRate 0.0975 Epoch: 0 Global Step: 4200 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:10,259-Speed 5160.64 samples/sec Loss 16.5143 LearningRate 0.0975 Epoch: 0 Global Step: 4210 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:12,228-Speed 5200.27 samples/sec Loss 16.5181 LearningRate 0.0975 Epoch: 0 Global Step: 4220 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:14,213-Speed 5161.44 samples/sec Loss 16.3874 LearningRate 0.0975 Epoch: 0 Global Step: 4230 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:16,187-Speed 5191.12 samples/sec Loss 16.3127 LearningRate 0.0975 Epoch: 0 Global Step: 4240 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:18,160-Speed 5189.83 samples/sec Loss 16.2619 LearningRate 0.0975 Epoch: 0 Global Step: 4250 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:20,137-Speed 5182.72 samples/sec Loss 16.2860 LearningRate 0.0975 Epoch: 0 Global Step: 4260 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:22,118-Speed 5168.96 samples/sec Loss 16.2635 LearningRate 0.0975 Epoch: 0 Global Step: 4270 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:24,090-Speed 5196.20 samples/sec Loss 16.3279 LearningRate 0.0975 Epoch: 0 Global Step: 4280 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:26,071-Speed 5168.34 samples/sec Loss 16.5008 LearningRate 0.0974 Epoch: 0 Global Step: 4290 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:28,045-Speed 5191.15 samples/sec Loss 16.0883 LearningRate 0.0974 Epoch: 0 Global Step: 4300 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:30,029-Speed 5163.28 samples/sec Loss 16.1627 LearningRate 0.0974 Epoch: 0 Global Step: 4310 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:31,997-Speed 5204.95 samples/sec Loss 16.1428 LearningRate 0.0974 Epoch: 0 Global Step: 4320 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:33,972-Speed 5184.82 samples/sec Loss 16.0208 LearningRate 0.0974 Epoch: 0 Global Step: 4330 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:35,943-Speed 5198.66 samples/sec Loss 16.0669 LearningRate 0.0974 Epoch: 0 Global Step: 4340 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:37,925-Speed 5167.48 samples/sec Loss 15.9485 LearningRate 0.0974 Epoch: 0 Global Step: 4350 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:39,897-Speed 5195.07 samples/sec Loss 16.1916 LearningRate 0.0974 Epoch: 0 Global Step: 4360 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:41,858-Speed 5224.46 samples/sec Loss 15.9525 LearningRate 0.0974 Epoch: 0 Global Step: 4370 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:43,825-Speed 5207.65 samples/sec Loss 15.9958 LearningRate 0.0974 Epoch: 0 Global Step: 4380 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:45,791-Speed 5208.43 samples/sec Loss 15.7991 LearningRate 0.0974 Epoch: 0 Global Step: 4390 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:47,764-Speed 5193.34 samples/sec Loss 15.9023 LearningRate 0.0974 Epoch: 0 Global Step: 4400 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:49,726-Speed 5221.06 samples/sec Loss 16.0143 LearningRate 0.0974 Epoch: 0 Global Step: 4410 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:51,700-Speed 5186.91 samples/sec Loss 15.8370 LearningRate 0.0974 Epoch: 0 Global Step: 4420 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:53,674-Speed 5190.74 samples/sec Loss 15.8690 LearningRate 0.0974 Epoch: 0 Global Step: 4430 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:55,652-Speed 5178.91 samples/sec Loss 15.8401 LearningRate 0.0974 Epoch: 0 Global Step: 4440 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:57,625-Speed 5191.27 samples/sec Loss 15.9229 LearningRate 0.0974 Epoch: 0 Global Step: 4450 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:32:59,594-Speed 5204.59 samples/sec Loss 15.9359 LearningRate 0.0973 Epoch: 0 Global Step: 4460 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:01,556-Speed 5220.22 samples/sec Loss 15.7350 LearningRate 0.0973 Epoch: 0 Global Step: 4470 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-10 23:33:03,523-Speed 5207.34 samples/sec Loss 15.5878 LearningRate 0.0973 Epoch: 0 Global Step: 4480 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:05,497-Speed 5188.81 samples/sec Loss 15.7661 LearningRate 0.0973 Epoch: 0 Global Step: 4490 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:07,462-Speed 5213.42 samples/sec Loss 15.8331 LearningRate 0.0973 Epoch: 0 Global Step: 4500 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:09,437-Speed 5184.93 samples/sec Loss 15.7773 LearningRate 0.0973 Epoch: 0 Global Step: 4510 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:11,413-Speed 5185.01 samples/sec Loss 15.6785 LearningRate 0.0973 Epoch: 0 Global Step: 4520 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:13,375-Speed 5219.61 samples/sec Loss 15.7624 LearningRate 0.0973 Epoch: 0 Global Step: 4530 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:15,340-Speed 5214.31 samples/sec Loss 15.4155 LearningRate 0.0973 Epoch: 0 Global Step: 4540 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:17,317-Speed 5182.35 samples/sec Loss 15.7209 LearningRate 0.0973 Epoch: 0 Global Step: 4550 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:19,280-Speed 5217.30 samples/sec Loss 15.4555 LearningRate 0.0973 Epoch: 0 Global Step: 4560 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:21,241-Speed 5222.59 samples/sec Loss 15.4792 LearningRate 0.0973 Epoch: 0 Global Step: 4570 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:23,210-Speed 5204.06 samples/sec Loss 15.4301 LearningRate 0.0973 Epoch: 0 Global Step: 4580 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:25,184-Speed 5188.16 samples/sec Loss 15.4305 LearningRate 0.0973 Epoch: 0 Global Step: 4590 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:27,148-Speed 5216.89 samples/sec Loss 15.5246 LearningRate 0.0973 Epoch: 0 Global Step: 4600 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:29,111-Speed 5217.99 samples/sec Loss 15.3661 LearningRate 0.0973 Epoch: 0 Global Step: 4610 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:31,072-Speed 5221.36 samples/sec Loss 15.2834 LearningRate 0.0973 Epoch: 0 Global Step: 4620 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:33,037-Speed 5213.92 samples/sec Loss 15.3603 LearningRate 0.0972 Epoch: 0 Global Step: 4630 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:35,002-Speed 5212.67 samples/sec Loss 15.4198 LearningRate 0.0972 Epoch: 0 Global Step: 4640 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:36,987-Speed 5161.42 samples/sec Loss 15.3442 LearningRate 0.0972 Epoch: 0 Global Step: 4650 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:38,951-Speed 5214.73 samples/sec Loss 15.4019 LearningRate 0.0972 Epoch: 0 Global Step: 4660 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:40,921-Speed 5200.65 samples/sec Loss 15.3864 LearningRate 0.0972 Epoch: 0 Global Step: 4670 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:42,877-Speed 5237.05 samples/sec Loss 15.3309 LearningRate 0.0972 Epoch: 0 Global Step: 4680 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:44,839-Speed 5221.66 samples/sec Loss 15.3021 LearningRate 0.0972 Epoch: 0 Global Step: 4690 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:46,807-Speed 5203.75 samples/sec Loss 15.3320 LearningRate 0.0972 Epoch: 0 Global Step: 4700 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:48,773-Speed 5211.80 samples/sec Loss 15.4559 LearningRate 0.0972 Epoch: 0 Global Step: 4710 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:50,742-Speed 5201.94 samples/sec Loss 15.2118 LearningRate 0.0972 Epoch: 0 Global Step: 4720 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:52,705-Speed 5216.84 samples/sec Loss 15.1208 LearningRate 0.0972 Epoch: 0 Global Step: 4730 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:54,680-Speed 5187.13 samples/sec Loss 15.2101 LearningRate 0.0972 Epoch: 0 Global Step: 4740 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:56,644-Speed 5214.86 samples/sec Loss 15.0802 LearningRate 0.0972 Epoch: 0 Global Step: 4750 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:33:58,609-Speed 5212.09 samples/sec Loss 15.1627 LearningRate 0.0972 Epoch: 0 Global Step: 4760 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:34:00,577-Speed 5205.35 samples/sec Loss 15.0160 LearningRate 0.0972 Epoch: 0 Global Step: 4770 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:34:02,548-Speed 5198.84 samples/sec Loss 15.2001 LearningRate 0.0972 Epoch: 0 Global Step: 4780 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:34:04,524-Speed 5184.61 samples/sec Loss 15.0811 LearningRate 0.0972 Epoch: 0 Global Step: 4790 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:34:06,488-Speed 5215.51 samples/sec Loss 14.9064 LearningRate 0.0971 Epoch: 0 Global Step: 4800 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:34:08,450-Speed 5219.03 samples/sec Loss 15.1112 LearningRate 0.0971 Epoch: 0 Global Step: 4810 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:34:10,416-Speed 5212.04 samples/sec Loss 14.8449 LearningRate 0.0971 Epoch: 0 Global Step: 4820 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:34:12,381-Speed 5212.36 samples/sec Loss 14.9346 LearningRate 0.0971 Epoch: 0 Global Step: 4830 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:34:14,361-Speed 5172.92 samples/sec Loss 14.9687 LearningRate 0.0971 Epoch: 0 Global Step: 4840 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:34:16,335-Speed 5189.87 samples/sec Loss 14.9376 LearningRate 0.0971 Epoch: 0 Global Step: 4850 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:34:18,297-Speed 5221.43 samples/sec Loss 15.0476 LearningRate 0.0971 Epoch: 0 Global Step: 4860 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:34:20,261-Speed 5213.57 samples/sec Loss 14.8509 LearningRate 0.0971 Epoch: 0 Global Step: 4870 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:34:22,252-Speed 5146.12 samples/sec Loss 14.7685 LearningRate 0.0971 Epoch: 0 Global Step: 4880 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:34:24,239-Speed 5154.61 samples/sec Loss 15.0382 LearningRate 0.0971 Epoch: 0 Global Step: 4890 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:34:26,207-Speed 5204.57 samples/sec Loss 14.9368 LearningRate 0.0971 Epoch: 0 Global Step: 4900 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:34:28,178-Speed 5197.82 samples/sec Loss 14.9747 LearningRate 0.0971 Epoch: 0 Global Step: 4910 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:34:30,143-Speed 5213.46 samples/sec Loss 14.7955 LearningRate 0.0971 Epoch: 0 Global Step: 4920 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:34:32,101-Speed 5230.18 samples/sec Loss 14.8457 LearningRate 0.0971 Epoch: 0 Global Step: 4930 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:34:34,070-Speed 5201.78 samples/sec Loss 14.8298 LearningRate 0.0971 Epoch: 0 Global Step: 4940 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-10 23:34:36,040-Speed 5200.82 samples/sec Loss 14.9018 LearningRate 0.0971 Epoch: 0 Global Step: 4950 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-10 23:34:38,022-Speed 5169.67 samples/sec Loss 14.8069 LearningRate 0.0971 Epoch: 0 Global Step: 4960 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-10 23:34:39,996-Speed 5188.23 samples/sec Loss 14.7943 LearningRate 0.0970 Epoch: 0 Global Step: 4970 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-10 23:34:41,977-Speed 5171.14 samples/sec Loss 14.6602 LearningRate 0.0970 Epoch: 0 Global Step: 4980 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-10 23:34:43,944-Speed 5207.20 samples/sec Loss 14.5891 LearningRate 0.0970 Epoch: 0 Global Step: 4990 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-10 23:34:45,934-Speed 5147.90 samples/sec Loss 14.6627 LearningRate 0.0970 Epoch: 0 Global Step: 5000 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-10 23:34:47,915-Speed 5172.46 samples/sec Loss 14.6992 LearningRate 0.0970 Epoch: 0 Global Step: 5010 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-10 23:34:49,879-Speed 5213.70 samples/sec Loss 14.5601 LearningRate 0.0970 Epoch: 0 Global Step: 5020 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-10 23:34:51,863-Speed 5163.98 samples/sec Loss 14.7013 LearningRate 0.0970 Epoch: 0 Global Step: 5030 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-10 23:34:53,831-Speed 5202.72 samples/sec Loss 14.6380 LearningRate 0.0970 Epoch: 0 Global Step: 5040 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:34:55,810-Speed 5178.27 samples/sec Loss 14.6246 LearningRate 0.0970 Epoch: 0 Global Step: 5050 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:34:57,774-Speed 5213.88 samples/sec Loss 14.5476 LearningRate 0.0970 Epoch: 0 Global Step: 5060 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:34:59,744-Speed 5199.16 samples/sec Loss 14.4136 LearningRate 0.0970 Epoch: 0 Global Step: 5070 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:35:01,717-Speed 5193.80 samples/sec Loss 14.5267 LearningRate 0.0970 Epoch: 0 Global Step: 5080 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:35:03,697-Speed 5173.05 samples/sec Loss 14.6102 LearningRate 0.0970 Epoch: 0 Global Step: 5090 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:35:05,662-Speed 5213.94 samples/sec Loss 14.6588 LearningRate 0.0970 Epoch: 0 Global Step: 5100 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:35:07,628-Speed 5210.06 samples/sec Loss 14.5198 LearningRate 0.0970 Epoch: 0 Global Step: 5110 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:35:09,594-Speed 5211.22 samples/sec Loss 14.4835 LearningRate 0.0970 Epoch: 0 Global Step: 5120 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:35:11,563-Speed 5201.26 samples/sec Loss 14.3863 LearningRate 0.0970 Epoch: 0 Global Step: 5130 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:35:13,554-Speed 5144.94 samples/sec Loss 14.3628 LearningRate 0.0969 Epoch: 0 Global Step: 5140 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:35:15,523-Speed 5202.61 samples/sec Loss 14.4774 LearningRate 0.0969 Epoch: 0 Global Step: 5150 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:35:17,490-Speed 5207.37 samples/sec Loss 14.4593 LearningRate 0.0969 Epoch: 0 Global Step: 5160 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:35:19,464-Speed 5190.34 samples/sec Loss 14.3958 LearningRate 0.0969 Epoch: 0 Global Step: 5170 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:35:21,442-Speed 5178.04 samples/sec Loss 14.5278 LearningRate 0.0969 Epoch: 0 Global Step: 5180 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:35:23,407-Speed 5211.81 samples/sec Loss 14.1991 LearningRate 0.0969 Epoch: 0 Global Step: 5190 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:35:25,384-Speed 5182.68 samples/sec Loss 14.3725 LearningRate 0.0969 Epoch: 0 Global Step: 5200 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:35:27,348-Speed 5213.79 samples/sec Loss 14.2451 LearningRate 0.0969 Epoch: 0 Global Step: 5210 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:35:29,316-Speed 5204.97 samples/sec Loss 14.2814 LearningRate 0.0969 Epoch: 0 Global Step: 5220 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:35:31,284-Speed 5205.60 samples/sec Loss 14.2880 LearningRate 0.0969 Epoch: 0 Global Step: 5230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:35:33,246-Speed 5221.84 samples/sec Loss 14.2664 LearningRate 0.0969 Epoch: 0 Global Step: 5240 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:35:35,216-Speed 5199.77 samples/sec Loss 14.1602 LearningRate 0.0969 Epoch: 0 Global Step: 5250 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:35:37,188-Speed 5194.30 samples/sec Loss 14.1455 LearningRate 0.0969 Epoch: 0 Global Step: 5260 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:35:39,157-Speed 5201.83 samples/sec Loss 14.0438 LearningRate 0.0969 Epoch: 0 Global Step: 5270 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:35:41,135-Speed 5178.41 samples/sec Loss 14.3817 LearningRate 0.0969 Epoch: 0 Global Step: 5280 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:35:43,111-Speed 5183.26 samples/sec Loss 14.1583 LearningRate 0.0969 Epoch: 0 Global Step: 5290 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:35:45,086-Speed 5185.98 samples/sec Loss 14.0795 LearningRate 0.0968 Epoch: 0 Global Step: 5300 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:35:47,076-Speed 5150.64 samples/sec Loss 14.0891 LearningRate 0.0968 Epoch: 0 Global Step: 5310 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:35:49,052-Speed 5182.65 samples/sec Loss 14.0896 LearningRate 0.0968 Epoch: 0 Global Step: 5320 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:35:51,021-Speed 5202.16 samples/sec Loss 13.8735 LearningRate 0.0968 Epoch: 0 Global Step: 5330 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:35:52,985-Speed 5215.92 samples/sec Loss 13.9887 LearningRate 0.0968 Epoch: 0 Global Step: 5340 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-10 23:35:54,947-Speed 5220.52 samples/sec Loss 14.1025 LearningRate 0.0968 Epoch: 0 Global Step: 5350 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:35:56,914-Speed 5208.26 samples/sec Loss 14.0018 LearningRate 0.0968 Epoch: 0 Global Step: 5360 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:35:58,903-Speed 5150.31 samples/sec Loss 14.0199 LearningRate 0.0968 Epoch: 0 Global Step: 5370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:00,876-Speed 5190.54 samples/sec Loss 13.9623 LearningRate 0.0968 Epoch: 0 Global Step: 5380 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:02,860-Speed 5162.50 samples/sec Loss 13.9484 LearningRate 0.0968 Epoch: 0 Global Step: 5390 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:04,824-Speed 5215.24 samples/sec Loss 13.8479 LearningRate 0.0968 Epoch: 0 Global Step: 5400 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:06,812-Speed 5155.65 samples/sec Loss 13.9839 LearningRate 0.0968 Epoch: 0 Global Step: 5410 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:08,790-Speed 5178.00 samples/sec Loss 13.9218 LearningRate 0.0968 Epoch: 0 Global Step: 5420 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:10,762-Speed 5193.30 samples/sec Loss 13.8789 LearningRate 0.0968 Epoch: 0 Global Step: 5430 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:12,731-Speed 5203.72 samples/sec Loss 13.9477 LearningRate 0.0968 Epoch: 0 Global Step: 5440 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:14,691-Speed 5225.48 samples/sec Loss 13.6694 LearningRate 0.0968 Epoch: 0 Global Step: 5450 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:16,659-Speed 5205.55 samples/sec Loss 13.9811 LearningRate 0.0968 Epoch: 0 Global Step: 5460 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:18,635-Speed 5184.05 samples/sec Loss 13.8613 LearningRate 0.0967 Epoch: 0 Global Step: 5470 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:20,602-Speed 5206.45 samples/sec Loss 13.7701 LearningRate 0.0967 Epoch: 0 Global Step: 5480 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:22,566-Speed 5216.92 samples/sec Loss 13.8518 LearningRate 0.0967 Epoch: 0 Global Step: 5490 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:24,534-Speed 5203.52 samples/sec Loss 13.6369 LearningRate 0.0967 Epoch: 0 Global Step: 5500 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:26,499-Speed 5213.74 samples/sec Loss 13.9031 LearningRate 0.0967 Epoch: 0 Global Step: 5510 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:28,467-Speed 5205.44 samples/sec Loss 13.9353 LearningRate 0.0967 Epoch: 0 Global Step: 5520 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:30,442-Speed 5187.72 samples/sec Loss 13.7969 LearningRate 0.0967 Epoch: 0 Global Step: 5530 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:32,413-Speed 5196.36 samples/sec Loss 13.6460 LearningRate 0.0967 Epoch: 0 Global Step: 5540 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:34,392-Speed 5174.48 samples/sec Loss 13.7291 LearningRate 0.0967 Epoch: 0 Global Step: 5550 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:36,374-Speed 5168.82 samples/sec Loss 13.7462 LearningRate 0.0967 Epoch: 0 Global Step: 5560 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:38,346-Speed 5196.16 samples/sec Loss 13.7402 LearningRate 0.0967 Epoch: 0 Global Step: 5570 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:40,331-Speed 5158.95 samples/sec Loss 13.7081 LearningRate 0.0967 Epoch: 0 Global Step: 5580 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:42,298-Speed 5208.17 samples/sec Loss 13.5473 LearningRate 0.0967 Epoch: 0 Global Step: 5590 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:44,265-Speed 5206.27 samples/sec Loss 13.5375 LearningRate 0.0967 Epoch: 0 Global Step: 5600 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:46,258-Speed 5140.07 samples/sec Loss 13.7245 LearningRate 0.0967 Epoch: 0 Global Step: 5610 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:48,244-Speed 5159.33 samples/sec Loss 13.6685 LearningRate 0.0967 Epoch: 0 Global Step: 5620 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:50,220-Speed 5184.08 samples/sec Loss 13.4218 LearningRate 0.0967 Epoch: 0 Global Step: 5630 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:52,188-Speed 5203.86 samples/sec Loss 13.4701 LearningRate 0.0966 Epoch: 0 Global Step: 5640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:54,155-Speed 5209.51 samples/sec Loss 13.4430 LearningRate 0.0966 Epoch: 0 Global Step: 5650 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-10 23:36:56,114-Speed 5227.12 samples/sec Loss 13.7220 LearningRate 0.0966 Epoch: 0 Global Step: 5660 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:36:58,083-Speed 5203.23 samples/sec Loss 13.7582 LearningRate 0.0966 Epoch: 0 Global Step: 5670 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:37:00,051-Speed 5206.08 samples/sec Loss 13.4418 LearningRate 0.0966 Epoch: 0 Global Step: 5680 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:37:02,024-Speed 5190.80 samples/sec Loss 13.4739 LearningRate 0.0966 Epoch: 0 Global Step: 5690 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:37:03,992-Speed 5204.80 samples/sec Loss 13.5560 LearningRate 0.0966 Epoch: 0 Global Step: 5700 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:37:05,978-Speed 5157.75 samples/sec Loss 13.4428 LearningRate 0.0966 Epoch: 0 Global Step: 5710 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:37:07,955-Speed 5182.65 samples/sec Loss 13.1197 LearningRate 0.0966 Epoch: 0 Global Step: 5720 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:37:09,936-Speed 5169.90 samples/sec Loss 13.3775 LearningRate 0.0966 Epoch: 0 Global Step: 5730 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:37:11,908-Speed 5195.84 samples/sec Loss 13.4458 LearningRate 0.0966 Epoch: 0 Global Step: 5740 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:37:13,883-Speed 5184.47 samples/sec Loss 13.3243 LearningRate 0.0966 Epoch: 0 Global Step: 5750 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:37:15,850-Speed 5209.37 samples/sec Loss 13.4312 LearningRate 0.0966 Epoch: 0 Global Step: 5760 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-10 23:37:17,809-Speed 5228.09 samples/sec Loss 13.4307 LearningRate 0.0966 Epoch: 0 Global Step: 5770 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:37:19,776-Speed 5208.19 samples/sec Loss 13.4157 LearningRate 0.0966 Epoch: 0 Global Step: 5780 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:37:21,746-Speed 5198.31 samples/sec Loss 13.3033 LearningRate 0.0966 Epoch: 0 Global Step: 5790 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:37:23,719-Speed 5193.12 samples/sec Loss 13.2956 LearningRate 0.0966 Epoch: 0 Global Step: 5800 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:37:25,684-Speed 5213.00 samples/sec Loss 13.5222 LearningRate 0.0965 Epoch: 0 Global Step: 5810 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:37:27,653-Speed 5200.67 samples/sec Loss 13.2781 LearningRate 0.0965 Epoch: 0 Global Step: 5820 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:37:29,621-Speed 5207.58 samples/sec Loss 13.3452 LearningRate 0.0965 Epoch: 0 Global Step: 5830 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:37:31,587-Speed 5210.17 samples/sec Loss 13.2310 LearningRate 0.0965 Epoch: 0 Global Step: 5840 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:37:33,558-Speed 5198.58 samples/sec Loss 13.2686 LearningRate 0.0965 Epoch: 0 Global Step: 5850 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:37:35,522-Speed 5215.27 samples/sec Loss 13.3221 LearningRate 0.0965 Epoch: 0 Global Step: 5860 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:37:37,489-Speed 5206.35 samples/sec Loss 13.2379 LearningRate 0.0965 Epoch: 0 Global Step: 5870 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:37:39,456-Speed 5209.46 samples/sec Loss 13.1990 LearningRate 0.0965 Epoch: 0 Global Step: 5880 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:37:41,420-Speed 5213.95 samples/sec Loss 13.3768 LearningRate 0.0965 Epoch: 0 Global Step: 5890 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:37:43,386-Speed 5211.06 samples/sec Loss 13.4074 LearningRate 0.0965 Epoch: 0 Global Step: 5900 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:37:45,365-Speed 5175.49 samples/sec Loss 13.2463 LearningRate 0.0965 Epoch: 0 Global Step: 5910 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:37:47,329-Speed 5215.84 samples/sec Loss 13.0493 LearningRate 0.0965 Epoch: 0 Global Step: 5920 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:37:49,300-Speed 5197.31 samples/sec Loss 13.0308 LearningRate 0.0965 Epoch: 0 Global Step: 5930 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:37:51,264-Speed 5215.17 samples/sec Loss 13.0567 LearningRate 0.0965 Epoch: 0 Global Step: 5940 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:37:53,234-Speed 5199.83 samples/sec Loss 13.1590 LearningRate 0.0965 Epoch: 0 Global Step: 5950 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:37:55,204-Speed 5199.64 samples/sec Loss 13.1775 LearningRate 0.0965 Epoch: 0 Global Step: 5960 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:37:57,171-Speed 5207.79 samples/sec Loss 13.1901 LearningRate 0.0965 Epoch: 0 Global Step: 5970 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:37:59,136-Speed 5212.27 samples/sec Loss 12.9963 LearningRate 0.0964 Epoch: 0 Global Step: 5980 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:38:01,110-Speed 5190.73 samples/sec Loss 12.9922 LearningRate 0.0964 Epoch: 0 Global Step: 5990 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:38:03,086-Speed 5182.89 samples/sec Loss 13.0243 LearningRate 0.0964 Epoch: 0 Global Step: 6000 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:38:29,797-[lfw][6000]XNorm: 23.410911 Training: 2022-04-10 23:38:29,798-[lfw][6000]Accuracy-Flip: 0.99033+-0.00407 Training: 2022-04-10 23:38:29,798-[lfw][6000]Accuracy-Highest: 0.99033 Training: 2022-04-10 23:39:00,663-[cfp_fp][6000]XNorm: 21.270596 Training: 2022-04-10 23:39:00,663-[cfp_fp][6000]Accuracy-Flip: 0.90814+-0.01259 Training: 2022-04-10 23:39:00,664-[cfp_fp][6000]Accuracy-Highest: 0.90814 Training: 2022-04-10 23:39:27,279-[agedb_30][6000]XNorm: 23.067226 Training: 2022-04-10 23:39:27,279-[agedb_30][6000]Accuracy-Flip: 0.93117+-0.01660 Training: 2022-04-10 23:39:27,280-[agedb_30][6000]Accuracy-Highest: 0.93117 Training: 2022-04-10 23:39:29,245-Speed 118.85 samples/sec Loss 12.8775 LearningRate 0.0964 Epoch: 0 Global Step: 6010 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:39:31,209-Speed 5215.00 samples/sec Loss 13.0412 LearningRate 0.0964 Epoch: 0 Global Step: 6020 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:39:33,168-Speed 5228.48 samples/sec Loss 13.1404 LearningRate 0.0964 Epoch: 0 Global Step: 6030 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:39:35,129-Speed 5222.77 samples/sec Loss 12.9736 LearningRate 0.0964 Epoch: 0 Global Step: 6040 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:39:37,103-Speed 5189.63 samples/sec Loss 12.9769 LearningRate 0.0964 Epoch: 0 Global Step: 6050 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:39:39,068-Speed 5213.87 samples/sec Loss 12.9774 LearningRate 0.0964 Epoch: 0 Global Step: 6060 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:39:41,027-Speed 5228.49 samples/sec Loss 13.0093 LearningRate 0.0964 Epoch: 0 Global Step: 6070 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:39:42,997-Speed 5201.12 samples/sec Loss 12.8884 LearningRate 0.0964 Epoch: 0 Global Step: 6080 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:39:44,954-Speed 5234.78 samples/sec Loss 12.9650 LearningRate 0.0964 Epoch: 0 Global Step: 6090 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:39:46,923-Speed 5201.36 samples/sec Loss 12.9092 LearningRate 0.0964 Epoch: 0 Global Step: 6100 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:39:48,882-Speed 5228.43 samples/sec Loss 12.9345 LearningRate 0.0964 Epoch: 0 Global Step: 6110 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:39:50,841-Speed 5228.12 samples/sec Loss 12.7798 LearningRate 0.0964 Epoch: 0 Global Step: 6120 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:39:52,800-Speed 5228.36 samples/sec Loss 12.9765 LearningRate 0.0964 Epoch: 0 Global Step: 6130 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:39:54,755-Speed 5241.93 samples/sec Loss 12.8085 LearningRate 0.0964 Epoch: 0 Global Step: 6140 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:39:56,723-Speed 5202.79 samples/sec Loss 12.8131 LearningRate 0.0963 Epoch: 0 Global Step: 6150 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:39:58,698-Speed 5186.71 samples/sec Loss 12.7032 LearningRate 0.0963 Epoch: 0 Global Step: 6160 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:40:00,671-Speed 5192.53 samples/sec Loss 12.7326 LearningRate 0.0963 Epoch: 0 Global Step: 6170 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:40:02,632-Speed 5225.83 samples/sec Loss 12.8098 LearningRate 0.0963 Epoch: 0 Global Step: 6180 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:40:04,598-Speed 5208.71 samples/sec Loss 12.6682 LearningRate 0.0963 Epoch: 0 Global Step: 6190 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:40:06,558-Speed 5225.40 samples/sec Loss 12.8842 LearningRate 0.0963 Epoch: 0 Global Step: 6200 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:40:08,518-Speed 5226.63 samples/sec Loss 12.8774 LearningRate 0.0963 Epoch: 0 Global Step: 6210 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:40:10,483-Speed 5212.95 samples/sec Loss 12.8631 LearningRate 0.0963 Epoch: 0 Global Step: 6220 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:40:12,448-Speed 5212.72 samples/sec Loss 12.7618 LearningRate 0.0963 Epoch: 0 Global Step: 6230 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:40:14,409-Speed 5224.03 samples/sec Loss 12.5822 LearningRate 0.0963 Epoch: 0 Global Step: 6240 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-10 23:40:16,364-Speed 5240.83 samples/sec Loss 12.8060 LearningRate 0.0963 Epoch: 0 Global Step: 6250 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:40:18,345-Speed 5169.04 samples/sec Loss 12.5896 LearningRate 0.0963 Epoch: 0 Global Step: 6260 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:40:20,318-Speed 5193.81 samples/sec Loss 12.5905 LearningRate 0.0963 Epoch: 0 Global Step: 6270 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:40:22,281-Speed 5216.62 samples/sec Loss 12.6170 LearningRate 0.0963 Epoch: 0 Global Step: 6280 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:40:24,261-Speed 5174.41 samples/sec Loss 12.7838 LearningRate 0.0963 Epoch: 0 Global Step: 6290 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:40:26,239-Speed 5178.44 samples/sec Loss 12.5678 LearningRate 0.0963 Epoch: 0 Global Step: 6300 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:40:28,208-Speed 5203.05 samples/sec Loss 12.5567 LearningRate 0.0963 Epoch: 0 Global Step: 6310 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:40:30,170-Speed 5219.50 samples/sec Loss 12.5296 LearningRate 0.0962 Epoch: 0 Global Step: 6320 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:40:32,134-Speed 5215.12 samples/sec Loss 12.7602 LearningRate 0.0962 Epoch: 0 Global Step: 6330 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:40:34,116-Speed 5169.29 samples/sec Loss 12.6604 LearningRate 0.0962 Epoch: 0 Global Step: 6340 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:40:36,084-Speed 5205.25 samples/sec Loss 12.5127 LearningRate 0.0962 Epoch: 0 Global Step: 6350 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-10 23:40:38,044-Speed 5226.46 samples/sec Loss 12.4760 LearningRate 0.0962 Epoch: 0 Global Step: 6360 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:40:40,020-Speed 5183.04 samples/sec Loss 12.6629 LearningRate 0.0962 Epoch: 0 Global Step: 6370 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:40:41,984-Speed 5217.95 samples/sec Loss 12.5634 LearningRate 0.0962 Epoch: 0 Global Step: 6380 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:40:43,948-Speed 5214.54 samples/sec Loss 12.5923 LearningRate 0.0962 Epoch: 0 Global Step: 6390 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:40:45,908-Speed 5225.39 samples/sec Loss 12.5529 LearningRate 0.0962 Epoch: 0 Global Step: 6400 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:40:47,866-Speed 5232.08 samples/sec Loss 12.3531 LearningRate 0.0962 Epoch: 0 Global Step: 6410 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:40:49,832-Speed 5211.27 samples/sec Loss 12.5694 LearningRate 0.0962 Epoch: 0 Global Step: 6420 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:40:51,795-Speed 5217.65 samples/sec Loss 12.5490 LearningRate 0.0962 Epoch: 0 Global Step: 6430 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:40:53,758-Speed 5217.62 samples/sec Loss 12.5625 LearningRate 0.0962 Epoch: 0 Global Step: 6440 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:40:55,718-Speed 5226.41 samples/sec Loss 12.3563 LearningRate 0.0962 Epoch: 0 Global Step: 6450 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:40:57,687-Speed 5203.19 samples/sec Loss 12.5486 LearningRate 0.0962 Epoch: 0 Global Step: 6460 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:40:59,648-Speed 5222.96 samples/sec Loss 12.4663 LearningRate 0.0962 Epoch: 0 Global Step: 6470 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:41:01,620-Speed 5195.94 samples/sec Loss 12.5435 LearningRate 0.0962 Epoch: 0 Global Step: 6480 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:41:03,580-Speed 5225.63 samples/sec Loss 12.3980 LearningRate 0.0961 Epoch: 0 Global Step: 6490 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:41:05,547-Speed 5206.50 samples/sec Loss 12.2810 LearningRate 0.0961 Epoch: 0 Global Step: 6500 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:41:07,502-Speed 5240.58 samples/sec Loss 12.4092 LearningRate 0.0961 Epoch: 0 Global Step: 6510 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:41:09,459-Speed 5233.58 samples/sec Loss 12.4215 LearningRate 0.0961 Epoch: 0 Global Step: 6520 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:41:11,420-Speed 5223.82 samples/sec Loss 12.3233 LearningRate 0.0961 Epoch: 0 Global Step: 6530 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:41:13,380-Speed 5226.19 samples/sec Loss 12.1970 LearningRate 0.0961 Epoch: 0 Global Step: 6540 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:41:15,339-Speed 5228.16 samples/sec Loss 12.4516 LearningRate 0.0961 Epoch: 0 Global Step: 6550 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:41:17,299-Speed 5226.03 samples/sec Loss 12.3725 LearningRate 0.0961 Epoch: 0 Global Step: 6560 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:41:19,260-Speed 5225.07 samples/sec Loss 12.3089 LearningRate 0.0961 Epoch: 0 Global Step: 6570 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:41:21,235-Speed 5184.86 samples/sec Loss 12.4632 LearningRate 0.0961 Epoch: 0 Global Step: 6580 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:41:23,203-Speed 5205.95 samples/sec Loss 12.2619 LearningRate 0.0961 Epoch: 0 Global Step: 6590 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:41:25,163-Speed 5225.99 samples/sec Loss 12.3370 LearningRate 0.0961 Epoch: 0 Global Step: 6600 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:41:27,122-Speed 5229.54 samples/sec Loss 12.3766 LearningRate 0.0961 Epoch: 0 Global Step: 6610 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:41:29,082-Speed 5225.74 samples/sec Loss 12.2744 LearningRate 0.0961 Epoch: 0 Global Step: 6620 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:41:31,043-Speed 5224.49 samples/sec Loss 12.2685 LearningRate 0.0961 Epoch: 0 Global Step: 6630 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:41:33,006-Speed 5217.08 samples/sec Loss 12.2600 LearningRate 0.0961 Epoch: 0 Global Step: 6640 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:41:34,967-Speed 5223.35 samples/sec Loss 12.2364 LearningRate 0.0961 Epoch: 0 Global Step: 6650 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:41:36,938-Speed 5197.00 samples/sec Loss 12.2371 LearningRate 0.0960 Epoch: 0 Global Step: 6660 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:41:38,909-Speed 5198.18 samples/sec Loss 12.3036 LearningRate 0.0960 Epoch: 0 Global Step: 6670 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:41:40,868-Speed 5227.65 samples/sec Loss 12.2793 LearningRate 0.0960 Epoch: 0 Global Step: 6680 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:41:42,826-Speed 5232.41 samples/sec Loss 12.2492 LearningRate 0.0960 Epoch: 0 Global Step: 6690 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:41:44,796-Speed 5200.33 samples/sec Loss 12.4009 LearningRate 0.0960 Epoch: 0 Global Step: 6700 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:41:46,757-Speed 5224.50 samples/sec Loss 12.2752 LearningRate 0.0960 Epoch: 0 Global Step: 6710 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:41:48,746-Speed 5150.16 samples/sec Loss 12.0802 LearningRate 0.0960 Epoch: 0 Global Step: 6720 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:41:50,704-Speed 5230.98 samples/sec Loss 12.1293 LearningRate 0.0960 Epoch: 0 Global Step: 6730 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:41:52,661-Speed 5232.69 samples/sec Loss 12.0660 LearningRate 0.0960 Epoch: 0 Global Step: 6740 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:41:54,624-Speed 5220.06 samples/sec Loss 12.1408 LearningRate 0.0960 Epoch: 0 Global Step: 6750 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:41:56,580-Speed 5235.93 samples/sec Loss 12.0237 LearningRate 0.0960 Epoch: 0 Global Step: 6760 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:41:58,557-Speed 5181.73 samples/sec Loss 12.1193 LearningRate 0.0960 Epoch: 0 Global Step: 6770 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:42:00,518-Speed 5221.84 samples/sec Loss 11.9454 LearningRate 0.0960 Epoch: 0 Global Step: 6780 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:42:02,492-Speed 5190.81 samples/sec Loss 11.9954 LearningRate 0.0960 Epoch: 0 Global Step: 6790 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:42:04,464-Speed 5194.50 samples/sec Loss 11.9704 LearningRate 0.0960 Epoch: 0 Global Step: 6800 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:42:06,425-Speed 5223.89 samples/sec Loss 12.1111 LearningRate 0.0960 Epoch: 0 Global Step: 6810 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:42:08,376-Speed 5250.31 samples/sec Loss 12.0542 LearningRate 0.0960 Epoch: 0 Global Step: 6820 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:42:10,340-Speed 5216.24 samples/sec Loss 12.0898 LearningRate 0.0959 Epoch: 0 Global Step: 6830 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:42:12,297-Speed 5234.53 samples/sec Loss 11.9146 LearningRate 0.0959 Epoch: 0 Global Step: 6840 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:42:14,259-Speed 5219.68 samples/sec Loss 12.0266 LearningRate 0.0959 Epoch: 0 Global Step: 6850 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:42:16,228-Speed 5202.75 samples/sec Loss 12.1546 LearningRate 0.0959 Epoch: 0 Global Step: 6860 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:42:18,188-Speed 5226.63 samples/sec Loss 11.8351 LearningRate 0.0959 Epoch: 0 Global Step: 6870 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:42:20,146-Speed 5230.30 samples/sec Loss 11.9893 LearningRate 0.0959 Epoch: 0 Global Step: 6880 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:42:22,119-Speed 5192.89 samples/sec Loss 11.9445 LearningRate 0.0959 Epoch: 0 Global Step: 6890 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:42:24,087-Speed 5205.81 samples/sec Loss 12.1166 LearningRate 0.0959 Epoch: 0 Global Step: 6900 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:42:26,052-Speed 5213.72 samples/sec Loss 11.9681 LearningRate 0.0959 Epoch: 0 Global Step: 6910 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:42:28,024-Speed 5194.59 samples/sec Loss 12.0647 LearningRate 0.0959 Epoch: 0 Global Step: 6920 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:42:29,981-Speed 5232.87 samples/sec Loss 11.7327 LearningRate 0.0959 Epoch: 0 Global Step: 6930 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:42:31,950-Speed 5201.95 samples/sec Loss 11.9014 LearningRate 0.0959 Epoch: 0 Global Step: 6940 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:42:33,913-Speed 5217.57 samples/sec Loss 11.7730 LearningRate 0.0959 Epoch: 0 Global Step: 6950 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:42:35,888-Speed 5186.28 samples/sec Loss 12.0480 LearningRate 0.0959 Epoch: 0 Global Step: 6960 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:42:37,889-Speed 5119.37 samples/sec Loss 11.9679 LearningRate 0.0959 Epoch: 0 Global Step: 6970 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:42:39,855-Speed 5212.44 samples/sec Loss 12.0175 LearningRate 0.0959 Epoch: 0 Global Step: 6980 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:42:41,835-Speed 5173.05 samples/sec Loss 11.9614 LearningRate 0.0959 Epoch: 0 Global Step: 6990 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:42:43,808-Speed 5193.02 samples/sec Loss 11.9888 LearningRate 0.0959 Epoch: 0 Global Step: 7000 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:42:45,786-Speed 5178.52 samples/sec Loss 11.8148 LearningRate 0.0958 Epoch: 0 Global Step: 7010 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:42:47,752-Speed 5211.28 samples/sec Loss 11.8274 LearningRate 0.0958 Epoch: 0 Global Step: 7020 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-10 23:42:49,708-Speed 5235.69 samples/sec Loss 11.8861 LearningRate 0.0958 Epoch: 0 Global Step: 7030 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:42:51,678-Speed 5199.55 samples/sec Loss 11.8359 LearningRate 0.0958 Epoch: 0 Global Step: 7040 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:42:53,640-Speed 5219.96 samples/sec Loss 11.8969 LearningRate 0.0958 Epoch: 0 Global Step: 7050 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:42:55,600-Speed 5226.29 samples/sec Loss 11.9559 LearningRate 0.0958 Epoch: 0 Global Step: 7060 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:42:57,565-Speed 5212.67 samples/sec Loss 11.7727 LearningRate 0.0958 Epoch: 0 Global Step: 7070 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:42:59,526-Speed 5224.32 samples/sec Loss 11.7235 LearningRate 0.0958 Epoch: 0 Global Step: 7080 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:43:01,485-Speed 5229.81 samples/sec Loss 11.8597 LearningRate 0.0958 Epoch: 0 Global Step: 7090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:43:03,463-Speed 5178.16 samples/sec Loss 11.8859 LearningRate 0.0958 Epoch: 0 Global Step: 7100 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:43:05,455-Speed 5141.43 samples/sec Loss 11.7212 LearningRate 0.0958 Epoch: 0 Global Step: 7110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:43:07,418-Speed 5220.29 samples/sec Loss 11.8156 LearningRate 0.0958 Epoch: 0 Global Step: 7120 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:43:09,375-Speed 5233.76 samples/sec Loss 11.8026 LearningRate 0.0958 Epoch: 0 Global Step: 7130 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:43:11,335-Speed 5227.61 samples/sec Loss 11.8159 LearningRate 0.0958 Epoch: 0 Global Step: 7140 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:43:13,292-Speed 5232.64 samples/sec Loss 11.7189 LearningRate 0.0958 Epoch: 0 Global Step: 7150 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:43:15,252-Speed 5227.62 samples/sec Loss 11.6459 LearningRate 0.0958 Epoch: 0 Global Step: 7160 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:43:17,212-Speed 5226.34 samples/sec Loss 11.6734 LearningRate 0.0958 Epoch: 0 Global Step: 7170 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:43:19,171-Speed 5227.98 samples/sec Loss 11.6816 LearningRate 0.0957 Epoch: 0 Global Step: 7180 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:43:21,131-Speed 5226.71 samples/sec Loss 11.5711 LearningRate 0.0957 Epoch: 0 Global Step: 7190 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:43:23,101-Speed 5199.84 samples/sec Loss 11.7072 LearningRate 0.0957 Epoch: 0 Global Step: 7200 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:43:25,067-Speed 5209.20 samples/sec Loss 11.6939 LearningRate 0.0957 Epoch: 0 Global Step: 7210 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:43:27,032-Speed 5214.03 samples/sec Loss 11.6956 LearningRate 0.0957 Epoch: 0 Global Step: 7220 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:43:29,004-Speed 5195.15 samples/sec Loss 11.5275 LearningRate 0.0957 Epoch: 0 Global Step: 7230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:43:30,969-Speed 5213.99 samples/sec Loss 11.5773 LearningRate 0.0957 Epoch: 0 Global Step: 7240 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:43:32,928-Speed 5228.99 samples/sec Loss 11.5565 LearningRate 0.0957 Epoch: 0 Global Step: 7250 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:43:34,902-Speed 5187.22 samples/sec Loss 11.5965 LearningRate 0.0957 Epoch: 0 Global Step: 7260 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:43:36,867-Speed 5212.31 samples/sec Loss 11.5194 LearningRate 0.0957 Epoch: 0 Global Step: 7270 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:43:38,828-Speed 5223.85 samples/sec Loss 11.5081 LearningRate 0.0957 Epoch: 0 Global Step: 7280 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:43:40,796-Speed 5206.72 samples/sec Loss 11.6945 LearningRate 0.0957 Epoch: 0 Global Step: 7290 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:43:42,769-Speed 5189.25 samples/sec Loss 11.5025 LearningRate 0.0957 Epoch: 0 Global Step: 7300 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:43:44,738-Speed 5203.95 samples/sec Loss 11.5115 LearningRate 0.0957 Epoch: 0 Global Step: 7310 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:43:46,738-Speed 5122.69 samples/sec Loss 11.6881 LearningRate 0.0957 Epoch: 0 Global Step: 7320 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:43:48,709-Speed 5195.98 samples/sec Loss 11.5881 LearningRate 0.0957 Epoch: 0 Global Step: 7330 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:43:50,711-Speed 5118.86 samples/sec Loss 11.6086 LearningRate 0.0957 Epoch: 0 Global Step: 7340 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:43:52,693-Speed 5165.93 samples/sec Loss 11.5370 LearningRate 0.0956 Epoch: 0 Global Step: 7350 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:43:54,654-Speed 5223.83 samples/sec Loss 11.5159 LearningRate 0.0956 Epoch: 0 Global Step: 7360 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:43:56,621-Speed 5208.08 samples/sec Loss 11.5774 LearningRate 0.0956 Epoch: 0 Global Step: 7370 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:43:58,586-Speed 5213.33 samples/sec Loss 11.5197 LearningRate 0.0956 Epoch: 0 Global Step: 7380 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:44:00,557-Speed 5196.07 samples/sec Loss 11.4808 LearningRate 0.0956 Epoch: 0 Global Step: 7390 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:44:02,528-Speed 5198.01 samples/sec Loss 11.4130 LearningRate 0.0956 Epoch: 0 Global Step: 7400 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:44:04,498-Speed 5199.36 samples/sec Loss 11.4454 LearningRate 0.0956 Epoch: 0 Global Step: 7410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:44:06,459-Speed 5222.38 samples/sec Loss 11.3402 LearningRate 0.0956 Epoch: 0 Global Step: 7420 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:44:08,432-Speed 5192.43 samples/sec Loss 11.6086 LearningRate 0.0956 Epoch: 0 Global Step: 7430 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:44:10,415-Speed 5166.08 samples/sec Loss 11.5857 LearningRate 0.0956 Epoch: 0 Global Step: 7440 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:44:12,380-Speed 5213.36 samples/sec Loss 11.2737 LearningRate 0.0956 Epoch: 0 Global Step: 7450 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:44:14,339-Speed 5228.86 samples/sec Loss 11.4457 LearningRate 0.0956 Epoch: 0 Global Step: 7460 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:44:16,304-Speed 5212.80 samples/sec Loss 11.4273 LearningRate 0.0956 Epoch: 0 Global Step: 7470 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:44:18,266-Speed 5220.35 samples/sec Loss 11.5519 LearningRate 0.0956 Epoch: 0 Global Step: 7480 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:44:20,227-Speed 5225.86 samples/sec Loss 11.4591 LearningRate 0.0956 Epoch: 0 Global Step: 7490 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:44:22,191-Speed 5214.43 samples/sec Loss 11.3257 LearningRate 0.0956 Epoch: 0 Global Step: 7500 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:44:24,154-Speed 5217.66 samples/sec Loss 11.3716 LearningRate 0.0956 Epoch: 0 Global Step: 7510 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:44:26,116-Speed 5219.89 samples/sec Loss 11.3121 LearningRate 0.0955 Epoch: 0 Global Step: 7520 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:44:28,082-Speed 5210.85 samples/sec Loss 11.4180 LearningRate 0.0955 Epoch: 0 Global Step: 7530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:44:30,059-Speed 5182.07 samples/sec Loss 11.4536 LearningRate 0.0955 Epoch: 0 Global Step: 7540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:44:32,022-Speed 5220.52 samples/sec Loss 11.4172 LearningRate 0.0955 Epoch: 0 Global Step: 7550 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:44:33,998-Speed 5183.43 samples/sec Loss 11.4480 LearningRate 0.0955 Epoch: 0 Global Step: 7560 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:44:35,961-Speed 5216.72 samples/sec Loss 11.4788 LearningRate 0.0955 Epoch: 0 Global Step: 7570 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:44:37,923-Speed 5220.77 samples/sec Loss 11.3584 LearningRate 0.0955 Epoch: 0 Global Step: 7580 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:44:39,916-Speed 5140.07 samples/sec Loss 11.3012 LearningRate 0.0955 Epoch: 0 Global Step: 7590 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:44:41,901-Speed 5159.79 samples/sec Loss 11.3105 LearningRate 0.0955 Epoch: 0 Global Step: 7600 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:44:43,868-Speed 5207.96 samples/sec Loss 11.2751 LearningRate 0.0955 Epoch: 0 Global Step: 7610 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:44:45,839-Speed 5197.68 samples/sec Loss 11.2279 LearningRate 0.0955 Epoch: 0 Global Step: 7620 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:44:47,799-Speed 5225.53 samples/sec Loss 11.2060 LearningRate 0.0955 Epoch: 0 Global Step: 7630 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:44:49,777-Speed 5180.60 samples/sec Loss 11.2399 LearningRate 0.0955 Epoch: 0 Global Step: 7640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:44:51,746-Speed 5202.92 samples/sec Loss 11.1777 LearningRate 0.0955 Epoch: 0 Global Step: 7650 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:44:53,720-Speed 5188.06 samples/sec Loss 11.1529 LearningRate 0.0955 Epoch: 0 Global Step: 7660 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:44:55,683-Speed 5219.06 samples/sec Loss 11.4119 LearningRate 0.0955 Epoch: 0 Global Step: 7670 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:44:57,655-Speed 5192.90 samples/sec Loss 11.2805 LearningRate 0.0955 Epoch: 0 Global Step: 7680 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:44:59,619-Speed 5216.92 samples/sec Loss 11.2718 LearningRate 0.0954 Epoch: 0 Global Step: 7690 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:45:01,584-Speed 5212.29 samples/sec Loss 11.1330 LearningRate 0.0954 Epoch: 0 Global Step: 7700 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:45:03,557-Speed 5191.06 samples/sec Loss 11.1452 LearningRate 0.0954 Epoch: 0 Global Step: 7710 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:45:05,523-Speed 5211.60 samples/sec Loss 11.1504 LearningRate 0.0954 Epoch: 0 Global Step: 7720 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:45:07,484-Speed 5222.97 samples/sec Loss 11.2616 LearningRate 0.0954 Epoch: 0 Global Step: 7730 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:45:09,450-Speed 5210.06 samples/sec Loss 11.2616 LearningRate 0.0954 Epoch: 0 Global Step: 7740 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:45:11,415-Speed 5213.67 samples/sec Loss 11.2230 LearningRate 0.0954 Epoch: 0 Global Step: 7750 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:45:13,391-Speed 5184.61 samples/sec Loss 11.1756 LearningRate 0.0954 Epoch: 0 Global Step: 7760 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:45:15,358-Speed 5207.52 samples/sec Loss 11.1238 LearningRate 0.0954 Epoch: 0 Global Step: 7770 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:45:17,321-Speed 5217.62 samples/sec Loss 11.0329 LearningRate 0.0954 Epoch: 0 Global Step: 7780 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:45:19,278-Speed 5233.60 samples/sec Loss 11.2437 LearningRate 0.0954 Epoch: 0 Global Step: 7790 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:45:21,244-Speed 5210.77 samples/sec Loss 11.1807 LearningRate 0.0954 Epoch: 0 Global Step: 7800 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:45:23,210-Speed 5210.18 samples/sec Loss 11.1156 LearningRate 0.0954 Epoch: 0 Global Step: 7810 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:45:25,184-Speed 5188.91 samples/sec Loss 10.9682 LearningRate 0.0954 Epoch: 0 Global Step: 7820 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:45:27,157-Speed 5192.37 samples/sec Loss 11.2149 LearningRate 0.0954 Epoch: 0 Global Step: 7830 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:45:29,116-Speed 5226.98 samples/sec Loss 11.0889 LearningRate 0.0954 Epoch: 0 Global Step: 7840 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:45:31,078-Speed 5222.66 samples/sec Loss 11.1160 LearningRate 0.0954 Epoch: 0 Global Step: 7850 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:45:33,042-Speed 5215.18 samples/sec Loss 11.3775 LearningRate 0.0953 Epoch: 0 Global Step: 7860 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:45:35,018-Speed 5186.30 samples/sec Loss 11.0216 LearningRate 0.0953 Epoch: 0 Global Step: 7870 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:45:36,979-Speed 5221.49 samples/sec Loss 11.0605 LearningRate 0.0953 Epoch: 0 Global Step: 7880 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:45:38,945-Speed 5212.51 samples/sec Loss 11.0918 LearningRate 0.0953 Epoch: 0 Global Step: 7890 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:45:40,916-Speed 5195.18 samples/sec Loss 11.0222 LearningRate 0.0953 Epoch: 0 Global Step: 7900 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:45:42,878-Speed 5220.95 samples/sec Loss 10.9590 LearningRate 0.0953 Epoch: 0 Global Step: 7910 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:45:44,851-Speed 5193.17 samples/sec Loss 11.0220 LearningRate 0.0953 Epoch: 0 Global Step: 7920 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:45:46,842-Speed 5143.51 samples/sec Loss 11.0604 LearningRate 0.0953 Epoch: 0 Global Step: 7930 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:45:48,816-Speed 5190.68 samples/sec Loss 11.0027 LearningRate 0.0953 Epoch: 0 Global Step: 7940 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:45:50,781-Speed 5211.95 samples/sec Loss 10.9443 LearningRate 0.0953 Epoch: 0 Global Step: 7950 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:45:52,753-Speed 5193.89 samples/sec Loss 10.8503 LearningRate 0.0953 Epoch: 0 Global Step: 7960 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:45:54,717-Speed 5215.78 samples/sec Loss 10.9934 LearningRate 0.0953 Epoch: 0 Global Step: 7970 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:45:56,680-Speed 5218.10 samples/sec Loss 10.9275 LearningRate 0.0953 Epoch: 0 Global Step: 7980 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:45:58,636-Speed 5238.19 samples/sec Loss 10.8363 LearningRate 0.0953 Epoch: 0 Global Step: 7990 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:46:00,607-Speed 5198.21 samples/sec Loss 11.0066 LearningRate 0.0953 Epoch: 0 Global Step: 8000 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:46:27,132-[lfw][8000]XNorm: 22.673254 Training: 2022-04-10 23:46:27,133-[lfw][8000]Accuracy-Flip: 0.99350+-0.00404 Training: 2022-04-10 23:46:27,133-[lfw][8000]Accuracy-Highest: 0.99350 Training: 2022-04-10 23:46:57,902-[cfp_fp][8000]XNorm: 20.982483 Training: 2022-04-10 23:46:57,902-[cfp_fp][8000]Accuracy-Flip: 0.91100+-0.01229 Training: 2022-04-10 23:46:57,903-[cfp_fp][8000]Accuracy-Highest: 0.91100 Training: 2022-04-10 23:47:24,666-[agedb_30][8000]XNorm: 22.578265 Training: 2022-04-10 23:47:24,667-[agedb_30][8000]Accuracy-Flip: 0.94083+-0.01355 Training: 2022-04-10 23:47:24,668-[agedb_30][8000]Accuracy-Highest: 0.94083 Training: 2022-04-10 23:47:26,650-Speed 119.01 samples/sec Loss 10.7484 LearningRate 0.0953 Epoch: 0 Global Step: 8010 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:47:28,616-Speed 5208.51 samples/sec Loss 11.0226 LearningRate 0.0953 Epoch: 0 Global Step: 8020 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:47:30,586-Speed 5198.72 samples/sec Loss 10.9716 LearningRate 0.0952 Epoch: 0 Global Step: 8030 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:47:32,552-Speed 5210.45 samples/sec Loss 10.8200 LearningRate 0.0952 Epoch: 0 Global Step: 8040 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:47:34,518-Speed 5210.81 samples/sec Loss 10.9221 LearningRate 0.0952 Epoch: 0 Global Step: 8050 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:47:36,492-Speed 5190.75 samples/sec Loss 10.8677 LearningRate 0.0952 Epoch: 0 Global Step: 8060 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:47:38,481-Speed 5149.13 samples/sec Loss 11.0833 LearningRate 0.0952 Epoch: 0 Global Step: 8070 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:47:40,447-Speed 5211.25 samples/sec Loss 10.9839 LearningRate 0.0952 Epoch: 0 Global Step: 8080 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:47:42,409-Speed 5219.87 samples/sec Loss 10.8409 LearningRate 0.0952 Epoch: 0 Global Step: 8090 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:47:44,371-Speed 5220.50 samples/sec Loss 11.0080 LearningRate 0.0952 Epoch: 0 Global Step: 8100 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:47:46,354-Speed 5167.08 samples/sec Loss 10.9720 LearningRate 0.0952 Epoch: 0 Global Step: 8110 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:47:48,343-Speed 5151.60 samples/sec Loss 10.8505 LearningRate 0.0952 Epoch: 0 Global Step: 8120 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:47:50,311-Speed 5203.38 samples/sec Loss 10.9147 LearningRate 0.0952 Epoch: 0 Global Step: 8130 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:47:52,286-Speed 5185.67 samples/sec Loss 10.8203 LearningRate 0.0952 Epoch: 0 Global Step: 8140 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:47:54,265-Speed 5176.36 samples/sec Loss 10.8189 LearningRate 0.0952 Epoch: 0 Global Step: 8150 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:47:56,236-Speed 5196.78 samples/sec Loss 10.8050 LearningRate 0.0952 Epoch: 0 Global Step: 8160 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:47:58,206-Speed 5200.83 samples/sec Loss 10.7922 LearningRate 0.0952 Epoch: 0 Global Step: 8170 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:48:00,187-Speed 5170.35 samples/sec Loss 10.8822 LearningRate 0.0952 Epoch: 0 Global Step: 8180 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:48:02,172-Speed 5161.46 samples/sec Loss 10.8443 LearningRate 0.0952 Epoch: 0 Global Step: 8190 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:48:04,155-Speed 5168.88 samples/sec Loss 10.9688 LearningRate 0.0951 Epoch: 0 Global Step: 8200 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:48:06,148-Speed 5141.27 samples/sec Loss 10.7657 LearningRate 0.0951 Epoch: 0 Global Step: 8210 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:48:08,128-Speed 5173.05 samples/sec Loss 10.8585 LearningRate 0.0951 Epoch: 0 Global Step: 8220 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:48:10,101-Speed 5191.97 samples/sec Loss 10.8987 LearningRate 0.0951 Epoch: 0 Global Step: 8230 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:48:12,088-Speed 5155.63 samples/sec Loss 10.8834 LearningRate 0.0951 Epoch: 0 Global Step: 8240 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:48:14,062-Speed 5187.90 samples/sec Loss 10.8857 LearningRate 0.0951 Epoch: 0 Global Step: 8250 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:48:16,077-Speed 5082.79 samples/sec Loss 10.8627 LearningRate 0.0951 Epoch: 0 Global Step: 8260 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:48:18,077-Speed 5122.59 samples/sec Loss 10.7442 LearningRate 0.0951 Epoch: 0 Global Step: 8270 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:48:20,071-Speed 5136.44 samples/sec Loss 10.9115 LearningRate 0.0951 Epoch: 0 Global Step: 8280 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:48:22,062-Speed 5146.27 samples/sec Loss 10.6457 LearningRate 0.0951 Epoch: 0 Global Step: 8290 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:48:24,044-Speed 5168.97 samples/sec Loss 10.7962 LearningRate 0.0951 Epoch: 0 Global Step: 8300 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-10 23:48:26,022-Speed 5177.25 samples/sec Loss 10.8166 LearningRate 0.0951 Epoch: 0 Global Step: 8310 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-10 23:48:28,014-Speed 5142.65 samples/sec Loss 10.6097 LearningRate 0.0951 Epoch: 0 Global Step: 8320 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:48:29,992-Speed 5180.18 samples/sec Loss 10.5514 LearningRate 0.0951 Epoch: 0 Global Step: 8330 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:48:31,971-Speed 5175.75 samples/sec Loss 10.6915 LearningRate 0.0951 Epoch: 0 Global Step: 8340 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:48:33,945-Speed 5188.75 samples/sec Loss 10.6387 LearningRate 0.0951 Epoch: 0 Global Step: 8350 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:48:35,921-Speed 5183.58 samples/sec Loss 10.6332 LearningRate 0.0951 Epoch: 0 Global Step: 8360 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:48:37,907-Speed 5158.30 samples/sec Loss 10.7014 LearningRate 0.0950 Epoch: 0 Global Step: 8370 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:48:39,887-Speed 5172.40 samples/sec Loss 10.7456 LearningRate 0.0950 Epoch: 0 Global Step: 8380 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:48:41,876-Speed 5151.05 samples/sec Loss 10.7076 LearningRate 0.0950 Epoch: 0 Global Step: 8390 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:48:43,857-Speed 5171.49 samples/sec Loss 10.7395 LearningRate 0.0950 Epoch: 0 Global Step: 8400 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:48:45,833-Speed 5183.51 samples/sec Loss 10.5926 LearningRate 0.0950 Epoch: 0 Global Step: 8410 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:48:47,806-Speed 5192.03 samples/sec Loss 10.6003 LearningRate 0.0950 Epoch: 0 Global Step: 8420 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-10 23:48:49,770-Speed 5216.81 samples/sec Loss 10.6042 LearningRate 0.0950 Epoch: 0 Global Step: 8430 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:48:51,750-Speed 5172.71 samples/sec Loss 10.7036 LearningRate 0.0950 Epoch: 0 Global Step: 8440 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:48:53,710-Speed 5224.71 samples/sec Loss 10.6270 LearningRate 0.0950 Epoch: 0 Global Step: 8450 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:48:55,687-Speed 5181.45 samples/sec Loss 10.6999 LearningRate 0.0950 Epoch: 0 Global Step: 8460 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:48:57,663-Speed 5183.97 samples/sec Loss 10.6783 LearningRate 0.0950 Epoch: 0 Global Step: 8470 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:48:59,630-Speed 5207.13 samples/sec Loss 10.6098 LearningRate 0.0950 Epoch: 0 Global Step: 8480 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:49:01,615-Speed 5162.86 samples/sec Loss 10.6932 LearningRate 0.0950 Epoch: 0 Global Step: 8490 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:49:03,580-Speed 5212.15 samples/sec Loss 10.4714 LearningRate 0.0950 Epoch: 0 Global Step: 8500 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:49:05,547-Speed 5207.37 samples/sec Loss 10.6158 LearningRate 0.0950 Epoch: 0 Global Step: 8510 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:49:07,515-Speed 5204.11 samples/sec Loss 10.5543 LearningRate 0.0950 Epoch: 0 Global Step: 8520 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:49:09,479-Speed 5217.98 samples/sec Loss 10.5432 LearningRate 0.0950 Epoch: 0 Global Step: 8530 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:49:11,443-Speed 5213.26 samples/sec Loss 10.5031 LearningRate 0.0949 Epoch: 0 Global Step: 8540 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:49:13,408-Speed 5212.59 samples/sec Loss 10.7474 LearningRate 0.0949 Epoch: 0 Global Step: 8550 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:49:15,377-Speed 5203.71 samples/sec Loss 10.6516 LearningRate 0.0949 Epoch: 0 Global Step: 8560 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:49:17,359-Speed 5168.25 samples/sec Loss 10.5537 LearningRate 0.0949 Epoch: 0 Global Step: 8570 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:49:19,316-Speed 5233.19 samples/sec Loss 10.5065 LearningRate 0.0949 Epoch: 0 Global Step: 8580 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:49:21,286-Speed 5200.15 samples/sec Loss 10.5648 LearningRate 0.0949 Epoch: 0 Global Step: 8590 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:49:23,259-Speed 5192.40 samples/sec Loss 10.5341 LearningRate 0.0949 Epoch: 0 Global Step: 8600 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:49:25,240-Speed 5171.96 samples/sec Loss 10.6571 LearningRate 0.0949 Epoch: 0 Global Step: 8610 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:49:27,218-Speed 5178.12 samples/sec Loss 10.5250 LearningRate 0.0949 Epoch: 0 Global Step: 8620 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:49:29,187-Speed 5203.38 samples/sec Loss 10.4177 LearningRate 0.0949 Epoch: 0 Global Step: 8630 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:49:31,149-Speed 5221.28 samples/sec Loss 10.5071 LearningRate 0.0949 Epoch: 0 Global Step: 8640 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:49:33,109-Speed 5223.51 samples/sec Loss 10.5249 LearningRate 0.0949 Epoch: 0 Global Step: 8650 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:49:35,068-Speed 5229.94 samples/sec Loss 10.6216 LearningRate 0.0949 Epoch: 0 Global Step: 8660 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:49:37,030-Speed 5219.44 samples/sec Loss 10.4342 LearningRate 0.0949 Epoch: 0 Global Step: 8670 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:49:38,994-Speed 5215.39 samples/sec Loss 10.4600 LearningRate 0.0949 Epoch: 0 Global Step: 8680 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:49:40,960-Speed 5211.52 samples/sec Loss 10.4720 LearningRate 0.0949 Epoch: 0 Global Step: 8690 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:49:42,935-Speed 5188.64 samples/sec Loss 10.4096 LearningRate 0.0949 Epoch: 0 Global Step: 8700 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:49:44,913-Speed 5178.56 samples/sec Loss 10.5159 LearningRate 0.0948 Epoch: 0 Global Step: 8710 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:49:46,890-Speed 5181.59 samples/sec Loss 10.4835 LearningRate 0.0948 Epoch: 0 Global Step: 8720 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:49:48,854-Speed 5214.91 samples/sec Loss 10.4729 LearningRate 0.0948 Epoch: 0 Global Step: 8730 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:49:50,816-Speed 5221.56 samples/sec Loss 10.3741 LearningRate 0.0948 Epoch: 0 Global Step: 8740 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:49:52,778-Speed 5218.68 samples/sec Loss 10.3921 LearningRate 0.0948 Epoch: 0 Global Step: 8750 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:49:54,742-Speed 5217.48 samples/sec Loss 10.3239 LearningRate 0.0948 Epoch: 0 Global Step: 8760 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:49:56,704-Speed 5220.32 samples/sec Loss 10.4078 LearningRate 0.0948 Epoch: 0 Global Step: 8770 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:49:58,668-Speed 5215.80 samples/sec Loss 10.4708 LearningRate 0.0948 Epoch: 0 Global Step: 8780 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:50:00,645-Speed 5180.21 samples/sec Loss 10.4436 LearningRate 0.0948 Epoch: 0 Global Step: 8790 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:50:02,626-Speed 5172.11 samples/sec Loss 10.5200 LearningRate 0.0948 Epoch: 0 Global Step: 8800 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:50:04,594-Speed 5204.85 samples/sec Loss 10.4601 LearningRate 0.0948 Epoch: 0 Global Step: 8810 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:50:06,566-Speed 5195.00 samples/sec Loss 10.3216 LearningRate 0.0948 Epoch: 0 Global Step: 8820 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:50:08,530-Speed 5216.36 samples/sec Loss 10.4414 LearningRate 0.0948 Epoch: 0 Global Step: 8830 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:50:10,495-Speed 5211.30 samples/sec Loss 10.3062 LearningRate 0.0948 Epoch: 0 Global Step: 8840 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:50:12,471-Speed 5185.33 samples/sec Loss 10.2915 LearningRate 0.0948 Epoch: 0 Global Step: 8850 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:50:14,432-Speed 5222.89 samples/sec Loss 10.4471 LearningRate 0.0948 Epoch: 0 Global Step: 8860 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:50:16,397-Speed 5211.27 samples/sec Loss 10.3102 LearningRate 0.0948 Epoch: 0 Global Step: 8870 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:50:18,364-Speed 5208.98 samples/sec Loss 10.3055 LearningRate 0.0948 Epoch: 0 Global Step: 8880 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:50:20,327-Speed 5218.28 samples/sec Loss 10.3192 LearningRate 0.0947 Epoch: 0 Global Step: 8890 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:50:22,314-Speed 5155.75 samples/sec Loss 10.2351 LearningRate 0.0947 Epoch: 0 Global Step: 8900 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:50:24,288-Speed 5188.84 samples/sec Loss 10.3218 LearningRate 0.0947 Epoch: 0 Global Step: 8910 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:50:26,256-Speed 5204.97 samples/sec Loss 10.3728 LearningRate 0.0947 Epoch: 0 Global Step: 8920 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:50:28,220-Speed 5215.88 samples/sec Loss 10.4431 LearningRate 0.0947 Epoch: 0 Global Step: 8930 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:50:30,188-Speed 5204.26 samples/sec Loss 10.4728 LearningRate 0.0947 Epoch: 0 Global Step: 8940 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:50:32,156-Speed 5206.62 samples/sec Loss 10.4493 LearningRate 0.0947 Epoch: 0 Global Step: 8950 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:50:34,121-Speed 5212.83 samples/sec Loss 10.2540 LearningRate 0.0947 Epoch: 0 Global Step: 8960 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:50:36,099-Speed 5177.81 samples/sec Loss 10.2102 LearningRate 0.0947 Epoch: 0 Global Step: 8970 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:50:38,076-Speed 5181.29 samples/sec Loss 10.2127 LearningRate 0.0947 Epoch: 0 Global Step: 8980 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:50:40,046-Speed 5199.53 samples/sec Loss 10.3595 LearningRate 0.0947 Epoch: 0 Global Step: 8990 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:50:42,016-Speed 5200.20 samples/sec Loss 10.3882 LearningRate 0.0947 Epoch: 0 Global Step: 9000 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:50:43,974-Speed 5230.19 samples/sec Loss 10.3670 LearningRate 0.0947 Epoch: 0 Global Step: 9010 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:50:45,938-Speed 5217.50 samples/sec Loss 10.1934 LearningRate 0.0947 Epoch: 0 Global Step: 9020 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:50:47,901-Speed 5218.97 samples/sec Loss 10.2287 LearningRate 0.0947 Epoch: 0 Global Step: 9030 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:50:49,866-Speed 5210.90 samples/sec Loss 10.2810 LearningRate 0.0947 Epoch: 0 Global Step: 9040 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:50:51,835-Speed 5203.52 samples/sec Loss 10.1552 LearningRate 0.0947 Epoch: 0 Global Step: 9050 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:50:53,815-Speed 5174.26 samples/sec Loss 10.3293 LearningRate 0.0946 Epoch: 0 Global Step: 9060 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:50:55,782-Speed 5204.91 samples/sec Loss 10.3574 LearningRate 0.0946 Epoch: 0 Global Step: 9070 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:50:57,753-Speed 5197.84 samples/sec Loss 10.3135 LearningRate 0.0946 Epoch: 0 Global Step: 9080 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:50:59,717-Speed 5214.80 samples/sec Loss 10.2398 LearningRate 0.0946 Epoch: 0 Global Step: 9090 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:51:01,686-Speed 5204.68 samples/sec Loss 10.3280 LearningRate 0.0946 Epoch: 0 Global Step: 9100 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:51:03,648-Speed 5221.66 samples/sec Loss 10.2432 LearningRate 0.0946 Epoch: 0 Global Step: 9110 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:51:05,611-Speed 5216.46 samples/sec Loss 10.1396 LearningRate 0.0946 Epoch: 0 Global Step: 9120 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:51:07,576-Speed 5214.89 samples/sec Loss 10.1432 LearningRate 0.0946 Epoch: 0 Global Step: 9130 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:51:09,551-Speed 5185.64 samples/sec Loss 10.1130 LearningRate 0.0946 Epoch: 0 Global Step: 9140 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:51:11,520-Speed 5202.18 samples/sec Loss 10.1810 LearningRate 0.0946 Epoch: 0 Global Step: 9150 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:51:13,485-Speed 5212.67 samples/sec Loss 10.3157 LearningRate 0.0946 Epoch: 0 Global Step: 9160 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:51:15,455-Speed 5200.14 samples/sec Loss 10.1670 LearningRate 0.0946 Epoch: 0 Global Step: 9170 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:51:17,424-Speed 5202.55 samples/sec Loss 10.1677 LearningRate 0.0946 Epoch: 0 Global Step: 9180 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:51:19,388-Speed 5215.92 samples/sec Loss 10.1971 LearningRate 0.0946 Epoch: 0 Global Step: 9190 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:51:21,353-Speed 5212.03 samples/sec Loss 10.2439 LearningRate 0.0946 Epoch: 0 Global Step: 9200 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:51:23,326-Speed 5190.90 samples/sec Loss 10.1963 LearningRate 0.0946 Epoch: 0 Global Step: 9210 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:51:25,298-Speed 5197.31 samples/sec Loss 10.2037 LearningRate 0.0946 Epoch: 0 Global Step: 9220 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:51:27,254-Speed 5235.35 samples/sec Loss 10.2073 LearningRate 0.0945 Epoch: 0 Global Step: 9230 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:51:29,222-Speed 5205.81 samples/sec Loss 10.0458 LearningRate 0.0945 Epoch: 0 Global Step: 9240 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:51:31,194-Speed 5194.81 samples/sec Loss 10.1575 LearningRate 0.0945 Epoch: 0 Global Step: 9250 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:51:33,163-Speed 5202.42 samples/sec Loss 10.2823 LearningRate 0.0945 Epoch: 0 Global Step: 9260 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:51:35,142-Speed 5176.35 samples/sec Loss 10.1626 LearningRate 0.0945 Epoch: 0 Global Step: 9270 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:51:37,108-Speed 5210.78 samples/sec Loss 9.9642 LearningRate 0.0945 Epoch: 0 Global Step: 9280 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:51:39,072-Speed 5213.88 samples/sec Loss 10.1704 LearningRate 0.0945 Epoch: 0 Global Step: 9290 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:51:41,040-Speed 5206.83 samples/sec Loss 10.2728 LearningRate 0.0945 Epoch: 0 Global Step: 9300 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:51:43,006-Speed 5208.28 samples/sec Loss 10.0545 LearningRate 0.0945 Epoch: 0 Global Step: 9310 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:51:44,972-Speed 5211.14 samples/sec Loss 10.1063 LearningRate 0.0945 Epoch: 0 Global Step: 9320 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:51:46,953-Speed 5170.65 samples/sec Loss 10.0568 LearningRate 0.0945 Epoch: 0 Global Step: 9330 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:51:48,922-Speed 5203.41 samples/sec Loss 10.1300 LearningRate 0.0945 Epoch: 0 Global Step: 9340 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:51:50,895-Speed 5192.73 samples/sec Loss 10.0812 LearningRate 0.0945 Epoch: 0 Global Step: 9350 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:51:52,864-Speed 5200.66 samples/sec Loss 10.0439 LearningRate 0.0945 Epoch: 0 Global Step: 9360 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:51:54,831-Speed 5208.99 samples/sec Loss 10.1025 LearningRate 0.0945 Epoch: 0 Global Step: 9370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:51:56,796-Speed 5212.62 samples/sec Loss 10.1072 LearningRate 0.0945 Epoch: 0 Global Step: 9380 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:51:58,766-Speed 5199.84 samples/sec Loss 10.2060 LearningRate 0.0945 Epoch: 0 Global Step: 9390 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:52:00,737-Speed 5195.07 samples/sec Loss 9.9970 LearningRate 0.0944 Epoch: 0 Global Step: 9400 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:52:02,721-Speed 5165.01 samples/sec Loss 9.8192 LearningRate 0.0944 Epoch: 0 Global Step: 9410 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:52:04,700-Speed 5176.53 samples/sec Loss 10.0301 LearningRate 0.0944 Epoch: 0 Global Step: 9420 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:52:06,667-Speed 5205.71 samples/sec Loss 9.9533 LearningRate 0.0944 Epoch: 0 Global Step: 9430 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-10 23:52:08,633-Speed 5213.38 samples/sec Loss 9.9979 LearningRate 0.0944 Epoch: 0 Global Step: 9440 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:52:10,608-Speed 5186.13 samples/sec Loss 10.0570 LearningRate 0.0944 Epoch: 0 Global Step: 9450 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:52:12,574-Speed 5209.34 samples/sec Loss 9.9194 LearningRate 0.0944 Epoch: 0 Global Step: 9460 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:52:14,543-Speed 5202.47 samples/sec Loss 9.9957 LearningRate 0.0944 Epoch: 0 Global Step: 9470 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:52:16,508-Speed 5212.20 samples/sec Loss 10.0004 LearningRate 0.0944 Epoch: 0 Global Step: 9480 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:52:18,491-Speed 5168.00 samples/sec Loss 9.9809 LearningRate 0.0944 Epoch: 0 Global Step: 9490 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:52:20,457-Speed 5209.13 samples/sec Loss 10.0189 LearningRate 0.0944 Epoch: 0 Global Step: 9500 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:52:22,430-Speed 5193.01 samples/sec Loss 10.0233 LearningRate 0.0944 Epoch: 0 Global Step: 9510 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:52:24,404-Speed 5189.01 samples/sec Loss 9.9669 LearningRate 0.0944 Epoch: 0 Global Step: 9520 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:52:26,363-Speed 5228.66 samples/sec Loss 9.9615 LearningRate 0.0944 Epoch: 0 Global Step: 9530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:52:28,348-Speed 5159.34 samples/sec Loss 9.9306 LearningRate 0.0944 Epoch: 0 Global Step: 9540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:52:30,323-Speed 5188.24 samples/sec Loss 9.9598 LearningRate 0.0944 Epoch: 0 Global Step: 9550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:52:32,290-Speed 5208.07 samples/sec Loss 9.9754 LearningRate 0.0944 Epoch: 0 Global Step: 9560 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:52:34,255-Speed 5211.13 samples/sec Loss 9.8500 LearningRate 0.0943 Epoch: 0 Global Step: 9570 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:52:36,224-Speed 5203.60 samples/sec Loss 9.9769 LearningRate 0.0943 Epoch: 0 Global Step: 9580 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:52:38,194-Speed 5198.29 samples/sec Loss 9.8732 LearningRate 0.0943 Epoch: 0 Global Step: 9590 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:52:40,173-Speed 5178.18 samples/sec Loss 9.8827 LearningRate 0.0943 Epoch: 0 Global Step: 9600 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:52:42,141-Speed 5202.49 samples/sec Loss 10.0684 LearningRate 0.0943 Epoch: 0 Global Step: 9610 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:52:44,113-Speed 5196.26 samples/sec Loss 9.8182 LearningRate 0.0943 Epoch: 0 Global Step: 9620 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:52:46,097-Speed 5162.54 samples/sec Loss 9.9258 LearningRate 0.0943 Epoch: 0 Global Step: 9630 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:52:48,086-Speed 5150.36 samples/sec Loss 9.8165 LearningRate 0.0943 Epoch: 0 Global Step: 9640 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:52:50,061-Speed 5186.98 samples/sec Loss 9.8548 LearningRate 0.0943 Epoch: 0 Global Step: 9650 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:52:52,044-Speed 5166.88 samples/sec Loss 9.9433 LearningRate 0.0943 Epoch: 0 Global Step: 9660 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:52:54,027-Speed 5165.18 samples/sec Loss 9.8748 LearningRate 0.0943 Epoch: 0 Global Step: 9670 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:52:55,991-Speed 5213.43 samples/sec Loss 9.8425 LearningRate 0.0943 Epoch: 0 Global Step: 9680 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:52:57,963-Speed 5196.22 samples/sec Loss 9.8512 LearningRate 0.0943 Epoch: 0 Global Step: 9690 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:52:59,980-Speed 5078.43 samples/sec Loss 9.8721 LearningRate 0.0943 Epoch: 0 Global Step: 9700 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:53:01,950-Speed 5199.35 samples/sec Loss 9.9364 LearningRate 0.0943 Epoch: 0 Global Step: 9710 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:53:03,923-Speed 5191.13 samples/sec Loss 9.8444 LearningRate 0.0943 Epoch: 0 Global Step: 9720 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:53:05,905-Speed 5167.74 samples/sec Loss 9.9496 LearningRate 0.0943 Epoch: 0 Global Step: 9730 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:53:07,892-Speed 5156.28 samples/sec Loss 9.8138 LearningRate 0.0942 Epoch: 0 Global Step: 9740 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:53:09,868-Speed 5185.27 samples/sec Loss 9.8097 LearningRate 0.0942 Epoch: 0 Global Step: 9750 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:53:11,844-Speed 5182.31 samples/sec Loss 9.8196 LearningRate 0.0942 Epoch: 0 Global Step: 9760 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:53:13,815-Speed 5197.28 samples/sec Loss 9.8418 LearningRate 0.0942 Epoch: 0 Global Step: 9770 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:53:15,782-Speed 5208.27 samples/sec Loss 9.8590 LearningRate 0.0942 Epoch: 0 Global Step: 9780 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:53:17,751-Speed 5202.89 samples/sec Loss 9.7989 LearningRate 0.0942 Epoch: 0 Global Step: 9790 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:53:19,718-Speed 5205.66 samples/sec Loss 9.7600 LearningRate 0.0942 Epoch: 0 Global Step: 9800 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:53:21,688-Speed 5200.79 samples/sec Loss 9.7654 LearningRate 0.0942 Epoch: 0 Global Step: 9810 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:53:23,658-Speed 5198.51 samples/sec Loss 9.7086 LearningRate 0.0942 Epoch: 0 Global Step: 9820 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:53:25,644-Speed 5158.58 samples/sec Loss 9.8364 LearningRate 0.0942 Epoch: 0 Global Step: 9830 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-10 23:53:27,633-Speed 5150.78 samples/sec Loss 9.7242 LearningRate 0.0942 Epoch: 0 Global Step: 9840 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-10 23:53:29,607-Speed 5189.37 samples/sec Loss 9.8107 LearningRate 0.0942 Epoch: 0 Global Step: 9850 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:53:31,582-Speed 5187.54 samples/sec Loss 9.7373 LearningRate 0.0942 Epoch: 0 Global Step: 9860 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:53:33,558-Speed 5182.24 samples/sec Loss 9.6751 LearningRate 0.0942 Epoch: 0 Global Step: 9870 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:53:35,524-Speed 5210.31 samples/sec Loss 9.8749 LearningRate 0.0942 Epoch: 0 Global Step: 9880 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:53:37,494-Speed 5200.00 samples/sec Loss 9.7434 LearningRate 0.0942 Epoch: 0 Global Step: 9890 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:53:39,461-Speed 5208.38 samples/sec Loss 9.7792 LearningRate 0.0942 Epoch: 0 Global Step: 9900 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:53:41,421-Speed 5224.48 samples/sec Loss 9.7450 LearningRate 0.0942 Epoch: 0 Global Step: 9910 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:53:43,386-Speed 5213.39 samples/sec Loss 9.8940 LearningRate 0.0941 Epoch: 0 Global Step: 9920 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:53:45,353-Speed 5206.97 samples/sec Loss 9.6879 LearningRate 0.0941 Epoch: 0 Global Step: 9930 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:53:47,320-Speed 5209.66 samples/sec Loss 9.7349 LearningRate 0.0941 Epoch: 0 Global Step: 9940 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:53:49,304-Speed 5162.04 samples/sec Loss 9.6579 LearningRate 0.0941 Epoch: 0 Global Step: 9950 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:53:51,270-Speed 5210.89 samples/sec Loss 9.7287 LearningRate 0.0941 Epoch: 0 Global Step: 9960 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:53:53,241-Speed 5196.32 samples/sec Loss 9.5503 LearningRate 0.0941 Epoch: 0 Global Step: 9970 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:53:55,221-Speed 5175.18 samples/sec Loss 9.6724 LearningRate 0.0941 Epoch: 0 Global Step: 9980 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:53:57,187-Speed 5208.11 samples/sec Loss 9.7606 LearningRate 0.0941 Epoch: 0 Global Step: 9990 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:53:59,159-Speed 5194.12 samples/sec Loss 9.7332 LearningRate 0.0941 Epoch: 0 Global Step: 10000 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:54:25,791-[lfw][10000]XNorm: 22.069092 Training: 2022-04-10 23:54:25,792-[lfw][10000]Accuracy-Flip: 0.99567+-0.00318 Training: 2022-04-10 23:54:25,792-[lfw][10000]Accuracy-Highest: 0.99567 Training: 2022-04-10 23:54:56,584-[cfp_fp][10000]XNorm: 19.738330 Training: 2022-04-10 23:54:56,585-[cfp_fp][10000]Accuracy-Flip: 0.93457+-0.01229 Training: 2022-04-10 23:54:56,585-[cfp_fp][10000]Accuracy-Highest: 0.93457 Training: 2022-04-10 23:55:23,265-[agedb_30][10000]XNorm: 21.982636 Training: 2022-04-10 23:55:23,266-[agedb_30][10000]Accuracy-Flip: 0.95100+-0.01119 Training: 2022-04-10 23:55:23,266-[agedb_30][10000]Accuracy-Highest: 0.95100 Training: 2022-04-10 23:55:25,251-Speed 118.94 samples/sec Loss 9.7167 LearningRate 0.0941 Epoch: 0 Global Step: 10010 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:55:27,225-Speed 5190.80 samples/sec Loss 9.5598 LearningRate 0.0941 Epoch: 0 Global Step: 10020 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:55:29,189-Speed 5215.47 samples/sec Loss 9.7586 LearningRate 0.0941 Epoch: 0 Global Step: 10030 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:55:31,152-Speed 5217.08 samples/sec Loss 9.6350 LearningRate 0.0941 Epoch: 0 Global Step: 10040 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:55:33,104-Speed 5247.21 samples/sec Loss 9.6097 LearningRate 0.0941 Epoch: 0 Global Step: 10050 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:55:35,074-Speed 5201.06 samples/sec Loss 9.7271 LearningRate 0.0941 Epoch: 0 Global Step: 10060 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:55:37,056-Speed 5167.13 samples/sec Loss 9.6142 LearningRate 0.0941 Epoch: 0 Global Step: 10070 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:55:39,024-Speed 5205.13 samples/sec Loss 9.6734 LearningRate 0.0941 Epoch: 0 Global Step: 10080 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:55:40,994-Speed 5199.24 samples/sec Loss 9.6975 LearningRate 0.0940 Epoch: 0 Global Step: 10090 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:55:42,963-Speed 5202.18 samples/sec Loss 9.6010 LearningRate 0.0940 Epoch: 0 Global Step: 10100 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:55:44,931-Speed 5205.76 samples/sec Loss 9.7669 LearningRate 0.0940 Epoch: 0 Global Step: 10110 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:55:46,901-Speed 5199.40 samples/sec Loss 9.4509 LearningRate 0.0940 Epoch: 0 Global Step: 10120 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:55:48,873-Speed 5196.18 samples/sec Loss 9.6564 LearningRate 0.0940 Epoch: 0 Global Step: 10130 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:55:50,842-Speed 5202.41 samples/sec Loss 9.7226 LearningRate 0.0940 Epoch: 0 Global Step: 10140 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:55:52,814-Speed 5193.12 samples/sec Loss 9.6855 LearningRate 0.0940 Epoch: 0 Global Step: 10150 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:55:54,785-Speed 5198.51 samples/sec Loss 9.5438 LearningRate 0.0940 Epoch: 0 Global Step: 10160 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:55:56,780-Speed 5135.29 samples/sec Loss 9.6012 LearningRate 0.0940 Epoch: 0 Global Step: 10170 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:55:58,752-Speed 5194.13 samples/sec Loss 9.5308 LearningRate 0.0940 Epoch: 0 Global Step: 10180 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:56:00,725-Speed 5191.20 samples/sec Loss 9.5459 LearningRate 0.0940 Epoch: 0 Global Step: 10190 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:56:02,696-Speed 5196.59 samples/sec Loss 9.6354 LearningRate 0.0940 Epoch: 0 Global Step: 10200 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:56:04,675-Speed 5174.83 samples/sec Loss 9.6678 LearningRate 0.0940 Epoch: 0 Global Step: 10210 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:56:06,655-Speed 5174.38 samples/sec Loss 9.6696 LearningRate 0.0940 Epoch: 0 Global Step: 10220 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:56:08,636-Speed 5170.78 samples/sec Loss 9.4588 LearningRate 0.0940 Epoch: 0 Global Step: 10230 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:56:10,621-Speed 5161.20 samples/sec Loss 9.5413 LearningRate 0.0940 Epoch: 0 Global Step: 10240 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:56:12,618-Speed 5129.18 samples/sec Loss 9.6109 LearningRate 0.0940 Epoch: 0 Global Step: 10250 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:56:14,612-Speed 5138.24 samples/sec Loss 9.5140 LearningRate 0.0939 Epoch: 0 Global Step: 10260 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:56:16,599-Speed 5153.63 samples/sec Loss 9.4870 LearningRate 0.0939 Epoch: 0 Global Step: 10270 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:56:18,577-Speed 5180.79 samples/sec Loss 9.5985 LearningRate 0.0939 Epoch: 0 Global Step: 10280 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:56:20,544-Speed 5207.44 samples/sec Loss 9.6901 LearningRate 0.0939 Epoch: 0 Global Step: 10290 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:56:22,520-Speed 5181.58 samples/sec Loss 9.4973 LearningRate 0.0939 Epoch: 0 Global Step: 10300 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:56:24,503-Speed 5165.80 samples/sec Loss 9.5013 LearningRate 0.0939 Epoch: 0 Global Step: 10310 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:56:26,486-Speed 5167.28 samples/sec Loss 9.4598 LearningRate 0.0939 Epoch: 0 Global Step: 10320 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:56:28,482-Speed 5131.14 samples/sec Loss 9.6714 LearningRate 0.0939 Epoch: 0 Global Step: 10330 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:56:30,473-Speed 5145.62 samples/sec Loss 9.6168 LearningRate 0.0939 Epoch: 0 Global Step: 10340 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:56:32,452-Speed 5177.19 samples/sec Loss 9.4865 LearningRate 0.0939 Epoch: 0 Global Step: 10350 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:56:34,432-Speed 5171.37 samples/sec Loss 9.5068 LearningRate 0.0939 Epoch: 0 Global Step: 10360 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:56:36,445-Speed 5089.03 samples/sec Loss 9.4800 LearningRate 0.0939 Epoch: 0 Global Step: 10370 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:56:38,427-Speed 5170.12 samples/sec Loss 9.5008 LearningRate 0.0939 Epoch: 0 Global Step: 10380 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:56:40,408-Speed 5169.16 samples/sec Loss 9.5365 LearningRate 0.0939 Epoch: 0 Global Step: 10390 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:56:42,386-Speed 5177.94 samples/sec Loss 9.5147 LearningRate 0.0939 Epoch: 0 Global Step: 10400 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:56:44,366-Speed 5173.95 samples/sec Loss 9.5171 LearningRate 0.0939 Epoch: 0 Global Step: 10410 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:56:46,356-Speed 5149.00 samples/sec Loss 9.4941 LearningRate 0.0939 Epoch: 0 Global Step: 10420 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:56:48,352-Speed 5130.84 samples/sec Loss 9.3969 LearningRate 0.0938 Epoch: 0 Global Step: 10430 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:56:50,331-Speed 5177.15 samples/sec Loss 9.3582 LearningRate 0.0938 Epoch: 0 Global Step: 10440 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:56:52,326-Speed 5134.50 samples/sec Loss 9.4669 LearningRate 0.0938 Epoch: 0 Global Step: 10450 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:56:54,311-Speed 5159.93 samples/sec Loss 9.4229 LearningRate 0.0938 Epoch: 0 Global Step: 10460 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:56:56,285-Speed 5188.49 samples/sec Loss 9.4129 LearningRate 0.0938 Epoch: 0 Global Step: 10470 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:56:58,259-Speed 5190.51 samples/sec Loss 9.5228 LearningRate 0.0938 Epoch: 0 Global Step: 10480 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:57:00,238-Speed 5177.03 samples/sec Loss 9.4148 LearningRate 0.0938 Epoch: 0 Global Step: 10490 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:57:02,210-Speed 5193.81 samples/sec Loss 9.4143 LearningRate 0.0938 Epoch: 0 Global Step: 10500 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:57:04,185-Speed 5184.99 samples/sec Loss 9.5161 LearningRate 0.0938 Epoch: 0 Global Step: 10510 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-10 23:57:06,152-Speed 5207.75 samples/sec Loss 9.3612 LearningRate 0.0938 Epoch: 0 Global Step: 10520 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:57:08,123-Speed 5197.85 samples/sec Loss 9.3755 LearningRate 0.0938 Epoch: 0 Global Step: 10530 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-10 23:57:10,101-Speed 5178.85 samples/sec Loss 9.4955 LearningRate 0.0938 Epoch: 0 Global Step: 10540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:57:12,089-Speed 5153.32 samples/sec Loss 9.4824 LearningRate 0.0938 Epoch: 0 Global Step: 10550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:57:14,062-Speed 5190.72 samples/sec Loss 9.4193 LearningRate 0.0938 Epoch: 0 Global Step: 10560 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:57:16,035-Speed 5192.27 samples/sec Loss 9.4951 LearningRate 0.0938 Epoch: 0 Global Step: 10570 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:57:18,004-Speed 5201.74 samples/sec Loss 9.3483 LearningRate 0.0938 Epoch: 0 Global Step: 10580 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:57:19,976-Speed 5194.70 samples/sec Loss 9.5174 LearningRate 0.0938 Epoch: 0 Global Step: 10590 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:57:21,956-Speed 5174.14 samples/sec Loss 9.4109 LearningRate 0.0938 Epoch: 0 Global Step: 10600 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:57:23,922-Speed 5210.98 samples/sec Loss 9.4113 LearningRate 0.0937 Epoch: 0 Global Step: 10610 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:57:25,885-Speed 5217.45 samples/sec Loss 9.3382 LearningRate 0.0937 Epoch: 0 Global Step: 10620 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:57:27,854-Speed 5203.68 samples/sec Loss 9.3105 LearningRate 0.0937 Epoch: 0 Global Step: 10630 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:57:29,819-Speed 5212.50 samples/sec Loss 9.2227 LearningRate 0.0937 Epoch: 0 Global Step: 10640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:57:31,786-Speed 5206.53 samples/sec Loss 9.2839 LearningRate 0.0937 Epoch: 0 Global Step: 10650 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:57:33,754-Speed 5206.69 samples/sec Loss 9.3553 LearningRate 0.0937 Epoch: 0 Global Step: 10660 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:57:35,723-Speed 5199.87 samples/sec Loss 9.3613 LearningRate 0.0937 Epoch: 0 Global Step: 10670 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:57:37,707-Speed 5164.14 samples/sec Loss 9.3387 LearningRate 0.0937 Epoch: 0 Global Step: 10680 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:57:39,679-Speed 5193.83 samples/sec Loss 9.3712 LearningRate 0.0937 Epoch: 0 Global Step: 10690 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:57:41,651-Speed 5194.32 samples/sec Loss 9.3856 LearningRate 0.0937 Epoch: 0 Global Step: 10700 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:57:43,626-Speed 5188.50 samples/sec Loss 9.3714 LearningRate 0.0937 Epoch: 0 Global Step: 10710 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:57:45,598-Speed 5192.64 samples/sec Loss 9.2498 LearningRate 0.0937 Epoch: 0 Global Step: 10720 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-10 23:57:47,550-Speed 5246.73 samples/sec Loss 9.4473 LearningRate 0.0937 Epoch: 0 Global Step: 10730 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:57:49,527-Speed 5182.06 samples/sec Loss 9.2351 LearningRate 0.0937 Epoch: 0 Global Step: 10740 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:57:51,491-Speed 5215.35 samples/sec Loss 9.3426 LearningRate 0.0937 Epoch: 0 Global Step: 10750 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:57:53,459-Speed 5206.18 samples/sec Loss 9.2593 LearningRate 0.0937 Epoch: 0 Global Step: 10760 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:57:55,426-Speed 5207.85 samples/sec Loss 9.1992 LearningRate 0.0937 Epoch: 0 Global Step: 10770 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:57:57,394-Speed 5205.60 samples/sec Loss 9.1776 LearningRate 0.0936 Epoch: 0 Global Step: 10780 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:57:59,371-Speed 5182.09 samples/sec Loss 9.4061 LearningRate 0.0936 Epoch: 0 Global Step: 10790 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:58:01,347-Speed 5182.92 samples/sec Loss 9.3135 LearningRate 0.0936 Epoch: 0 Global Step: 10800 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:58:03,323-Speed 5182.84 samples/sec Loss 9.2741 LearningRate 0.0936 Epoch: 0 Global Step: 10810 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:58:05,290-Speed 5208.40 samples/sec Loss 9.2884 LearningRate 0.0936 Epoch: 0 Global Step: 10820 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:58:07,259-Speed 5200.43 samples/sec Loss 9.2837 LearningRate 0.0936 Epoch: 0 Global Step: 10830 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:58:09,225-Speed 5211.51 samples/sec Loss 9.2047 LearningRate 0.0936 Epoch: 0 Global Step: 10840 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:58:11,191-Speed 5209.34 samples/sec Loss 9.1548 LearningRate 0.0936 Epoch: 0 Global Step: 10850 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:58:13,175-Speed 5165.11 samples/sec Loss 9.3246 LearningRate 0.0936 Epoch: 0 Global Step: 10860 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:58:15,152-Speed 5179.76 samples/sec Loss 9.3882 LearningRate 0.0936 Epoch: 0 Global Step: 10870 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:58:17,134-Speed 5168.64 samples/sec Loss 9.1635 LearningRate 0.0936 Epoch: 0 Global Step: 10880 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:58:19,103-Speed 5204.48 samples/sec Loss 9.1814 LearningRate 0.0936 Epoch: 0 Global Step: 10890 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:58:21,071-Speed 5202.96 samples/sec Loss 9.3091 LearningRate 0.0936 Epoch: 0 Global Step: 10900 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:58:23,034-Speed 5219.06 samples/sec Loss 9.2097 LearningRate 0.0936 Epoch: 0 Global Step: 10910 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:58:25,003-Speed 5203.20 samples/sec Loss 9.3081 LearningRate 0.0936 Epoch: 0 Global Step: 10920 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:58:26,970-Speed 5206.46 samples/sec Loss 9.1470 LearningRate 0.0936 Epoch: 0 Global Step: 10930 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:58:28,940-Speed 5199.35 samples/sec Loss 9.2449 LearningRate 0.0936 Epoch: 0 Global Step: 10940 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:58:30,915-Speed 5187.34 samples/sec Loss 9.2490 LearningRate 0.0935 Epoch: 0 Global Step: 10950 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:58:32,887-Speed 5193.11 samples/sec Loss 9.1899 LearningRate 0.0935 Epoch: 0 Global Step: 10960 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:58:34,863-Speed 5185.58 samples/sec Loss 9.1666 LearningRate 0.0935 Epoch: 0 Global Step: 10970 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:58:36,864-Speed 5118.77 samples/sec Loss 9.2410 LearningRate 0.0935 Epoch: 0 Global Step: 10980 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:58:38,838-Speed 5190.94 samples/sec Loss 9.2464 LearningRate 0.0935 Epoch: 0 Global Step: 10990 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:58:40,831-Speed 5139.51 samples/sec Loss 9.1492 LearningRate 0.0935 Epoch: 0 Global Step: 11000 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:58:42,799-Speed 5203.69 samples/sec Loss 9.2692 LearningRate 0.0935 Epoch: 0 Global Step: 11010 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:58:44,764-Speed 5212.58 samples/sec Loss 9.3502 LearningRate 0.0935 Epoch: 0 Global Step: 11020 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:58:46,743-Speed 5177.44 samples/sec Loss 9.2486 LearningRate 0.0935 Epoch: 0 Global Step: 11030 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:58:48,717-Speed 5189.07 samples/sec Loss 9.2221 LearningRate 0.0935 Epoch: 0 Global Step: 11040 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:58:50,685-Speed 5203.54 samples/sec Loss 9.2830 LearningRate 0.0935 Epoch: 0 Global Step: 11050 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:58:52,666-Speed 5171.95 samples/sec Loss 9.1744 LearningRate 0.0935 Epoch: 0 Global Step: 11060 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:58:54,633-Speed 5205.89 samples/sec Loss 9.0909 LearningRate 0.0935 Epoch: 0 Global Step: 11070 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:58:56,610-Speed 5181.86 samples/sec Loss 9.1859 LearningRate 0.0935 Epoch: 0 Global Step: 11080 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:58:58,594-Speed 5164.76 samples/sec Loss 9.1257 LearningRate 0.0935 Epoch: 0 Global Step: 11090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:59:00,568-Speed 5190.11 samples/sec Loss 9.1346 LearningRate 0.0935 Epoch: 0 Global Step: 11100 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:59:02,544-Speed 5181.46 samples/sec Loss 9.1166 LearningRate 0.0935 Epoch: 0 Global Step: 11110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:59:04,513-Speed 5202.72 samples/sec Loss 9.2084 LearningRate 0.0934 Epoch: 0 Global Step: 11120 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:59:06,482-Speed 5203.39 samples/sec Loss 9.1849 LearningRate 0.0934 Epoch: 0 Global Step: 11130 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:59:08,447-Speed 5213.10 samples/sec Loss 9.2066 LearningRate 0.0934 Epoch: 0 Global Step: 11140 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:59:10,416-Speed 5201.05 samples/sec Loss 9.1636 LearningRate 0.0934 Epoch: 0 Global Step: 11150 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:59:12,404-Speed 5152.09 samples/sec Loss 9.1946 LearningRate 0.0934 Epoch: 0 Global Step: 11160 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:59:14,379-Speed 5186.92 samples/sec Loss 9.1094 LearningRate 0.0934 Epoch: 0 Global Step: 11170 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:59:16,353-Speed 5188.46 samples/sec Loss 9.1360 LearningRate 0.0934 Epoch: 0 Global Step: 11180 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:59:18,322-Speed 5203.19 samples/sec Loss 9.1174 LearningRate 0.0934 Epoch: 0 Global Step: 11190 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:59:20,281-Speed 5230.64 samples/sec Loss 9.1613 LearningRate 0.0934 Epoch: 0 Global Step: 11200 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:59:22,270-Speed 5149.35 samples/sec Loss 9.0408 LearningRate 0.0934 Epoch: 0 Global Step: 11210 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:59:24,244-Speed 5190.12 samples/sec Loss 9.1599 LearningRate 0.0934 Epoch: 0 Global Step: 11220 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:59:26,227-Speed 5165.17 samples/sec Loss 9.0735 LearningRate 0.0934 Epoch: 0 Global Step: 11230 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:59:28,224-Speed 5128.94 samples/sec Loss 9.2234 LearningRate 0.0934 Epoch: 0 Global Step: 11240 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:59:30,204-Speed 5174.95 samples/sec Loss 9.1218 LearningRate 0.0934 Epoch: 0 Global Step: 11250 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:59:32,177-Speed 5191.72 samples/sec Loss 9.0069 LearningRate 0.0934 Epoch: 0 Global Step: 11260 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:59:34,144-Speed 5205.65 samples/sec Loss 9.1015 LearningRate 0.0934 Epoch: 0 Global Step: 11270 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:59:36,123-Speed 5177.15 samples/sec Loss 9.0526 LearningRate 0.0934 Epoch: 0 Global Step: 11280 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:59:38,105-Speed 5167.12 samples/sec Loss 9.1091 LearningRate 0.0934 Epoch: 0 Global Step: 11290 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:59:40,081-Speed 5185.40 samples/sec Loss 9.1192 LearningRate 0.0933 Epoch: 0 Global Step: 11300 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:59:42,051-Speed 5199.99 samples/sec Loss 9.0851 LearningRate 0.0933 Epoch: 0 Global Step: 11310 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:59:44,021-Speed 5199.72 samples/sec Loss 9.0659 LearningRate 0.0933 Epoch: 0 Global Step: 11320 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-10 23:59:45,981-Speed 5224.97 samples/sec Loss 9.1522 LearningRate 0.0933 Epoch: 0 Global Step: 11330 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:59:47,953-Speed 5194.19 samples/sec Loss 9.0574 LearningRate 0.0933 Epoch: 0 Global Step: 11340 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:59:49,921-Speed 5205.70 samples/sec Loss 9.0870 LearningRate 0.0933 Epoch: 0 Global Step: 11350 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:59:51,890-Speed 5203.15 samples/sec Loss 9.0409 LearningRate 0.0933 Epoch: 0 Global Step: 11360 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:59:53,859-Speed 5203.24 samples/sec Loss 9.1446 LearningRate 0.0933 Epoch: 0 Global Step: 11370 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:59:55,839-Speed 5171.46 samples/sec Loss 9.1950 LearningRate 0.0933 Epoch: 0 Global Step: 11380 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:59:57,810-Speed 5198.62 samples/sec Loss 9.0300 LearningRate 0.0933 Epoch: 0 Global Step: 11390 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-10 23:59:59,779-Speed 5201.56 samples/sec Loss 9.1011 LearningRate 0.0933 Epoch: 0 Global Step: 11400 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:00:01,748-Speed 5204.56 samples/sec Loss 9.1627 LearningRate 0.0933 Epoch: 0 Global Step: 11410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:00:03,713-Speed 5211.10 samples/sec Loss 9.0755 LearningRate 0.0933 Epoch: 0 Global Step: 11420 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:00:05,683-Speed 5200.44 samples/sec Loss 9.0787 LearningRate 0.0933 Epoch: 0 Global Step: 11430 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:00:07,650-Speed 5207.87 samples/sec Loss 8.9457 LearningRate 0.0933 Epoch: 0 Global Step: 11440 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:00:09,622-Speed 5194.28 samples/sec Loss 8.9456 LearningRate 0.0933 Epoch: 0 Global Step: 11450 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:00:11,597-Speed 5186.58 samples/sec Loss 9.0506 LearningRate 0.0933 Epoch: 0 Global Step: 11460 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:00:13,567-Speed 5199.56 samples/sec Loss 9.1035 LearningRate 0.0932 Epoch: 0 Global Step: 11470 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:00:15,537-Speed 5199.36 samples/sec Loss 9.0572 LearningRate 0.0932 Epoch: 0 Global Step: 11480 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:00:17,516-Speed 5175.89 samples/sec Loss 9.1363 LearningRate 0.0932 Epoch: 0 Global Step: 11490 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:00:19,489-Speed 5190.72 samples/sec Loss 9.0181 LearningRate 0.0932 Epoch: 0 Global Step: 11500 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:00:21,473-Speed 5165.52 samples/sec Loss 9.1950 LearningRate 0.0932 Epoch: 0 Global Step: 11510 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:00:23,447-Speed 5187.47 samples/sec Loss 8.9539 LearningRate 0.0932 Epoch: 0 Global Step: 11520 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:00:25,416-Speed 5204.33 samples/sec Loss 8.9963 LearningRate 0.0932 Epoch: 0 Global Step: 11530 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:00:27,404-Speed 5151.77 samples/sec Loss 9.0504 LearningRate 0.0932 Epoch: 0 Global Step: 11540 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:00:29,385-Speed 5171.40 samples/sec Loss 8.8252 LearningRate 0.0932 Epoch: 0 Global Step: 11550 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:00:31,357-Speed 5192.31 samples/sec Loss 8.9858 LearningRate 0.0932 Epoch: 0 Global Step: 11560 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:00:33,332-Speed 5187.79 samples/sec Loss 9.1483 LearningRate 0.0932 Epoch: 0 Global Step: 11570 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:00:35,307-Speed 5186.66 samples/sec Loss 9.1010 LearningRate 0.0932 Epoch: 0 Global Step: 11580 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:00:37,282-Speed 5186.01 samples/sec Loss 8.8995 LearningRate 0.0932 Epoch: 0 Global Step: 11590 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:00:39,274-Speed 5141.76 samples/sec Loss 8.9428 LearningRate 0.0932 Epoch: 0 Global Step: 11600 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:00:41,251-Speed 5180.46 samples/sec Loss 8.8776 LearningRate 0.0932 Epoch: 0 Global Step: 11610 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:00:43,224-Speed 5193.83 samples/sec Loss 8.9258 LearningRate 0.0932 Epoch: 0 Global Step: 11620 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:00:45,198-Speed 5189.95 samples/sec Loss 8.8595 LearningRate 0.0932 Epoch: 0 Global Step: 11630 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-11 00:00:47,178-Speed 5172.95 samples/sec Loss 9.0582 LearningRate 0.0931 Epoch: 0 Global Step: 11640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:00:49,148-Speed 5198.12 samples/sec Loss 8.9458 LearningRate 0.0931 Epoch: 0 Global Step: 11650 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:00:51,122-Speed 5191.36 samples/sec Loss 8.9872 LearningRate 0.0931 Epoch: 0 Global Step: 11660 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:00:53,098-Speed 5181.76 samples/sec Loss 8.9336 LearningRate 0.0931 Epoch: 0 Global Step: 11670 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:00:55,063-Speed 5213.34 samples/sec Loss 9.0177 LearningRate 0.0931 Epoch: 0 Global Step: 11680 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:00:57,037-Speed 5188.93 samples/sec Loss 8.9697 LearningRate 0.0931 Epoch: 0 Global Step: 11690 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:00:59,023-Speed 5158.34 samples/sec Loss 8.8511 LearningRate 0.0931 Epoch: 0 Global Step: 11700 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:01:01,007-Speed 5161.81 samples/sec Loss 9.0270 LearningRate 0.0931 Epoch: 0 Global Step: 11710 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:01:02,977-Speed 5199.66 samples/sec Loss 8.8876 LearningRate 0.0931 Epoch: 0 Global Step: 11720 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:01:04,969-Speed 5144.99 samples/sec Loss 9.0283 LearningRate 0.0931 Epoch: 0 Global Step: 11730 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:01:06,941-Speed 5194.57 samples/sec Loss 8.8537 LearningRate 0.0931 Epoch: 0 Global Step: 11740 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:01:08,911-Speed 5197.45 samples/sec Loss 8.8923 LearningRate 0.0931 Epoch: 0 Global Step: 11750 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:01:10,887-Speed 5184.24 samples/sec Loss 8.9637 LearningRate 0.0931 Epoch: 0 Global Step: 11760 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:01:12,859-Speed 5198.89 samples/sec Loss 8.9699 LearningRate 0.0931 Epoch: 0 Global Step: 11770 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:01:14,832-Speed 5189.84 samples/sec Loss 8.8992 LearningRate 0.0931 Epoch: 0 Global Step: 11780 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:01:16,825-Speed 5139.36 samples/sec Loss 9.0149 LearningRate 0.0931 Epoch: 0 Global Step: 11790 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:01:18,797-Speed 5195.47 samples/sec Loss 8.8607 LearningRate 0.0931 Epoch: 0 Global Step: 11800 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:01:20,768-Speed 5196.39 samples/sec Loss 8.8303 LearningRate 0.0930 Epoch: 0 Global Step: 11810 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:01:22,749-Speed 5170.46 samples/sec Loss 8.8932 LearningRate 0.0930 Epoch: 0 Global Step: 11820 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:01:24,735-Speed 5159.97 samples/sec Loss 8.8602 LearningRate 0.0930 Epoch: 0 Global Step: 11830 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:01:26,704-Speed 5201.93 samples/sec Loss 8.7759 LearningRate 0.0930 Epoch: 0 Global Step: 11840 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:01:28,678-Speed 5190.15 samples/sec Loss 8.9265 LearningRate 0.0930 Epoch: 0 Global Step: 11850 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:01:30,658-Speed 5174.44 samples/sec Loss 8.7354 LearningRate 0.0930 Epoch: 0 Global Step: 11860 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:01:32,630-Speed 5192.18 samples/sec Loss 8.8776 LearningRate 0.0930 Epoch: 0 Global Step: 11870 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:01:34,611-Speed 5172.77 samples/sec Loss 8.9146 LearningRate 0.0930 Epoch: 0 Global Step: 11880 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:01:36,581-Speed 5197.90 samples/sec Loss 8.8939 LearningRate 0.0930 Epoch: 0 Global Step: 11890 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:01:38,551-Speed 5200.48 samples/sec Loss 8.7895 LearningRate 0.0930 Epoch: 0 Global Step: 11900 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:01:40,523-Speed 5193.75 samples/sec Loss 8.8247 LearningRate 0.0930 Epoch: 0 Global Step: 11910 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:01:42,504-Speed 5171.32 samples/sec Loss 8.7336 LearningRate 0.0930 Epoch: 0 Global Step: 11920 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:01:44,499-Speed 5133.97 samples/sec Loss 8.8767 LearningRate 0.0930 Epoch: 0 Global Step: 11930 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:01:46,478-Speed 5176.24 samples/sec Loss 8.7834 LearningRate 0.0930 Epoch: 0 Global Step: 11940 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:01:48,442-Speed 5217.15 samples/sec Loss 8.9660 LearningRate 0.0930 Epoch: 0 Global Step: 11950 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:01:50,424-Speed 5166.68 samples/sec Loss 8.8223 LearningRate 0.0930 Epoch: 0 Global Step: 11960 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:01:52,393-Speed 5203.97 samples/sec Loss 8.9481 LearningRate 0.0930 Epoch: 0 Global Step: 11970 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:01:54,361-Speed 5205.00 samples/sec Loss 8.7800 LearningRate 0.0930 Epoch: 0 Global Step: 11980 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:01:56,345-Speed 5162.76 samples/sec Loss 8.8498 LearningRate 0.0929 Epoch: 0 Global Step: 11990 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:01:58,333-Speed 5152.34 samples/sec Loss 8.7447 LearningRate 0.0929 Epoch: 0 Global Step: 12000 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:02:25,123-[lfw][12000]XNorm: 23.293650 Training: 2022-04-11 00:02:25,124-[lfw][12000]Accuracy-Flip: 0.99683+-0.00263 Training: 2022-04-11 00:02:25,124-[lfw][12000]Accuracy-Highest: 0.99683 Training: 2022-04-11 00:02:56,076-[cfp_fp][12000]XNorm: 20.768611 Training: 2022-04-11 00:02:56,076-[cfp_fp][12000]Accuracy-Flip: 0.94629+-0.00721 Training: 2022-04-11 00:02:56,077-[cfp_fp][12000]Accuracy-Highest: 0.94629 Training: 2022-04-11 00:03:22,783-[agedb_30][12000]XNorm: 22.674014 Training: 2022-04-11 00:03:22,784-[agedb_30][12000]Accuracy-Flip: 0.95667+-0.00816 Training: 2022-04-11 00:03:22,784-[agedb_30][12000]Accuracy-Highest: 0.95667 Training: 2022-04-11 00:03:24,774-Speed 118.46 samples/sec Loss 8.8107 LearningRate 0.0929 Epoch: 0 Global Step: 12010 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-11 00:03:26,735-Speed 5223.18 samples/sec Loss 8.8226 LearningRate 0.0929 Epoch: 0 Global Step: 12020 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-11 00:03:28,699-Speed 5215.33 samples/sec Loss 8.8606 LearningRate 0.0929 Epoch: 0 Global Step: 12030 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-11 00:03:30,661-Speed 5220.50 samples/sec Loss 8.8234 LearningRate 0.0929 Epoch: 0 Global Step: 12040 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-11 00:03:32,626-Speed 5212.88 samples/sec Loss 8.8684 LearningRate 0.0929 Epoch: 0 Global Step: 12050 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-11 00:03:34,600-Speed 5189.04 samples/sec Loss 8.8025 LearningRate 0.0929 Epoch: 0 Global Step: 12060 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-11 00:03:36,569-Speed 5201.19 samples/sec Loss 8.8658 LearningRate 0.0929 Epoch: 0 Global Step: 12070 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-11 00:03:38,541-Speed 5196.60 samples/sec Loss 8.5973 LearningRate 0.0929 Epoch: 0 Global Step: 12080 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-11 00:03:40,510-Speed 5201.28 samples/sec Loss 8.8146 LearningRate 0.0929 Epoch: 0 Global Step: 12090 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-11 00:03:42,479-Speed 5202.07 samples/sec Loss 8.8073 LearningRate 0.0929 Epoch: 0 Global Step: 12100 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-11 00:03:44,450-Speed 5196.96 samples/sec Loss 8.8047 LearningRate 0.0929 Epoch: 0 Global Step: 12110 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-11 00:03:46,419-Speed 5204.42 samples/sec Loss 8.8645 LearningRate 0.0929 Epoch: 0 Global Step: 12120 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-11 00:03:48,385-Speed 5209.34 samples/sec Loss 8.6964 LearningRate 0.0929 Epoch: 0 Global Step: 12130 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-11 00:03:50,353-Speed 5205.77 samples/sec Loss 8.6825 LearningRate 0.0929 Epoch: 0 Global Step: 12140 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-11 00:03:52,337-Speed 5162.81 samples/sec Loss 8.6851 LearningRate 0.0929 Epoch: 0 Global Step: 12150 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-11 00:03:54,312-Speed 5186.38 samples/sec Loss 8.7547 LearningRate 0.0928 Epoch: 0 Global Step: 12160 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-11 00:03:56,283-Speed 5195.65 samples/sec Loss 8.7315 LearningRate 0.0928 Epoch: 0 Global Step: 12170 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-11 00:03:58,269-Speed 5157.51 samples/sec Loss 8.7408 LearningRate 0.0928 Epoch: 0 Global Step: 12180 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-11 00:04:00,249-Speed 5176.35 samples/sec Loss 8.7706 LearningRate 0.0928 Epoch: 0 Global Step: 12190 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-11 00:04:02,225-Speed 5182.38 samples/sec Loss 8.7353 LearningRate 0.0928 Epoch: 0 Global Step: 12200 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:04:04,209-Speed 5163.04 samples/sec Loss 8.8154 LearningRate 0.0928 Epoch: 0 Global Step: 12210 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:04:06,185-Speed 5185.14 samples/sec Loss 8.8036 LearningRate 0.0928 Epoch: 0 Global Step: 12220 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:04:08,163-Speed 5177.01 samples/sec Loss 8.5942 LearningRate 0.0928 Epoch: 0 Global Step: 12230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:04:10,144-Speed 5172.46 samples/sec Loss 8.8617 LearningRate 0.0928 Epoch: 0 Global Step: 12240 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:04:12,121-Speed 5181.37 samples/sec Loss 8.7813 LearningRate 0.0928 Epoch: 0 Global Step: 12250 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:04:14,098-Speed 5180.10 samples/sec Loss 8.6295 LearningRate 0.0928 Epoch: 0 Global Step: 12260 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:04:16,079-Speed 5171.21 samples/sec Loss 8.7449 LearningRate 0.0928 Epoch: 0 Global Step: 12270 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:04:18,056-Speed 5179.55 samples/sec Loss 8.7646 LearningRate 0.0928 Epoch: 0 Global Step: 12280 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:04:20,033-Speed 5182.18 samples/sec Loss 8.7530 LearningRate 0.0928 Epoch: 0 Global Step: 12290 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:04:22,011-Speed 5179.33 samples/sec Loss 8.6499 LearningRate 0.0928 Epoch: 0 Global Step: 12300 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:04:23,989-Speed 5178.61 samples/sec Loss 8.6314 LearningRate 0.0928 Epoch: 0 Global Step: 12310 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:04:25,968-Speed 5177.56 samples/sec Loss 8.6793 LearningRate 0.0928 Epoch: 0 Global Step: 12320 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:04:27,948-Speed 5172.21 samples/sec Loss 8.7821 LearningRate 0.0927 Epoch: 0 Global Step: 12330 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:04:29,931-Speed 5165.42 samples/sec Loss 8.7777 LearningRate 0.0927 Epoch: 0 Global Step: 12340 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:04:31,914-Speed 5167.20 samples/sec Loss 8.6241 LearningRate 0.0927 Epoch: 0 Global Step: 12350 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:04:33,897-Speed 5163.64 samples/sec Loss 8.6515 LearningRate 0.0927 Epoch: 0 Global Step: 12360 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:04:35,896-Speed 5124.70 samples/sec Loss 8.6107 LearningRate 0.0927 Epoch: 0 Global Step: 12370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:04:37,872-Speed 5183.12 samples/sec Loss 8.6325 LearningRate 0.0927 Epoch: 0 Global Step: 12380 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:04:39,876-Speed 5112.76 samples/sec Loss 8.6104 LearningRate 0.0927 Epoch: 0 Global Step: 12390 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:04:41,857-Speed 5170.47 samples/sec Loss 8.6813 LearningRate 0.0927 Epoch: 0 Global Step: 12400 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:04:43,856-Speed 5124.82 samples/sec Loss 8.7198 LearningRate 0.0927 Epoch: 0 Global Step: 12410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:04:45,838-Speed 5169.75 samples/sec Loss 8.7362 LearningRate 0.0927 Epoch: 0 Global Step: 12420 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:04:47,825-Speed 5152.77 samples/sec Loss 8.5866 LearningRate 0.0927 Epoch: 0 Global Step: 12430 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:04:49,802-Speed 5181.62 samples/sec Loss 8.6243 LearningRate 0.0927 Epoch: 0 Global Step: 12440 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:04:51,785-Speed 5166.43 samples/sec Loss 8.5635 LearningRate 0.0927 Epoch: 0 Global Step: 12450 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:04:53,769-Speed 5163.81 samples/sec Loss 8.7094 LearningRate 0.0927 Epoch: 0 Global Step: 12460 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:04:55,763-Speed 5136.08 samples/sec Loss 8.6933 LearningRate 0.0927 Epoch: 0 Global Step: 12470 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:04:57,740-Speed 5183.58 samples/sec Loss 8.7643 LearningRate 0.0927 Epoch: 0 Global Step: 12480 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:04:59,719-Speed 5176.05 samples/sec Loss 8.5656 LearningRate 0.0927 Epoch: 0 Global Step: 12490 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:05:01,707-Speed 5152.63 samples/sec Loss 8.5749 LearningRate 0.0927 Epoch: 0 Global Step: 12500 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:05:03,685-Speed 5178.58 samples/sec Loss 8.5422 LearningRate 0.0926 Epoch: 0 Global Step: 12510 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:05:05,667-Speed 5167.12 samples/sec Loss 8.7641 LearningRate 0.0926 Epoch: 0 Global Step: 12520 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:05:07,648-Speed 5169.72 samples/sec Loss 8.5776 LearningRate 0.0926 Epoch: 0 Global Step: 12530 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:05:09,633-Speed 5160.78 samples/sec Loss 8.4784 LearningRate 0.0926 Epoch: 0 Global Step: 12540 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:05:11,633-Speed 5121.34 samples/sec Loss 8.6318 LearningRate 0.0926 Epoch: 0 Global Step: 12550 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:05:13,617-Speed 5165.39 samples/sec Loss 8.5477 LearningRate 0.0926 Epoch: 0 Global Step: 12560 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:05:15,594-Speed 5178.96 samples/sec Loss 8.5665 LearningRate 0.0926 Epoch: 0 Global Step: 12570 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:05:17,568-Speed 5191.26 samples/sec Loss 8.7011 LearningRate 0.0926 Epoch: 0 Global Step: 12580 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:05:19,545-Speed 5181.29 samples/sec Loss 8.5940 LearningRate 0.0926 Epoch: 0 Global Step: 12590 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:05:21,543-Speed 5127.26 samples/sec Loss 8.4905 LearningRate 0.0926 Epoch: 0 Global Step: 12600 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:05:23,518-Speed 5186.27 samples/sec Loss 8.7291 LearningRate 0.0926 Epoch: 0 Global Step: 12610 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:05:25,502-Speed 5160.52 samples/sec Loss 8.5754 LearningRate 0.0926 Epoch: 0 Global Step: 12620 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:05:27,470-Speed 5204.77 samples/sec Loss 8.5217 LearningRate 0.0926 Epoch: 0 Global Step: 12630 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:05:29,446-Speed 5185.10 samples/sec Loss 8.5240 LearningRate 0.0926 Epoch: 0 Global Step: 12640 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:05:31,429-Speed 5166.41 samples/sec Loss 8.5608 LearningRate 0.0926 Epoch: 0 Global Step: 12650 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:05:33,417-Speed 5151.70 samples/sec Loss 8.5360 LearningRate 0.0926 Epoch: 0 Global Step: 12660 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:05:35,404-Speed 5155.59 samples/sec Loss 8.5767 LearningRate 0.0926 Epoch: 0 Global Step: 12670 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:05:37,377-Speed 5192.84 samples/sec Loss 8.6906 LearningRate 0.0925 Epoch: 0 Global Step: 12680 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:05:39,354-Speed 5181.34 samples/sec Loss 8.5994 LearningRate 0.0925 Epoch: 0 Global Step: 12690 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:05:41,327-Speed 5192.09 samples/sec Loss 8.5083 LearningRate 0.0925 Epoch: 0 Global Step: 12700 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:05:43,296-Speed 5202.15 samples/sec Loss 8.6624 LearningRate 0.0925 Epoch: 0 Global Step: 12710 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:05:45,265-Speed 5200.69 samples/sec Loss 8.5883 LearningRate 0.0925 Epoch: 0 Global Step: 12720 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:05:47,234-Speed 5203.68 samples/sec Loss 8.5640 LearningRate 0.0925 Epoch: 0 Global Step: 12730 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:05:49,200-Speed 5209.82 samples/sec Loss 8.5363 LearningRate 0.0925 Epoch: 0 Global Step: 12740 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:05:51,183-Speed 5165.63 samples/sec Loss 8.5663 LearningRate 0.0925 Epoch: 0 Global Step: 12750 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:05:53,178-Speed 5133.92 samples/sec Loss 8.5429 LearningRate 0.0925 Epoch: 0 Global Step: 12760 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:05:55,161-Speed 5166.32 samples/sec Loss 8.6292 LearningRate 0.0925 Epoch: 0 Global Step: 12770 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:05:57,129-Speed 5205.91 samples/sec Loss 8.5763 LearningRate 0.0925 Epoch: 0 Global Step: 12780 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:05:59,092-Speed 5219.46 samples/sec Loss 8.5046 LearningRate 0.0925 Epoch: 0 Global Step: 12790 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:06:01,064-Speed 5193.89 samples/sec Loss 8.5844 LearningRate 0.0925 Epoch: 0 Global Step: 12800 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:06:03,047-Speed 5163.27 samples/sec Loss 8.6223 LearningRate 0.0925 Epoch: 0 Global Step: 12810 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:06:05,015-Speed 5207.00 samples/sec Loss 8.4672 LearningRate 0.0925 Epoch: 0 Global Step: 12820 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:06:06,991-Speed 5183.72 samples/sec Loss 8.5762 LearningRate 0.0925 Epoch: 0 Global Step: 12830 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:06:08,970-Speed 5174.61 samples/sec Loss 8.6245 LearningRate 0.0925 Epoch: 0 Global Step: 12840 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:06:10,949-Speed 5177.26 samples/sec Loss 8.5596 LearningRate 0.0924 Epoch: 0 Global Step: 12850 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:06:12,923-Speed 5189.08 samples/sec Loss 8.5841 LearningRate 0.0924 Epoch: 0 Global Step: 12860 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:06:14,899-Speed 5181.84 samples/sec Loss 8.5218 LearningRate 0.0924 Epoch: 0 Global Step: 12870 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:06:16,874-Speed 5188.82 samples/sec Loss 8.5048 LearningRate 0.0924 Epoch: 0 Global Step: 12880 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:06:18,839-Speed 5212.17 samples/sec Loss 8.4644 LearningRate 0.0924 Epoch: 0 Global Step: 12890 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:06:20,806-Speed 5208.01 samples/sec Loss 8.4981 LearningRate 0.0924 Epoch: 0 Global Step: 12900 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:06:22,773-Speed 5207.29 samples/sec Loss 8.4399 LearningRate 0.0924 Epoch: 0 Global Step: 12910 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:06:24,749-Speed 5185.27 samples/sec Loss 8.5838 LearningRate 0.0924 Epoch: 0 Global Step: 12920 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:06:26,722-Speed 5190.02 samples/sec Loss 8.5780 LearningRate 0.0924 Epoch: 0 Global Step: 12930 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:06:28,696-Speed 5189.96 samples/sec Loss 8.5042 LearningRate 0.0924 Epoch: 0 Global Step: 12940 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:06:30,669-Speed 5192.07 samples/sec Loss 8.5631 LearningRate 0.0924 Epoch: 0 Global Step: 12950 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:06:32,637-Speed 5205.49 samples/sec Loss 8.5397 LearningRate 0.0924 Epoch: 0 Global Step: 12960 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:06:34,611-Speed 5188.38 samples/sec Loss 8.5139 LearningRate 0.0924 Epoch: 0 Global Step: 12970 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:06:36,597-Speed 5157.72 samples/sec Loss 8.5284 LearningRate 0.0924 Epoch: 0 Global Step: 12980 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:06:38,568-Speed 5196.87 samples/sec Loss 8.5450 LearningRate 0.0924 Epoch: 0 Global Step: 12990 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:06:40,544-Speed 5183.88 samples/sec Loss 8.4664 LearningRate 0.0924 Epoch: 0 Global Step: 13000 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:06:42,519-Speed 5187.69 samples/sec Loss 8.5043 LearningRate 0.0924 Epoch: 0 Global Step: 13010 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:06:44,489-Speed 5200.99 samples/sec Loss 8.3449 LearningRate 0.0924 Epoch: 0 Global Step: 13020 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:06:46,469-Speed 5172.93 samples/sec Loss 8.4153 LearningRate 0.0923 Epoch: 0 Global Step: 13030 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:06:48,442-Speed 5191.28 samples/sec Loss 8.4323 LearningRate 0.0923 Epoch: 0 Global Step: 13040 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:06:50,414-Speed 5193.46 samples/sec Loss 8.4250 LearningRate 0.0923 Epoch: 0 Global Step: 13050 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:06:52,383-Speed 5203.02 samples/sec Loss 8.4858 LearningRate 0.0923 Epoch: 0 Global Step: 13060 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:06:54,367-Speed 5162.35 samples/sec Loss 8.3549 LearningRate 0.0923 Epoch: 0 Global Step: 13070 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:06:56,339-Speed 5195.51 samples/sec Loss 8.3753 LearningRate 0.0923 Epoch: 0 Global Step: 13080 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:06:58,323-Speed 5162.76 samples/sec Loss 8.3076 LearningRate 0.0923 Epoch: 0 Global Step: 13090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:07:00,297-Speed 5190.58 samples/sec Loss 8.3393 LearningRate 0.0923 Epoch: 0 Global Step: 13100 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:07:02,272-Speed 5187.03 samples/sec Loss 8.3482 LearningRate 0.0923 Epoch: 0 Global Step: 13110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:07:04,243-Speed 5196.56 samples/sec Loss 8.4620 LearningRate 0.0923 Epoch: 0 Global Step: 13120 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:07:06,215-Speed 5193.11 samples/sec Loss 8.4746 LearningRate 0.0923 Epoch: 0 Global Step: 13130 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:07:08,185-Speed 5199.58 samples/sec Loss 8.4289 LearningRate 0.0923 Epoch: 0 Global Step: 13140 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:07:10,156-Speed 5197.56 samples/sec Loss 8.3576 LearningRate 0.0923 Epoch: 0 Global Step: 13150 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:07:12,129-Speed 5191.15 samples/sec Loss 8.2564 LearningRate 0.0923 Epoch: 0 Global Step: 13160 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-11 00:07:14,104-Speed 5186.13 samples/sec Loss 8.3715 LearningRate 0.0923 Epoch: 0 Global Step: 13170 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-11 00:07:16,075-Speed 5199.14 samples/sec Loss 8.4738 LearningRate 0.0923 Epoch: 0 Global Step: 13180 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:07:18,049-Speed 5187.81 samples/sec Loss 8.4075 LearningRate 0.0923 Epoch: 0 Global Step: 13190 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:07:20,026-Speed 5183.35 samples/sec Loss 8.2814 LearningRate 0.0922 Epoch: 0 Global Step: 13200 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:07:21,997-Speed 5195.56 samples/sec Loss 8.5136 LearningRate 0.0922 Epoch: 0 Global Step: 13210 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:07:23,971-Speed 5190.03 samples/sec Loss 8.4028 LearningRate 0.0922 Epoch: 0 Global Step: 13220 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:07:25,944-Speed 5192.33 samples/sec Loss 8.3162 LearningRate 0.0922 Epoch: 0 Global Step: 13230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:07:27,908-Speed 5213.32 samples/sec Loss 8.4026 LearningRate 0.0922 Epoch: 0 Global Step: 13240 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:07:29,892-Speed 5164.71 samples/sec Loss 8.4436 LearningRate 0.0922 Epoch: 0 Global Step: 13250 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:07:31,864-Speed 5193.65 samples/sec Loss 8.3739 LearningRate 0.0922 Epoch: 0 Global Step: 13260 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:07:33,845-Speed 5170.27 samples/sec Loss 8.4800 LearningRate 0.0922 Epoch: 0 Global Step: 13270 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:07:35,819-Speed 5189.49 samples/sec Loss 8.3304 LearningRate 0.0922 Epoch: 0 Global Step: 13280 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:07:37,788-Speed 5202.87 samples/sec Loss 8.4186 LearningRate 0.0922 Epoch: 0 Global Step: 13290 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:07:39,764-Speed 5184.11 samples/sec Loss 8.3532 LearningRate 0.0922 Epoch: 0 Global Step: 13300 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:07:41,743-Speed 5176.37 samples/sec Loss 8.4844 LearningRate 0.0922 Epoch: 0 Global Step: 13310 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:07:43,713-Speed 5198.78 samples/sec Loss 8.3020 LearningRate 0.0922 Epoch: 0 Global Step: 13320 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:07:45,681-Speed 5205.22 samples/sec Loss 8.4001 LearningRate 0.0922 Epoch: 0 Global Step: 13330 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:07:47,653-Speed 5195.53 samples/sec Loss 8.3310 LearningRate 0.0922 Epoch: 0 Global Step: 13340 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:07:49,622-Speed 5202.20 samples/sec Loss 8.3069 LearningRate 0.0922 Epoch: 0 Global Step: 13350 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:07:51,593-Speed 5195.85 samples/sec Loss 8.2152 LearningRate 0.0922 Epoch: 0 Global Step: 13360 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:07:53,568-Speed 5188.44 samples/sec Loss 8.2439 LearningRate 0.0922 Epoch: 0 Global Step: 13370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:07:55,538-Speed 5197.10 samples/sec Loss 8.4134 LearningRate 0.0921 Epoch: 0 Global Step: 13380 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:07:57,509-Speed 5196.95 samples/sec Loss 8.3077 LearningRate 0.0921 Epoch: 0 Global Step: 13390 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:07:59,483-Speed 5190.48 samples/sec Loss 8.2561 LearningRate 0.0921 Epoch: 0 Global Step: 13400 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:08:01,480-Speed 5130.08 samples/sec Loss 8.3317 LearningRate 0.0921 Epoch: 0 Global Step: 13410 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:08:03,463-Speed 5165.12 samples/sec Loss 8.2587 LearningRate 0.0921 Epoch: 0 Global Step: 13420 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:08:05,433-Speed 5200.49 samples/sec Loss 8.2591 LearningRate 0.0921 Epoch: 0 Global Step: 13430 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:08:07,404-Speed 5195.66 samples/sec Loss 8.2864 LearningRate 0.0921 Epoch: 0 Global Step: 13440 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:08:09,383-Speed 5175.52 samples/sec Loss 8.2926 LearningRate 0.0921 Epoch: 0 Global Step: 13450 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:08:11,361-Speed 5178.59 samples/sec Loss 8.2994 LearningRate 0.0921 Epoch: 0 Global Step: 13460 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:08:13,335-Speed 5191.19 samples/sec Loss 8.3299 LearningRate 0.0921 Epoch: 0 Global Step: 13470 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:08:15,311-Speed 5182.40 samples/sec Loss 8.1928 LearningRate 0.0921 Epoch: 0 Global Step: 13480 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:08:17,283-Speed 5193.93 samples/sec Loss 8.3103 LearningRate 0.0921 Epoch: 0 Global Step: 13490 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:08:19,253-Speed 5200.41 samples/sec Loss 8.1849 LearningRate 0.0921 Epoch: 0 Global Step: 13500 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:08:21,228-Speed 5188.11 samples/sec Loss 8.2252 LearningRate 0.0921 Epoch: 0 Global Step: 13510 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:08:23,204-Speed 5182.75 samples/sec Loss 8.2225 LearningRate 0.0921 Epoch: 0 Global Step: 13520 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:08:25,175-Speed 5197.40 samples/sec Loss 8.2592 LearningRate 0.0921 Epoch: 0 Global Step: 13530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:08:27,148-Speed 5191.42 samples/sec Loss 8.2955 LearningRate 0.0921 Epoch: 0 Global Step: 13540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:08:29,118-Speed 5202.34 samples/sec Loss 8.2605 LearningRate 0.0920 Epoch: 0 Global Step: 13550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:08:31,088-Speed 5199.20 samples/sec Loss 8.3493 LearningRate 0.0920 Epoch: 0 Global Step: 13560 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:08:33,057-Speed 5201.05 samples/sec Loss 8.4467 LearningRate 0.0920 Epoch: 0 Global Step: 13570 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:08:35,038-Speed 5169.71 samples/sec Loss 8.3759 LearningRate 0.0920 Epoch: 0 Global Step: 13580 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:08:37,039-Speed 5119.39 samples/sec Loss 8.3035 LearningRate 0.0920 Epoch: 0 Global Step: 13590 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:08:39,031-Speed 5143.31 samples/sec Loss 8.2864 LearningRate 0.0920 Epoch: 0 Global Step: 13600 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:08:41,023-Speed 5142.09 samples/sec Loss 8.3820 LearningRate 0.0920 Epoch: 0 Global Step: 13610 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:08:43,008-Speed 5159.65 samples/sec Loss 8.2750 LearningRate 0.0920 Epoch: 0 Global Step: 13620 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:08:45,001-Speed 5141.58 samples/sec Loss 8.1692 LearningRate 0.0920 Epoch: 0 Global Step: 13630 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:08:46,973-Speed 5194.78 samples/sec Loss 8.1934 LearningRate 0.0920 Epoch: 0 Global Step: 13640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:08:48,942-Speed 5200.88 samples/sec Loss 8.2390 LearningRate 0.0920 Epoch: 0 Global Step: 13650 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:08:50,905-Speed 5219.05 samples/sec Loss 8.2471 LearningRate 0.0920 Epoch: 0 Global Step: 13660 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:08:52,872-Speed 5207.24 samples/sec Loss 8.2681 LearningRate 0.0920 Epoch: 0 Global Step: 13670 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:08:54,844-Speed 5192.96 samples/sec Loss 8.2383 LearningRate 0.0920 Epoch: 0 Global Step: 13680 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:08:56,816-Speed 5194.26 samples/sec Loss 8.4089 LearningRate 0.0920 Epoch: 0 Global Step: 13690 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:08:58,785-Speed 5203.79 samples/sec Loss 8.2694 LearningRate 0.0920 Epoch: 0 Global Step: 13700 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:09:00,753-Speed 5203.89 samples/sec Loss 8.2595 LearningRate 0.0920 Epoch: 0 Global Step: 13710 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:09:02,724-Speed 5198.73 samples/sec Loss 8.3100 LearningRate 0.0919 Epoch: 0 Global Step: 13720 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:09:04,697-Speed 5190.51 samples/sec Loss 8.2882 LearningRate 0.0919 Epoch: 0 Global Step: 13730 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:09:06,666-Speed 5203.73 samples/sec Loss 8.2707 LearningRate 0.0919 Epoch: 0 Global Step: 13740 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:09:08,648-Speed 5169.06 samples/sec Loss 8.2458 LearningRate 0.0919 Epoch: 0 Global Step: 13750 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:09:10,619-Speed 5196.70 samples/sec Loss 8.2159 LearningRate 0.0919 Epoch: 0 Global Step: 13760 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:09:12,594-Speed 5186.29 samples/sec Loss 8.1638 LearningRate 0.0919 Epoch: 0 Global Step: 13770 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:09:14,568-Speed 5188.04 samples/sec Loss 8.3412 LearningRate 0.0919 Epoch: 0 Global Step: 13780 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:09:16,559-Speed 5144.66 samples/sec Loss 8.2632 LearningRate 0.0919 Epoch: 0 Global Step: 13790 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:09:18,542-Speed 5165.94 samples/sec Loss 8.2050 LearningRate 0.0919 Epoch: 0 Global Step: 13800 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:09:20,514-Speed 5193.56 samples/sec Loss 8.2393 LearningRate 0.0919 Epoch: 0 Global Step: 13810 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:09:22,492-Speed 5179.95 samples/sec Loss 8.1775 LearningRate 0.0919 Epoch: 0 Global Step: 13820 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:09:24,475-Speed 5163.79 samples/sec Loss 8.2442 LearningRate 0.0919 Epoch: 0 Global Step: 13830 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:09:26,452-Speed 5182.35 samples/sec Loss 8.2778 LearningRate 0.0919 Epoch: 0 Global Step: 13840 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:09:28,434-Speed 5170.14 samples/sec Loss 8.2815 LearningRate 0.0919 Epoch: 0 Global Step: 13850 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:09:30,417-Speed 5165.90 samples/sec Loss 8.1452 LearningRate 0.0919 Epoch: 0 Global Step: 13860 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-11 00:09:32,374-Speed 5234.02 samples/sec Loss 8.2136 LearningRate 0.0919 Epoch: 0 Global Step: 13870 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:09:34,351-Speed 5181.04 samples/sec Loss 8.1890 LearningRate 0.0919 Epoch: 0 Global Step: 13880 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:09:36,329-Speed 5177.08 samples/sec Loss 8.2374 LearningRate 0.0919 Epoch: 0 Global Step: 13890 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:09:38,316-Speed 5155.00 samples/sec Loss 8.3872 LearningRate 0.0918 Epoch: 0 Global Step: 13900 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:09:40,293-Speed 5182.03 samples/sec Loss 8.1122 LearningRate 0.0918 Epoch: 0 Global Step: 13910 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:09:42,271-Speed 5179.06 samples/sec Loss 8.1905 LearningRate 0.0918 Epoch: 0 Global Step: 13920 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:09:44,247-Speed 5184.00 samples/sec Loss 8.1770 LearningRate 0.0918 Epoch: 0 Global Step: 13930 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:09:46,234-Speed 5154.82 samples/sec Loss 8.2986 LearningRate 0.0918 Epoch: 0 Global Step: 13940 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:09:48,217-Speed 5166.90 samples/sec Loss 8.1779 LearningRate 0.0918 Epoch: 0 Global Step: 13950 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:09:50,186-Speed 5202.31 samples/sec Loss 8.0985 LearningRate 0.0918 Epoch: 0 Global Step: 13960 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:09:52,156-Speed 5199.40 samples/sec Loss 8.0839 LearningRate 0.0918 Epoch: 0 Global Step: 13970 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:09:54,127-Speed 5197.53 samples/sec Loss 8.1598 LearningRate 0.0918 Epoch: 0 Global Step: 13980 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:09:56,094-Speed 5207.39 samples/sec Loss 8.1969 LearningRate 0.0918 Epoch: 0 Global Step: 13990 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:09:58,072-Speed 5177.78 samples/sec Loss 8.0699 LearningRate 0.0918 Epoch: 0 Global Step: 14000 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:10:24,626-[lfw][14000]XNorm: 21.324625 Training: 2022-04-11 00:10:24,627-[lfw][14000]Accuracy-Flip: 0.99600+-0.00309 Training: 2022-04-11 00:10:24,628-[lfw][14000]Accuracy-Highest: 0.99683 Training: 2022-04-11 00:10:55,329-[cfp_fp][14000]XNorm: 19.522159 Training: 2022-04-11 00:10:55,330-[cfp_fp][14000]Accuracy-Flip: 0.95471+-0.00975 Training: 2022-04-11 00:10:55,330-[cfp_fp][14000]Accuracy-Highest: 0.95471 Training: 2022-04-11 00:11:21,786-[agedb_30][14000]XNorm: 21.308208 Training: 2022-04-11 00:11:21,787-[agedb_30][14000]Accuracy-Flip: 0.96100+-0.00810 Training: 2022-04-11 00:11:21,787-[agedb_30][14000]Accuracy-Highest: 0.96100 Training: 2022-04-11 00:11:23,763-Speed 119.50 samples/sec Loss 8.2451 LearningRate 0.0918 Epoch: 0 Global Step: 14010 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:11:25,737-Speed 5188.70 samples/sec Loss 8.0861 LearningRate 0.0918 Epoch: 0 Global Step: 14020 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:11:27,706-Speed 5203.82 samples/sec Loss 8.1237 LearningRate 0.0918 Epoch: 0 Global Step: 14030 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:11:29,673-Speed 5206.94 samples/sec Loss 8.1984 LearningRate 0.0918 Epoch: 0 Global Step: 14040 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:11:31,637-Speed 5215.73 samples/sec Loss 8.1579 LearningRate 0.0918 Epoch: 0 Global Step: 14050 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:11:33,609-Speed 5193.72 samples/sec Loss 8.0921 LearningRate 0.0918 Epoch: 0 Global Step: 14060 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:11:35,573-Speed 5214.88 samples/sec Loss 8.0921 LearningRate 0.0917 Epoch: 0 Global Step: 14070 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:11:37,543-Speed 5200.94 samples/sec Loss 8.1911 LearningRate 0.0917 Epoch: 0 Global Step: 14080 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:11:39,515-Speed 5194.91 samples/sec Loss 8.2035 LearningRate 0.0917 Epoch: 0 Global Step: 14090 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:11:41,481-Speed 5209.79 samples/sec Loss 8.1428 LearningRate 0.0917 Epoch: 0 Global Step: 14100 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:11:43,451-Speed 5201.55 samples/sec Loss 8.2512 LearningRate 0.0917 Epoch: 0 Global Step: 14110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:11:45,423-Speed 5194.53 samples/sec Loss 8.1599 LearningRate 0.0917 Epoch: 0 Global Step: 14120 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:11:47,394-Speed 5194.40 samples/sec Loss 8.1800 LearningRate 0.0917 Epoch: 0 Global Step: 14130 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:11:49,387-Speed 5141.92 samples/sec Loss 8.1594 LearningRate 0.0917 Epoch: 0 Global Step: 14140 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:11:51,365-Speed 5175.88 samples/sec Loss 8.1951 LearningRate 0.0917 Epoch: 0 Global Step: 14150 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:11:53,334-Speed 5202.76 samples/sec Loss 8.2259 LearningRate 0.0917 Epoch: 0 Global Step: 14160 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:11:55,303-Speed 5203.30 samples/sec Loss 8.0254 LearningRate 0.0917 Epoch: 0 Global Step: 14170 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:11:57,276-Speed 5191.17 samples/sec Loss 8.1469 LearningRate 0.0917 Epoch: 0 Global Step: 14180 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:11:59,270-Speed 5137.42 samples/sec Loss 8.2413 LearningRate 0.0917 Epoch: 0 Global Step: 14190 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:12:01,256-Speed 5159.87 samples/sec Loss 8.1265 LearningRate 0.0917 Epoch: 0 Global Step: 14200 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:12:03,220-Speed 5214.31 samples/sec Loss 8.1689 LearningRate 0.0917 Epoch: 0 Global Step: 14210 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:12:05,185-Speed 5214.39 samples/sec Loss 8.0974 LearningRate 0.0917 Epoch: 0 Global Step: 14220 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:12:07,157-Speed 5194.07 samples/sec Loss 8.1042 LearningRate 0.0917 Epoch: 0 Global Step: 14230 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:12:09,125-Speed 5205.32 samples/sec Loss 8.1553 LearningRate 0.0917 Epoch: 0 Global Step: 14240 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:12:11,105-Speed 5173.64 samples/sec Loss 8.0496 LearningRate 0.0916 Epoch: 0 Global Step: 14250 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:12:13,075-Speed 5199.64 samples/sec Loss 8.1220 LearningRate 0.0916 Epoch: 0 Global Step: 14260 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:12:15,044-Speed 5202.09 samples/sec Loss 7.9487 LearningRate 0.0916 Epoch: 0 Global Step: 14270 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:12:17,030-Speed 5155.87 samples/sec Loss 8.1023 LearningRate 0.0916 Epoch: 0 Global Step: 14280 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:12:19,004-Speed 5190.74 samples/sec Loss 8.0772 LearningRate 0.0916 Epoch: 0 Global Step: 14290 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:12:20,976-Speed 5191.68 samples/sec Loss 8.0133 LearningRate 0.0916 Epoch: 0 Global Step: 14300 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:12:22,958-Speed 5172.91 samples/sec Loss 8.1299 LearningRate 0.0916 Epoch: 0 Global Step: 14310 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:12:24,935-Speed 5180.06 samples/sec Loss 7.9602 LearningRate 0.0916 Epoch: 0 Global Step: 14320 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:12:26,899-Speed 5215.75 samples/sec Loss 8.0497 LearningRate 0.0916 Epoch: 0 Global Step: 14330 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:12:28,868-Speed 5201.85 samples/sec Loss 8.1675 LearningRate 0.0916 Epoch: 0 Global Step: 14340 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:12:30,838-Speed 5202.07 samples/sec Loss 8.0165 LearningRate 0.0916 Epoch: 0 Global Step: 14350 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:12:32,806-Speed 5203.97 samples/sec Loss 8.1738 LearningRate 0.0916 Epoch: 0 Global Step: 14360 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:12:34,786-Speed 5172.42 samples/sec Loss 8.1513 LearningRate 0.0916 Epoch: 0 Global Step: 14370 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:12:36,760-Speed 5188.84 samples/sec Loss 8.1246 LearningRate 0.0916 Epoch: 0 Global Step: 14380 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:12:38,738-Speed 5179.42 samples/sec Loss 8.0381 LearningRate 0.0916 Epoch: 0 Global Step: 14390 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:12:40,711-Speed 5192.54 samples/sec Loss 8.0752 LearningRate 0.0916 Epoch: 0 Global Step: 14400 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:12:42,698-Speed 5155.65 samples/sec Loss 8.0534 LearningRate 0.0916 Epoch: 0 Global Step: 14410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:12:44,669-Speed 5197.18 samples/sec Loss 8.0748 LearningRate 0.0915 Epoch: 0 Global Step: 14420 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:12:46,656-Speed 5154.85 samples/sec Loss 8.0451 LearningRate 0.0915 Epoch: 0 Global Step: 14430 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:12:48,629-Speed 5190.28 samples/sec Loss 8.1029 LearningRate 0.0915 Epoch: 0 Global Step: 14440 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:12:50,601-Speed 5196.49 samples/sec Loss 8.0589 LearningRate 0.0915 Epoch: 0 Global Step: 14450 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:12:52,574-Speed 5192.14 samples/sec Loss 8.1188 LearningRate 0.0915 Epoch: 0 Global Step: 14460 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:12:54,547-Speed 5189.78 samples/sec Loss 8.1322 LearningRate 0.0915 Epoch: 0 Global Step: 14470 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:12:56,518-Speed 5196.36 samples/sec Loss 8.0704 LearningRate 0.0915 Epoch: 0 Global Step: 14480 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:12:58,498-Speed 5174.30 samples/sec Loss 8.1285 LearningRate 0.0915 Epoch: 0 Global Step: 14490 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:13:00,488-Speed 5147.03 samples/sec Loss 8.0941 LearningRate 0.0915 Epoch: 0 Global Step: 14500 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:13:02,466-Speed 5181.41 samples/sec Loss 8.0864 LearningRate 0.0915 Epoch: 0 Global Step: 14510 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:13:04,450-Speed 5162.12 samples/sec Loss 8.0702 LearningRate 0.0915 Epoch: 0 Global Step: 14520 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:13:06,425-Speed 5186.90 samples/sec Loss 7.8732 LearningRate 0.0915 Epoch: 0 Global Step: 14530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:13:08,393-Speed 5203.89 samples/sec Loss 8.0826 LearningRate 0.0915 Epoch: 0 Global Step: 14540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:13:10,363-Speed 5201.85 samples/sec Loss 8.1775 LearningRate 0.0915 Epoch: 0 Global Step: 14550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:13:12,330-Speed 5207.39 samples/sec Loss 8.0407 LearningRate 0.0915 Epoch: 0 Global Step: 14560 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:13:14,298-Speed 5203.81 samples/sec Loss 7.9653 LearningRate 0.0915 Epoch: 0 Global Step: 14570 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:13:16,276-Speed 5178.91 samples/sec Loss 8.0649 LearningRate 0.0915 Epoch: 0 Global Step: 14580 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:13:18,261-Speed 5159.80 samples/sec Loss 8.0665 LearningRate 0.0914 Epoch: 0 Global Step: 14590 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:13:20,249-Speed 5152.27 samples/sec Loss 8.0430 LearningRate 0.0914 Epoch: 0 Global Step: 14600 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:13:22,229-Speed 5172.65 samples/sec Loss 7.9954 LearningRate 0.0914 Epoch: 0 Global Step: 14610 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:13:24,209-Speed 5176.01 samples/sec Loss 7.8291 LearningRate 0.0914 Epoch: 0 Global Step: 14620 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:13:26,186-Speed 5180.37 samples/sec Loss 8.0417 LearningRate 0.0914 Epoch: 0 Global Step: 14630 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:13:28,157-Speed 5196.86 samples/sec Loss 8.0431 LearningRate 0.0914 Epoch: 0 Global Step: 14640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:13:30,131-Speed 5190.46 samples/sec Loss 8.1004 LearningRate 0.0914 Epoch: 0 Global Step: 14650 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:13:32,108-Speed 5180.39 samples/sec Loss 7.9685 LearningRate 0.0914 Epoch: 0 Global Step: 14660 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:13:34,086-Speed 5178.14 samples/sec Loss 7.9073 LearningRate 0.0914 Epoch: 0 Global Step: 14670 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:13:36,060-Speed 5190.46 samples/sec Loss 8.0066 LearningRate 0.0914 Epoch: 0 Global Step: 14680 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:13:38,035-Speed 5185.89 samples/sec Loss 8.0197 LearningRate 0.0914 Epoch: 0 Global Step: 14690 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-11 00:13:39,998-Speed 5219.25 samples/sec Loss 8.0448 LearningRate 0.0914 Epoch: 0 Global Step: 14700 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:13:41,961-Speed 5217.44 samples/sec Loss 7.9415 LearningRate 0.0914 Epoch: 0 Global Step: 14710 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:13:43,935-Speed 5190.58 samples/sec Loss 8.0294 LearningRate 0.0914 Epoch: 0 Global Step: 14720 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:13:45,902-Speed 5207.10 samples/sec Loss 8.0454 LearningRate 0.0914 Epoch: 0 Global Step: 14730 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:13:47,880-Speed 5178.81 samples/sec Loss 7.9579 LearningRate 0.0914 Epoch: 0 Global Step: 14740 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:13:49,885-Speed 5107.69 samples/sec Loss 8.0044 LearningRate 0.0914 Epoch: 0 Global Step: 14750 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:13:51,858-Speed 5192.63 samples/sec Loss 7.9280 LearningRate 0.0914 Epoch: 0 Global Step: 14760 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:13:53,829-Speed 5196.29 samples/sec Loss 8.0198 LearningRate 0.0913 Epoch: 0 Global Step: 14770 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:13:55,805-Speed 5185.27 samples/sec Loss 7.9187 LearningRate 0.0913 Epoch: 0 Global Step: 14780 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:13:57,775-Speed 5198.71 samples/sec Loss 7.9671 LearningRate 0.0913 Epoch: 0 Global Step: 14790 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:13:59,750-Speed 5185.40 samples/sec Loss 8.0213 LearningRate 0.0913 Epoch: 0 Global Step: 14800 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:14:01,747-Speed 5130.74 samples/sec Loss 7.9529 LearningRate 0.0913 Epoch: 0 Global Step: 14810 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:14:03,722-Speed 5187.91 samples/sec Loss 7.9574 LearningRate 0.0913 Epoch: 0 Global Step: 14820 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:14:05,710-Speed 5152.51 samples/sec Loss 7.9839 LearningRate 0.0913 Epoch: 0 Global Step: 14830 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:14:07,684-Speed 5190.07 samples/sec Loss 8.0016 LearningRate 0.0913 Epoch: 0 Global Step: 14840 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:14:09,664-Speed 5173.74 samples/sec Loss 8.0038 LearningRate 0.0913 Epoch: 0 Global Step: 14850 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:14:11,637-Speed 5191.60 samples/sec Loss 7.9704 LearningRate 0.0913 Epoch: 0 Global Step: 14860 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:14:13,613-Speed 5183.21 samples/sec Loss 7.9194 LearningRate 0.0913 Epoch: 0 Global Step: 14870 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:14:15,585-Speed 5192.66 samples/sec Loss 7.9893 LearningRate 0.0913 Epoch: 0 Global Step: 14880 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:14:17,553-Speed 5204.83 samples/sec Loss 8.0771 LearningRate 0.0913 Epoch: 0 Global Step: 14890 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:14:19,515-Speed 5222.07 samples/sec Loss 7.9804 LearningRate 0.0913 Epoch: 0 Global Step: 14900 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:14:21,491-Speed 5183.37 samples/sec Loss 7.9000 LearningRate 0.0913 Epoch: 0 Global Step: 14910 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:14:23,472-Speed 5171.36 samples/sec Loss 8.0532 LearningRate 0.0913 Epoch: 0 Global Step: 14920 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:14:25,461-Speed 5151.10 samples/sec Loss 7.8211 LearningRate 0.0913 Epoch: 0 Global Step: 14930 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:14:27,440-Speed 5175.06 samples/sec Loss 7.9498 LearningRate 0.0912 Epoch: 0 Global Step: 14940 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:14:29,412-Speed 5194.48 samples/sec Loss 7.9409 LearningRate 0.0912 Epoch: 0 Global Step: 14950 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:14:31,393-Speed 5173.83 samples/sec Loss 8.0348 LearningRate 0.0912 Epoch: 0 Global Step: 14960 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:14:33,360-Speed 5208.15 samples/sec Loss 7.9622 LearningRate 0.0912 Epoch: 0 Global Step: 14970 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:14:35,328-Speed 5203.38 samples/sec Loss 7.8296 LearningRate 0.0912 Epoch: 0 Global Step: 14980 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:14:37,316-Speed 5153.91 samples/sec Loss 8.0231 LearningRate 0.0912 Epoch: 0 Global Step: 14990 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:14:39,294-Speed 5178.38 samples/sec Loss 7.8903 LearningRate 0.0912 Epoch: 0 Global Step: 15000 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:14:41,272-Speed 5177.42 samples/sec Loss 7.9374 LearningRate 0.0912 Epoch: 0 Global Step: 15010 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:14:43,247-Speed 5187.08 samples/sec Loss 7.9868 LearningRate 0.0912 Epoch: 0 Global Step: 15020 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:14:45,234-Speed 5154.03 samples/sec Loss 7.8587 LearningRate 0.0912 Epoch: 0 Global Step: 15030 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:14:47,208-Speed 5190.07 samples/sec Loss 8.0489 LearningRate 0.0912 Epoch: 0 Global Step: 15040 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:14:49,188-Speed 5174.18 samples/sec Loss 7.9010 LearningRate 0.0912 Epoch: 0 Global Step: 15050 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:14:51,161-Speed 5190.66 samples/sec Loss 7.9871 LearningRate 0.0912 Epoch: 0 Global Step: 15060 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:14:53,140-Speed 5177.45 samples/sec Loss 8.0115 LearningRate 0.0912 Epoch: 0 Global Step: 15070 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:14:55,118-Speed 5178.38 samples/sec Loss 7.8542 LearningRate 0.0912 Epoch: 0 Global Step: 15080 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:14:57,089-Speed 5197.53 samples/sec Loss 7.9134 LearningRate 0.0912 Epoch: 0 Global Step: 15090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:14:59,060-Speed 5197.07 samples/sec Loss 7.8499 LearningRate 0.0912 Epoch: 0 Global Step: 15100 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-11 00:15:01,018-Speed 5231.81 samples/sec Loss 7.9013 LearningRate 0.0912 Epoch: 0 Global Step: 15110 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:15:02,988-Speed 5198.11 samples/sec Loss 7.8479 LearningRate 0.0911 Epoch: 0 Global Step: 15120 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:15:04,973-Speed 5160.07 samples/sec Loss 7.9078 LearningRate 0.0911 Epoch: 0 Global Step: 15130 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:15:06,953-Speed 5173.60 samples/sec Loss 7.9589 LearningRate 0.0911 Epoch: 0 Global Step: 15140 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:15:08,941-Speed 5153.59 samples/sec Loss 7.8084 LearningRate 0.0911 Epoch: 0 Global Step: 15150 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:15:10,913-Speed 5194.74 samples/sec Loss 7.8685 LearningRate 0.0911 Epoch: 0 Global Step: 15160 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:15:12,886-Speed 5192.00 samples/sec Loss 7.7558 LearningRate 0.0911 Epoch: 0 Global Step: 15170 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:15:14,857-Speed 5196.41 samples/sec Loss 7.7530 LearningRate 0.0911 Epoch: 0 Global Step: 15180 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:15:16,827-Speed 5201.03 samples/sec Loss 7.8211 LearningRate 0.0911 Epoch: 0 Global Step: 15190 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:15:18,799-Speed 5194.71 samples/sec Loss 7.8555 LearningRate 0.0911 Epoch: 0 Global Step: 15200 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:15:20,769-Speed 5198.77 samples/sec Loss 7.8305 LearningRate 0.0911 Epoch: 0 Global Step: 15210 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:15:22,739-Speed 5199.21 samples/sec Loss 7.8401 LearningRate 0.0911 Epoch: 0 Global Step: 15220 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:15:24,725-Speed 5158.60 samples/sec Loss 7.7336 LearningRate 0.0911 Epoch: 0 Global Step: 15230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:15:26,695-Speed 5198.47 samples/sec Loss 7.9297 LearningRate 0.0911 Epoch: 0 Global Step: 15240 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:15:28,675-Speed 5174.06 samples/sec Loss 7.8334 LearningRate 0.0911 Epoch: 0 Global Step: 15250 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:15:30,656-Speed 5172.99 samples/sec Loss 7.9512 LearningRate 0.0911 Epoch: 0 Global Step: 15260 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:15:32,628-Speed 5193.06 samples/sec Loss 7.7411 LearningRate 0.0911 Epoch: 0 Global Step: 15270 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:15:34,600-Speed 5194.92 samples/sec Loss 7.8712 LearningRate 0.0911 Epoch: 0 Global Step: 15280 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:15:36,575-Speed 5187.51 samples/sec Loss 7.7969 LearningRate 0.0910 Epoch: 0 Global Step: 15290 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:15:38,543-Speed 5203.24 samples/sec Loss 7.9403 LearningRate 0.0910 Epoch: 0 Global Step: 15300 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:15:40,512-Speed 5203.66 samples/sec Loss 7.8157 LearningRate 0.0910 Epoch: 0 Global Step: 15310 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:15:42,480-Speed 5203.96 samples/sec Loss 7.9314 LearningRate 0.0910 Epoch: 0 Global Step: 15320 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:15:44,447-Speed 5206.52 samples/sec Loss 7.9805 LearningRate 0.0910 Epoch: 0 Global Step: 15330 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:15:46,415-Speed 5206.71 samples/sec Loss 7.8105 LearningRate 0.0910 Epoch: 0 Global Step: 15340 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:15:48,383-Speed 5204.13 samples/sec Loss 7.8075 LearningRate 0.0910 Epoch: 0 Global Step: 15350 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:15:50,357-Speed 5189.16 samples/sec Loss 7.8450 LearningRate 0.0910 Epoch: 0 Global Step: 15360 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:15:52,331-Speed 5191.15 samples/sec Loss 7.7854 LearningRate 0.0910 Epoch: 0 Global Step: 15370 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:15:54,305-Speed 5189.83 samples/sec Loss 7.7663 LearningRate 0.0910 Epoch: 0 Global Step: 15380 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:15:56,274-Speed 5200.14 samples/sec Loss 7.7702 LearningRate 0.0910 Epoch: 0 Global Step: 15390 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:15:58,244-Speed 5201.64 samples/sec Loss 7.7890 LearningRate 0.0910 Epoch: 0 Global Step: 15400 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:16:00,223-Speed 5173.77 samples/sec Loss 7.8691 LearningRate 0.0910 Epoch: 0 Global Step: 15410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:16:02,195-Speed 5195.07 samples/sec Loss 7.8189 LearningRate 0.0910 Epoch: 0 Global Step: 15420 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:16:04,172-Speed 5181.98 samples/sec Loss 7.7353 LearningRate 0.0910 Epoch: 0 Global Step: 15430 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:16:06,147-Speed 5185.57 samples/sec Loss 7.7921 LearningRate 0.0910 Epoch: 0 Global Step: 15440 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:16:08,133-Speed 5157.60 samples/sec Loss 7.8722 LearningRate 0.0910 Epoch: 0 Global Step: 15450 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:16:10,113-Speed 5175.85 samples/sec Loss 7.7612 LearningRate 0.0910 Epoch: 0 Global Step: 15460 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:16:12,082-Speed 5201.13 samples/sec Loss 7.8807 LearningRate 0.0909 Epoch: 0 Global Step: 15470 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:16:14,052-Speed 5201.39 samples/sec Loss 7.8953 LearningRate 0.0909 Epoch: 0 Global Step: 15480 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:16:16,017-Speed 5211.39 samples/sec Loss 7.6757 LearningRate 0.0909 Epoch: 0 Global Step: 15490 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:16:17,989-Speed 5195.13 samples/sec Loss 7.6935 LearningRate 0.0909 Epoch: 0 Global Step: 15500 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:16:19,958-Speed 5202.91 samples/sec Loss 7.8210 LearningRate 0.0909 Epoch: 0 Global Step: 15510 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:16:21,926-Speed 5202.42 samples/sec Loss 7.7684 LearningRate 0.0909 Epoch: 0 Global Step: 15520 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:16:23,906-Speed 5175.40 samples/sec Loss 7.8203 LearningRate 0.0909 Epoch: 0 Global Step: 15530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:16:25,898-Speed 5142.47 samples/sec Loss 7.6869 LearningRate 0.0909 Epoch: 0 Global Step: 15540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:16:27,872-Speed 5186.90 samples/sec Loss 7.6315 LearningRate 0.0909 Epoch: 0 Global Step: 15550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:16:29,878-Speed 5108.07 samples/sec Loss 7.7570 LearningRate 0.0909 Epoch: 0 Global Step: 15560 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:16:31,858-Speed 5174.79 samples/sec Loss 7.8414 LearningRate 0.0909 Epoch: 0 Global Step: 15570 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:16:33,828-Speed 5200.04 samples/sec Loss 7.6687 LearningRate 0.0909 Epoch: 0 Global Step: 15580 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:16:35,828-Speed 5121.32 samples/sec Loss 7.7466 LearningRate 0.0909 Epoch: 0 Global Step: 15590 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:16:37,813-Speed 5159.50 samples/sec Loss 7.8848 LearningRate 0.0909 Epoch: 0 Global Step: 15600 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:16:39,792-Speed 5179.97 samples/sec Loss 7.7549 LearningRate 0.0909 Epoch: 0 Global Step: 15610 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:16:41,781-Speed 5151.68 samples/sec Loss 7.6453 LearningRate 0.0909 Epoch: 0 Global Step: 15620 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:16:43,751-Speed 5198.45 samples/sec Loss 7.8823 LearningRate 0.0909 Epoch: 0 Global Step: 15630 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:16:45,722-Speed 5195.81 samples/sec Loss 7.7623 LearningRate 0.0908 Epoch: 0 Global Step: 15640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:16:47,703-Speed 5172.23 samples/sec Loss 7.7521 LearningRate 0.0908 Epoch: 0 Global Step: 15650 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:16:49,673-Speed 5199.98 samples/sec Loss 7.7676 LearningRate 0.0908 Epoch: 0 Global Step: 15660 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:16:51,648-Speed 5186.41 samples/sec Loss 7.7938 LearningRate 0.0908 Epoch: 0 Global Step: 15670 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:16:53,624-Speed 5183.33 samples/sec Loss 7.7956 LearningRate 0.0908 Epoch: 0 Global Step: 15680 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:16:55,588-Speed 5215.90 samples/sec Loss 7.7567 LearningRate 0.0908 Epoch: 0 Global Step: 15690 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:16:57,560-Speed 5193.98 samples/sec Loss 7.7856 LearningRate 0.0908 Epoch: 0 Global Step: 15700 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:16:59,546-Speed 5158.13 samples/sec Loss 7.7076 LearningRate 0.0908 Epoch: 0 Global Step: 15710 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:17:01,516-Speed 5200.79 samples/sec Loss 7.6439 LearningRate 0.0908 Epoch: 0 Global Step: 15720 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:17:03,487-Speed 5196.45 samples/sec Loss 7.7436 LearningRate 0.0908 Epoch: 0 Global Step: 15730 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:17:05,469-Speed 5168.28 samples/sec Loss 7.8438 LearningRate 0.0908 Epoch: 0 Global Step: 15740 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:17:07,439-Speed 5199.71 samples/sec Loss 7.6470 LearningRate 0.0908 Epoch: 0 Global Step: 15750 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:17:09,421-Speed 5169.74 samples/sec Loss 7.6920 LearningRate 0.0908 Epoch: 0 Global Step: 15760 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:17:11,400-Speed 5174.36 samples/sec Loss 7.6445 LearningRate 0.0908 Epoch: 0 Global Step: 15770 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:17:13,380-Speed 5175.43 samples/sec Loss 7.7798 LearningRate 0.0908 Epoch: 0 Global Step: 15780 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:17:15,363-Speed 5164.05 samples/sec Loss 7.7174 LearningRate 0.0908 Epoch: 0 Global Step: 15790 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:17:17,344-Speed 5170.36 samples/sec Loss 7.7100 LearningRate 0.0908 Epoch: 0 Global Step: 15800 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:17:19,313-Speed 5203.46 samples/sec Loss 7.7185 LearningRate 0.0908 Epoch: 0 Global Step: 15810 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:17:21,284-Speed 5196.14 samples/sec Loss 7.6355 LearningRate 0.0907 Epoch: 0 Global Step: 15820 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:17:23,272-Speed 5154.06 samples/sec Loss 7.6929 LearningRate 0.0907 Epoch: 0 Global Step: 15830 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:17:25,244-Speed 5193.62 samples/sec Loss 7.7688 LearningRate 0.0907 Epoch: 0 Global Step: 15840 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:17:27,224-Speed 5173.23 samples/sec Loss 7.6105 LearningRate 0.0907 Epoch: 0 Global Step: 15850 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:17:29,198-Speed 5189.71 samples/sec Loss 7.6250 LearningRate 0.0907 Epoch: 0 Global Step: 15860 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:17:31,167-Speed 5201.67 samples/sec Loss 7.6086 LearningRate 0.0907 Epoch: 0 Global Step: 15870 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:17:33,154-Speed 5155.49 samples/sec Loss 7.6959 LearningRate 0.0907 Epoch: 0 Global Step: 15880 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:17:35,124-Speed 5202.04 samples/sec Loss 7.7248 LearningRate 0.0907 Epoch: 0 Global Step: 15890 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:17:37,094-Speed 5197.45 samples/sec Loss 7.8137 LearningRate 0.0907 Epoch: 0 Global Step: 15900 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:17:39,067-Speed 5192.70 samples/sec Loss 7.5788 LearningRate 0.0907 Epoch: 0 Global Step: 15910 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:17:41,040-Speed 5191.43 samples/sec Loss 7.7147 LearningRate 0.0907 Epoch: 0 Global Step: 15920 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:17:43,015-Speed 5188.22 samples/sec Loss 7.6550 LearningRate 0.0907 Epoch: 0 Global Step: 15930 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:17:44,999-Speed 5160.43 samples/sec Loss 7.6790 LearningRate 0.0907 Epoch: 0 Global Step: 15940 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-11 00:17:46,978-Speed 5177.29 samples/sec Loss 7.7441 LearningRate 0.0907 Epoch: 0 Global Step: 15950 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-11 00:17:48,945-Speed 5207.71 samples/sec Loss 7.6613 LearningRate 0.0907 Epoch: 0 Global Step: 15960 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:17:50,912-Speed 5206.34 samples/sec Loss 7.6558 LearningRate 0.0907 Epoch: 0 Global Step: 15970 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:17:52,882-Speed 5199.40 samples/sec Loss 7.7776 LearningRate 0.0907 Epoch: 0 Global Step: 15980 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:17:54,851-Speed 5204.45 samples/sec Loss 7.6926 LearningRate 0.0906 Epoch: 0 Global Step: 15990 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:17:56,822-Speed 5195.54 samples/sec Loss 7.5338 LearningRate 0.0906 Epoch: 0 Global Step: 16000 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:18:23,377-[lfw][16000]XNorm: 23.279770 Training: 2022-04-11 00:18:23,378-[lfw][16000]Accuracy-Flip: 0.99633+-0.00245 Training: 2022-04-11 00:18:23,378-[lfw][16000]Accuracy-Highest: 0.99683 Training: 2022-04-11 00:18:54,218-[cfp_fp][16000]XNorm: 20.468648 Training: 2022-04-11 00:18:54,219-[cfp_fp][16000]Accuracy-Flip: 0.95429+-0.00952 Training: 2022-04-11 00:18:54,219-[cfp_fp][16000]Accuracy-Highest: 0.95471 Training: 2022-04-11 00:19:20,749-[agedb_30][16000]XNorm: 22.584027 Training: 2022-04-11 00:19:20,749-[agedb_30][16000]Accuracy-Flip: 0.96533+-0.00809 Training: 2022-04-11 00:19:20,750-[agedb_30][16000]Accuracy-Highest: 0.96533 Training: 2022-04-11 00:19:22,725-Speed 119.21 samples/sec Loss 7.5786 LearningRate 0.0906 Epoch: 0 Global Step: 16010 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:19:24,693-Speed 5206.37 samples/sec Loss 7.6345 LearningRate 0.0906 Epoch: 0 Global Step: 16020 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:19:26,661-Speed 5205.20 samples/sec Loss 7.6810 LearningRate 0.0906 Epoch: 0 Global Step: 16030 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:19:28,624-Speed 5216.55 samples/sec Loss 7.6008 LearningRate 0.0906 Epoch: 0 Global Step: 16040 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:19:30,590-Speed 5211.81 samples/sec Loss 7.6911 LearningRate 0.0906 Epoch: 0 Global Step: 16050 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:19:32,553-Speed 5217.84 samples/sec Loss 7.6350 LearningRate 0.0906 Epoch: 0 Global Step: 16060 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:19:34,520-Speed 5208.45 samples/sec Loss 7.6234 LearningRate 0.0906 Epoch: 0 Global Step: 16070 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:19:36,489-Speed 5201.56 samples/sec Loss 7.5938 LearningRate 0.0906 Epoch: 0 Global Step: 16080 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:19:38,458-Speed 5202.53 samples/sec Loss 7.6275 LearningRate 0.0906 Epoch: 0 Global Step: 16090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:19:40,440-Speed 5167.32 samples/sec Loss 7.6799 LearningRate 0.0906 Epoch: 0 Global Step: 16100 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:19:42,412-Speed 5193.87 samples/sec Loss 7.5376 LearningRate 0.0906 Epoch: 0 Global Step: 16110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:19:44,378-Speed 5209.39 samples/sec Loss 7.7186 LearningRate 0.0906 Epoch: 0 Global Step: 16120 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:19:46,361-Speed 5168.18 samples/sec Loss 7.7378 LearningRate 0.0906 Epoch: 0 Global Step: 16130 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:19:48,340-Speed 5174.18 samples/sec Loss 7.6085 LearningRate 0.0906 Epoch: 0 Global Step: 16140 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:19:50,312-Speed 5196.10 samples/sec Loss 7.7740 LearningRate 0.0906 Epoch: 0 Global Step: 16150 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:19:52,293-Speed 5169.03 samples/sec Loss 7.6292 LearningRate 0.0906 Epoch: 0 Global Step: 16160 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:19:54,265-Speed 5194.86 samples/sec Loss 7.6755 LearningRate 0.0905 Epoch: 0 Global Step: 16170 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-11 00:19:56,236-Speed 5196.51 samples/sec Loss 7.5744 LearningRate 0.0905 Epoch: 0 Global Step: 16180 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:19:58,210-Speed 5190.08 samples/sec Loss 7.6369 LearningRate 0.0905 Epoch: 0 Global Step: 16190 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:20:00,198-Speed 5153.19 samples/sec Loss 7.6127 LearningRate 0.0905 Epoch: 0 Global Step: 16200 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:20:02,178-Speed 5174.49 samples/sec Loss 7.6359 LearningRate 0.0905 Epoch: 0 Global Step: 16210 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:20:04,169-Speed 5144.27 samples/sec Loss 7.7301 LearningRate 0.0905 Epoch: 0 Global Step: 16220 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:20:06,156-Speed 5155.40 samples/sec Loss 7.5789 LearningRate 0.0905 Epoch: 0 Global Step: 16230 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:20:08,135-Speed 5174.87 samples/sec Loss 7.6225 LearningRate 0.0905 Epoch: 0 Global Step: 16240 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:20:10,105-Speed 5200.58 samples/sec Loss 7.6632 LearningRate 0.0905 Epoch: 0 Global Step: 16250 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:20:12,073-Speed 5205.00 samples/sec Loss 7.5857 LearningRate 0.0905 Epoch: 0 Global Step: 16260 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:20:14,041-Speed 5204.36 samples/sec Loss 7.7052 LearningRate 0.0905 Epoch: 0 Global Step: 16270 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:20:16,009-Speed 5204.90 samples/sec Loss 7.5973 LearningRate 0.0905 Epoch: 0 Global Step: 16280 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:20:17,970-Speed 5223.24 samples/sec Loss 7.6059 LearningRate 0.0905 Epoch: 0 Global Step: 16290 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:20:19,937-Speed 5208.04 samples/sec Loss 7.5772 LearningRate 0.0905 Epoch: 0 Global Step: 16300 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:20:21,911-Speed 5190.44 samples/sec Loss 7.6164 LearningRate 0.0905 Epoch: 0 Global Step: 16310 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:20:23,893-Speed 5166.45 samples/sec Loss 7.5308 LearningRate 0.0905 Epoch: 0 Global Step: 16320 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:20:25,859-Speed 5211.12 samples/sec Loss 7.6906 LearningRate 0.0905 Epoch: 0 Global Step: 16330 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:20:27,843-Speed 5163.64 samples/sec Loss 7.6100 LearningRate 0.0904 Epoch: 0 Global Step: 16340 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:20:29,815-Speed 5195.69 samples/sec Loss 7.4556 LearningRate 0.0904 Epoch: 0 Global Step: 16350 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:20:31,780-Speed 5210.95 samples/sec Loss 7.4793 LearningRate 0.0904 Epoch: 0 Global Step: 16360 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:20:33,749-Speed 5204.00 samples/sec Loss 7.5223 LearningRate 0.0904 Epoch: 0 Global Step: 16370 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:20:35,717-Speed 5205.46 samples/sec Loss 7.5627 LearningRate 0.0904 Epoch: 0 Global Step: 16380 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:20:37,682-Speed 5212.08 samples/sec Loss 7.5154 LearningRate 0.0904 Epoch: 0 Global Step: 16390 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:20:39,667-Speed 5159.96 samples/sec Loss 7.6062 LearningRate 0.0904 Epoch: 0 Global Step: 16400 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:20:41,671-Speed 5110.60 samples/sec Loss 7.6078 LearningRate 0.0904 Epoch: 0 Global Step: 16410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:20:43,645-Speed 5190.52 samples/sec Loss 7.4468 LearningRate 0.0904 Epoch: 0 Global Step: 16420 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:20:45,615-Speed 5198.76 samples/sec Loss 7.5715 LearningRate 0.0904 Epoch: 0 Global Step: 16430 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:20:47,583-Speed 5207.24 samples/sec Loss 7.5733 LearningRate 0.0904 Epoch: 0 Global Step: 16440 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:20:49,549-Speed 5208.04 samples/sec Loss 7.6487 LearningRate 0.0904 Epoch: 0 Global Step: 16450 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:20:51,519-Speed 5201.80 samples/sec Loss 7.5732 LearningRate 0.0904 Epoch: 0 Global Step: 16460 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:20:53,486-Speed 5208.17 samples/sec Loss 7.6014 LearningRate 0.0904 Epoch: 0 Global Step: 16470 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:20:55,462-Speed 5183.33 samples/sec Loss 7.5985 LearningRate 0.0904 Epoch: 0 Global Step: 16480 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:20:57,429-Speed 5205.83 samples/sec Loss 7.6047 LearningRate 0.0904 Epoch: 0 Global Step: 16490 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:20:59,403-Speed 5190.49 samples/sec Loss 7.5532 LearningRate 0.0904 Epoch: 0 Global Step: 16500 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:21:01,373-Speed 5200.00 samples/sec Loss 7.6257 LearningRate 0.0904 Epoch: 0 Global Step: 16510 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:21:03,349-Speed 5184.17 samples/sec Loss 7.4813 LearningRate 0.0903 Epoch: 0 Global Step: 16520 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:21:05,309-Speed 5223.65 samples/sec Loss 7.6933 LearningRate 0.0903 Epoch: 0 Global Step: 16530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:21:07,276-Speed 5208.51 samples/sec Loss 7.6183 LearningRate 0.0903 Epoch: 0 Global Step: 16540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:21:09,246-Speed 5200.35 samples/sec Loss 7.6025 LearningRate 0.0903 Epoch: 0 Global Step: 16550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:21:11,212-Speed 5211.10 samples/sec Loss 7.5081 LearningRate 0.0903 Epoch: 0 Global Step: 16560 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:21:13,182-Speed 5199.25 samples/sec Loss 7.5144 LearningRate 0.0903 Epoch: 0 Global Step: 16570 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:21:15,149-Speed 5207.91 samples/sec Loss 7.4880 LearningRate 0.0903 Epoch: 0 Global Step: 16580 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:21:17,117-Speed 5205.92 samples/sec Loss 7.6134 LearningRate 0.0903 Epoch: 0 Global Step: 16590 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:21:19,085-Speed 5204.51 samples/sec Loss 7.5503 LearningRate 0.0903 Epoch: 0 Global Step: 16600 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:21:21,054-Speed 5201.55 samples/sec Loss 7.5517 LearningRate 0.0903 Epoch: 0 Global Step: 16610 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:21:23,025-Speed 5198.13 samples/sec Loss 7.4628 LearningRate 0.0903 Epoch: 0 Global Step: 16620 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:21:25,013-Speed 5151.73 samples/sec Loss 7.5085 LearningRate 0.0903 Epoch: 0 Global Step: 16630 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:21:26,974-Speed 5224.22 samples/sec Loss 7.5670 LearningRate 0.0903 Epoch: 0 Global Step: 16640 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:21:28,950-Speed 5184.21 samples/sec Loss 7.5431 LearningRate 0.0903 Epoch: 0 Global Step: 16650 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:21:30,925-Speed 5184.66 samples/sec Loss 7.5762 LearningRate 0.0903 Epoch: 0 Global Step: 16660 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:21:32,892-Speed 5209.04 samples/sec Loss 7.4327 LearningRate 0.0903 Epoch: 0 Global Step: 16670 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:21:35,094-Speed 4652.31 samples/sec Loss 7.5783 LearningRate 0.0903 Epoch: 0 Global Step: 16680 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:21:37,046-Speed 5246.16 samples/sec Loss 7.5542 LearningRate 0.0903 Epoch: 0 Global Step: 16690 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:22:07,292-Speed 338.58 samples/sec Loss 6.8647 LearningRate 0.0902 Epoch: 1 Global Step: 16700 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:22:09,250-Speed 5232.54 samples/sec Loss 6.7726 LearningRate 0.0902 Epoch: 1 Global Step: 16710 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:22:12,412-Speed 3239.70 samples/sec Loss 6.8195 LearningRate 0.0902 Epoch: 1 Global Step: 16720 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:22:14,390-Speed 5177.50 samples/sec Loss 6.6755 LearningRate 0.0902 Epoch: 1 Global Step: 16730 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:22:16,527-Speed 4794.13 samples/sec Loss 6.7024 LearningRate 0.0902 Epoch: 1 Global Step: 16740 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:22:18,508-Speed 5171.86 samples/sec Loss 6.7702 LearningRate 0.0902 Epoch: 1 Global Step: 16750 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:22:20,480-Speed 5195.38 samples/sec Loss 6.8092 LearningRate 0.0902 Epoch: 1 Global Step: 16760 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:22:22,469-Speed 5150.10 samples/sec Loss 6.6433 LearningRate 0.0902 Epoch: 1 Global Step: 16770 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:22:24,441-Speed 5193.32 samples/sec Loss 6.7560 LearningRate 0.0902 Epoch: 1 Global Step: 16780 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:22:26,443-Speed 5117.41 samples/sec Loss 6.7246 LearningRate 0.0902 Epoch: 1 Global Step: 16790 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:22:28,413-Speed 5199.63 samples/sec Loss 6.8278 LearningRate 0.0902 Epoch: 1 Global Step: 16800 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:22:30,378-Speed 5213.06 samples/sec Loss 6.7682 LearningRate 0.0902 Epoch: 1 Global Step: 16810 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:22:32,343-Speed 5213.80 samples/sec Loss 6.8311 LearningRate 0.0902 Epoch: 1 Global Step: 16820 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:22:34,312-Speed 5201.80 samples/sec Loss 6.7392 LearningRate 0.0902 Epoch: 1 Global Step: 16830 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:22:36,299-Speed 5155.23 samples/sec Loss 6.7002 LearningRate 0.0902 Epoch: 1 Global Step: 16840 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:22:38,379-Speed 4927.19 samples/sec Loss 6.7196 LearningRate 0.0902 Epoch: 1 Global Step: 16850 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:22:40,372-Speed 5140.90 samples/sec Loss 6.6854 LearningRate 0.0902 Epoch: 1 Global Step: 16860 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:22:42,337-Speed 5212.07 samples/sec Loss 6.8870 LearningRate 0.0901 Epoch: 1 Global Step: 16870 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:22:44,307-Speed 5199.11 samples/sec Loss 6.8571 LearningRate 0.0901 Epoch: 1 Global Step: 16880 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:22:46,274-Speed 5208.13 samples/sec Loss 6.7926 LearningRate 0.0901 Epoch: 1 Global Step: 16890 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:22:48,256-Speed 5167.96 samples/sec Loss 6.7312 LearningRate 0.0901 Epoch: 1 Global Step: 16900 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:22:50,220-Speed 5214.61 samples/sec Loss 6.7356 LearningRate 0.0901 Epoch: 1 Global Step: 16910 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:22:52,184-Speed 5215.19 samples/sec Loss 6.8476 LearningRate 0.0901 Epoch: 1 Global Step: 16920 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:22:54,162-Speed 5178.96 samples/sec Loss 6.7529 LearningRate 0.0901 Epoch: 1 Global Step: 16930 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:22:56,132-Speed 5199.61 samples/sec Loss 6.8994 LearningRate 0.0901 Epoch: 1 Global Step: 16940 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:22:58,101-Speed 5203.08 samples/sec Loss 6.8034 LearningRate 0.0901 Epoch: 1 Global Step: 16950 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:23:00,106-Speed 5108.50 samples/sec Loss 6.7759 LearningRate 0.0901 Epoch: 1 Global Step: 16960 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:23:02,085-Speed 5175.86 samples/sec Loss 6.7269 LearningRate 0.0901 Epoch: 1 Global Step: 16970 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:23:04,057-Speed 5196.33 samples/sec Loss 6.7496 LearningRate 0.0901 Epoch: 1 Global Step: 16980 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:23:06,021-Speed 5215.78 samples/sec Loss 6.8437 LearningRate 0.0901 Epoch: 1 Global Step: 16990 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:23:08,271-Speed 4552.19 samples/sec Loss 6.8217 LearningRate 0.0901 Epoch: 1 Global Step: 17000 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:23:10,661-Speed 4285.45 samples/sec Loss 6.7169 LearningRate 0.0901 Epoch: 1 Global Step: 17010 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:23:12,648-Speed 5155.31 samples/sec Loss 6.7916 LearningRate 0.0901 Epoch: 1 Global Step: 17020 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:23:14,614-Speed 5208.83 samples/sec Loss 6.6721 LearningRate 0.0901 Epoch: 1 Global Step: 17030 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:23:16,596-Speed 5168.62 samples/sec Loss 6.7360 LearningRate 0.0901 Epoch: 1 Global Step: 17040 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:23:18,579-Speed 5165.02 samples/sec Loss 6.7561 LearningRate 0.0900 Epoch: 1 Global Step: 17050 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:23:20,554-Speed 5187.36 samples/sec Loss 6.7832 LearningRate 0.0900 Epoch: 1 Global Step: 17060 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:23:22,532-Speed 5179.61 samples/sec Loss 6.6995 LearningRate 0.0900 Epoch: 1 Global Step: 17070 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:23:24,630-Speed 4882.15 samples/sec Loss 6.7377 LearningRate 0.0900 Epoch: 1 Global Step: 17080 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:23:26,598-Speed 5203.93 samples/sec Loss 6.8734 LearningRate 0.0900 Epoch: 1 Global Step: 17090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:23:28,573-Speed 5186.68 samples/sec Loss 6.8528 LearningRate 0.0900 Epoch: 1 Global Step: 17100 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:23:30,546-Speed 5192.34 samples/sec Loss 6.7829 LearningRate 0.0900 Epoch: 1 Global Step: 17110 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:23:32,514-Speed 5205.93 samples/sec Loss 6.8376 LearningRate 0.0900 Epoch: 1 Global Step: 17120 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:23:34,497-Speed 5165.30 samples/sec Loss 6.8683 LearningRate 0.0900 Epoch: 1 Global Step: 17130 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:23:36,484-Speed 5153.82 samples/sec Loss 6.7675 LearningRate 0.0900 Epoch: 1 Global Step: 17140 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:23:38,451-Speed 5209.47 samples/sec Loss 6.9456 LearningRate 0.0900 Epoch: 1 Global Step: 17150 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:23:40,419-Speed 5204.72 samples/sec Loss 6.7600 LearningRate 0.0900 Epoch: 1 Global Step: 17160 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:23:42,388-Speed 5203.18 samples/sec Loss 6.8629 LearningRate 0.0900 Epoch: 1 Global Step: 17170 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:23:44,356-Speed 5202.93 samples/sec Loss 6.8603 LearningRate 0.0900 Epoch: 1 Global Step: 17180 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:23:46,340-Speed 5164.77 samples/sec Loss 6.7226 LearningRate 0.0900 Epoch: 1 Global Step: 17190 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:23:48,303-Speed 5216.32 samples/sec Loss 6.8732 LearningRate 0.0900 Epoch: 1 Global Step: 17200 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:23:50,269-Speed 5211.56 samples/sec Loss 6.7795 LearningRate 0.0900 Epoch: 1 Global Step: 17210 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:23:52,248-Speed 5176.38 samples/sec Loss 6.8563 LearningRate 0.0899 Epoch: 1 Global Step: 17220 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:23:54,228-Speed 5173.11 samples/sec Loss 6.8225 LearningRate 0.0899 Epoch: 1 Global Step: 17230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:23:56,195-Speed 5206.62 samples/sec Loss 6.7481 LearningRate 0.0899 Epoch: 1 Global Step: 17240 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:23:58,166-Speed 5198.09 samples/sec Loss 6.7403 LearningRate 0.0899 Epoch: 1 Global Step: 17250 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:24:00,138-Speed 5193.21 samples/sec Loss 6.8111 LearningRate 0.0899 Epoch: 1 Global Step: 17260 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:24:02,112-Speed 5189.45 samples/sec Loss 6.8810 LearningRate 0.0899 Epoch: 1 Global Step: 17270 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:24:04,077-Speed 5213.51 samples/sec Loss 6.8480 LearningRate 0.0899 Epoch: 1 Global Step: 17280 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:24:06,043-Speed 5212.09 samples/sec Loss 6.8271 LearningRate 0.0899 Epoch: 1 Global Step: 17290 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:24:08,007-Speed 5215.17 samples/sec Loss 6.8098 LearningRate 0.0899 Epoch: 1 Global Step: 17300 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:24:09,975-Speed 5203.86 samples/sec Loss 6.9129 LearningRate 0.0899 Epoch: 1 Global Step: 17310 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:24:11,949-Speed 5189.16 samples/sec Loss 6.8937 LearningRate 0.0899 Epoch: 1 Global Step: 17320 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:24:13,916-Speed 5207.81 samples/sec Loss 6.9060 LearningRate 0.0899 Epoch: 1 Global Step: 17330 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:24:15,891-Speed 5186.71 samples/sec Loss 6.8071 LearningRate 0.0899 Epoch: 1 Global Step: 17340 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:24:17,860-Speed 5200.85 samples/sec Loss 6.9045 LearningRate 0.0899 Epoch: 1 Global Step: 17350 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:24:19,831-Speed 5196.52 samples/sec Loss 6.8026 LearningRate 0.0899 Epoch: 1 Global Step: 17360 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:24:21,803-Speed 5194.83 samples/sec Loss 6.8380 LearningRate 0.0899 Epoch: 1 Global Step: 17370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:24:23,777-Speed 5191.66 samples/sec Loss 6.8573 LearningRate 0.0899 Epoch: 1 Global Step: 17380 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:24:25,742-Speed 5212.15 samples/sec Loss 6.8426 LearningRate 0.0899 Epoch: 1 Global Step: 17390 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:24:27,705-Speed 5216.87 samples/sec Loss 6.8327 LearningRate 0.0898 Epoch: 1 Global Step: 17400 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:24:29,663-Speed 5232.25 samples/sec Loss 6.7782 LearningRate 0.0898 Epoch: 1 Global Step: 17410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:24:31,630-Speed 5207.48 samples/sec Loss 6.8883 LearningRate 0.0898 Epoch: 1 Global Step: 17420 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:24:33,592-Speed 5221.40 samples/sec Loss 6.8713 LearningRate 0.0898 Epoch: 1 Global Step: 17430 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:24:35,561-Speed 5203.05 samples/sec Loss 6.7501 LearningRate 0.0898 Epoch: 1 Global Step: 17440 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:24:37,534-Speed 5192.20 samples/sec Loss 6.8069 LearningRate 0.0898 Epoch: 1 Global Step: 17450 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:24:39,505-Speed 5196.79 samples/sec Loss 6.8376 LearningRate 0.0898 Epoch: 1 Global Step: 17460 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:24:41,498-Speed 5138.52 samples/sec Loss 7.0386 LearningRate 0.0898 Epoch: 1 Global Step: 17470 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:24:43,482-Speed 5164.29 samples/sec Loss 6.8744 LearningRate 0.0898 Epoch: 1 Global Step: 17480 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:24:45,447-Speed 5212.67 samples/sec Loss 6.9111 LearningRate 0.0898 Epoch: 1 Global Step: 17490 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:24:47,422-Speed 5186.88 samples/sec Loss 6.7930 LearningRate 0.0898 Epoch: 1 Global Step: 17500 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:24:49,412-Speed 5146.65 samples/sec Loss 6.8124 LearningRate 0.0898 Epoch: 1 Global Step: 17510 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:24:51,378-Speed 5210.69 samples/sec Loss 6.8270 LearningRate 0.0898 Epoch: 1 Global Step: 17520 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:24:53,345-Speed 5208.70 samples/sec Loss 6.9205 LearningRate 0.0898 Epoch: 1 Global Step: 17530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:24:55,313-Speed 5203.85 samples/sec Loss 6.8061 LearningRate 0.0898 Epoch: 1 Global Step: 17540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:24:57,277-Speed 5215.30 samples/sec Loss 6.9097 LearningRate 0.0898 Epoch: 1 Global Step: 17550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:24:59,265-Speed 5152.27 samples/sec Loss 6.8120 LearningRate 0.0898 Epoch: 1 Global Step: 17560 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:25:01,237-Speed 5193.76 samples/sec Loss 6.8761 LearningRate 0.0898 Epoch: 1 Global Step: 17570 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:25:03,229-Speed 5144.23 samples/sec Loss 6.9480 LearningRate 0.0897 Epoch: 1 Global Step: 17580 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:25:05,199-Speed 5198.30 samples/sec Loss 6.7690 LearningRate 0.0897 Epoch: 1 Global Step: 17590 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:25:07,166-Speed 5209.33 samples/sec Loss 6.8876 LearningRate 0.0897 Epoch: 1 Global Step: 17600 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:25:09,132-Speed 5209.34 samples/sec Loss 6.8213 LearningRate 0.0897 Epoch: 1 Global Step: 17610 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:25:11,103-Speed 5196.87 samples/sec Loss 6.8912 LearningRate 0.0897 Epoch: 1 Global Step: 17620 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:25:13,073-Speed 5201.25 samples/sec Loss 6.9202 LearningRate 0.0897 Epoch: 1 Global Step: 17630 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:25:15,053-Speed 5171.65 samples/sec Loss 6.7919 LearningRate 0.0897 Epoch: 1 Global Step: 17640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:25:17,021-Speed 5204.71 samples/sec Loss 6.8525 LearningRate 0.0897 Epoch: 1 Global Step: 17650 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:25:18,988-Speed 5207.29 samples/sec Loss 6.8523 LearningRate 0.0897 Epoch: 1 Global Step: 17660 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:25:20,964-Speed 5185.46 samples/sec Loss 6.9030 LearningRate 0.0897 Epoch: 1 Global Step: 17670 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:25:22,943-Speed 5175.01 samples/sec Loss 6.8568 LearningRate 0.0897 Epoch: 1 Global Step: 17680 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:25:24,903-Speed 5226.08 samples/sec Loss 6.8309 LearningRate 0.0897 Epoch: 1 Global Step: 17690 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:25:26,876-Speed 5193.19 samples/sec Loss 6.8581 LearningRate 0.0897 Epoch: 1 Global Step: 17700 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:25:28,857-Speed 5171.66 samples/sec Loss 6.7884 LearningRate 0.0897 Epoch: 1 Global Step: 17710 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:25:30,826-Speed 5201.68 samples/sec Loss 6.7762 LearningRate 0.0897 Epoch: 1 Global Step: 17720 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:25:32,793-Speed 5207.63 samples/sec Loss 6.8424 LearningRate 0.0897 Epoch: 1 Global Step: 17730 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:25:34,766-Speed 5191.80 samples/sec Loss 6.8274 LearningRate 0.0897 Epoch: 1 Global Step: 17740 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:25:36,743-Speed 5180.29 samples/sec Loss 6.9145 LearningRate 0.0896 Epoch: 1 Global Step: 17750 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:25:38,718-Speed 5186.87 samples/sec Loss 6.8989 LearningRate 0.0896 Epoch: 1 Global Step: 17760 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:25:40,688-Speed 5199.76 samples/sec Loss 6.9463 LearningRate 0.0896 Epoch: 1 Global Step: 17770 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:25:42,661-Speed 5193.33 samples/sec Loss 6.7175 LearningRate 0.0896 Epoch: 1 Global Step: 17780 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:25:44,641-Speed 5172.33 samples/sec Loss 6.9723 LearningRate 0.0896 Epoch: 1 Global Step: 17790 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:25:46,598-Speed 5233.73 samples/sec Loss 6.7912 LearningRate 0.0896 Epoch: 1 Global Step: 17800 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:25:48,568-Speed 5199.54 samples/sec Loss 6.8776 LearningRate 0.0896 Epoch: 1 Global Step: 17810 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:25:50,542-Speed 5189.73 samples/sec Loss 6.8613 LearningRate 0.0896 Epoch: 1 Global Step: 17820 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:25:52,546-Speed 5112.18 samples/sec Loss 7.0132 LearningRate 0.0896 Epoch: 1 Global Step: 17830 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:25:54,525-Speed 5176.55 samples/sec Loss 6.9223 LearningRate 0.0896 Epoch: 1 Global Step: 17840 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:25:56,492-Speed 5205.52 samples/sec Loss 6.8848 LearningRate 0.0896 Epoch: 1 Global Step: 17850 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:25:58,458-Speed 5210.03 samples/sec Loss 6.8284 LearningRate 0.0896 Epoch: 1 Global Step: 17860 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:26:00,429-Speed 5198.84 samples/sec Loss 6.8839 LearningRate 0.0896 Epoch: 1 Global Step: 17870 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:26:02,414-Speed 5159.04 samples/sec Loss 6.9344 LearningRate 0.0896 Epoch: 1 Global Step: 17880 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:26:04,390-Speed 5184.77 samples/sec Loss 6.9348 LearningRate 0.0896 Epoch: 1 Global Step: 17890 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:26:06,367-Speed 5180.56 samples/sec Loss 6.8191 LearningRate 0.0896 Epoch: 1 Global Step: 17900 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:26:08,338-Speed 5196.77 samples/sec Loss 6.7144 LearningRate 0.0896 Epoch: 1 Global Step: 17910 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:26:10,314-Speed 5185.84 samples/sec Loss 6.8626 LearningRate 0.0896 Epoch: 1 Global Step: 17920 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:26:12,284-Speed 5200.54 samples/sec Loss 6.8643 LearningRate 0.0895 Epoch: 1 Global Step: 17930 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:26:14,253-Speed 5201.02 samples/sec Loss 6.7585 LearningRate 0.0895 Epoch: 1 Global Step: 17940 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:26:16,222-Speed 5203.24 samples/sec Loss 6.8234 LearningRate 0.0895 Epoch: 1 Global Step: 17950 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:26:18,193-Speed 5196.06 samples/sec Loss 6.8123 LearningRate 0.0895 Epoch: 1 Global Step: 17960 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:26:20,155-Speed 5221.75 samples/sec Loss 6.8843 LearningRate 0.0895 Epoch: 1 Global Step: 17970 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:26:22,124-Speed 5201.13 samples/sec Loss 6.6635 LearningRate 0.0895 Epoch: 1 Global Step: 17980 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:26:24,100-Speed 5184.31 samples/sec Loss 6.9032 LearningRate 0.0895 Epoch: 1 Global Step: 17990 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:26:26,079-Speed 5176.36 samples/sec Loss 6.8904 LearningRate 0.0895 Epoch: 1 Global Step: 18000 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:26:52,756-[lfw][18000]XNorm: 22.253343 Training: 2022-04-11 00:26:52,756-[lfw][18000]Accuracy-Flip: 0.99633+-0.00306 Training: 2022-04-11 00:26:52,757-[lfw][18000]Accuracy-Highest: 0.99683 Training: 2022-04-11 00:27:23,773-[cfp_fp][18000]XNorm: 19.864631 Training: 2022-04-11 00:27:23,773-[cfp_fp][18000]Accuracy-Flip: 0.96071+-0.01021 Training: 2022-04-11 00:27:23,774-[cfp_fp][18000]Accuracy-Highest: 0.96071 Training: 2022-04-11 00:27:50,271-[agedb_30][18000]XNorm: 21.965342 Training: 2022-04-11 00:27:50,272-[agedb_30][18000]Accuracy-Flip: 0.96167+-0.00667 Training: 2022-04-11 00:27:50,272-[agedb_30][18000]Accuracy-Highest: 0.96533 Training: 2022-04-11 00:27:52,255-Speed 118.83 samples/sec Loss 6.8854 LearningRate 0.0895 Epoch: 1 Global Step: 18010 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:27:54,216-Speed 5221.44 samples/sec Loss 7.0036 LearningRate 0.0895 Epoch: 1 Global Step: 18020 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:27:56,183-Speed 5208.62 samples/sec Loss 6.9335 LearningRate 0.0895 Epoch: 1 Global Step: 18030 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:27:58,161-Speed 5178.56 samples/sec Loss 6.8636 LearningRate 0.0895 Epoch: 1 Global Step: 18040 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:28:00,131-Speed 5201.58 samples/sec Loss 6.8269 LearningRate 0.0895 Epoch: 1 Global Step: 18050 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:28:02,103-Speed 5194.21 samples/sec Loss 6.8632 LearningRate 0.0895 Epoch: 1 Global Step: 18060 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:28:04,064-Speed 5223.22 samples/sec Loss 6.9014 LearningRate 0.0895 Epoch: 1 Global Step: 18070 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:28:06,029-Speed 5213.51 samples/sec Loss 6.9152 LearningRate 0.0895 Epoch: 1 Global Step: 18080 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:28:08,000-Speed 5197.74 samples/sec Loss 6.8912 LearningRate 0.0895 Epoch: 1 Global Step: 18090 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:28:09,989-Speed 5148.16 samples/sec Loss 6.7921 LearningRate 0.0894 Epoch: 1 Global Step: 18100 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:28:11,958-Speed 5202.58 samples/sec Loss 6.8594 LearningRate 0.0894 Epoch: 1 Global Step: 18110 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:28:13,929-Speed 5198.50 samples/sec Loss 6.8939 LearningRate 0.0894 Epoch: 1 Global Step: 18120 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:28:15,897-Speed 5203.38 samples/sec Loss 6.8471 LearningRate 0.0894 Epoch: 1 Global Step: 18130 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:28:17,873-Speed 5183.75 samples/sec Loss 6.9006 LearningRate 0.0894 Epoch: 1 Global Step: 18140 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:28:19,849-Speed 5184.06 samples/sec Loss 6.8916 LearningRate 0.0894 Epoch: 1 Global Step: 18150 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:28:21,839-Speed 5146.75 samples/sec Loss 6.7342 LearningRate 0.0894 Epoch: 1 Global Step: 18160 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:28:23,818-Speed 5176.63 samples/sec Loss 6.8883 LearningRate 0.0894 Epoch: 1 Global Step: 18170 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:28:25,801-Speed 5167.16 samples/sec Loss 6.7512 LearningRate 0.0894 Epoch: 1 Global Step: 18180 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:28:27,778-Speed 5180.51 samples/sec Loss 6.8547 LearningRate 0.0894 Epoch: 1 Global Step: 18190 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:28:29,766-Speed 5153.99 samples/sec Loss 6.8714 LearningRate 0.0894 Epoch: 1 Global Step: 18200 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:28:31,741-Speed 5184.29 samples/sec Loss 6.7909 LearningRate 0.0894 Epoch: 1 Global Step: 18210 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:28:33,718-Speed 5182.02 samples/sec Loss 6.8575 LearningRate 0.0894 Epoch: 1 Global Step: 18220 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:28:35,694-Speed 5183.59 samples/sec Loss 6.8647 LearningRate 0.0894 Epoch: 1 Global Step: 18230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:28:37,678-Speed 5163.53 samples/sec Loss 6.9408 LearningRate 0.0894 Epoch: 1 Global Step: 18240 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:28:39,675-Speed 5128.58 samples/sec Loss 6.9372 LearningRate 0.0894 Epoch: 1 Global Step: 18250 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:28:41,661-Speed 5157.99 samples/sec Loss 6.8766 LearningRate 0.0894 Epoch: 1 Global Step: 18260 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:28:43,635-Speed 5189.69 samples/sec Loss 6.8568 LearningRate 0.0894 Epoch: 1 Global Step: 18270 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:28:45,616-Speed 5171.93 samples/sec Loss 6.9602 LearningRate 0.0893 Epoch: 1 Global Step: 18280 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:28:47,592-Speed 5185.26 samples/sec Loss 6.8365 LearningRate 0.0893 Epoch: 1 Global Step: 18290 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:28:49,568-Speed 5181.81 samples/sec Loss 6.8379 LearningRate 0.0893 Epoch: 1 Global Step: 18300 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:28:51,542-Speed 5189.58 samples/sec Loss 6.8814 LearningRate 0.0893 Epoch: 1 Global Step: 18310 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:28:53,515-Speed 5191.63 samples/sec Loss 6.9027 LearningRate 0.0893 Epoch: 1 Global Step: 18320 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:28:55,491-Speed 5184.23 samples/sec Loss 6.7988 LearningRate 0.0893 Epoch: 1 Global Step: 18330 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:28:57,481-Speed 5148.07 samples/sec Loss 6.9157 LearningRate 0.0893 Epoch: 1 Global Step: 18340 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:28:59,475-Speed 5135.34 samples/sec Loss 6.8287 LearningRate 0.0893 Epoch: 1 Global Step: 18350 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:29:01,466-Speed 5145.59 samples/sec Loss 6.9081 LearningRate 0.0893 Epoch: 1 Global Step: 18360 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:29:03,459-Speed 5139.16 samples/sec Loss 6.8786 LearningRate 0.0893 Epoch: 1 Global Step: 18370 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-11 00:29:05,432-Speed 5192.21 samples/sec Loss 6.9122 LearningRate 0.0893 Epoch: 1 Global Step: 18380 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:29:07,423-Speed 5145.87 samples/sec Loss 6.8380 LearningRate 0.0893 Epoch: 1 Global Step: 18390 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:29:09,406-Speed 5164.76 samples/sec Loss 6.8633 LearningRate 0.0893 Epoch: 1 Global Step: 18400 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:29:11,396-Speed 5147.37 samples/sec Loss 6.9013 LearningRate 0.0893 Epoch: 1 Global Step: 18410 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:29:13,375-Speed 5176.08 samples/sec Loss 6.7643 LearningRate 0.0893 Epoch: 1 Global Step: 18420 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:29:15,351-Speed 5183.00 samples/sec Loss 6.8513 LearningRate 0.0893 Epoch: 1 Global Step: 18430 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:29:17,335-Speed 5164.78 samples/sec Loss 6.7727 LearningRate 0.0893 Epoch: 1 Global Step: 18440 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:29:19,321-Speed 5158.06 samples/sec Loss 6.9382 LearningRate 0.0893 Epoch: 1 Global Step: 18450 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:29:21,312-Speed 5143.40 samples/sec Loss 6.8080 LearningRate 0.0892 Epoch: 1 Global Step: 18460 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:29:23,303-Speed 5145.52 samples/sec Loss 6.9298 LearningRate 0.0892 Epoch: 1 Global Step: 18470 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:29:25,289-Speed 5156.89 samples/sec Loss 6.8582 LearningRate 0.0892 Epoch: 1 Global Step: 18480 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:29:27,278-Speed 5150.68 samples/sec Loss 6.8669 LearningRate 0.0892 Epoch: 1 Global Step: 18490 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:29:29,253-Speed 5186.56 samples/sec Loss 6.7821 LearningRate 0.0892 Epoch: 1 Global Step: 18500 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:29:31,233-Speed 5174.83 samples/sec Loss 6.8244 LearningRate 0.0892 Epoch: 1 Global Step: 18510 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:29:33,208-Speed 5186.04 samples/sec Loss 6.7867 LearningRate 0.0892 Epoch: 1 Global Step: 18520 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:29:35,182-Speed 5187.93 samples/sec Loss 6.8519 LearningRate 0.0892 Epoch: 1 Global Step: 18530 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:29:37,162-Speed 5174.32 samples/sec Loss 6.8398 LearningRate 0.0892 Epoch: 1 Global Step: 18540 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:29:39,143-Speed 5170.94 samples/sec Loss 6.8919 LearningRate 0.0892 Epoch: 1 Global Step: 18550 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:29:41,117-Speed 5187.82 samples/sec Loss 6.8943 LearningRate 0.0892 Epoch: 1 Global Step: 18560 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:29:43,090-Speed 5191.69 samples/sec Loss 6.8663 LearningRate 0.0892 Epoch: 1 Global Step: 18570 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:29:45,065-Speed 5188.03 samples/sec Loss 6.8797 LearningRate 0.0892 Epoch: 1 Global Step: 18580 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:29:47,038-Speed 5190.31 samples/sec Loss 6.9310 LearningRate 0.0892 Epoch: 1 Global Step: 18590 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:29:49,021-Speed 5199.54 samples/sec Loss 6.8625 LearningRate 0.0892 Epoch: 1 Global Step: 18600 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:29:50,999-Speed 5178.31 samples/sec Loss 6.8659 LearningRate 0.0892 Epoch: 1 Global Step: 18610 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:29:52,974-Speed 5186.08 samples/sec Loss 6.8611 LearningRate 0.0892 Epoch: 1 Global Step: 18620 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:29:54,952-Speed 5178.75 samples/sec Loss 6.8688 LearningRate 0.0891 Epoch: 1 Global Step: 18630 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:29:56,918-Speed 5209.89 samples/sec Loss 6.8781 LearningRate 0.0891 Epoch: 1 Global Step: 18640 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:29:58,908-Speed 5148.58 samples/sec Loss 6.8744 LearningRate 0.0891 Epoch: 1 Global Step: 18650 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:30:00,913-Speed 5106.80 samples/sec Loss 6.8488 LearningRate 0.0891 Epoch: 1 Global Step: 18660 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:30:02,881-Speed 5205.75 samples/sec Loss 6.8911 LearningRate 0.0891 Epoch: 1 Global Step: 18670 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:30:04,860-Speed 5176.94 samples/sec Loss 6.8835 LearningRate 0.0891 Epoch: 1 Global Step: 18680 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:30:06,833-Speed 5190.31 samples/sec Loss 6.9007 LearningRate 0.0891 Epoch: 1 Global Step: 18690 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:30:08,816-Speed 5166.99 samples/sec Loss 6.8399 LearningRate 0.0891 Epoch: 1 Global Step: 18700 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:30:10,799-Speed 5164.69 samples/sec Loss 6.8443 LearningRate 0.0891 Epoch: 1 Global Step: 18710 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:30:12,759-Speed 5225.96 samples/sec Loss 6.8068 LearningRate 0.0891 Epoch: 1 Global Step: 18720 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:30:14,732-Speed 5191.37 samples/sec Loss 6.8969 LearningRate 0.0891 Epoch: 1 Global Step: 18730 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:30:16,700-Speed 5206.03 samples/sec Loss 6.7891 LearningRate 0.0891 Epoch: 1 Global Step: 18740 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:30:18,663-Speed 5217.39 samples/sec Loss 6.9094 LearningRate 0.0891 Epoch: 1 Global Step: 18750 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:30:20,626-Speed 5219.96 samples/sec Loss 6.8027 LearningRate 0.0891 Epoch: 1 Global Step: 18760 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:30:22,593-Speed 5205.66 samples/sec Loss 6.7702 LearningRate 0.0891 Epoch: 1 Global Step: 18770 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:30:24,567-Speed 5190.70 samples/sec Loss 6.9327 LearningRate 0.0891 Epoch: 1 Global Step: 18780 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:30:26,531-Speed 5216.18 samples/sec Loss 6.9687 LearningRate 0.0891 Epoch: 1 Global Step: 18790 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:30:28,505-Speed 5187.07 samples/sec Loss 6.9246 LearningRate 0.0891 Epoch: 1 Global Step: 18800 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:30:30,472-Speed 5208.17 samples/sec Loss 6.8685 LearningRate 0.0890 Epoch: 1 Global Step: 18810 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:30:32,440-Speed 5206.12 samples/sec Loss 6.8567 LearningRate 0.0890 Epoch: 1 Global Step: 18820 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:30:34,418-Speed 5179.06 samples/sec Loss 6.8733 LearningRate 0.0890 Epoch: 1 Global Step: 18830 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:30:36,405-Speed 5155.92 samples/sec Loss 6.7730 LearningRate 0.0890 Epoch: 1 Global Step: 18840 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:30:38,369-Speed 5213.66 samples/sec Loss 6.7997 LearningRate 0.0890 Epoch: 1 Global Step: 18850 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:30:40,351-Speed 5168.30 samples/sec Loss 6.9196 LearningRate 0.0890 Epoch: 1 Global Step: 18860 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:30:42,318-Speed 5207.22 samples/sec Loss 6.8941 LearningRate 0.0890 Epoch: 1 Global Step: 18870 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:30:44,297-Speed 5175.45 samples/sec Loss 6.9561 LearningRate 0.0890 Epoch: 1 Global Step: 18880 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:30:46,267-Speed 5200.81 samples/sec Loss 6.9096 LearningRate 0.0890 Epoch: 1 Global Step: 18890 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:30:48,230-Speed 5218.46 samples/sec Loss 6.9030 LearningRate 0.0890 Epoch: 1 Global Step: 18900 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:30:50,206-Speed 5184.43 samples/sec Loss 6.9686 LearningRate 0.0890 Epoch: 1 Global Step: 18910 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:30:52,199-Speed 5140.25 samples/sec Loss 6.9513 LearningRate 0.0890 Epoch: 1 Global Step: 18920 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:30:54,170-Speed 5196.61 samples/sec Loss 6.8223 LearningRate 0.0890 Epoch: 1 Global Step: 18930 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:30:56,138-Speed 5205.49 samples/sec Loss 6.8418 LearningRate 0.0890 Epoch: 1 Global Step: 18940 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:30:58,108-Speed 5200.31 samples/sec Loss 6.8624 LearningRate 0.0890 Epoch: 1 Global Step: 18950 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:31:00,070-Speed 5220.91 samples/sec Loss 6.8998 LearningRate 0.0890 Epoch: 1 Global Step: 18960 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:31:02,051-Speed 5169.90 samples/sec Loss 6.8312 LearningRate 0.0890 Epoch: 1 Global Step: 18970 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:31:04,026-Speed 5192.86 samples/sec Loss 6.8620 LearningRate 0.0890 Epoch: 1 Global Step: 18980 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:31:05,988-Speed 5218.81 samples/sec Loss 6.7881 LearningRate 0.0889 Epoch: 1 Global Step: 18990 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:31:07,946-Speed 5231.43 samples/sec Loss 6.8907 LearningRate 0.0889 Epoch: 1 Global Step: 19000 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:31:09,928-Speed 5168.35 samples/sec Loss 6.9337 LearningRate 0.0889 Epoch: 1 Global Step: 19010 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:31:11,900-Speed 5196.09 samples/sec Loss 6.8818 LearningRate 0.0889 Epoch: 1 Global Step: 19020 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:31:13,875-Speed 5185.14 samples/sec Loss 6.7672 LearningRate 0.0889 Epoch: 1 Global Step: 19030 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:31:15,852-Speed 5182.24 samples/sec Loss 6.8878 LearningRate 0.0889 Epoch: 1 Global Step: 19040 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:31:17,817-Speed 5212.48 samples/sec Loss 6.8801 LearningRate 0.0889 Epoch: 1 Global Step: 19050 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:31:19,792-Speed 5188.01 samples/sec Loss 6.8984 LearningRate 0.0889 Epoch: 1 Global Step: 19060 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:31:21,755-Speed 5217.81 samples/sec Loss 6.7416 LearningRate 0.0889 Epoch: 1 Global Step: 19070 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:31:23,732-Speed 5181.79 samples/sec Loss 6.8522 LearningRate 0.0889 Epoch: 1 Global Step: 19080 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:31:25,714-Speed 5168.06 samples/sec Loss 6.8400 LearningRate 0.0889 Epoch: 1 Global Step: 19090 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:31:27,689-Speed 5187.01 samples/sec Loss 6.8448 LearningRate 0.0889 Epoch: 1 Global Step: 19100 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:31:29,663-Speed 5187.76 samples/sec Loss 6.7783 LearningRate 0.0889 Epoch: 1 Global Step: 19110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:31:31,629-Speed 5210.99 samples/sec Loss 6.8181 LearningRate 0.0889 Epoch: 1 Global Step: 19120 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:31:33,604-Speed 5185.73 samples/sec Loss 6.7792 LearningRate 0.0889 Epoch: 1 Global Step: 19130 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:31:35,564-Speed 5226.64 samples/sec Loss 6.8538 LearningRate 0.0889 Epoch: 1 Global Step: 19140 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:31:37,528-Speed 5217.91 samples/sec Loss 6.8410 LearningRate 0.0889 Epoch: 1 Global Step: 19150 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:31:39,491-Speed 5217.01 samples/sec Loss 6.8987 LearningRate 0.0889 Epoch: 1 Global Step: 19160 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:31:41,455-Speed 5215.96 samples/sec Loss 6.7480 LearningRate 0.0888 Epoch: 1 Global Step: 19170 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:31:43,418-Speed 5216.22 samples/sec Loss 6.8421 LearningRate 0.0888 Epoch: 1 Global Step: 19180 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:31:45,394-Speed 5185.77 samples/sec Loss 6.9757 LearningRate 0.0888 Epoch: 1 Global Step: 19190 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:31:47,364-Speed 5199.54 samples/sec Loss 6.8523 LearningRate 0.0888 Epoch: 1 Global Step: 19200 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:31:49,324-Speed 5225.11 samples/sec Loss 6.9727 LearningRate 0.0888 Epoch: 1 Global Step: 19210 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:31:51,288-Speed 5214.48 samples/sec Loss 6.9086 LearningRate 0.0888 Epoch: 1 Global Step: 19220 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:31:53,260-Speed 5194.85 samples/sec Loss 6.9227 LearningRate 0.0888 Epoch: 1 Global Step: 19230 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:31:55,242-Speed 5169.28 samples/sec Loss 6.8872 LearningRate 0.0888 Epoch: 1 Global Step: 19240 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:31:57,210-Speed 5206.56 samples/sec Loss 6.8301 LearningRate 0.0888 Epoch: 1 Global Step: 19250 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:31:59,174-Speed 5215.99 samples/sec Loss 6.8457 LearningRate 0.0888 Epoch: 1 Global Step: 19260 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:32:01,139-Speed 5210.64 samples/sec Loss 6.8592 LearningRate 0.0888 Epoch: 1 Global Step: 19270 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:32:03,110-Speed 5198.45 samples/sec Loss 6.8351 LearningRate 0.0888 Epoch: 1 Global Step: 19280 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:32:05,077-Speed 5207.95 samples/sec Loss 6.8995 LearningRate 0.0888 Epoch: 1 Global Step: 19290 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:32:07,054-Speed 5181.80 samples/sec Loss 6.8199 LearningRate 0.0888 Epoch: 1 Global Step: 19300 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:32:09,028-Speed 5187.83 samples/sec Loss 6.8576 LearningRate 0.0888 Epoch: 1 Global Step: 19310 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:32:11,005-Speed 5181.67 samples/sec Loss 6.7727 LearningRate 0.0888 Epoch: 1 Global Step: 19320 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:32:12,991-Speed 5156.65 samples/sec Loss 6.8271 LearningRate 0.0888 Epoch: 1 Global Step: 19330 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:32:14,980-Speed 5151.31 samples/sec Loss 6.8180 LearningRate 0.0887 Epoch: 1 Global Step: 19340 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:32:16,944-Speed 5214.16 samples/sec Loss 6.9043 LearningRate 0.0887 Epoch: 1 Global Step: 19350 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:32:18,923-Speed 5177.99 samples/sec Loss 6.8533 LearningRate 0.0887 Epoch: 1 Global Step: 19360 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:32:20,892-Speed 5202.92 samples/sec Loss 6.8952 LearningRate 0.0887 Epoch: 1 Global Step: 19370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:32:22,874-Speed 5167.97 samples/sec Loss 6.8351 LearningRate 0.0887 Epoch: 1 Global Step: 19380 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:32:24,845-Speed 5196.20 samples/sec Loss 6.7590 LearningRate 0.0887 Epoch: 1 Global Step: 19390 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:32:26,810-Speed 5213.13 samples/sec Loss 6.8479 LearningRate 0.0887 Epoch: 1 Global Step: 19400 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:32:28,776-Speed 5210.35 samples/sec Loss 6.8759 LearningRate 0.0887 Epoch: 1 Global Step: 19410 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:32:30,740-Speed 5215.93 samples/sec Loss 6.7845 LearningRate 0.0887 Epoch: 1 Global Step: 19420 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:32:32,704-Speed 5214.43 samples/sec Loss 6.7081 LearningRate 0.0887 Epoch: 1 Global Step: 19430 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:32:34,672-Speed 5205.67 samples/sec Loss 6.8564 LearningRate 0.0887 Epoch: 1 Global Step: 19440 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-11 00:32:36,675-Speed 5113.30 samples/sec Loss 6.7204 LearningRate 0.0887 Epoch: 1 Global Step: 19450 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-11 00:32:38,640-Speed 5212.28 samples/sec Loss 6.9148 LearningRate 0.0887 Epoch: 1 Global Step: 19460 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:32:40,608-Speed 5206.34 samples/sec Loss 6.7993 LearningRate 0.0887 Epoch: 1 Global Step: 19470 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:32:42,576-Speed 5204.07 samples/sec Loss 6.7910 LearningRate 0.0887 Epoch: 1 Global Step: 19480 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:32:44,542-Speed 5210.85 samples/sec Loss 6.7837 LearningRate 0.0887 Epoch: 1 Global Step: 19490 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:32:46,525-Speed 5164.36 samples/sec Loss 6.7815 LearningRate 0.0887 Epoch: 1 Global Step: 19500 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:32:48,505-Speed 5176.05 samples/sec Loss 6.8595 LearningRate 0.0887 Epoch: 1 Global Step: 19510 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:32:50,493-Speed 5150.93 samples/sec Loss 6.8086 LearningRate 0.0886 Epoch: 1 Global Step: 19520 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:32:52,468-Speed 5187.95 samples/sec Loss 6.8473 LearningRate 0.0886 Epoch: 1 Global Step: 19530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:32:54,433-Speed 5212.42 samples/sec Loss 6.9445 LearningRate 0.0886 Epoch: 1 Global Step: 19540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:32:56,397-Speed 5215.84 samples/sec Loss 6.7465 LearningRate 0.0886 Epoch: 1 Global Step: 19550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:32:58,390-Speed 5138.26 samples/sec Loss 6.7684 LearningRate 0.0886 Epoch: 1 Global Step: 19560 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:33:00,357-Speed 5207.62 samples/sec Loss 6.8192 LearningRate 0.0886 Epoch: 1 Global Step: 19570 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:33:02,332-Speed 5187.05 samples/sec Loss 6.7888 LearningRate 0.0886 Epoch: 1 Global Step: 19580 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:33:04,310-Speed 5179.85 samples/sec Loss 6.9129 LearningRate 0.0886 Epoch: 1 Global Step: 19590 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:33:06,295-Speed 5160.16 samples/sec Loss 6.7600 LearningRate 0.0886 Epoch: 1 Global Step: 19600 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:33:08,262-Speed 5208.65 samples/sec Loss 6.8152 LearningRate 0.0886 Epoch: 1 Global Step: 19610 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:33:10,231-Speed 5201.94 samples/sec Loss 6.8213 LearningRate 0.0886 Epoch: 1 Global Step: 19620 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:33:12,199-Speed 5203.67 samples/sec Loss 6.8488 LearningRate 0.0886 Epoch: 1 Global Step: 19630 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:33:14,168-Speed 5202.57 samples/sec Loss 6.8564 LearningRate 0.0886 Epoch: 1 Global Step: 19640 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:33:16,135-Speed 5206.38 samples/sec Loss 6.7165 LearningRate 0.0886 Epoch: 1 Global Step: 19650 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:33:18,102-Speed 5208.73 samples/sec Loss 6.8047 LearningRate 0.0886 Epoch: 1 Global Step: 19660 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:33:20,070-Speed 5206.17 samples/sec Loss 6.8754 LearningRate 0.0886 Epoch: 1 Global Step: 19670 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:33:22,067-Speed 5128.12 samples/sec Loss 6.8752 LearningRate 0.0886 Epoch: 1 Global Step: 19680 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:33:24,057-Speed 5146.82 samples/sec Loss 6.8682 LearningRate 0.0886 Epoch: 1 Global Step: 19690 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:33:26,034-Speed 5181.77 samples/sec Loss 6.8355 LearningRate 0.0885 Epoch: 1 Global Step: 19700 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:33:28,008-Speed 5191.41 samples/sec Loss 6.8567 LearningRate 0.0885 Epoch: 1 Global Step: 19710 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:33:29,981-Speed 5190.73 samples/sec Loss 6.8641 LearningRate 0.0885 Epoch: 1 Global Step: 19720 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:33:31,947-Speed 5210.08 samples/sec Loss 6.7743 LearningRate 0.0885 Epoch: 1 Global Step: 19730 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:33:33,913-Speed 5210.09 samples/sec Loss 6.8773 LearningRate 0.0885 Epoch: 1 Global Step: 19740 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:33:35,877-Speed 5215.88 samples/sec Loss 6.7976 LearningRate 0.0885 Epoch: 1 Global Step: 19750 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:33:37,854-Speed 5181.03 samples/sec Loss 6.8551 LearningRate 0.0885 Epoch: 1 Global Step: 19760 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:33:39,833-Speed 5176.78 samples/sec Loss 6.8967 LearningRate 0.0885 Epoch: 1 Global Step: 19770 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:33:41,812-Speed 5174.37 samples/sec Loss 6.8500 LearningRate 0.0885 Epoch: 1 Global Step: 19780 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:33:43,784-Speed 5194.32 samples/sec Loss 6.8717 LearningRate 0.0885 Epoch: 1 Global Step: 19790 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:33:45,789-Speed 5109.89 samples/sec Loss 6.7881 LearningRate 0.0885 Epoch: 1 Global Step: 19800 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:33:47,774-Speed 5160.36 samples/sec Loss 6.8621 LearningRate 0.0885 Epoch: 1 Global Step: 19810 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:33:49,742-Speed 5205.73 samples/sec Loss 6.8003 LearningRate 0.0885 Epoch: 1 Global Step: 19820 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:33:51,716-Speed 5188.14 samples/sec Loss 6.8851 LearningRate 0.0885 Epoch: 1 Global Step: 19830 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:33:53,683-Speed 5207.57 samples/sec Loss 6.7225 LearningRate 0.0885 Epoch: 1 Global Step: 19840 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:33:55,652-Speed 5202.76 samples/sec Loss 6.7810 LearningRate 0.0885 Epoch: 1 Global Step: 19850 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:33:57,622-Speed 5200.71 samples/sec Loss 6.8620 LearningRate 0.0885 Epoch: 1 Global Step: 19860 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:33:59,592-Speed 5198.24 samples/sec Loss 6.7999 LearningRate 0.0884 Epoch: 1 Global Step: 19870 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:34:01,566-Speed 5190.85 samples/sec Loss 6.6925 LearningRate 0.0884 Epoch: 1 Global Step: 19880 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:34:03,537-Speed 5195.82 samples/sec Loss 6.8613 LearningRate 0.0884 Epoch: 1 Global Step: 19890 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:34:05,506-Speed 5202.75 samples/sec Loss 6.7856 LearningRate 0.0884 Epoch: 1 Global Step: 19900 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:34:07,476-Speed 5198.79 samples/sec Loss 6.7738 LearningRate 0.0884 Epoch: 1 Global Step: 19910 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:34:09,449-Speed 5193.92 samples/sec Loss 6.7731 LearningRate 0.0884 Epoch: 1 Global Step: 19920 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:34:11,427-Speed 5177.07 samples/sec Loss 6.8296 LearningRate 0.0884 Epoch: 1 Global Step: 19930 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:34:13,413-Speed 5159.60 samples/sec Loss 6.7862 LearningRate 0.0884 Epoch: 1 Global Step: 19940 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:34:15,393-Speed 5172.78 samples/sec Loss 6.7623 LearningRate 0.0884 Epoch: 1 Global Step: 19950 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:34:17,370-Speed 5181.46 samples/sec Loss 6.7800 LearningRate 0.0884 Epoch: 1 Global Step: 19960 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:34:19,338-Speed 5203.66 samples/sec Loss 6.7916 LearningRate 0.0884 Epoch: 1 Global Step: 19970 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:34:21,307-Speed 5202.86 samples/sec Loss 6.7836 LearningRate 0.0884 Epoch: 1 Global Step: 19980 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:34:23,312-Speed 5109.82 samples/sec Loss 6.7949 LearningRate 0.0884 Epoch: 1 Global Step: 19990 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-11 00:34:25,304-Speed 5142.22 samples/sec Loss 6.6819 LearningRate 0.0884 Epoch: 1 Global Step: 20000 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-11 00:34:51,927-[lfw][20000]XNorm: 22.723473 Training: 2022-04-11 00:34:51,927-[lfw][20000]Accuracy-Flip: 0.99600+-0.00249 Training: 2022-04-11 00:34:51,927-[lfw][20000]Accuracy-Highest: 0.99683 Training: 2022-04-11 00:35:22,846-[cfp_fp][20000]XNorm: 19.780425 Training: 2022-04-11 00:35:22,847-[cfp_fp][20000]Accuracy-Flip: 0.96343+-0.00649 Training: 2022-04-11 00:35:22,847-[cfp_fp][20000]Accuracy-Highest: 0.96343 Training: 2022-04-11 00:35:49,626-[agedb_30][20000]XNorm: 22.169983 Training: 2022-04-11 00:35:49,627-[agedb_30][20000]Accuracy-Flip: 0.96683+-0.00794 Training: 2022-04-11 00:35:49,627-[agedb_30][20000]Accuracy-Highest: 0.96683 Training: 2022-04-11 00:35:51,604-Speed 118.66 samples/sec Loss 6.8046 LearningRate 0.0884 Epoch: 1 Global Step: 20010 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-11 00:35:53,556-Speed 5248.27 samples/sec Loss 6.6818 LearningRate 0.0884 Epoch: 1 Global Step: 20020 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:35:55,524-Speed 5205.85 samples/sec Loss 6.8684 LearningRate 0.0884 Epoch: 1 Global Step: 20030 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:35:57,484-Speed 5224.08 samples/sec Loss 6.7421 LearningRate 0.0884 Epoch: 1 Global Step: 20040 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:35:59,458-Speed 5190.52 samples/sec Loss 6.9254 LearningRate 0.0883 Epoch: 1 Global Step: 20050 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:36:01,440-Speed 5170.39 samples/sec Loss 6.7925 LearningRate 0.0883 Epoch: 1 Global Step: 20060 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:36:03,404-Speed 5215.50 samples/sec Loss 6.7144 LearningRate 0.0883 Epoch: 1 Global Step: 20070 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:36:05,370-Speed 5208.51 samples/sec Loss 6.7486 LearningRate 0.0883 Epoch: 1 Global Step: 20080 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:36:07,333-Speed 5219.28 samples/sec Loss 6.7986 LearningRate 0.0883 Epoch: 1 Global Step: 20090 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:36:09,299-Speed 5209.02 samples/sec Loss 6.8044 LearningRate 0.0883 Epoch: 1 Global Step: 20100 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:36:11,264-Speed 5213.65 samples/sec Loss 6.7958 LearningRate 0.0883 Epoch: 1 Global Step: 20110 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:36:13,236-Speed 5193.24 samples/sec Loss 6.7661 LearningRate 0.0883 Epoch: 1 Global Step: 20120 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:36:15,200-Speed 5216.43 samples/sec Loss 6.8321 LearningRate 0.0883 Epoch: 1 Global Step: 20130 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:36:17,164-Speed 5215.74 samples/sec Loss 6.8508 LearningRate 0.0883 Epoch: 1 Global Step: 20140 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:36:19,137-Speed 5190.07 samples/sec Loss 6.8207 LearningRate 0.0883 Epoch: 1 Global Step: 20150 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:36:21,109-Speed 5196.07 samples/sec Loss 6.9039 LearningRate 0.0883 Epoch: 1 Global Step: 20160 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:36:23,079-Speed 5198.57 samples/sec Loss 6.8039 LearningRate 0.0883 Epoch: 1 Global Step: 20170 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:36:25,064-Speed 5162.40 samples/sec Loss 6.9357 LearningRate 0.0883 Epoch: 1 Global Step: 20180 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:36:27,029-Speed 5212.19 samples/sec Loss 6.6533 LearningRate 0.0883 Epoch: 1 Global Step: 20190 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:36:28,996-Speed 5208.59 samples/sec Loss 6.8222 LearningRate 0.0883 Epoch: 1 Global Step: 20200 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:36:30,968-Speed 5194.00 samples/sec Loss 6.8439 LearningRate 0.0883 Epoch: 1 Global Step: 20210 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:36:32,946-Speed 5179.70 samples/sec Loss 6.8490 LearningRate 0.0883 Epoch: 1 Global Step: 20220 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:36:34,918-Speed 5194.04 samples/sec Loss 6.7124 LearningRate 0.0882 Epoch: 1 Global Step: 20230 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:36:36,908-Speed 5147.02 samples/sec Loss 6.8450 LearningRate 0.0882 Epoch: 1 Global Step: 20240 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:36:38,878-Speed 5197.88 samples/sec Loss 6.8460 LearningRate 0.0882 Epoch: 1 Global Step: 20250 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:36:40,850-Speed 5196.56 samples/sec Loss 6.8451 LearningRate 0.0882 Epoch: 1 Global Step: 20260 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:36:42,816-Speed 5208.21 samples/sec Loss 6.7538 LearningRate 0.0882 Epoch: 1 Global Step: 20270 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:36:44,784-Speed 5204.94 samples/sec Loss 6.7142 LearningRate 0.0882 Epoch: 1 Global Step: 20280 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:36:46,749-Speed 5212.95 samples/sec Loss 6.8905 LearningRate 0.0882 Epoch: 1 Global Step: 20290 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:36:48,718-Speed 5203.19 samples/sec Loss 6.8055 LearningRate 0.0882 Epoch: 1 Global Step: 20300 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:36:50,685-Speed 5207.34 samples/sec Loss 6.7899 LearningRate 0.0882 Epoch: 1 Global Step: 20310 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:36:52,654-Speed 5203.63 samples/sec Loss 6.7358 LearningRate 0.0882 Epoch: 1 Global Step: 20320 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:36:54,618-Speed 5217.00 samples/sec Loss 6.6930 LearningRate 0.0882 Epoch: 1 Global Step: 20330 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:36:56,584-Speed 5207.92 samples/sec Loss 6.7943 LearningRate 0.0882 Epoch: 1 Global Step: 20340 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:36:58,558-Speed 5189.10 samples/sec Loss 6.8650 LearningRate 0.0882 Epoch: 1 Global Step: 20350 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:37:00,525-Speed 5209.14 samples/sec Loss 6.8160 LearningRate 0.0882 Epoch: 1 Global Step: 20360 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:37:02,493-Speed 5205.79 samples/sec Loss 6.7175 LearningRate 0.0882 Epoch: 1 Global Step: 20370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:37:04,465-Speed 5193.21 samples/sec Loss 6.7107 LearningRate 0.0882 Epoch: 1 Global Step: 20380 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:37:06,438-Speed 5191.30 samples/sec Loss 6.7828 LearningRate 0.0882 Epoch: 1 Global Step: 20390 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:37:08,410-Speed 5195.67 samples/sec Loss 6.8316 LearningRate 0.0882 Epoch: 1 Global Step: 20400 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:37:10,389-Speed 5174.81 samples/sec Loss 6.7551 LearningRate 0.0881 Epoch: 1 Global Step: 20410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:37:12,364-Speed 5187.04 samples/sec Loss 6.7781 LearningRate 0.0881 Epoch: 1 Global Step: 20420 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:37:14,330-Speed 5211.36 samples/sec Loss 6.8205 LearningRate 0.0881 Epoch: 1 Global Step: 20430 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:37:16,296-Speed 5208.89 samples/sec Loss 6.7317 LearningRate 0.0881 Epoch: 1 Global Step: 20440 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:37:18,278-Speed 5169.59 samples/sec Loss 6.8179 LearningRate 0.0881 Epoch: 1 Global Step: 20450 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:37:20,251-Speed 5190.53 samples/sec Loss 6.8352 LearningRate 0.0881 Epoch: 1 Global Step: 20460 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:37:22,216-Speed 5213.87 samples/sec Loss 6.9057 LearningRate 0.0881 Epoch: 1 Global Step: 20470 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:37:24,184-Speed 5204.34 samples/sec Loss 6.8262 LearningRate 0.0881 Epoch: 1 Global Step: 20480 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:37:26,156-Speed 5193.46 samples/sec Loss 6.8222 LearningRate 0.0881 Epoch: 1 Global Step: 20490 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:37:28,140-Speed 5163.17 samples/sec Loss 6.7336 LearningRate 0.0881 Epoch: 1 Global Step: 20500 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:37:30,103-Speed 5218.60 samples/sec Loss 6.9093 LearningRate 0.0881 Epoch: 1 Global Step: 20510 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:37:32,072-Speed 5203.98 samples/sec Loss 6.7068 LearningRate 0.0881 Epoch: 1 Global Step: 20520 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:37:34,040-Speed 5205.70 samples/sec Loss 6.6984 LearningRate 0.0881 Epoch: 1 Global Step: 20530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:37:36,008-Speed 5202.59 samples/sec Loss 6.7228 LearningRate 0.0881 Epoch: 1 Global Step: 20540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:37:37,993-Speed 5162.18 samples/sec Loss 6.8265 LearningRate 0.0881 Epoch: 1 Global Step: 20550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:37:39,960-Speed 5206.35 samples/sec Loss 6.7327 LearningRate 0.0881 Epoch: 1 Global Step: 20560 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:37:41,930-Speed 5201.15 samples/sec Loss 6.8137 LearningRate 0.0881 Epoch: 1 Global Step: 20570 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:37:43,908-Speed 5176.98 samples/sec Loss 6.7944 LearningRate 0.0881 Epoch: 1 Global Step: 20580 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:37:45,899-Speed 5145.26 samples/sec Loss 6.8356 LearningRate 0.0880 Epoch: 1 Global Step: 20590 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:37:47,887-Speed 5151.82 samples/sec Loss 6.7939 LearningRate 0.0880 Epoch: 1 Global Step: 20600 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:37:49,880-Speed 5141.25 samples/sec Loss 6.7535 LearningRate 0.0880 Epoch: 1 Global Step: 20610 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:37:51,861-Speed 5168.77 samples/sec Loss 6.8730 LearningRate 0.0880 Epoch: 1 Global Step: 20620 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:37:53,846-Speed 5161.89 samples/sec Loss 6.7683 LearningRate 0.0880 Epoch: 1 Global Step: 20630 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:37:55,820-Speed 5188.57 samples/sec Loss 6.7877 LearningRate 0.0880 Epoch: 1 Global Step: 20640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:37:57,789-Speed 5202.86 samples/sec Loss 6.7880 LearningRate 0.0880 Epoch: 1 Global Step: 20650 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:37:59,756-Speed 5208.13 samples/sec Loss 6.8541 LearningRate 0.0880 Epoch: 1 Global Step: 20660 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:38:01,746-Speed 5147.95 samples/sec Loss 6.6901 LearningRate 0.0880 Epoch: 1 Global Step: 20670 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:38:03,725-Speed 5175.86 samples/sec Loss 6.7795 LearningRate 0.0880 Epoch: 1 Global Step: 20680 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:38:05,685-Speed 5225.46 samples/sec Loss 6.6099 LearningRate 0.0880 Epoch: 1 Global Step: 20690 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:38:07,662-Speed 5180.82 samples/sec Loss 6.8240 LearningRate 0.0880 Epoch: 1 Global Step: 20700 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:38:09,636-Speed 5189.27 samples/sec Loss 6.7162 LearningRate 0.0880 Epoch: 1 Global Step: 20710 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:38:11,609-Speed 5193.13 samples/sec Loss 6.7912 LearningRate 0.0880 Epoch: 1 Global Step: 20720 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:38:13,580-Speed 5196.96 samples/sec Loss 6.7131 LearningRate 0.0880 Epoch: 1 Global Step: 20730 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:38:15,549-Speed 5201.66 samples/sec Loss 6.7661 LearningRate 0.0880 Epoch: 1 Global Step: 20740 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:38:17,518-Speed 5202.11 samples/sec Loss 6.7083 LearningRate 0.0880 Epoch: 1 Global Step: 20750 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:38:19,486-Speed 5205.97 samples/sec Loss 6.7725 LearningRate 0.0879 Epoch: 1 Global Step: 20760 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:38:21,453-Speed 5208.90 samples/sec Loss 6.7428 LearningRate 0.0879 Epoch: 1 Global Step: 20770 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:38:23,424-Speed 5196.65 samples/sec Loss 6.7873 LearningRate 0.0879 Epoch: 1 Global Step: 20780 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:38:25,390-Speed 5208.90 samples/sec Loss 6.6934 LearningRate 0.0879 Epoch: 1 Global Step: 20790 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:38:27,358-Speed 5206.34 samples/sec Loss 6.8283 LearningRate 0.0879 Epoch: 1 Global Step: 20800 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:38:29,342-Speed 5161.63 samples/sec Loss 6.7033 LearningRate 0.0879 Epoch: 1 Global Step: 20810 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:38:31,316-Speed 5190.06 samples/sec Loss 6.8018 LearningRate 0.0879 Epoch: 1 Global Step: 20820 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:38:33,288-Speed 5192.95 samples/sec Loss 6.7611 LearningRate 0.0879 Epoch: 1 Global Step: 20830 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:38:35,260-Speed 5195.33 samples/sec Loss 6.7342 LearningRate 0.0879 Epoch: 1 Global Step: 20840 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:38:37,244-Speed 5165.05 samples/sec Loss 6.8317 LearningRate 0.0879 Epoch: 1 Global Step: 20850 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:38:39,225-Speed 5171.19 samples/sec Loss 6.7980 LearningRate 0.0879 Epoch: 1 Global Step: 20860 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:38:41,192-Speed 5207.13 samples/sec Loss 6.7791 LearningRate 0.0879 Epoch: 1 Global Step: 20870 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:38:43,159-Speed 5205.77 samples/sec Loss 6.7824 LearningRate 0.0879 Epoch: 1 Global Step: 20880 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:38:45,139-Speed 5175.63 samples/sec Loss 6.6252 LearningRate 0.0879 Epoch: 1 Global Step: 20890 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:38:47,107-Speed 5203.68 samples/sec Loss 6.7173 LearningRate 0.0879 Epoch: 1 Global Step: 20900 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:38:49,075-Speed 5205.55 samples/sec Loss 6.8150 LearningRate 0.0879 Epoch: 1 Global Step: 20910 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:38:51,054-Speed 5176.11 samples/sec Loss 6.6834 LearningRate 0.0879 Epoch: 1 Global Step: 20920 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:38:53,020-Speed 5210.22 samples/sec Loss 6.7139 LearningRate 0.0879 Epoch: 1 Global Step: 20930 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:38:54,989-Speed 5201.59 samples/sec Loss 6.6946 LearningRate 0.0878 Epoch: 1 Global Step: 20940 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:38:56,976-Speed 5156.16 samples/sec Loss 6.7450 LearningRate 0.0878 Epoch: 1 Global Step: 20950 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:38:58,956-Speed 5172.32 samples/sec Loss 6.7068 LearningRate 0.0878 Epoch: 1 Global Step: 20960 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:39:00,922-Speed 5211.27 samples/sec Loss 6.7680 LearningRate 0.0878 Epoch: 1 Global Step: 20970 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:39:02,900-Speed 5179.59 samples/sec Loss 6.6926 LearningRate 0.0878 Epoch: 1 Global Step: 20980 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:39:04,867-Speed 5207.24 samples/sec Loss 6.7205 LearningRate 0.0878 Epoch: 1 Global Step: 20990 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:39:06,837-Speed 5200.01 samples/sec Loss 6.6469 LearningRate 0.0878 Epoch: 1 Global Step: 21000 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:39:08,803-Speed 5210.39 samples/sec Loss 6.6809 LearningRate 0.0878 Epoch: 1 Global Step: 21010 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:39:10,785-Speed 5167.86 samples/sec Loss 6.7801 LearningRate 0.0878 Epoch: 1 Global Step: 21020 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:39:12,755-Speed 5198.60 samples/sec Loss 6.8111 LearningRate 0.0878 Epoch: 1 Global Step: 21030 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:39:14,739-Speed 5163.46 samples/sec Loss 6.7311 LearningRate 0.0878 Epoch: 1 Global Step: 21040 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:39:16,720-Speed 5170.87 samples/sec Loss 6.6380 LearningRate 0.0878 Epoch: 1 Global Step: 21050 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:39:18,684-Speed 5214.82 samples/sec Loss 6.6751 LearningRate 0.0878 Epoch: 1 Global Step: 21060 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:39:20,652-Speed 5206.73 samples/sec Loss 6.6127 LearningRate 0.0878 Epoch: 1 Global Step: 21070 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:39:22,628-Speed 5183.23 samples/sec Loss 6.6851 LearningRate 0.0878 Epoch: 1 Global Step: 21080 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:39:24,597-Speed 5204.36 samples/sec Loss 6.6895 LearningRate 0.0878 Epoch: 1 Global Step: 21090 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:39:26,573-Speed 5183.85 samples/sec Loss 6.5627 LearningRate 0.0878 Epoch: 1 Global Step: 21100 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:39:28,540-Speed 5207.72 samples/sec Loss 6.7660 LearningRate 0.0878 Epoch: 1 Global Step: 21110 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:39:30,507-Speed 5205.74 samples/sec Loss 6.7422 LearningRate 0.0877 Epoch: 1 Global Step: 21120 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:39:32,475-Speed 5206.27 samples/sec Loss 6.6708 LearningRate 0.0877 Epoch: 1 Global Step: 21130 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:39:34,441-Speed 5210.44 samples/sec Loss 6.6060 LearningRate 0.0877 Epoch: 1 Global Step: 21140 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:39:36,406-Speed 5211.46 samples/sec Loss 6.6342 LearningRate 0.0877 Epoch: 1 Global Step: 21150 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:39:38,376-Speed 5201.14 samples/sec Loss 6.6732 LearningRate 0.0877 Epoch: 1 Global Step: 21160 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:39:40,359-Speed 5163.44 samples/sec Loss 6.6514 LearningRate 0.0877 Epoch: 1 Global Step: 21170 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:39:42,333-Speed 5189.47 samples/sec Loss 6.6323 LearningRate 0.0877 Epoch: 1 Global Step: 21180 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:39:44,302-Speed 5204.58 samples/sec Loss 6.6645 LearningRate 0.0877 Epoch: 1 Global Step: 21190 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:39:46,273-Speed 5196.16 samples/sec Loss 6.7055 LearningRate 0.0877 Epoch: 1 Global Step: 21200 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:39:48,238-Speed 5212.06 samples/sec Loss 6.6171 LearningRate 0.0877 Epoch: 1 Global Step: 21210 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:39:50,206-Speed 5205.17 samples/sec Loss 6.7181 LearningRate 0.0877 Epoch: 1 Global Step: 21220 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:39:52,173-Speed 5207.39 samples/sec Loss 6.6425 LearningRate 0.0877 Epoch: 1 Global Step: 21230 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:39:54,140-Speed 5206.82 samples/sec Loss 6.6946 LearningRate 0.0877 Epoch: 1 Global Step: 21240 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:39:56,110-Speed 5200.75 samples/sec Loss 6.6928 LearningRate 0.0877 Epoch: 1 Global Step: 21250 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:39:58,077-Speed 5208.88 samples/sec Loss 6.7266 LearningRate 0.0877 Epoch: 1 Global Step: 21260 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:40:00,043-Speed 5209.22 samples/sec Loss 6.7119 LearningRate 0.0877 Epoch: 1 Global Step: 21270 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:40:02,011-Speed 5204.63 samples/sec Loss 6.7530 LearningRate 0.0877 Epoch: 1 Global Step: 21280 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:40:03,988-Speed 5181.76 samples/sec Loss 6.6843 LearningRate 0.0877 Epoch: 1 Global Step: 21290 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:40:05,969-Speed 5170.49 samples/sec Loss 6.7462 LearningRate 0.0876 Epoch: 1 Global Step: 21300 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:40:07,937-Speed 5205.70 samples/sec Loss 6.6311 LearningRate 0.0876 Epoch: 1 Global Step: 21310 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:40:09,904-Speed 5209.73 samples/sec Loss 6.7189 LearningRate 0.0876 Epoch: 1 Global Step: 21320 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:40:11,887-Speed 5165.10 samples/sec Loss 6.6805 LearningRate 0.0876 Epoch: 1 Global Step: 21330 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:40:13,867-Speed 5172.87 samples/sec Loss 6.7514 LearningRate 0.0876 Epoch: 1 Global Step: 21340 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:40:15,844-Speed 5182.38 samples/sec Loss 6.7239 LearningRate 0.0876 Epoch: 1 Global Step: 21350 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:40:17,816-Speed 5192.94 samples/sec Loss 6.6203 LearningRate 0.0876 Epoch: 1 Global Step: 21360 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:40:19,788-Speed 5195.58 samples/sec Loss 6.6595 LearningRate 0.0876 Epoch: 1 Global Step: 21370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:40:21,751-Speed 5216.92 samples/sec Loss 6.7268 LearningRate 0.0876 Epoch: 1 Global Step: 21380 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:40:23,718-Speed 5207.36 samples/sec Loss 6.7041 LearningRate 0.0876 Epoch: 1 Global Step: 21390 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:40:25,689-Speed 5196.28 samples/sec Loss 6.5407 LearningRate 0.0876 Epoch: 1 Global Step: 21400 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:40:27,658-Speed 5204.28 samples/sec Loss 6.6323 LearningRate 0.0876 Epoch: 1 Global Step: 21410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:40:29,642-Speed 5163.74 samples/sec Loss 6.6340 LearningRate 0.0876 Epoch: 1 Global Step: 21420 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:40:31,621-Speed 5175.72 samples/sec Loss 6.7251 LearningRate 0.0876 Epoch: 1 Global Step: 21430 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:40:33,589-Speed 5203.58 samples/sec Loss 6.5575 LearningRate 0.0876 Epoch: 1 Global Step: 21440 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:40:35,554-Speed 5213.94 samples/sec Loss 6.6341 LearningRate 0.0876 Epoch: 1 Global Step: 21450 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:40:37,529-Speed 5185.80 samples/sec Loss 6.6224 LearningRate 0.0876 Epoch: 1 Global Step: 21460 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:40:39,523-Speed 5138.05 samples/sec Loss 6.7121 LearningRate 0.0876 Epoch: 1 Global Step: 21470 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:40:41,495-Speed 5194.66 samples/sec Loss 6.7635 LearningRate 0.0875 Epoch: 1 Global Step: 21480 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:40:43,474-Speed 5173.92 samples/sec Loss 6.7130 LearningRate 0.0875 Epoch: 1 Global Step: 21490 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:40:45,440-Speed 5211.75 samples/sec Loss 6.6190 LearningRate 0.0875 Epoch: 1 Global Step: 21500 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:40:47,429-Speed 5149.30 samples/sec Loss 6.5240 LearningRate 0.0875 Epoch: 1 Global Step: 21510 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:40:49,425-Speed 5134.25 samples/sec Loss 6.6594 LearningRate 0.0875 Epoch: 1 Global Step: 21520 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:40:51,394-Speed 5201.05 samples/sec Loss 6.6352 LearningRate 0.0875 Epoch: 1 Global Step: 21530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:40:53,364-Speed 5200.14 samples/sec Loss 6.7270 LearningRate 0.0875 Epoch: 1 Global Step: 21540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:40:55,337-Speed 5192.51 samples/sec Loss 6.6695 LearningRate 0.0875 Epoch: 1 Global Step: 21550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:40:57,317-Speed 5173.01 samples/sec Loss 6.5473 LearningRate 0.0875 Epoch: 1 Global Step: 21560 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:40:59,285-Speed 5205.80 samples/sec Loss 6.6689 LearningRate 0.0875 Epoch: 1 Global Step: 21570 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:41:01,261-Speed 5182.47 samples/sec Loss 6.6122 LearningRate 0.0875 Epoch: 1 Global Step: 21580 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:41:03,237-Speed 5183.29 samples/sec Loss 6.7151 LearningRate 0.0875 Epoch: 1 Global Step: 21590 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:41:05,220-Speed 5165.07 samples/sec Loss 6.6303 LearningRate 0.0875 Epoch: 1 Global Step: 21600 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:41:07,195-Speed 5188.42 samples/sec Loss 6.6657 LearningRate 0.0875 Epoch: 1 Global Step: 21610 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:41:09,164-Speed 5202.97 samples/sec Loss 6.6607 LearningRate 0.0875 Epoch: 1 Global Step: 21620 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:41:11,140-Speed 5184.05 samples/sec Loss 6.7390 LearningRate 0.0875 Epoch: 1 Global Step: 21630 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:41:13,123-Speed 5164.96 samples/sec Loss 6.6286 LearningRate 0.0875 Epoch: 1 Global Step: 21640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:41:15,096-Speed 5192.86 samples/sec Loss 6.7053 LearningRate 0.0874 Epoch: 1 Global Step: 21650 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:41:17,065-Speed 5202.14 samples/sec Loss 6.6408 LearningRate 0.0874 Epoch: 1 Global Step: 21660 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:41:19,026-Speed 5223.88 samples/sec Loss 6.6790 LearningRate 0.0874 Epoch: 1 Global Step: 21670 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:41:20,992-Speed 5208.48 samples/sec Loss 6.6175 LearningRate 0.0874 Epoch: 1 Global Step: 21680 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:41:22,969-Speed 5182.04 samples/sec Loss 6.6263 LearningRate 0.0874 Epoch: 1 Global Step: 21690 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:41:24,957-Speed 5151.22 samples/sec Loss 6.6553 LearningRate 0.0874 Epoch: 1 Global Step: 21700 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:41:26,957-Speed 5122.79 samples/sec Loss 6.6469 LearningRate 0.0874 Epoch: 1 Global Step: 21710 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:41:28,946-Speed 5149.83 samples/sec Loss 6.7090 LearningRate 0.0874 Epoch: 1 Global Step: 21720 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:41:30,929-Speed 5166.34 samples/sec Loss 6.6688 LearningRate 0.0874 Epoch: 1 Global Step: 21730 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:41:32,908-Speed 5174.95 samples/sec Loss 6.5412 LearningRate 0.0874 Epoch: 1 Global Step: 21740 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:41:34,876-Speed 5204.96 samples/sec Loss 6.6769 LearningRate 0.0874 Epoch: 1 Global Step: 21750 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:41:36,857-Speed 5171.80 samples/sec Loss 6.7031 LearningRate 0.0874 Epoch: 1 Global Step: 21760 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:41:38,830-Speed 5192.63 samples/sec Loss 6.6172 LearningRate 0.0874 Epoch: 1 Global Step: 21770 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:41:40,801-Speed 5195.28 samples/sec Loss 6.6706 LearningRate 0.0874 Epoch: 1 Global Step: 21780 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:41:42,775-Speed 5191.20 samples/sec Loss 6.6722 LearningRate 0.0874 Epoch: 1 Global Step: 21790 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:41:44,745-Speed 5197.52 samples/sec Loss 6.6712 LearningRate 0.0874 Epoch: 1 Global Step: 21800 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:41:46,725-Speed 5175.37 samples/sec Loss 6.6209 LearningRate 0.0874 Epoch: 1 Global Step: 21810 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:41:48,694-Speed 5201.82 samples/sec Loss 6.6286 LearningRate 0.0874 Epoch: 1 Global Step: 21820 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:41:50,673-Speed 5175.35 samples/sec Loss 6.5788 LearningRate 0.0873 Epoch: 1 Global Step: 21830 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:41:52,660-Speed 5155.00 samples/sec Loss 6.6534 LearningRate 0.0873 Epoch: 1 Global Step: 21840 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:41:54,630-Speed 5200.03 samples/sec Loss 6.6084 LearningRate 0.0873 Epoch: 1 Global Step: 21850 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:41:56,597-Speed 5208.07 samples/sec Loss 6.6649 LearningRate 0.0873 Epoch: 1 Global Step: 21860 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:41:58,581-Speed 5162.28 samples/sec Loss 6.6970 LearningRate 0.0873 Epoch: 1 Global Step: 21870 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:42:00,563-Speed 5168.58 samples/sec Loss 6.5577 LearningRate 0.0873 Epoch: 1 Global Step: 21880 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:42:02,535-Speed 5194.23 samples/sec Loss 6.7334 LearningRate 0.0873 Epoch: 1 Global Step: 21890 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:42:04,518-Speed 5164.87 samples/sec Loss 6.6849 LearningRate 0.0873 Epoch: 1 Global Step: 21900 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:42:06,488-Speed 5199.65 samples/sec Loss 6.6444 LearningRate 0.0873 Epoch: 1 Global Step: 21910 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:42:08,459-Speed 5199.23 samples/sec Loss 6.7120 LearningRate 0.0873 Epoch: 1 Global Step: 21920 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:42:10,433-Speed 5189.09 samples/sec Loss 6.6074 LearningRate 0.0873 Epoch: 1 Global Step: 21930 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:42:12,453-Speed 5071.16 samples/sec Loss 6.6170 LearningRate 0.0873 Epoch: 1 Global Step: 21940 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:42:14,441-Speed 5153.08 samples/sec Loss 6.6050 LearningRate 0.0873 Epoch: 1 Global Step: 21950 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:42:16,411-Speed 5198.37 samples/sec Loss 6.6402 LearningRate 0.0873 Epoch: 1 Global Step: 21960 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:42:18,379-Speed 5206.57 samples/sec Loss 6.5773 LearningRate 0.0873 Epoch: 1 Global Step: 21970 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-11 00:42:20,343-Speed 5213.84 samples/sec Loss 6.6540 LearningRate 0.0873 Epoch: 1 Global Step: 21980 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:42:22,317-Speed 5190.69 samples/sec Loss 6.5379 LearningRate 0.0873 Epoch: 1 Global Step: 21990 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:42:24,284-Speed 5205.47 samples/sec Loss 6.6160 LearningRate 0.0873 Epoch: 1 Global Step: 22000 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:42:50,773-[lfw][22000]XNorm: 22.497452 Training: 2022-04-11 00:42:50,774-[lfw][22000]Accuracy-Flip: 0.99517+-0.00353 Training: 2022-04-11 00:42:50,774-[lfw][22000]Accuracy-Highest: 0.99683 Training: 2022-04-11 00:43:21,496-[cfp_fp][22000]XNorm: 20.317650 Training: 2022-04-11 00:43:21,496-[cfp_fp][22000]Accuracy-Flip: 0.96957+-0.00796 Training: 2022-04-11 00:43:21,497-[cfp_fp][22000]Accuracy-Highest: 0.96957 Training: 2022-04-11 00:43:48,139-[agedb_30][22000]XNorm: 22.127961 Training: 2022-04-11 00:43:48,139-[agedb_30][22000]Accuracy-Flip: 0.96700+-0.00752 Training: 2022-04-11 00:43:48,140-[agedb_30][22000]Accuracy-Highest: 0.96700 Training: 2022-04-11 00:43:50,123-Speed 119.30 samples/sec Loss 6.7597 LearningRate 0.0872 Epoch: 1 Global Step: 22010 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:43:52,090-Speed 5206.11 samples/sec Loss 6.6885 LearningRate 0.0872 Epoch: 1 Global Step: 22020 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:43:54,051-Speed 5222.47 samples/sec Loss 6.6144 LearningRate 0.0872 Epoch: 1 Global Step: 22030 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:43:56,021-Speed 5201.84 samples/sec Loss 6.5397 LearningRate 0.0872 Epoch: 1 Global Step: 22040 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:43:57,988-Speed 5206.11 samples/sec Loss 6.6605 LearningRate 0.0872 Epoch: 1 Global Step: 22050 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:43:59,964-Speed 5185.18 samples/sec Loss 6.5930 LearningRate 0.0872 Epoch: 1 Global Step: 22060 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:44:01,931-Speed 5205.92 samples/sec Loss 6.6076 LearningRate 0.0872 Epoch: 1 Global Step: 22070 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:44:03,895-Speed 5215.55 samples/sec Loss 6.6872 LearningRate 0.0872 Epoch: 1 Global Step: 22080 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-11 00:44:05,854-Speed 5230.11 samples/sec Loss 6.6985 LearningRate 0.0872 Epoch: 1 Global Step: 22090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:44:07,814-Speed 5226.15 samples/sec Loss 6.6444 LearningRate 0.0872 Epoch: 1 Global Step: 22100 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:44:09,781-Speed 5206.88 samples/sec Loss 6.7260 LearningRate 0.0872 Epoch: 1 Global Step: 22110 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:44:11,749-Speed 5205.37 samples/sec Loss 6.6749 LearningRate 0.0872 Epoch: 1 Global Step: 22120 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:44:13,720-Speed 5196.13 samples/sec Loss 6.7161 LearningRate 0.0872 Epoch: 1 Global Step: 22130 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:44:15,687-Speed 5208.91 samples/sec Loss 6.6177 LearningRate 0.0872 Epoch: 1 Global Step: 22140 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:44:17,672-Speed 5161.35 samples/sec Loss 6.6318 LearningRate 0.0872 Epoch: 1 Global Step: 22150 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:44:19,643-Speed 5195.64 samples/sec Loss 6.6261 LearningRate 0.0872 Epoch: 1 Global Step: 22160 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:44:21,619-Speed 5184.50 samples/sec Loss 6.5870 LearningRate 0.0872 Epoch: 1 Global Step: 22170 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:44:23,601-Speed 5167.76 samples/sec Loss 6.5550 LearningRate 0.0872 Epoch: 1 Global Step: 22180 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:44:25,602-Speed 5118.35 samples/sec Loss 6.5644 LearningRate 0.0871 Epoch: 1 Global Step: 22190 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:44:27,576-Speed 5191.00 samples/sec Loss 6.5271 LearningRate 0.0871 Epoch: 1 Global Step: 22200 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:44:29,548-Speed 5193.34 samples/sec Loss 6.5586 LearningRate 0.0871 Epoch: 1 Global Step: 22210 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:44:31,527-Speed 5177.03 samples/sec Loss 6.6542 LearningRate 0.0871 Epoch: 1 Global Step: 22220 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:44:33,495-Speed 5203.98 samples/sec Loss 6.5176 LearningRate 0.0871 Epoch: 1 Global Step: 22230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:44:35,476-Speed 5172.50 samples/sec Loss 6.7250 LearningRate 0.0871 Epoch: 1 Global Step: 22240 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:44:37,456-Speed 5172.12 samples/sec Loss 6.5828 LearningRate 0.0871 Epoch: 1 Global Step: 22250 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:44:39,450-Speed 5137.12 samples/sec Loss 6.5317 LearningRate 0.0871 Epoch: 1 Global Step: 22260 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:44:41,449-Speed 5123.78 samples/sec Loss 6.5714 LearningRate 0.0871 Epoch: 1 Global Step: 22270 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:44:43,416-Speed 5207.68 samples/sec Loss 6.4875 LearningRate 0.0871 Epoch: 1 Global Step: 22280 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:44:45,391-Speed 5187.65 samples/sec Loss 6.5586 LearningRate 0.0871 Epoch: 1 Global Step: 22290 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:44:47,360-Speed 5203.51 samples/sec Loss 6.6397 LearningRate 0.0871 Epoch: 1 Global Step: 22300 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-11 00:44:49,323-Speed 5219.13 samples/sec Loss 6.5511 LearningRate 0.0871 Epoch: 1 Global Step: 22310 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:44:51,291-Speed 5203.95 samples/sec Loss 6.4731 LearningRate 0.0871 Epoch: 1 Global Step: 22320 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:44:53,258-Speed 5206.95 samples/sec Loss 6.5974 LearningRate 0.0871 Epoch: 1 Global Step: 22330 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:44:55,228-Speed 5199.99 samples/sec Loss 6.6084 LearningRate 0.0871 Epoch: 1 Global Step: 22340 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:44:57,193-Speed 5211.39 samples/sec Loss 6.5668 LearningRate 0.0871 Epoch: 1 Global Step: 22350 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:44:59,163-Speed 5201.20 samples/sec Loss 6.5260 LearningRate 0.0871 Epoch: 1 Global Step: 22360 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:45:01,139-Speed 5182.52 samples/sec Loss 6.5147 LearningRate 0.0870 Epoch: 1 Global Step: 22370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:45:03,108-Speed 5202.94 samples/sec Loss 6.5525 LearningRate 0.0870 Epoch: 1 Global Step: 22380 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:45:05,092-Speed 5164.49 samples/sec Loss 6.6137 LearningRate 0.0870 Epoch: 1 Global Step: 22390 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:45:07,066-Speed 5189.38 samples/sec Loss 6.5433 LearningRate 0.0870 Epoch: 1 Global Step: 22400 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:45:09,048-Speed 5168.13 samples/sec Loss 6.4711 LearningRate 0.0870 Epoch: 1 Global Step: 22410 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-11 00:45:11,009-Speed 5222.75 samples/sec Loss 6.5450 LearningRate 0.0870 Epoch: 1 Global Step: 22420 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:45:12,984-Speed 5186.36 samples/sec Loss 6.5566 LearningRate 0.0870 Epoch: 1 Global Step: 22430 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:45:14,954-Speed 5200.22 samples/sec Loss 6.5702 LearningRate 0.0870 Epoch: 1 Global Step: 22440 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:45:16,936-Speed 5169.20 samples/sec Loss 6.6060 LearningRate 0.0870 Epoch: 1 Global Step: 22450 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:45:18,892-Speed 5234.64 samples/sec Loss 6.6199 LearningRate 0.0870 Epoch: 1 Global Step: 22460 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:45:20,858-Speed 5210.86 samples/sec Loss 6.5679 LearningRate 0.0870 Epoch: 1 Global Step: 22470 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:45:22,825-Speed 5209.01 samples/sec Loss 6.5796 LearningRate 0.0870 Epoch: 1 Global Step: 22480 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:45:24,799-Speed 5187.74 samples/sec Loss 6.6172 LearningRate 0.0870 Epoch: 1 Global Step: 22490 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:45:26,789-Speed 5146.34 samples/sec Loss 6.5592 LearningRate 0.0870 Epoch: 1 Global Step: 22500 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:45:28,772-Speed 5167.52 samples/sec Loss 6.6388 LearningRate 0.0870 Epoch: 1 Global Step: 22510 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:45:30,757-Speed 5162.00 samples/sec Loss 6.5499 LearningRate 0.0870 Epoch: 1 Global Step: 22520 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:45:32,723-Speed 5208.51 samples/sec Loss 6.5950 LearningRate 0.0870 Epoch: 1 Global Step: 22530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:45:34,691-Speed 5206.37 samples/sec Loss 6.6344 LearningRate 0.0870 Epoch: 1 Global Step: 22540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:45:36,658-Speed 5207.07 samples/sec Loss 6.5540 LearningRate 0.0869 Epoch: 1 Global Step: 22550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:45:38,625-Speed 5205.99 samples/sec Loss 6.5300 LearningRate 0.0869 Epoch: 1 Global Step: 22560 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:45:40,618-Speed 5139.75 samples/sec Loss 6.5406 LearningRate 0.0869 Epoch: 1 Global Step: 22570 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:45:42,584-Speed 5211.13 samples/sec Loss 6.6584 LearningRate 0.0869 Epoch: 1 Global Step: 22580 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:45:44,551-Speed 5208.72 samples/sec Loss 6.4710 LearningRate 0.0869 Epoch: 1 Global Step: 22590 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:45:47,391-Speed 3606.08 samples/sec Loss 6.5325 LearningRate 0.0869 Epoch: 1 Global Step: 22600 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:45:49,369-Speed 5179.47 samples/sec Loss 6.6743 LearningRate 0.0869 Epoch: 1 Global Step: 22610 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:45:51,338-Speed 5201.19 samples/sec Loss 6.5251 LearningRate 0.0869 Epoch: 1 Global Step: 22620 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:45:53,320-Speed 5168.59 samples/sec Loss 6.6127 LearningRate 0.0869 Epoch: 1 Global Step: 22630 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:45:55,297-Speed 5180.07 samples/sec Loss 6.6209 LearningRate 0.0869 Epoch: 1 Global Step: 22640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:45:57,264-Speed 5209.72 samples/sec Loss 6.5241 LearningRate 0.0869 Epoch: 1 Global Step: 22650 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:45:59,233-Speed 5201.36 samples/sec Loss 6.6206 LearningRate 0.0869 Epoch: 1 Global Step: 22660 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-11 00:46:01,191-Speed 5232.06 samples/sec Loss 6.4896 LearningRate 0.0869 Epoch: 1 Global Step: 22670 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:46:03,156-Speed 5211.81 samples/sec Loss 6.5052 LearningRate 0.0869 Epoch: 1 Global Step: 22680 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:46:05,126-Speed 5199.73 samples/sec Loss 6.5301 LearningRate 0.0869 Epoch: 1 Global Step: 22690 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:46:07,094-Speed 5206.48 samples/sec Loss 6.5023 LearningRate 0.0869 Epoch: 1 Global Step: 22700 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:46:09,062-Speed 5203.61 samples/sec Loss 6.4629 LearningRate 0.0869 Epoch: 1 Global Step: 22710 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:46:11,033-Speed 5199.92 samples/sec Loss 6.5210 LearningRate 0.0869 Epoch: 1 Global Step: 22720 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:46:13,006-Speed 5192.61 samples/sec Loss 6.5105 LearningRate 0.0868 Epoch: 1 Global Step: 22730 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:46:14,981-Speed 5187.15 samples/sec Loss 6.4913 LearningRate 0.0868 Epoch: 1 Global Step: 22740 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:46:16,965-Speed 5161.72 samples/sec Loss 6.5720 LearningRate 0.0868 Epoch: 1 Global Step: 22750 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:46:18,943-Speed 5179.25 samples/sec Loss 6.5148 LearningRate 0.0868 Epoch: 1 Global Step: 22760 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:46:20,912-Speed 5203.73 samples/sec Loss 6.4627 LearningRate 0.0868 Epoch: 1 Global Step: 22770 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:46:22,887-Speed 5186.90 samples/sec Loss 6.5885 LearningRate 0.0868 Epoch: 1 Global Step: 22780 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:46:24,852-Speed 5210.97 samples/sec Loss 6.4598 LearningRate 0.0868 Epoch: 1 Global Step: 22790 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:46:26,825-Speed 5193.10 samples/sec Loss 6.5475 LearningRate 0.0868 Epoch: 1 Global Step: 22800 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:46:28,794-Speed 5201.71 samples/sec Loss 6.5523 LearningRate 0.0868 Epoch: 1 Global Step: 22810 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:46:30,770-Speed 5183.37 samples/sec Loss 6.6247 LearningRate 0.0868 Epoch: 1 Global Step: 22820 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:46:32,736-Speed 5209.34 samples/sec Loss 6.5277 LearningRate 0.0868 Epoch: 1 Global Step: 22830 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:46:34,705-Speed 5204.42 samples/sec Loss 6.6407 LearningRate 0.0868 Epoch: 1 Global Step: 22840 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:46:36,701-Speed 5131.14 samples/sec Loss 6.5454 LearningRate 0.0868 Epoch: 1 Global Step: 22850 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:46:38,676-Speed 5185.19 samples/sec Loss 6.5520 LearningRate 0.0868 Epoch: 1 Global Step: 22860 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:46:40,653-Speed 5184.30 samples/sec Loss 6.4237 LearningRate 0.0868 Epoch: 1 Global Step: 22870 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:46:42,628-Speed 5185.66 samples/sec Loss 6.5754 LearningRate 0.0868 Epoch: 1 Global Step: 22880 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:46:44,601-Speed 5192.07 samples/sec Loss 6.5887 LearningRate 0.0868 Epoch: 1 Global Step: 22890 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:46:46,559-Speed 5230.26 samples/sec Loss 6.4838 LearningRate 0.0868 Epoch: 1 Global Step: 22900 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:46:48,531-Speed 5194.31 samples/sec Loss 6.5926 LearningRate 0.0867 Epoch: 1 Global Step: 22910 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:46:50,537-Speed 5106.09 samples/sec Loss 6.6202 LearningRate 0.0867 Epoch: 1 Global Step: 22920 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:46:52,507-Speed 5201.78 samples/sec Loss 6.4892 LearningRate 0.0867 Epoch: 1 Global Step: 22930 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:46:54,480-Speed 5189.47 samples/sec Loss 6.5307 LearningRate 0.0867 Epoch: 1 Global Step: 22940 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:46:56,451-Speed 5197.31 samples/sec Loss 6.5347 LearningRate 0.0867 Epoch: 1 Global Step: 22950 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:46:58,442-Speed 5146.28 samples/sec Loss 6.6070 LearningRate 0.0867 Epoch: 1 Global Step: 22960 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:47:00,423-Speed 5170.24 samples/sec Loss 6.4771 LearningRate 0.0867 Epoch: 1 Global Step: 22970 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:47:02,392-Speed 5201.96 samples/sec Loss 6.4113 LearningRate 0.0867 Epoch: 1 Global Step: 22980 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:47:04,372-Speed 5173.95 samples/sec Loss 6.4632 LearningRate 0.0867 Epoch: 1 Global Step: 22990 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:47:06,340-Speed 5204.68 samples/sec Loss 6.5488 LearningRate 0.0867 Epoch: 1 Global Step: 23000 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:47:08,308-Speed 5204.34 samples/sec Loss 6.4977 LearningRate 0.0867 Epoch: 1 Global Step: 23010 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:47:10,279-Speed 5197.29 samples/sec Loss 6.6165 LearningRate 0.0867 Epoch: 1 Global Step: 23020 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:47:12,251-Speed 5195.59 samples/sec Loss 6.5652 LearningRate 0.0867 Epoch: 1 Global Step: 23030 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:47:14,231-Speed 5173.62 samples/sec Loss 6.4656 LearningRate 0.0867 Epoch: 1 Global Step: 23040 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:47:16,212-Speed 5171.67 samples/sec Loss 6.6578 LearningRate 0.0867 Epoch: 1 Global Step: 23050 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:47:18,181-Speed 5202.15 samples/sec Loss 6.5216 LearningRate 0.0867 Epoch: 1 Global Step: 23060 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:47:20,167-Speed 5159.75 samples/sec Loss 6.5050 LearningRate 0.0867 Epoch: 1 Global Step: 23070 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:47:22,139-Speed 5195.21 samples/sec Loss 6.5384 LearningRate 0.0867 Epoch: 1 Global Step: 23080 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:47:24,110-Speed 5197.18 samples/sec Loss 6.4713 LearningRate 0.0866 Epoch: 1 Global Step: 23090 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:47:26,085-Speed 5185.84 samples/sec Loss 6.5055 LearningRate 0.0866 Epoch: 1 Global Step: 23100 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:47:28,052-Speed 5207.25 samples/sec Loss 6.5085 LearningRate 0.0866 Epoch: 1 Global Step: 23110 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:47:30,021-Speed 5200.91 samples/sec Loss 6.5418 LearningRate 0.0866 Epoch: 1 Global Step: 23120 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:47:31,995-Speed 5191.19 samples/sec Loss 6.5123 LearningRate 0.0866 Epoch: 1 Global Step: 23130 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:47:33,970-Speed 5185.50 samples/sec Loss 6.4271 LearningRate 0.0866 Epoch: 1 Global Step: 23140 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:47:35,954-Speed 5162.74 samples/sec Loss 6.5378 LearningRate 0.0866 Epoch: 1 Global Step: 23150 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:47:37,943-Speed 5149.03 samples/sec Loss 6.5416 LearningRate 0.0866 Epoch: 1 Global Step: 23160 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:47:39,915-Speed 5195.17 samples/sec Loss 6.4897 LearningRate 0.0866 Epoch: 1 Global Step: 23170 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:47:41,887-Speed 5195.33 samples/sec Loss 6.4264 LearningRate 0.0866 Epoch: 1 Global Step: 23180 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:47:43,860-Speed 5192.76 samples/sec Loss 6.4406 LearningRate 0.0866 Epoch: 1 Global Step: 23190 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:47:45,830-Speed 5199.36 samples/sec Loss 6.5566 LearningRate 0.0866 Epoch: 1 Global Step: 23200 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:47:47,804-Speed 5189.66 samples/sec Loss 6.4492 LearningRate 0.0866 Epoch: 1 Global Step: 23210 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:47:49,776-Speed 5192.48 samples/sec Loss 6.5095 LearningRate 0.0866 Epoch: 1 Global Step: 23220 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:47:51,747-Speed 5200.02 samples/sec Loss 6.3803 LearningRate 0.0866 Epoch: 1 Global Step: 23230 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:47:53,753-Speed 5105.30 samples/sec Loss 6.5461 LearningRate 0.0866 Epoch: 1 Global Step: 23240 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:47:55,722-Speed 5200.93 samples/sec Loss 6.4556 LearningRate 0.0866 Epoch: 1 Global Step: 23250 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:47:57,692-Speed 5199.34 samples/sec Loss 6.4355 LearningRate 0.0865 Epoch: 1 Global Step: 23260 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:47:59,669-Speed 5180.91 samples/sec Loss 6.4573 LearningRate 0.0865 Epoch: 1 Global Step: 23270 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:48:01,666-Speed 5132.07 samples/sec Loss 6.5180 LearningRate 0.0865 Epoch: 1 Global Step: 23280 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:48:03,636-Speed 5198.30 samples/sec Loss 6.5025 LearningRate 0.0865 Epoch: 1 Global Step: 23290 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:48:05,604-Speed 5206.14 samples/sec Loss 6.4253 LearningRate 0.0865 Epoch: 1 Global Step: 23300 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:48:07,578-Speed 5189.96 samples/sec Loss 6.5130 LearningRate 0.0865 Epoch: 1 Global Step: 23310 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:48:09,546-Speed 5204.25 samples/sec Loss 6.5293 LearningRate 0.0865 Epoch: 1 Global Step: 23320 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:48:11,532-Speed 5157.11 samples/sec Loss 6.4131 LearningRate 0.0865 Epoch: 1 Global Step: 23330 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:48:13,503-Speed 5196.71 samples/sec Loss 6.5010 LearningRate 0.0865 Epoch: 1 Global Step: 23340 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:48:15,481-Speed 5178.33 samples/sec Loss 6.5170 LearningRate 0.0865 Epoch: 1 Global Step: 23350 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:48:17,461-Speed 5174.45 samples/sec Loss 6.4949 LearningRate 0.0865 Epoch: 1 Global Step: 23360 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:48:19,430-Speed 5201.09 samples/sec Loss 6.5335 LearningRate 0.0865 Epoch: 1 Global Step: 23370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:48:21,402-Speed 5194.96 samples/sec Loss 6.5636 LearningRate 0.0865 Epoch: 1 Global Step: 23380 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:48:23,376-Speed 5191.48 samples/sec Loss 6.5170 LearningRate 0.0865 Epoch: 1 Global Step: 23390 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:48:25,356-Speed 5173.29 samples/sec Loss 6.4757 LearningRate 0.0865 Epoch: 1 Global Step: 23400 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-11 00:48:27,319-Speed 5218.65 samples/sec Loss 6.5050 LearningRate 0.0865 Epoch: 1 Global Step: 23410 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:48:29,290-Speed 5196.14 samples/sec Loss 6.4524 LearningRate 0.0865 Epoch: 1 Global Step: 23420 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:48:31,260-Speed 5198.54 samples/sec Loss 6.4171 LearningRate 0.0865 Epoch: 1 Global Step: 23430 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:48:33,231-Speed 5196.43 samples/sec Loss 6.4155 LearningRate 0.0864 Epoch: 1 Global Step: 23440 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:48:35,205-Speed 5190.64 samples/sec Loss 6.4834 LearningRate 0.0864 Epoch: 1 Global Step: 23450 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:48:37,183-Speed 5179.20 samples/sec Loss 6.5470 LearningRate 0.0864 Epoch: 1 Global Step: 23460 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:48:39,173-Speed 5145.33 samples/sec Loss 6.4032 LearningRate 0.0864 Epoch: 1 Global Step: 23470 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:48:41,153-Speed 5174.88 samples/sec Loss 6.4064 LearningRate 0.0864 Epoch: 1 Global Step: 23480 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:48:43,138-Speed 5158.71 samples/sec Loss 6.5390 LearningRate 0.0864 Epoch: 1 Global Step: 23490 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:48:45,119-Speed 5172.16 samples/sec Loss 6.4927 LearningRate 0.0864 Epoch: 1 Global Step: 23500 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:48:47,076-Speed 5235.21 samples/sec Loss 6.5964 LearningRate 0.0864 Epoch: 1 Global Step: 23510 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:48:49,046-Speed 5198.70 samples/sec Loss 6.4181 LearningRate 0.0864 Epoch: 1 Global Step: 23520 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:48:51,021-Speed 5187.89 samples/sec Loss 6.5224 LearningRate 0.0864 Epoch: 1 Global Step: 23530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:48:53,006-Speed 5159.32 samples/sec Loss 6.4158 LearningRate 0.0864 Epoch: 1 Global Step: 23540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:48:54,977-Speed 5197.71 samples/sec Loss 6.4540 LearningRate 0.0864 Epoch: 1 Global Step: 23550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:48:56,946-Speed 5202.06 samples/sec Loss 6.4397 LearningRate 0.0864 Epoch: 1 Global Step: 23560 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:48:58,926-Speed 5174.05 samples/sec Loss 6.5143 LearningRate 0.0864 Epoch: 1 Global Step: 23570 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:49:00,906-Speed 5172.22 samples/sec Loss 6.5418 LearningRate 0.0864 Epoch: 1 Global Step: 23580 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:49:02,886-Speed 5173.92 samples/sec Loss 6.3830 LearningRate 0.0864 Epoch: 1 Global Step: 23590 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:49:04,883-Speed 5128.82 samples/sec Loss 6.4684 LearningRate 0.0864 Epoch: 1 Global Step: 23600 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:49:06,862-Speed 5176.05 samples/sec Loss 6.5636 LearningRate 0.0864 Epoch: 1 Global Step: 23610 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:49:08,836-Speed 5191.42 samples/sec Loss 6.3576 LearningRate 0.0863 Epoch: 1 Global Step: 23620 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:49:10,808-Speed 5192.21 samples/sec Loss 6.4697 LearningRate 0.0863 Epoch: 1 Global Step: 23630 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:49:12,782-Speed 5190.41 samples/sec Loss 6.3367 LearningRate 0.0863 Epoch: 1 Global Step: 23640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:49:14,770-Speed 5152.68 samples/sec Loss 6.4126 LearningRate 0.0863 Epoch: 1 Global Step: 23650 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:49:16,749-Speed 5174.42 samples/sec Loss 6.3996 LearningRate 0.0863 Epoch: 1 Global Step: 23660 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:49:18,719-Speed 5201.70 samples/sec Loss 6.3616 LearningRate 0.0863 Epoch: 1 Global Step: 23670 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:49:20,691-Speed 5194.16 samples/sec Loss 6.3403 LearningRate 0.0863 Epoch: 1 Global Step: 23680 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:49:22,671-Speed 5174.22 samples/sec Loss 6.3823 LearningRate 0.0863 Epoch: 1 Global Step: 23690 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:49:24,655-Speed 5161.50 samples/sec Loss 6.3963 LearningRate 0.0863 Epoch: 1 Global Step: 23700 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:49:26,642-Speed 5157.19 samples/sec Loss 6.5074 LearningRate 0.0863 Epoch: 1 Global Step: 23710 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-11 00:49:28,606-Speed 5214.99 samples/sec Loss 6.3941 LearningRate 0.0863 Epoch: 1 Global Step: 23720 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:49:30,578-Speed 5192.99 samples/sec Loss 6.5200 LearningRate 0.0863 Epoch: 1 Global Step: 23730 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:49:32,553-Speed 5187.31 samples/sec Loss 6.4222 LearningRate 0.0863 Epoch: 1 Global Step: 23740 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:49:34,530-Speed 5181.69 samples/sec Loss 6.3565 LearningRate 0.0863 Epoch: 1 Global Step: 23750 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:49:36,500-Speed 5199.36 samples/sec Loss 6.4941 LearningRate 0.0863 Epoch: 1 Global Step: 23760 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:49:38,476-Speed 5183.57 samples/sec Loss 6.4586 LearningRate 0.0863 Epoch: 1 Global Step: 23770 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:49:40,449-Speed 5191.43 samples/sec Loss 6.4136 LearningRate 0.0863 Epoch: 1 Global Step: 23780 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:49:42,421-Speed 5195.16 samples/sec Loss 6.3864 LearningRate 0.0863 Epoch: 1 Global Step: 23790 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:49:44,426-Speed 5110.50 samples/sec Loss 6.4890 LearningRate 0.0862 Epoch: 1 Global Step: 23800 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:49:46,406-Speed 5172.51 samples/sec Loss 6.3491 LearningRate 0.0862 Epoch: 1 Global Step: 23810 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:49:48,381-Speed 5186.34 samples/sec Loss 6.4403 LearningRate 0.0862 Epoch: 1 Global Step: 23820 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-11 00:49:50,355-Speed 5190.18 samples/sec Loss 6.5235 LearningRate 0.0862 Epoch: 1 Global Step: 23830 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-11 00:49:52,323-Speed 5205.15 samples/sec Loss 6.4406 LearningRate 0.0862 Epoch: 1 Global Step: 23840 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-11 00:49:54,284-Speed 5223.40 samples/sec Loss 6.4724 LearningRate 0.0862 Epoch: 1 Global Step: 23850 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:49:56,252-Speed 5203.97 samples/sec Loss 6.4236 LearningRate 0.0862 Epoch: 1 Global Step: 23860 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:49:58,226-Speed 5189.14 samples/sec Loss 6.3958 LearningRate 0.0862 Epoch: 1 Global Step: 23870 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:50:00,197-Speed 5196.36 samples/sec Loss 6.4565 LearningRate 0.0862 Epoch: 1 Global Step: 23880 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:50:02,167-Speed 5201.11 samples/sec Loss 6.4837 LearningRate 0.0862 Epoch: 1 Global Step: 23890 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:50:08,346-Speed 1657.48 samples/sec Loss 6.4266 LearningRate 0.0862 Epoch: 1 Global Step: 23900 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:50:11,117-Speed 3696.01 samples/sec Loss 6.3045 LearningRate 0.0862 Epoch: 1 Global Step: 23910 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:50:13,091-Speed 5189.96 samples/sec Loss 6.4474 LearningRate 0.0862 Epoch: 1 Global Step: 23920 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:50:15,061-Speed 5199.95 samples/sec Loss 6.4593 LearningRate 0.0862 Epoch: 1 Global Step: 23930 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:50:17,025-Speed 5214.30 samples/sec Loss 6.4694 LearningRate 0.0862 Epoch: 1 Global Step: 23940 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:50:18,993-Speed 5206.12 samples/sec Loss 6.2868 LearningRate 0.0862 Epoch: 1 Global Step: 23950 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:50:20,968-Speed 5186.02 samples/sec Loss 6.4019 LearningRate 0.0862 Epoch: 1 Global Step: 23960 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:50:22,939-Speed 5197.27 samples/sec Loss 6.5876 LearningRate 0.0862 Epoch: 1 Global Step: 23970 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:50:24,932-Speed 5139.36 samples/sec Loss 6.3395 LearningRate 0.0861 Epoch: 1 Global Step: 23980 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:50:26,910-Speed 5179.19 samples/sec Loss 6.4100 LearningRate 0.0861 Epoch: 1 Global Step: 23990 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:50:28,900-Speed 5148.36 samples/sec Loss 6.3991 LearningRate 0.0861 Epoch: 1 Global Step: 24000 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:50:55,522-[lfw][24000]XNorm: 23.372204 Training: 2022-04-11 00:50:55,523-[lfw][24000]Accuracy-Flip: 0.99600+-0.00271 Training: 2022-04-11 00:50:55,523-[lfw][24000]Accuracy-Highest: 0.99683 Training: 2022-04-11 00:51:26,293-[cfp_fp][24000]XNorm: 21.192709 Training: 2022-04-11 00:51:26,294-[cfp_fp][24000]Accuracy-Flip: 0.96814+-0.00620 Training: 2022-04-11 00:51:26,294-[cfp_fp][24000]Accuracy-Highest: 0.96957 Training: 2022-04-11 00:51:52,813-[agedb_30][24000]XNorm: 23.497013 Training: 2022-04-11 00:51:52,813-[agedb_30][24000]Accuracy-Flip: 0.96733+-0.00704 Training: 2022-04-11 00:51:52,814-[agedb_30][24000]Accuracy-Highest: 0.96733 Training: 2022-04-11 00:51:54,784-Speed 119.23 samples/sec Loss 6.4033 LearningRate 0.0861 Epoch: 1 Global Step: 24010 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:51:56,748-Speed 5213.78 samples/sec Loss 6.3663 LearningRate 0.0861 Epoch: 1 Global Step: 24020 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:51:58,717-Speed 5201.60 samples/sec Loss 6.4307 LearningRate 0.0861 Epoch: 1 Global Step: 24030 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:52:00,692-Speed 5188.26 samples/sec Loss 6.4429 LearningRate 0.0861 Epoch: 1 Global Step: 24040 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:52:02,657-Speed 5212.08 samples/sec Loss 6.4243 LearningRate 0.0861 Epoch: 1 Global Step: 24050 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:52:04,622-Speed 5214.82 samples/sec Loss 6.3419 LearningRate 0.0861 Epoch: 1 Global Step: 24060 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:52:06,593-Speed 5195.37 samples/sec Loss 6.3467 LearningRate 0.0861 Epoch: 1 Global Step: 24070 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:52:08,569-Speed 5183.11 samples/sec Loss 6.4248 LearningRate 0.0861 Epoch: 1 Global Step: 24080 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:52:10,537-Speed 5206.67 samples/sec Loss 6.4643 LearningRate 0.0861 Epoch: 1 Global Step: 24090 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:52:12,504-Speed 5205.83 samples/sec Loss 6.3908 LearningRate 0.0861 Epoch: 1 Global Step: 24100 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:52:14,482-Speed 5179.81 samples/sec Loss 6.3914 LearningRate 0.0861 Epoch: 1 Global Step: 24110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:52:16,452-Speed 5198.50 samples/sec Loss 6.3600 LearningRate 0.0861 Epoch: 1 Global Step: 24120 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:52:18,423-Speed 5197.91 samples/sec Loss 6.4507 LearningRate 0.0861 Epoch: 1 Global Step: 24130 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:52:20,395-Speed 5195.91 samples/sec Loss 6.4525 LearningRate 0.0861 Epoch: 1 Global Step: 24140 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:52:22,385-Speed 5146.77 samples/sec Loss 6.4594 LearningRate 0.0861 Epoch: 1 Global Step: 24150 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:52:24,356-Speed 5198.25 samples/sec Loss 6.1916 LearningRate 0.0860 Epoch: 1 Global Step: 24160 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:52:26,337-Speed 5171.36 samples/sec Loss 6.4772 LearningRate 0.0860 Epoch: 1 Global Step: 24170 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:52:28,322-Speed 5158.29 samples/sec Loss 6.3257 LearningRate 0.0860 Epoch: 1 Global Step: 24180 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:52:30,302-Speed 5172.94 samples/sec Loss 6.3768 LearningRate 0.0860 Epoch: 1 Global Step: 24190 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:52:32,275-Speed 5193.05 samples/sec Loss 6.3704 LearningRate 0.0860 Epoch: 1 Global Step: 24200 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:52:34,266-Speed 5142.91 samples/sec Loss 6.3863 LearningRate 0.0860 Epoch: 1 Global Step: 24210 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:52:36,256-Speed 5149.78 samples/sec Loss 6.3405 LearningRate 0.0860 Epoch: 1 Global Step: 24220 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:52:38,235-Speed 5176.49 samples/sec Loss 6.2606 LearningRate 0.0860 Epoch: 1 Global Step: 24230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:52:40,234-Speed 5124.83 samples/sec Loss 6.3220 LearningRate 0.0860 Epoch: 1 Global Step: 24240 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:52:42,215-Speed 5170.97 samples/sec Loss 6.3181 LearningRate 0.0860 Epoch: 1 Global Step: 24250 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:52:44,188-Speed 5190.44 samples/sec Loss 6.4657 LearningRate 0.0860 Epoch: 1 Global Step: 24260 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:52:46,152-Speed 5216.37 samples/sec Loss 6.3605 LearningRate 0.0860 Epoch: 1 Global Step: 24270 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:52:48,134-Speed 5169.14 samples/sec Loss 6.3028 LearningRate 0.0860 Epoch: 1 Global Step: 24280 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:52:50,100-Speed 5208.11 samples/sec Loss 6.3177 LearningRate 0.0860 Epoch: 1 Global Step: 24290 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:52:52,071-Speed 5199.02 samples/sec Loss 6.4232 LearningRate 0.0860 Epoch: 1 Global Step: 24300 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:52:54,046-Speed 5186.19 samples/sec Loss 6.3366 LearningRate 0.0860 Epoch: 1 Global Step: 24310 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:52:56,027-Speed 5170.84 samples/sec Loss 6.4194 LearningRate 0.0860 Epoch: 1 Global Step: 24320 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:52:58,010-Speed 5165.10 samples/sec Loss 6.2735 LearningRate 0.0860 Epoch: 1 Global Step: 24330 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:52:59,989-Speed 5177.10 samples/sec Loss 6.3507 LearningRate 0.0859 Epoch: 1 Global Step: 24340 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:53:01,958-Speed 5201.87 samples/sec Loss 6.4767 LearningRate 0.0859 Epoch: 1 Global Step: 24350 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:53:03,927-Speed 5202.14 samples/sec Loss 6.4340 LearningRate 0.0859 Epoch: 1 Global Step: 24360 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:53:05,923-Speed 5131.49 samples/sec Loss 6.3017 LearningRate 0.0859 Epoch: 1 Global Step: 24370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:53:07,894-Speed 5197.54 samples/sec Loss 6.4741 LearningRate 0.0859 Epoch: 1 Global Step: 24380 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:53:09,870-Speed 5183.74 samples/sec Loss 6.4036 LearningRate 0.0859 Epoch: 1 Global Step: 24390 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:53:11,853-Speed 5165.15 samples/sec Loss 6.2999 LearningRate 0.0859 Epoch: 1 Global Step: 24400 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:53:13,824-Speed 5196.44 samples/sec Loss 6.4403 LearningRate 0.0859 Epoch: 1 Global Step: 24410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:53:15,793-Speed 5202.33 samples/sec Loss 6.3679 LearningRate 0.0859 Epoch: 1 Global Step: 24420 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:53:17,762-Speed 5202.52 samples/sec Loss 6.3389 LearningRate 0.0859 Epoch: 1 Global Step: 24430 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:53:19,737-Speed 5186.20 samples/sec Loss 6.2878 LearningRate 0.0859 Epoch: 1 Global Step: 24440 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:53:21,729-Speed 5144.17 samples/sec Loss 6.4003 LearningRate 0.0859 Epoch: 1 Global Step: 24450 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:53:23,701-Speed 5194.10 samples/sec Loss 6.3678 LearningRate 0.0859 Epoch: 1 Global Step: 24460 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:53:25,672-Speed 5198.33 samples/sec Loss 6.2622 LearningRate 0.0859 Epoch: 1 Global Step: 24470 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:53:27,640-Speed 5203.38 samples/sec Loss 6.5171 LearningRate 0.0859 Epoch: 1 Global Step: 24480 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:53:29,608-Speed 5205.37 samples/sec Loss 6.2953 LearningRate 0.0859 Epoch: 1 Global Step: 24490 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:53:31,576-Speed 5205.75 samples/sec Loss 6.3496 LearningRate 0.0859 Epoch: 1 Global Step: 24500 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:53:33,569-Speed 5139.25 samples/sec Loss 6.2914 LearningRate 0.0859 Epoch: 1 Global Step: 24510 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:53:35,584-Speed 5082.50 samples/sec Loss 6.4257 LearningRate 0.0858 Epoch: 1 Global Step: 24520 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:53:37,551-Speed 5208.42 samples/sec Loss 6.3787 LearningRate 0.0858 Epoch: 1 Global Step: 24530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:53:39,529-Speed 5179.47 samples/sec Loss 6.3668 LearningRate 0.0858 Epoch: 1 Global Step: 24540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:53:41,501-Speed 5192.98 samples/sec Loss 6.4226 LearningRate 0.0858 Epoch: 1 Global Step: 24550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:53:43,469-Speed 5206.32 samples/sec Loss 6.5219 LearningRate 0.0858 Epoch: 1 Global Step: 24560 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:53:45,435-Speed 5209.83 samples/sec Loss 6.4104 LearningRate 0.0858 Epoch: 1 Global Step: 24570 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:53:47,414-Speed 5176.47 samples/sec Loss 6.3118 LearningRate 0.0858 Epoch: 1 Global Step: 24580 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:53:49,390-Speed 5183.02 samples/sec Loss 6.3639 LearningRate 0.0858 Epoch: 1 Global Step: 24590 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:53:51,362-Speed 5196.17 samples/sec Loss 6.3552 LearningRate 0.0858 Epoch: 1 Global Step: 24600 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:53:53,338-Speed 5181.66 samples/sec Loss 6.3167 LearningRate 0.0858 Epoch: 1 Global Step: 24610 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:53:55,309-Speed 5199.17 samples/sec Loss 6.4093 LearningRate 0.0858 Epoch: 1 Global Step: 24620 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:53:57,277-Speed 5204.44 samples/sec Loss 6.2791 LearningRate 0.0858 Epoch: 1 Global Step: 24630 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:53:59,246-Speed 5201.07 samples/sec Loss 6.3329 LearningRate 0.0858 Epoch: 1 Global Step: 24640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:54:01,227-Speed 5170.49 samples/sec Loss 6.3264 LearningRate 0.0858 Epoch: 1 Global Step: 24650 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:54:03,207-Speed 5172.76 samples/sec Loss 6.4099 LearningRate 0.0858 Epoch: 1 Global Step: 24660 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:54:05,186-Speed 5177.45 samples/sec Loss 6.3080 LearningRate 0.0858 Epoch: 1 Global Step: 24670 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:54:07,157-Speed 5198.02 samples/sec Loss 6.3413 LearningRate 0.0858 Epoch: 1 Global Step: 24680 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:54:09,137-Speed 5172.07 samples/sec Loss 6.4207 LearningRate 0.0858 Epoch: 1 Global Step: 24690 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:54:11,109-Speed 5194.55 samples/sec Loss 6.3599 LearningRate 0.0857 Epoch: 1 Global Step: 24700 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:54:13,094-Speed 5161.22 samples/sec Loss 6.3285 LearningRate 0.0857 Epoch: 1 Global Step: 24710 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:54:15,062-Speed 5205.37 samples/sec Loss 6.2768 LearningRate 0.0857 Epoch: 1 Global Step: 24720 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:54:17,032-Speed 5198.64 samples/sec Loss 6.3434 LearningRate 0.0857 Epoch: 1 Global Step: 24730 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-11 00:54:18,999-Speed 5207.30 samples/sec Loss 6.3667 LearningRate 0.0857 Epoch: 1 Global Step: 24740 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:54:20,985-Speed 5159.03 samples/sec Loss 6.4757 LearningRate 0.0857 Epoch: 1 Global Step: 24750 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:54:22,977-Speed 5141.69 samples/sec Loss 6.3534 LearningRate 0.0857 Epoch: 1 Global Step: 24760 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:54:24,963-Speed 5158.85 samples/sec Loss 6.3307 LearningRate 0.0857 Epoch: 1 Global Step: 24770 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:54:26,943-Speed 5172.36 samples/sec Loss 6.2867 LearningRate 0.0857 Epoch: 1 Global Step: 24780 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:54:28,924-Speed 5170.89 samples/sec Loss 6.2830 LearningRate 0.0857 Epoch: 1 Global Step: 24790 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:54:30,896-Speed 5194.51 samples/sec Loss 6.3659 LearningRate 0.0857 Epoch: 1 Global Step: 24800 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:54:32,867-Speed 5196.52 samples/sec Loss 6.3116 LearningRate 0.0857 Epoch: 1 Global Step: 24810 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:54:34,823-Speed 5238.70 samples/sec Loss 6.3315 LearningRate 0.0857 Epoch: 1 Global Step: 24820 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:54:36,800-Speed 5180.74 samples/sec Loss 6.3732 LearningRate 0.0857 Epoch: 1 Global Step: 24830 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:54:38,781-Speed 5169.60 samples/sec Loss 6.3276 LearningRate 0.0857 Epoch: 1 Global Step: 24840 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:54:40,752-Speed 5198.10 samples/sec Loss 6.3299 LearningRate 0.0857 Epoch: 1 Global Step: 24850 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:54:42,730-Speed 5177.58 samples/sec Loss 6.2584 LearningRate 0.0857 Epoch: 1 Global Step: 24860 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:54:44,705-Speed 5186.70 samples/sec Loss 6.3079 LearningRate 0.0857 Epoch: 1 Global Step: 24870 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:54:46,691-Speed 5159.77 samples/sec Loss 6.3218 LearningRate 0.0856 Epoch: 1 Global Step: 24880 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:54:48,659-Speed 5203.68 samples/sec Loss 6.3315 LearningRate 0.0856 Epoch: 1 Global Step: 24890 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:54:50,629-Speed 5199.84 samples/sec Loss 6.2767 LearningRate 0.0856 Epoch: 1 Global Step: 24900 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:54:52,600-Speed 5198.80 samples/sec Loss 6.4558 LearningRate 0.0856 Epoch: 1 Global Step: 24910 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:54:54,578-Speed 5177.20 samples/sec Loss 6.3200 LearningRate 0.0856 Epoch: 1 Global Step: 24920 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:54:56,565-Speed 5155.58 samples/sec Loss 6.2634 LearningRate 0.0856 Epoch: 1 Global Step: 24930 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:54:58,546-Speed 5169.37 samples/sec Loss 6.3568 LearningRate 0.0856 Epoch: 1 Global Step: 24940 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:55:00,516-Speed 5198.77 samples/sec Loss 6.2524 LearningRate 0.0856 Epoch: 1 Global Step: 24950 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:55:02,488-Speed 5196.18 samples/sec Loss 6.3719 LearningRate 0.0856 Epoch: 1 Global Step: 24960 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:55:04,481-Speed 5139.83 samples/sec Loss 6.3975 LearningRate 0.0856 Epoch: 1 Global Step: 24970 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:55:06,452-Speed 5196.47 samples/sec Loss 6.3925 LearningRate 0.0856 Epoch: 1 Global Step: 24980 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:55:08,428-Speed 5184.44 samples/sec Loss 6.3869 LearningRate 0.0856 Epoch: 1 Global Step: 24990 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:55:10,397-Speed 5202.24 samples/sec Loss 6.3370 LearningRate 0.0856 Epoch: 1 Global Step: 25000 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:55:12,369-Speed 5194.74 samples/sec Loss 6.3146 LearningRate 0.0856 Epoch: 1 Global Step: 25010 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:55:14,339-Speed 5200.94 samples/sec Loss 6.3356 LearningRate 0.0856 Epoch: 1 Global Step: 25020 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:55:16,330-Speed 5144.60 samples/sec Loss 6.1881 LearningRate 0.0856 Epoch: 1 Global Step: 25030 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:55:18,301-Speed 5196.11 samples/sec Loss 6.3098 LearningRate 0.0856 Epoch: 1 Global Step: 25040 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:55:20,272-Speed 5197.29 samples/sec Loss 6.2976 LearningRate 0.0856 Epoch: 1 Global Step: 25050 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:55:22,249-Speed 5180.80 samples/sec Loss 6.3139 LearningRate 0.0855 Epoch: 1 Global Step: 25060 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:55:24,228-Speed 5176.79 samples/sec Loss 6.2547 LearningRate 0.0855 Epoch: 1 Global Step: 25070 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:55:26,207-Speed 5174.86 samples/sec Loss 6.2976 LearningRate 0.0855 Epoch: 1 Global Step: 25080 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:55:28,199-Speed 5142.60 samples/sec Loss 6.1751 LearningRate 0.0855 Epoch: 1 Global Step: 25090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:55:30,164-Speed 5213.62 samples/sec Loss 6.2968 LearningRate 0.0855 Epoch: 1 Global Step: 25100 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:55:32,135-Speed 5198.69 samples/sec Loss 6.2801 LearningRate 0.0855 Epoch: 1 Global Step: 25110 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:55:34,115-Speed 5173.04 samples/sec Loss 6.2379 LearningRate 0.0855 Epoch: 1 Global Step: 25120 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:55:36,091-Speed 5182.89 samples/sec Loss 6.2276 LearningRate 0.0855 Epoch: 1 Global Step: 25130 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:55:38,072-Speed 5169.92 samples/sec Loss 6.2155 LearningRate 0.0855 Epoch: 1 Global Step: 25140 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:55:40,045-Speed 5193.60 samples/sec Loss 6.2118 LearningRate 0.0855 Epoch: 1 Global Step: 25150 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:55:42,028-Speed 5165.81 samples/sec Loss 6.3233 LearningRate 0.0855 Epoch: 1 Global Step: 25160 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:55:44,015-Speed 5155.80 samples/sec Loss 6.3354 LearningRate 0.0855 Epoch: 1 Global Step: 25170 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:55:45,991-Speed 5182.12 samples/sec Loss 6.2254 LearningRate 0.0855 Epoch: 1 Global Step: 25180 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:55:47,977-Speed 5157.48 samples/sec Loss 6.2023 LearningRate 0.0855 Epoch: 1 Global Step: 25190 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:55:49,971-Speed 5138.37 samples/sec Loss 6.1800 LearningRate 0.0855 Epoch: 1 Global Step: 25200 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:55:51,938-Speed 5208.32 samples/sec Loss 6.2279 LearningRate 0.0855 Epoch: 1 Global Step: 25210 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:55:53,913-Speed 5186.29 samples/sec Loss 6.2561 LearningRate 0.0855 Epoch: 1 Global Step: 25220 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:55:55,881-Speed 5203.60 samples/sec Loss 6.3407 LearningRate 0.0855 Epoch: 1 Global Step: 25230 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:55:57,854-Speed 5192.48 samples/sec Loss 6.2394 LearningRate 0.0854 Epoch: 1 Global Step: 25240 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:55:59,822-Speed 5204.10 samples/sec Loss 6.2364 LearningRate 0.0854 Epoch: 1 Global Step: 25250 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:56:01,791-Speed 5202.98 samples/sec Loss 6.2816 LearningRate 0.0854 Epoch: 1 Global Step: 25260 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:56:03,761-Speed 5199.44 samples/sec Loss 6.2801 LearningRate 0.0854 Epoch: 1 Global Step: 25270 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:56:05,737-Speed 5184.39 samples/sec Loss 6.1879 LearningRate 0.0854 Epoch: 1 Global Step: 25280 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:56:07,718-Speed 5170.41 samples/sec Loss 6.2808 LearningRate 0.0854 Epoch: 1 Global Step: 25290 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:56:09,691-Speed 5191.18 samples/sec Loss 6.3782 LearningRate 0.0854 Epoch: 1 Global Step: 25300 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:56:11,666-Speed 5188.59 samples/sec Loss 6.1238 LearningRate 0.0854 Epoch: 1 Global Step: 25310 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:56:13,651-Speed 5160.97 samples/sec Loss 6.3930 LearningRate 0.0854 Epoch: 1 Global Step: 25320 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:56:15,632-Speed 5169.13 samples/sec Loss 6.3215 LearningRate 0.0854 Epoch: 1 Global Step: 25330 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:56:17,601-Speed 5202.73 samples/sec Loss 6.2312 LearningRate 0.0854 Epoch: 1 Global Step: 25340 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:56:19,571-Speed 5200.98 samples/sec Loss 6.4134 LearningRate 0.0854 Epoch: 1 Global Step: 25350 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 00:56:21,547-Speed 5182.41 samples/sec Loss 6.3241 LearningRate 0.0854 Epoch: 1 Global Step: 25360 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:56:23,518-Speed 5197.13 samples/sec Loss 6.2715 LearningRate 0.0854 Epoch: 1 Global Step: 25370 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:56:25,491-Speed 5191.52 samples/sec Loss 6.2655 LearningRate 0.0854 Epoch: 1 Global Step: 25380 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:56:27,461-Speed 5199.84 samples/sec Loss 6.1966 LearningRate 0.0854 Epoch: 1 Global Step: 25390 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:56:29,444-Speed 5164.56 samples/sec Loss 6.2743 LearningRate 0.0854 Epoch: 1 Global Step: 25400 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:56:31,419-Speed 5188.61 samples/sec Loss 6.2190 LearningRate 0.0854 Epoch: 1 Global Step: 25410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:56:33,391-Speed 5192.92 samples/sec Loss 6.3305 LearningRate 0.0854 Epoch: 1 Global Step: 25420 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:56:35,363-Speed 5196.93 samples/sec Loss 6.3621 LearningRate 0.0853 Epoch: 1 Global Step: 25430 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:56:37,340-Speed 5179.52 samples/sec Loss 6.2798 LearningRate 0.0853 Epoch: 1 Global Step: 25440 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:56:39,316-Speed 5185.54 samples/sec Loss 6.3418 LearningRate 0.0853 Epoch: 1 Global Step: 25450 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:56:41,294-Speed 5176.57 samples/sec Loss 6.2860 LearningRate 0.0853 Epoch: 1 Global Step: 25460 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:56:43,265-Speed 5197.37 samples/sec Loss 6.3183 LearningRate 0.0853 Epoch: 1 Global Step: 25470 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-11 00:56:45,260-Speed 5135.99 samples/sec Loss 6.3504 LearningRate 0.0853 Epoch: 1 Global Step: 25480 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 00:56:47,240-Speed 5173.13 samples/sec Loss 6.3496 LearningRate 0.0853 Epoch: 1 Global Step: 25490 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 00:56:49,214-Speed 5188.85 samples/sec Loss 6.2286 LearningRate 0.0853 Epoch: 1 Global Step: 25500 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 00:56:51,184-Speed 5200.88 samples/sec Loss 6.2327 LearningRate 0.0853 Epoch: 1 Global Step: 25510 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 00:56:53,153-Speed 5201.00 samples/sec Loss 6.1540 LearningRate 0.0853 Epoch: 1 Global Step: 25520 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 00:56:55,124-Speed 5198.55 samples/sec Loss 6.1844 LearningRate 0.0853 Epoch: 1 Global Step: 25530 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 00:56:57,105-Speed 5170.19 samples/sec Loss 6.3636 LearningRate 0.0853 Epoch: 1 Global Step: 25540 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 00:56:59,078-Speed 5191.90 samples/sec Loss 6.2832 LearningRate 0.0853 Epoch: 1 Global Step: 25550 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 00:57:01,065-Speed 5154.16 samples/sec Loss 6.3006 LearningRate 0.0853 Epoch: 1 Global Step: 25560 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 00:57:03,045-Speed 5172.75 samples/sec Loss 6.2271 LearningRate 0.0853 Epoch: 1 Global Step: 25570 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 00:57:05,035-Speed 5151.15 samples/sec Loss 6.2083 LearningRate 0.0853 Epoch: 1 Global Step: 25580 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 00:57:07,005-Speed 5197.94 samples/sec Loss 6.2147 LearningRate 0.0853 Epoch: 1 Global Step: 25590 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 00:57:08,976-Speed 5196.89 samples/sec Loss 6.3157 LearningRate 0.0853 Epoch: 1 Global Step: 25600 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 00:57:10,949-Speed 5192.68 samples/sec Loss 6.1919 LearningRate 0.0852 Epoch: 1 Global Step: 25610 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 00:57:12,918-Speed 5202.65 samples/sec Loss 6.2617 LearningRate 0.0852 Epoch: 1 Global Step: 25620 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 00:57:14,888-Speed 5198.58 samples/sec Loss 6.2910 LearningRate 0.0852 Epoch: 1 Global Step: 25630 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 00:57:16,863-Speed 5187.48 samples/sec Loss 6.3629 LearningRate 0.0852 Epoch: 1 Global Step: 25640 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 00:57:18,844-Speed 5170.73 samples/sec Loss 6.2414 LearningRate 0.0852 Epoch: 1 Global Step: 25650 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 00:57:20,815-Speed 5195.23 samples/sec Loss 6.2388 LearningRate 0.0852 Epoch: 1 Global Step: 25660 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 00:57:22,799-Speed 5165.14 samples/sec Loss 6.2315 LearningRate 0.0852 Epoch: 1 Global Step: 25670 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 00:57:24,792-Speed 5138.77 samples/sec Loss 6.2652 LearningRate 0.0852 Epoch: 1 Global Step: 25680 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 00:57:26,775-Speed 5167.07 samples/sec Loss 6.2195 LearningRate 0.0852 Epoch: 1 Global Step: 25690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 00:57:28,761-Speed 5155.73 samples/sec Loss 6.3445 LearningRate 0.0852 Epoch: 1 Global Step: 25700 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 00:57:30,731-Speed 5200.76 samples/sec Loss 6.1652 LearningRate 0.0852 Epoch: 1 Global Step: 25710 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 00:57:32,702-Speed 5196.13 samples/sec Loss 6.2918 LearningRate 0.0852 Epoch: 1 Global Step: 25720 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 00:57:34,670-Speed 5205.04 samples/sec Loss 6.3367 LearningRate 0.0852 Epoch: 1 Global Step: 25730 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 00:57:36,649-Speed 5177.39 samples/sec Loss 6.2974 LearningRate 0.0852 Epoch: 1 Global Step: 25740 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 00:57:38,617-Speed 5204.54 samples/sec Loss 6.1898 LearningRate 0.0852 Epoch: 1 Global Step: 25750 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 00:57:40,609-Speed 5140.62 samples/sec Loss 6.2551 LearningRate 0.0852 Epoch: 1 Global Step: 25760 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 00:57:42,573-Speed 5215.63 samples/sec Loss 6.1821 LearningRate 0.0852 Epoch: 1 Global Step: 25770 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 00:57:44,544-Speed 5197.72 samples/sec Loss 6.2565 LearningRate 0.0852 Epoch: 1 Global Step: 25780 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 00:57:46,521-Speed 5183.15 samples/sec Loss 6.2167 LearningRate 0.0851 Epoch: 1 Global Step: 25790 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 00:57:48,489-Speed 5203.58 samples/sec Loss 6.3106 LearningRate 0.0851 Epoch: 1 Global Step: 25800 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 00:57:50,466-Speed 5182.16 samples/sec Loss 6.2819 LearningRate 0.0851 Epoch: 1 Global Step: 25810 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 00:57:52,439-Speed 5192.08 samples/sec Loss 6.1826 LearningRate 0.0851 Epoch: 1 Global Step: 25820 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 00:57:54,408-Speed 5203.87 samples/sec Loss 6.1921 LearningRate 0.0851 Epoch: 1 Global Step: 25830 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 00:57:56,375-Speed 5205.89 samples/sec Loss 6.2607 LearningRate 0.0851 Epoch: 1 Global Step: 25840 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 00:57:58,345-Speed 5198.96 samples/sec Loss 6.2225 LearningRate 0.0851 Epoch: 1 Global Step: 25850 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 00:58:00,322-Speed 5182.20 samples/sec Loss 6.2937 LearningRate 0.0851 Epoch: 1 Global Step: 25860 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 00:58:02,292-Speed 5200.71 samples/sec Loss 6.1941 LearningRate 0.0851 Epoch: 1 Global Step: 25870 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 00:58:04,276-Speed 5161.30 samples/sec Loss 6.1359 LearningRate 0.0851 Epoch: 1 Global Step: 25880 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 00:58:06,252-Speed 5185.51 samples/sec Loss 6.2238 LearningRate 0.0851 Epoch: 1 Global Step: 25890 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 00:58:08,236-Speed 5162.32 samples/sec Loss 6.2271 LearningRate 0.0851 Epoch: 1 Global Step: 25900 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 00:58:10,205-Speed 5204.60 samples/sec Loss 6.2508 LearningRate 0.0851 Epoch: 1 Global Step: 25910 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 00:58:12,179-Speed 5188.38 samples/sec Loss 6.2054 LearningRate 0.0851 Epoch: 1 Global Step: 25920 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 00:58:14,161-Speed 5167.18 samples/sec Loss 6.2539 LearningRate 0.0851 Epoch: 1 Global Step: 25930 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 00:58:16,138-Speed 5181.19 samples/sec Loss 6.1853 LearningRate 0.0851 Epoch: 1 Global Step: 25940 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 00:58:18,119-Speed 5170.83 samples/sec Loss 6.2238 LearningRate 0.0851 Epoch: 1 Global Step: 25950 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 00:58:20,087-Speed 5205.24 samples/sec Loss 6.1645 LearningRate 0.0851 Epoch: 1 Global Step: 25960 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 00:58:22,056-Speed 5203.26 samples/sec Loss 6.0948 LearningRate 0.0850 Epoch: 1 Global Step: 25970 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 00:58:24,027-Speed 5196.75 samples/sec Loss 6.2766 LearningRate 0.0850 Epoch: 1 Global Step: 25980 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 00:58:26,006-Speed 5175.85 samples/sec Loss 6.1133 LearningRate 0.0850 Epoch: 1 Global Step: 25990 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 00:58:27,984-Speed 5177.53 samples/sec Loss 6.2793 LearningRate 0.0850 Epoch: 1 Global Step: 26000 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 00:58:54,618-[lfw][26000]XNorm: 22.290360 Training: 2022-04-11 00:58:54,618-[lfw][26000]Accuracy-Flip: 0.99700+-0.00256 Training: 2022-04-11 00:58:54,619-[lfw][26000]Accuracy-Highest: 0.99700 Training: 2022-04-11 00:59:25,439-[cfp_fp][26000]XNorm: 19.943093 Training: 2022-04-11 00:59:25,440-[cfp_fp][26000]Accuracy-Flip: 0.97100+-0.00731 Training: 2022-04-11 00:59:25,440-[cfp_fp][26000]Accuracy-Highest: 0.97100 Training: 2022-04-11 00:59:52,049-[agedb_30][26000]XNorm: 22.431230 Training: 2022-04-11 00:59:52,050-[agedb_30][26000]Accuracy-Flip: 0.97033+-0.00682 Training: 2022-04-11 00:59:52,050-[agedb_30][26000]Accuracy-Highest: 0.97033 Training: 2022-04-11 00:59:54,038-Speed 119.00 samples/sec Loss 6.2170 LearningRate 0.0850 Epoch: 1 Global Step: 26010 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:59:55,997-Speed 5228.80 samples/sec Loss 6.0571 LearningRate 0.0850 Epoch: 1 Global Step: 26020 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:59:57,959-Speed 5219.56 samples/sec Loss 6.2082 LearningRate 0.0850 Epoch: 1 Global Step: 26030 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 00:59:59,922-Speed 5216.57 samples/sec Loss 6.2553 LearningRate 0.0850 Epoch: 1 Global Step: 26040 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:00:01,884-Speed 5223.18 samples/sec Loss 6.2100 LearningRate 0.0850 Epoch: 1 Global Step: 26050 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:00:03,848-Speed 5215.42 samples/sec Loss 6.1983 LearningRate 0.0850 Epoch: 1 Global Step: 26060 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:00:05,814-Speed 5208.69 samples/sec Loss 6.2838 LearningRate 0.0850 Epoch: 1 Global Step: 26070 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:00:07,777-Speed 5217.67 samples/sec Loss 6.1903 LearningRate 0.0850 Epoch: 1 Global Step: 26080 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:00:09,744-Speed 5207.34 samples/sec Loss 6.3011 LearningRate 0.0850 Epoch: 1 Global Step: 26090 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:00:11,716-Speed 5195.35 samples/sec Loss 6.1403 LearningRate 0.0850 Epoch: 1 Global Step: 26100 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:00:13,696-Speed 5173.86 samples/sec Loss 6.2137 LearningRate 0.0850 Epoch: 1 Global Step: 26110 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:00:15,680-Speed 5163.31 samples/sec Loss 6.2318 LearningRate 0.0850 Epoch: 1 Global Step: 26120 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:00:17,659-Speed 5174.26 samples/sec Loss 6.2068 LearningRate 0.0850 Epoch: 1 Global Step: 26130 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:00:19,626-Speed 5208.58 samples/sec Loss 6.2122 LearningRate 0.0850 Epoch: 1 Global Step: 26140 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:00:21,596-Speed 5202.15 samples/sec Loss 6.3014 LearningRate 0.0849 Epoch: 1 Global Step: 26150 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:00:23,571-Speed 5185.79 samples/sec Loss 6.3138 LearningRate 0.0849 Epoch: 1 Global Step: 26160 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:00:25,536-Speed 5212.97 samples/sec Loss 6.1966 LearningRate 0.0849 Epoch: 1 Global Step: 26170 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:00:27,517-Speed 5169.39 samples/sec Loss 6.2846 LearningRate 0.0849 Epoch: 1 Global Step: 26180 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:00:29,487-Speed 5201.61 samples/sec Loss 6.1708 LearningRate 0.0849 Epoch: 1 Global Step: 26190 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:00:31,468-Speed 5169.61 samples/sec Loss 6.1869 LearningRate 0.0849 Epoch: 1 Global Step: 26200 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:00:33,437-Speed 5202.20 samples/sec Loss 6.1586 LearningRate 0.0849 Epoch: 1 Global Step: 26210 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:00:35,416-Speed 5175.64 samples/sec Loss 6.1993 LearningRate 0.0849 Epoch: 1 Global Step: 26220 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:00:37,395-Speed 5176.98 samples/sec Loss 6.2316 LearningRate 0.0849 Epoch: 1 Global Step: 26230 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:00:39,397-Speed 5115.54 samples/sec Loss 6.1888 LearningRate 0.0849 Epoch: 1 Global Step: 26240 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:00:41,417-Speed 5070.42 samples/sec Loss 6.1270 LearningRate 0.0849 Epoch: 1 Global Step: 26250 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:00:43,387-Speed 5199.97 samples/sec Loss 6.1816 LearningRate 0.0849 Epoch: 1 Global Step: 26260 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:00:45,358-Speed 5197.69 samples/sec Loss 6.1785 LearningRate 0.0849 Epoch: 1 Global Step: 26270 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:00:47,327-Speed 5202.08 samples/sec Loss 6.1434 LearningRate 0.0849 Epoch: 1 Global Step: 26280 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:00:49,294-Speed 5208.70 samples/sec Loss 6.2803 LearningRate 0.0849 Epoch: 1 Global Step: 26290 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:00:51,257-Speed 5217.73 samples/sec Loss 6.1864 LearningRate 0.0849 Epoch: 1 Global Step: 26300 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:00:53,223-Speed 5211.38 samples/sec Loss 6.2030 LearningRate 0.0849 Epoch: 1 Global Step: 26310 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:00:55,189-Speed 5209.95 samples/sec Loss 6.1767 LearningRate 0.0849 Epoch: 1 Global Step: 26320 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:00:57,156-Speed 5207.74 samples/sec Loss 6.2796 LearningRate 0.0848 Epoch: 1 Global Step: 26330 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:00:59,124-Speed 5204.40 samples/sec Loss 6.0654 LearningRate 0.0848 Epoch: 1 Global Step: 26340 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:01:01,098-Speed 5189.49 samples/sec Loss 6.0728 LearningRate 0.0848 Epoch: 1 Global Step: 26350 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:01:03,065-Speed 5208.71 samples/sec Loss 6.1441 LearningRate 0.0848 Epoch: 1 Global Step: 26360 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:01:05,036-Speed 5196.37 samples/sec Loss 6.2191 LearningRate 0.0848 Epoch: 1 Global Step: 26370 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:01:07,001-Speed 5213.17 samples/sec Loss 6.0928 LearningRate 0.0848 Epoch: 1 Global Step: 26380 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:01:08,966-Speed 5212.37 samples/sec Loss 6.1761 LearningRate 0.0848 Epoch: 1 Global Step: 26390 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:01:10,945-Speed 5177.26 samples/sec Loss 6.1472 LearningRate 0.0848 Epoch: 1 Global Step: 26400 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:01:12,924-Speed 5176.93 samples/sec Loss 6.1688 LearningRate 0.0848 Epoch: 1 Global Step: 26410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:01:14,894-Speed 5200.55 samples/sec Loss 6.1981 LearningRate 0.0848 Epoch: 1 Global Step: 26420 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:01:16,860-Speed 5208.43 samples/sec Loss 6.0892 LearningRate 0.0848 Epoch: 1 Global Step: 26430 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:01:18,826-Speed 5210.98 samples/sec Loss 6.2648 LearningRate 0.0848 Epoch: 1 Global Step: 26440 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:01:20,792-Speed 5210.03 samples/sec Loss 6.1116 LearningRate 0.0848 Epoch: 1 Global Step: 26450 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:01:22,769-Speed 5182.64 samples/sec Loss 6.2408 LearningRate 0.0848 Epoch: 1 Global Step: 26460 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:01:24,751-Speed 5167.83 samples/sec Loss 6.2369 LearningRate 0.0848 Epoch: 1 Global Step: 26470 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:01:26,727-Speed 5183.94 samples/sec Loss 6.0845 LearningRate 0.0848 Epoch: 1 Global Step: 26480 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:01:28,694-Speed 5206.44 samples/sec Loss 6.2450 LearningRate 0.0848 Epoch: 1 Global Step: 26490 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:01:30,668-Speed 5189.80 samples/sec Loss 6.1387 LearningRate 0.0848 Epoch: 1 Global Step: 26500 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:01:32,634-Speed 5209.69 samples/sec Loss 6.2074 LearningRate 0.0847 Epoch: 1 Global Step: 26510 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:01:34,616-Speed 5167.92 samples/sec Loss 6.1875 LearningRate 0.0847 Epoch: 1 Global Step: 26520 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:01:36,586-Speed 5201.44 samples/sec Loss 6.1404 LearningRate 0.0847 Epoch: 1 Global Step: 26530 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:01:38,547-Speed 5223.77 samples/sec Loss 6.0725 LearningRate 0.0847 Epoch: 1 Global Step: 26540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:01:40,514-Speed 5208.23 samples/sec Loss 6.1301 LearningRate 0.0847 Epoch: 1 Global Step: 26550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:01:42,482-Speed 5203.83 samples/sec Loss 6.2066 LearningRate 0.0847 Epoch: 1 Global Step: 26560 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:01:44,459-Speed 5181.76 samples/sec Loss 6.1707 LearningRate 0.0847 Epoch: 1 Global Step: 26570 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:01:46,428-Speed 5202.99 samples/sec Loss 6.2204 LearningRate 0.0847 Epoch: 1 Global Step: 26580 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:01:48,394-Speed 5209.36 samples/sec Loss 6.1681 LearningRate 0.0847 Epoch: 1 Global Step: 26590 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:01:50,366-Speed 5194.62 samples/sec Loss 6.1901 LearningRate 0.0847 Epoch: 1 Global Step: 26600 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:01:52,331-Speed 5211.36 samples/sec Loss 6.1980 LearningRate 0.0847 Epoch: 1 Global Step: 26610 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:01:54,305-Speed 5191.96 samples/sec Loss 6.2260 LearningRate 0.0847 Epoch: 1 Global Step: 26620 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:01:56,271-Speed 5210.90 samples/sec Loss 6.0674 LearningRate 0.0847 Epoch: 1 Global Step: 26630 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:01:58,242-Speed 5194.82 samples/sec Loss 6.1173 LearningRate 0.0847 Epoch: 1 Global Step: 26640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:02:00,207-Speed 5214.81 samples/sec Loss 6.0879 LearningRate 0.0847 Epoch: 1 Global Step: 26650 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:02:02,170-Speed 5216.66 samples/sec Loss 6.1591 LearningRate 0.0847 Epoch: 1 Global Step: 26660 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:02:04,146-Speed 5184.42 samples/sec Loss 6.2447 LearningRate 0.0847 Epoch: 1 Global Step: 26670 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:02:06,114-Speed 5205.52 samples/sec Loss 6.1697 LearningRate 0.0847 Epoch: 1 Global Step: 26680 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:02:08,083-Speed 5200.66 samples/sec Loss 6.1410 LearningRate 0.0846 Epoch: 1 Global Step: 26690 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:02:10,062-Speed 5175.56 samples/sec Loss 6.1397 LearningRate 0.0846 Epoch: 1 Global Step: 26700 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:02:12,052-Speed 5149.61 samples/sec Loss 6.1020 LearningRate 0.0846 Epoch: 1 Global Step: 26710 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:02:14,042-Speed 5148.06 samples/sec Loss 6.1195 LearningRate 0.0846 Epoch: 1 Global Step: 26720 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:02:16,013-Speed 5196.17 samples/sec Loss 6.0872 LearningRate 0.0846 Epoch: 1 Global Step: 26730 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:02:17,977-Speed 5214.82 samples/sec Loss 6.1827 LearningRate 0.0846 Epoch: 1 Global Step: 26740 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:02:19,947-Speed 5202.34 samples/sec Loss 6.1484 LearningRate 0.0846 Epoch: 1 Global Step: 26750 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:02:21,922-Speed 5186.04 samples/sec Loss 6.1402 LearningRate 0.0846 Epoch: 1 Global Step: 26760 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:02:23,903-Speed 5170.08 samples/sec Loss 6.1594 LearningRate 0.0846 Epoch: 1 Global Step: 26770 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:02:25,881-Speed 5179.39 samples/sec Loss 6.1174 LearningRate 0.0846 Epoch: 1 Global Step: 26780 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:02:27,857-Speed 5183.57 samples/sec Loss 6.1917 LearningRate 0.0846 Epoch: 1 Global Step: 26790 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:02:29,823-Speed 5209.83 samples/sec Loss 6.2770 LearningRate 0.0846 Epoch: 1 Global Step: 26800 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:02:31,781-Speed 5231.73 samples/sec Loss 6.2619 LearningRate 0.0846 Epoch: 1 Global Step: 26810 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-11 01:02:33,771-Speed 5145.97 samples/sec Loss 6.1523 LearningRate 0.0846 Epoch: 1 Global Step: 26820 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:02:35,764-Speed 5140.84 samples/sec Loss 6.1402 LearningRate 0.0846 Epoch: 1 Global Step: 26830 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:02:37,753-Speed 5150.11 samples/sec Loss 6.0523 LearningRate 0.0846 Epoch: 1 Global Step: 26840 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:02:39,724-Speed 5197.00 samples/sec Loss 6.1044 LearningRate 0.0846 Epoch: 1 Global Step: 26850 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:02:41,708-Speed 5165.13 samples/sec Loss 6.1556 LearningRate 0.0846 Epoch: 1 Global Step: 26860 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:02:43,680-Speed 5193.23 samples/sec Loss 6.1261 LearningRate 0.0845 Epoch: 1 Global Step: 26870 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:02:45,648-Speed 5204.31 samples/sec Loss 6.1833 LearningRate 0.0845 Epoch: 1 Global Step: 26880 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:02:47,615-Speed 5208.94 samples/sec Loss 6.1012 LearningRate 0.0845 Epoch: 1 Global Step: 26890 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:02:49,583-Speed 5204.98 samples/sec Loss 6.1980 LearningRate 0.0845 Epoch: 1 Global Step: 26900 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:02:51,570-Speed 5154.72 samples/sec Loss 6.0298 LearningRate 0.0845 Epoch: 1 Global Step: 26910 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:02:53,545-Speed 5185.38 samples/sec Loss 6.0147 LearningRate 0.0845 Epoch: 1 Global Step: 26920 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:02:55,523-Speed 5178.72 samples/sec Loss 6.0463 LearningRate 0.0845 Epoch: 1 Global Step: 26930 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:02:57,504-Speed 5174.08 samples/sec Loss 6.1844 LearningRate 0.0845 Epoch: 1 Global Step: 26940 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:02:59,479-Speed 5185.94 samples/sec Loss 6.0893 LearningRate 0.0845 Epoch: 1 Global Step: 26950 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:03:01,461-Speed 5166.61 samples/sec Loss 6.0762 LearningRate 0.0845 Epoch: 1 Global Step: 26960 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:03:03,445-Speed 5163.47 samples/sec Loss 6.1349 LearningRate 0.0845 Epoch: 1 Global Step: 26970 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:03:05,419-Speed 5190.83 samples/sec Loss 6.2087 LearningRate 0.0845 Epoch: 1 Global Step: 26980 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:03:07,396-Speed 5179.52 samples/sec Loss 6.1607 LearningRate 0.0845 Epoch: 1 Global Step: 26990 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:03:09,366-Speed 5199.24 samples/sec Loss 6.1964 LearningRate 0.0845 Epoch: 1 Global Step: 27000 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:03:11,338-Speed 5194.89 samples/sec Loss 6.0938 LearningRate 0.0845 Epoch: 1 Global Step: 27010 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-11 01:03:13,317-Speed 5175.39 samples/sec Loss 6.1234 LearningRate 0.0845 Epoch: 1 Global Step: 27020 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:03:15,298-Speed 5171.82 samples/sec Loss 6.1552 LearningRate 0.0845 Epoch: 1 Global Step: 27030 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:03:17,277-Speed 5176.02 samples/sec Loss 6.0711 LearningRate 0.0845 Epoch: 1 Global Step: 27040 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:03:19,258-Speed 5172.17 samples/sec Loss 6.0966 LearningRate 0.0845 Epoch: 1 Global Step: 27050 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:03:21,228-Speed 5199.13 samples/sec Loss 6.2270 LearningRate 0.0844 Epoch: 1 Global Step: 27060 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:03:23,200-Speed 5195.45 samples/sec Loss 6.1327 LearningRate 0.0844 Epoch: 1 Global Step: 27070 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:03:25,184-Speed 5160.60 samples/sec Loss 6.0212 LearningRate 0.0844 Epoch: 1 Global Step: 27080 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:03:27,154-Speed 5201.03 samples/sec Loss 6.1373 LearningRate 0.0844 Epoch: 1 Global Step: 27090 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:03:29,124-Speed 5198.97 samples/sec Loss 6.1648 LearningRate 0.0844 Epoch: 1 Global Step: 27100 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:03:31,094-Speed 5201.09 samples/sec Loss 6.1553 LearningRate 0.0844 Epoch: 1 Global Step: 27110 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:03:33,072-Speed 5178.61 samples/sec Loss 6.1478 LearningRate 0.0844 Epoch: 1 Global Step: 27120 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:03:35,041-Speed 5201.70 samples/sec Loss 6.1825 LearningRate 0.0844 Epoch: 1 Global Step: 27130 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:03:37,007-Speed 5208.72 samples/sec Loss 6.1497 LearningRate 0.0844 Epoch: 1 Global Step: 27140 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:03:38,976-Speed 5203.23 samples/sec Loss 5.9846 LearningRate 0.0844 Epoch: 1 Global Step: 27150 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:03:40,948-Speed 5194.70 samples/sec Loss 6.0365 LearningRate 0.0844 Epoch: 1 Global Step: 27160 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:03:42,932-Speed 5164.98 samples/sec Loss 6.1992 LearningRate 0.0844 Epoch: 1 Global Step: 27170 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:03:44,913-Speed 5169.81 samples/sec Loss 6.1108 LearningRate 0.0844 Epoch: 1 Global Step: 27180 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:03:46,884-Speed 5195.78 samples/sec Loss 6.0872 LearningRate 0.0844 Epoch: 1 Global Step: 27190 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:03:48,853-Speed 5203.18 samples/sec Loss 6.1440 LearningRate 0.0844 Epoch: 1 Global Step: 27200 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:03:50,822-Speed 5201.39 samples/sec Loss 6.0210 LearningRate 0.0844 Epoch: 1 Global Step: 27210 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:03:52,794-Speed 5194.29 samples/sec Loss 6.0623 LearningRate 0.0844 Epoch: 1 Global Step: 27220 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:03:54,767-Speed 5193.88 samples/sec Loss 6.2023 LearningRate 0.0844 Epoch: 1 Global Step: 27230 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:03:56,737-Speed 5199.86 samples/sec Loss 6.1915 LearningRate 0.0843 Epoch: 1 Global Step: 27240 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:03:58,717-Speed 5171.17 samples/sec Loss 6.0901 LearningRate 0.0843 Epoch: 1 Global Step: 27250 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:04:00,702-Speed 5163.40 samples/sec Loss 6.1140 LearningRate 0.0843 Epoch: 1 Global Step: 27260 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:04:02,674-Speed 5192.19 samples/sec Loss 6.0993 LearningRate 0.0843 Epoch: 1 Global Step: 27270 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:04:04,646-Speed 5194.22 samples/sec Loss 6.1253 LearningRate 0.0843 Epoch: 1 Global Step: 27280 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:04:06,615-Speed 5202.02 samples/sec Loss 6.1274 LearningRate 0.0843 Epoch: 1 Global Step: 27290 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:04:08,586-Speed 5197.44 samples/sec Loss 6.0511 LearningRate 0.0843 Epoch: 1 Global Step: 27300 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:04:10,558-Speed 5196.29 samples/sec Loss 6.1812 LearningRate 0.0843 Epoch: 1 Global Step: 27310 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:04:12,530-Speed 5194.05 samples/sec Loss 6.0579 LearningRate 0.0843 Epoch: 1 Global Step: 27320 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:04:14,504-Speed 5189.41 samples/sec Loss 6.1048 LearningRate 0.0843 Epoch: 1 Global Step: 27330 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:04:16,474-Speed 5199.39 samples/sec Loss 6.0696 LearningRate 0.0843 Epoch: 1 Global Step: 27340 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:04:18,448-Speed 5187.13 samples/sec Loss 6.1378 LearningRate 0.0843 Epoch: 1 Global Step: 27350 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:04:20,414-Speed 5211.93 samples/sec Loss 6.1520 LearningRate 0.0843 Epoch: 1 Global Step: 27360 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:04:22,393-Speed 5176.64 samples/sec Loss 6.1605 LearningRate 0.0843 Epoch: 1 Global Step: 27370 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:04:24,369-Speed 5184.99 samples/sec Loss 6.1211 LearningRate 0.0843 Epoch: 1 Global Step: 27380 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:04:26,342-Speed 5191.18 samples/sec Loss 6.0032 LearningRate 0.0843 Epoch: 1 Global Step: 27390 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:04:28,344-Speed 5116.70 samples/sec Loss 6.0184 LearningRate 0.0843 Epoch: 1 Global Step: 27400 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:04:30,314-Speed 5199.10 samples/sec Loss 5.9811 LearningRate 0.0843 Epoch: 1 Global Step: 27410 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:04:32,285-Speed 5198.04 samples/sec Loss 6.0316 LearningRate 0.0842 Epoch: 1 Global Step: 27420 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:04:34,256-Speed 5195.99 samples/sec Loss 6.0431 LearningRate 0.0842 Epoch: 1 Global Step: 27430 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:04:36,243-Speed 5155.65 samples/sec Loss 6.2500 LearningRate 0.0842 Epoch: 1 Global Step: 27440 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:04:38,221-Speed 5179.56 samples/sec Loss 6.0694 LearningRate 0.0842 Epoch: 1 Global Step: 27450 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:04:40,197-Speed 5181.32 samples/sec Loss 6.1333 LearningRate 0.0842 Epoch: 1 Global Step: 27460 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:04:42,168-Speed 5199.69 samples/sec Loss 6.0478 LearningRate 0.0842 Epoch: 1 Global Step: 27470 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:04:44,133-Speed 5213.99 samples/sec Loss 6.1090 LearningRate 0.0842 Epoch: 1 Global Step: 27480 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:04:46,103-Speed 5200.32 samples/sec Loss 6.1428 LearningRate 0.0842 Epoch: 1 Global Step: 27490 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:04:48,077-Speed 5188.49 samples/sec Loss 6.0607 LearningRate 0.0842 Epoch: 1 Global Step: 27500 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:04:50,054-Speed 5180.74 samples/sec Loss 6.1043 LearningRate 0.0842 Epoch: 1 Global Step: 27510 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:04:52,029-Speed 5185.66 samples/sec Loss 6.0630 LearningRate 0.0842 Epoch: 1 Global Step: 27520 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:04:54,001-Speed 5195.24 samples/sec Loss 6.1134 LearningRate 0.0842 Epoch: 1 Global Step: 27530 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:04:55,971-Speed 5198.29 samples/sec Loss 6.1528 LearningRate 0.0842 Epoch: 1 Global Step: 27540 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:04:57,948-Speed 5181.48 samples/sec Loss 6.1149 LearningRate 0.0842 Epoch: 1 Global Step: 27550 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:04:59,931-Speed 5165.81 samples/sec Loss 6.0508 LearningRate 0.0842 Epoch: 1 Global Step: 27560 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:05:01,909-Speed 5178.99 samples/sec Loss 6.1138 LearningRate 0.0842 Epoch: 1 Global Step: 27570 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:05:03,896-Speed 5156.52 samples/sec Loss 6.1277 LearningRate 0.0842 Epoch: 1 Global Step: 27580 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:05:05,883-Speed 5156.07 samples/sec Loss 6.0560 LearningRate 0.0842 Epoch: 1 Global Step: 27590 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:05:07,855-Speed 5193.04 samples/sec Loss 6.1404 LearningRate 0.0841 Epoch: 1 Global Step: 27600 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:05:09,826-Speed 5195.80 samples/sec Loss 6.0498 LearningRate 0.0841 Epoch: 1 Global Step: 27610 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:05:11,802-Speed 5185.58 samples/sec Loss 6.1457 LearningRate 0.0841 Epoch: 1 Global Step: 27620 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:05:13,774-Speed 5193.26 samples/sec Loss 5.9824 LearningRate 0.0841 Epoch: 1 Global Step: 27630 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:05:15,752-Speed 5179.03 samples/sec Loss 5.9962 LearningRate 0.0841 Epoch: 1 Global Step: 27640 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:05:17,724-Speed 5195.47 samples/sec Loss 6.0995 LearningRate 0.0841 Epoch: 1 Global Step: 27650 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:05:19,701-Speed 5180.56 samples/sec Loss 6.0349 LearningRate 0.0841 Epoch: 1 Global Step: 27660 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:05:21,688-Speed 5157.01 samples/sec Loss 6.0843 LearningRate 0.0841 Epoch: 1 Global Step: 27670 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:05:23,649-Speed 5221.75 samples/sec Loss 6.1155 LearningRate 0.0841 Epoch: 1 Global Step: 27680 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:05:25,627-Speed 5179.02 samples/sec Loss 5.9842 LearningRate 0.0841 Epoch: 1 Global Step: 27690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:05:27,599-Speed 5196.31 samples/sec Loss 6.1132 LearningRate 0.0841 Epoch: 1 Global Step: 27700 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:05:29,570-Speed 5196.20 samples/sec Loss 6.0114 LearningRate 0.0841 Epoch: 1 Global Step: 27710 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:05:31,542-Speed 5193.96 samples/sec Loss 6.0404 LearningRate 0.0841 Epoch: 1 Global Step: 27720 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:05:33,518-Speed 5185.86 samples/sec Loss 5.9554 LearningRate 0.0841 Epoch: 1 Global Step: 27730 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:05:35,489-Speed 5195.36 samples/sec Loss 6.0370 LearningRate 0.0841 Epoch: 1 Global Step: 27740 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:05:37,458-Speed 5202.38 samples/sec Loss 5.9856 LearningRate 0.0841 Epoch: 1 Global Step: 27750 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:05:39,443-Speed 5160.32 samples/sec Loss 6.0871 LearningRate 0.0841 Epoch: 1 Global Step: 27760 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:05:41,414-Speed 5197.93 samples/sec Loss 6.0625 LearningRate 0.0841 Epoch: 1 Global Step: 27770 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:05:43,393-Speed 5176.00 samples/sec Loss 6.0715 LearningRate 0.0840 Epoch: 1 Global Step: 27780 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-11 01:05:45,363-Speed 5200.50 samples/sec Loss 6.0633 LearningRate 0.0840 Epoch: 1 Global Step: 27790 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-11 01:05:47,337-Speed 5189.77 samples/sec Loss 6.0057 LearningRate 0.0840 Epoch: 1 Global Step: 27800 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:05:49,325-Speed 5152.44 samples/sec Loss 6.0201 LearningRate 0.0840 Epoch: 1 Global Step: 27810 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:05:51,309-Speed 5162.25 samples/sec Loss 6.0542 LearningRate 0.0840 Epoch: 1 Global Step: 27820 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:05:53,279-Speed 5200.59 samples/sec Loss 6.0270 LearningRate 0.0840 Epoch: 1 Global Step: 27830 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:05:55,257-Speed 5178.48 samples/sec Loss 6.0258 LearningRate 0.0840 Epoch: 1 Global Step: 27840 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:05:57,227-Speed 5200.54 samples/sec Loss 6.1694 LearningRate 0.0840 Epoch: 1 Global Step: 27850 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:05:59,207-Speed 5171.26 samples/sec Loss 6.0026 LearningRate 0.0840 Epoch: 1 Global Step: 27860 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:06:01,180-Speed 5193.86 samples/sec Loss 6.0411 LearningRate 0.0840 Epoch: 1 Global Step: 27870 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:06:03,166-Speed 5155.79 samples/sec Loss 6.0112 LearningRate 0.0840 Epoch: 1 Global Step: 27880 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:06:05,152-Speed 5159.60 samples/sec Loss 6.0396 LearningRate 0.0840 Epoch: 1 Global Step: 27890 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:06:07,126-Speed 5189.60 samples/sec Loss 5.9946 LearningRate 0.0840 Epoch: 1 Global Step: 27900 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:06:09,096-Speed 5198.54 samples/sec Loss 5.9905 LearningRate 0.0840 Epoch: 1 Global Step: 27910 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:06:11,069-Speed 5192.46 samples/sec Loss 6.0275 LearningRate 0.0840 Epoch: 1 Global Step: 27920 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:06:13,041-Speed 5192.91 samples/sec Loss 6.0515 LearningRate 0.0840 Epoch: 1 Global Step: 27930 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:06:15,030-Speed 5151.12 samples/sec Loss 5.9783 LearningRate 0.0840 Epoch: 1 Global Step: 27940 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:06:17,004-Speed 5189.14 samples/sec Loss 6.0733 LearningRate 0.0840 Epoch: 1 Global Step: 27950 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:06:19,005-Speed 5118.53 samples/sec Loss 6.0536 LearningRate 0.0839 Epoch: 1 Global Step: 27960 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:06:20,975-Speed 5200.47 samples/sec Loss 6.0608 LearningRate 0.0839 Epoch: 1 Global Step: 27970 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:06:22,948-Speed 5189.81 samples/sec Loss 6.0416 LearningRate 0.0839 Epoch: 1 Global Step: 27980 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:06:24,921-Speed 5193.31 samples/sec Loss 6.0808 LearningRate 0.0839 Epoch: 1 Global Step: 27990 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:06:26,894-Speed 5191.56 samples/sec Loss 5.9578 LearningRate 0.0839 Epoch: 1 Global Step: 28000 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:06:53,634-[lfw][28000]XNorm: 21.328614 Training: 2022-04-11 01:06:53,635-[lfw][28000]Accuracy-Flip: 0.99633+-0.00277 Training: 2022-04-11 01:06:53,635-[lfw][28000]Accuracy-Highest: 0.99700 Training: 2022-04-11 01:07:24,516-[cfp_fp][28000]XNorm: 19.478398 Training: 2022-04-11 01:07:24,516-[cfp_fp][28000]Accuracy-Flip: 0.96557+-0.00868 Training: 2022-04-11 01:07:24,516-[cfp_fp][28000]Accuracy-Highest: 0.97100 Training: 2022-04-11 01:07:51,160-[agedb_30][28000]XNorm: 21.133905 Training: 2022-04-11 01:07:51,161-[agedb_30][28000]Accuracy-Flip: 0.97067+-0.00761 Training: 2022-04-11 01:07:51,161-[agedb_30][28000]Accuracy-Highest: 0.97067 Training: 2022-04-11 01:07:53,142-Speed 118.73 samples/sec Loss 6.0774 LearningRate 0.0839 Epoch: 1 Global Step: 28010 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:07:55,098-Speed 5236.72 samples/sec Loss 6.0234 LearningRate 0.0839 Epoch: 1 Global Step: 28020 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:07:57,060-Speed 5221.36 samples/sec Loss 6.0350 LearningRate 0.0839 Epoch: 1 Global Step: 28030 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:07:59,019-Speed 5227.47 samples/sec Loss 5.9991 LearningRate 0.0839 Epoch: 1 Global Step: 28040 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:08:00,988-Speed 5203.90 samples/sec Loss 5.9540 LearningRate 0.0839 Epoch: 1 Global Step: 28050 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:08:02,954-Speed 5210.39 samples/sec Loss 5.9814 LearningRate 0.0839 Epoch: 1 Global Step: 28060 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:08:04,916-Speed 5220.96 samples/sec Loss 5.9246 LearningRate 0.0839 Epoch: 1 Global Step: 28070 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:08:06,875-Speed 5227.76 samples/sec Loss 5.9504 LearningRate 0.0839 Epoch: 1 Global Step: 28080 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:08:08,840-Speed 5213.68 samples/sec Loss 5.9723 LearningRate 0.0839 Epoch: 1 Global Step: 28090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:08:10,796-Speed 5234.64 samples/sec Loss 6.0543 LearningRate 0.0839 Epoch: 1 Global Step: 28100 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:08:12,765-Speed 5203.48 samples/sec Loss 5.9586 LearningRate 0.0839 Epoch: 1 Global Step: 28110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-11 01:08:14,733-Speed 5205.60 samples/sec Loss 5.9834 LearningRate 0.0839 Epoch: 1 Global Step: 28120 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:08:16,697-Speed 5215.16 samples/sec Loss 5.9818 LearningRate 0.0839 Epoch: 1 Global Step: 28130 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:08:18,661-Speed 5214.78 samples/sec Loss 6.0916 LearningRate 0.0839 Epoch: 1 Global Step: 28140 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:08:20,626-Speed 5214.42 samples/sec Loss 5.8609 LearningRate 0.0838 Epoch: 1 Global Step: 28150 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:08:22,600-Speed 5188.08 samples/sec Loss 5.9690 LearningRate 0.0838 Epoch: 1 Global Step: 28160 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:08:24,567-Speed 5208.68 samples/sec Loss 6.0098 LearningRate 0.0838 Epoch: 1 Global Step: 28170 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:08:26,544-Speed 5182.04 samples/sec Loss 5.9597 LearningRate 0.0838 Epoch: 1 Global Step: 28180 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:08:28,509-Speed 5212.00 samples/sec Loss 5.9557 LearningRate 0.0838 Epoch: 1 Global Step: 28190 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:08:30,476-Speed 5207.46 samples/sec Loss 6.0075 LearningRate 0.0838 Epoch: 1 Global Step: 28200 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-11 01:08:32,449-Speed 5191.58 samples/sec Loss 5.9850 LearningRate 0.0838 Epoch: 1 Global Step: 28210 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:08:34,424-Speed 5186.56 samples/sec Loss 5.9903 LearningRate 0.0838 Epoch: 1 Global Step: 28220 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:08:36,412-Speed 5152.90 samples/sec Loss 6.0027 LearningRate 0.0838 Epoch: 1 Global Step: 28230 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:08:38,392-Speed 5172.81 samples/sec Loss 6.1168 LearningRate 0.0838 Epoch: 1 Global Step: 28240 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:08:40,363-Speed 5199.16 samples/sec Loss 5.9831 LearningRate 0.0838 Epoch: 1 Global Step: 28250 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:08:42,335-Speed 5193.99 samples/sec Loss 5.9252 LearningRate 0.0838 Epoch: 1 Global Step: 28260 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:08:44,295-Speed 5225.23 samples/sec Loss 5.9711 LearningRate 0.0838 Epoch: 1 Global Step: 28270 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:08:46,276-Speed 5170.60 samples/sec Loss 6.0355 LearningRate 0.0838 Epoch: 1 Global Step: 28280 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:08:48,256-Speed 5174.59 samples/sec Loss 5.9848 LearningRate 0.0838 Epoch: 1 Global Step: 28290 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:08:50,225-Speed 5201.59 samples/sec Loss 6.0821 LearningRate 0.0838 Epoch: 1 Global Step: 28300 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:08:52,200-Speed 5187.52 samples/sec Loss 6.1205 LearningRate 0.0838 Epoch: 1 Global Step: 28310 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:08:54,166-Speed 5208.64 samples/sec Loss 6.1566 LearningRate 0.0838 Epoch: 1 Global Step: 28320 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:08:56,130-Speed 5215.95 samples/sec Loss 5.9441 LearningRate 0.0837 Epoch: 1 Global Step: 28330 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:08:58,097-Speed 5207.74 samples/sec Loss 5.9190 LearningRate 0.0837 Epoch: 1 Global Step: 28340 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:09:00,077-Speed 5172.64 samples/sec Loss 5.9729 LearningRate 0.0837 Epoch: 1 Global Step: 28350 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:09:02,066-Speed 5150.51 samples/sec Loss 5.9613 LearningRate 0.0837 Epoch: 1 Global Step: 28360 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:09:04,033-Speed 5208.49 samples/sec Loss 6.0456 LearningRate 0.0837 Epoch: 1 Global Step: 28370 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:09:06,018-Speed 5159.06 samples/sec Loss 5.9567 LearningRate 0.0837 Epoch: 1 Global Step: 28380 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:09:07,990-Speed 5195.42 samples/sec Loss 6.0063 LearningRate 0.0837 Epoch: 1 Global Step: 28390 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:09:09,955-Speed 5212.66 samples/sec Loss 5.9655 LearningRate 0.0837 Epoch: 1 Global Step: 28400 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:09:11,922-Speed 5209.11 samples/sec Loss 6.0155 LearningRate 0.0837 Epoch: 1 Global Step: 28410 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:09:13,886-Speed 5214.48 samples/sec Loss 5.9861 LearningRate 0.0837 Epoch: 1 Global Step: 28420 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:09:15,855-Speed 5201.32 samples/sec Loss 6.0842 LearningRate 0.0837 Epoch: 1 Global Step: 28430 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:09:17,833-Speed 5180.57 samples/sec Loss 5.9437 LearningRate 0.0837 Epoch: 1 Global Step: 28440 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:09:19,805-Speed 5191.66 samples/sec Loss 6.0175 LearningRate 0.0837 Epoch: 1 Global Step: 28450 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:09:21,794-Speed 5153.82 samples/sec Loss 5.9515 LearningRate 0.0837 Epoch: 1 Global Step: 28460 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:09:23,777-Speed 5163.43 samples/sec Loss 6.0265 LearningRate 0.0837 Epoch: 1 Global Step: 28470 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-11 01:09:25,736-Speed 5229.67 samples/sec Loss 5.9212 LearningRate 0.0837 Epoch: 1 Global Step: 28480 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:09:27,700-Speed 5215.76 samples/sec Loss 5.9215 LearningRate 0.0837 Epoch: 1 Global Step: 28490 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:09:29,675-Speed 5186.79 samples/sec Loss 5.9163 LearningRate 0.0837 Epoch: 1 Global Step: 28500 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:09:31,641-Speed 5209.53 samples/sec Loss 6.0291 LearningRate 0.0836 Epoch: 1 Global Step: 28510 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:09:33,621-Speed 5174.95 samples/sec Loss 5.9253 LearningRate 0.0836 Epoch: 1 Global Step: 28520 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:09:35,589-Speed 5203.86 samples/sec Loss 6.0793 LearningRate 0.0836 Epoch: 1 Global Step: 28530 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:09:37,552-Speed 5217.52 samples/sec Loss 5.9296 LearningRate 0.0836 Epoch: 1 Global Step: 28540 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:09:39,523-Speed 5198.36 samples/sec Loss 6.0219 LearningRate 0.0836 Epoch: 1 Global Step: 28550 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:09:41,503-Speed 5173.72 samples/sec Loss 6.0000 LearningRate 0.0836 Epoch: 1 Global Step: 28560 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:09:43,476-Speed 5192.07 samples/sec Loss 6.0246 LearningRate 0.0836 Epoch: 1 Global Step: 28570 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:09:45,442-Speed 5209.55 samples/sec Loss 5.9240 LearningRate 0.0836 Epoch: 1 Global Step: 28580 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:09:47,423-Speed 5171.61 samples/sec Loss 5.8611 LearningRate 0.0836 Epoch: 1 Global Step: 28590 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:09:49,398-Speed 5186.64 samples/sec Loss 5.9433 LearningRate 0.0836 Epoch: 1 Global Step: 28600 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:09:51,366-Speed 5203.53 samples/sec Loss 6.0198 LearningRate 0.0836 Epoch: 1 Global Step: 28610 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:09:53,332-Speed 5210.05 samples/sec Loss 5.9458 LearningRate 0.0836 Epoch: 1 Global Step: 28620 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:09:55,297-Speed 5213.13 samples/sec Loss 5.9644 LearningRate 0.0836 Epoch: 1 Global Step: 28630 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:09:57,269-Speed 5193.28 samples/sec Loss 6.0850 LearningRate 0.0836 Epoch: 1 Global Step: 28640 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:09:59,238-Speed 5203.75 samples/sec Loss 6.0217 LearningRate 0.0836 Epoch: 1 Global Step: 28650 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:10:01,219-Speed 5171.28 samples/sec Loss 6.0185 LearningRate 0.0836 Epoch: 1 Global Step: 28660 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:10:03,184-Speed 5212.29 samples/sec Loss 5.9269 LearningRate 0.0836 Epoch: 1 Global Step: 28670 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:10:05,150-Speed 5211.31 samples/sec Loss 5.9417 LearningRate 0.0836 Epoch: 1 Global Step: 28680 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-11 01:10:07,109-Speed 5228.49 samples/sec Loss 5.9375 LearningRate 0.0835 Epoch: 1 Global Step: 28690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:10:09,061-Speed 5249.11 samples/sec Loss 5.8843 LearningRate 0.0835 Epoch: 1 Global Step: 28700 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:10:11,025-Speed 5215.80 samples/sec Loss 5.9578 LearningRate 0.0835 Epoch: 1 Global Step: 28710 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:10:12,993-Speed 5204.09 samples/sec Loss 5.9025 LearningRate 0.0835 Epoch: 1 Global Step: 28720 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:10:14,957-Speed 5215.04 samples/sec Loss 6.0086 LearningRate 0.0835 Epoch: 1 Global Step: 28730 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:10:16,923-Speed 5209.09 samples/sec Loss 6.0225 LearningRate 0.0835 Epoch: 1 Global Step: 28740 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:10:18,892-Speed 5204.16 samples/sec Loss 5.8597 LearningRate 0.0835 Epoch: 1 Global Step: 28750 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:10:20,856-Speed 5215.51 samples/sec Loss 5.9727 LearningRate 0.0835 Epoch: 1 Global Step: 28760 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:10:22,835-Speed 5175.40 samples/sec Loss 5.9562 LearningRate 0.0835 Epoch: 1 Global Step: 28770 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:10:24,814-Speed 5175.62 samples/sec Loss 5.9403 LearningRate 0.0835 Epoch: 1 Global Step: 28780 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:10:26,780-Speed 5212.07 samples/sec Loss 5.9888 LearningRate 0.0835 Epoch: 1 Global Step: 28790 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:10:28,762-Speed 5167.65 samples/sec Loss 5.9392 LearningRate 0.0835 Epoch: 1 Global Step: 28800 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:10:30,731-Speed 5202.30 samples/sec Loss 5.8787 LearningRate 0.0835 Epoch: 1 Global Step: 28810 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:10:32,695-Speed 5215.83 samples/sec Loss 5.9517 LearningRate 0.0835 Epoch: 1 Global Step: 28820 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:10:34,673-Speed 5177.28 samples/sec Loss 5.9107 LearningRate 0.0835 Epoch: 1 Global Step: 28830 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:10:36,677-Speed 5111.33 samples/sec Loss 5.8788 LearningRate 0.0835 Epoch: 1 Global Step: 28840 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:10:38,675-Speed 5127.71 samples/sec Loss 5.9537 LearningRate 0.0835 Epoch: 1 Global Step: 28850 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:10:40,646-Speed 5198.18 samples/sec Loss 5.9350 LearningRate 0.0835 Epoch: 1 Global Step: 28860 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:10:42,613-Speed 5205.85 samples/sec Loss 5.9488 LearningRate 0.0835 Epoch: 1 Global Step: 28870 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:10:44,584-Speed 5197.52 samples/sec Loss 5.9945 LearningRate 0.0834 Epoch: 1 Global Step: 28880 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:10:46,564-Speed 5173.93 samples/sec Loss 5.9864 LearningRate 0.0834 Epoch: 1 Global Step: 28890 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:10:48,533-Speed 5201.56 samples/sec Loss 5.9150 LearningRate 0.0834 Epoch: 1 Global Step: 28900 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:10:50,502-Speed 5203.65 samples/sec Loss 5.8571 LearningRate 0.0834 Epoch: 1 Global Step: 28910 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:10:52,483-Speed 5169.81 samples/sec Loss 5.9766 LearningRate 0.0834 Epoch: 1 Global Step: 28920 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:10:54,474-Speed 5145.76 samples/sec Loss 5.9438 LearningRate 0.0834 Epoch: 1 Global Step: 28930 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:10:56,444-Speed 5199.15 samples/sec Loss 5.9151 LearningRate 0.0834 Epoch: 1 Global Step: 28940 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:10:58,411-Speed 5207.52 samples/sec Loss 5.9345 LearningRate 0.0834 Epoch: 1 Global Step: 28950 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:11:00,374-Speed 5218.11 samples/sec Loss 5.9878 LearningRate 0.0834 Epoch: 1 Global Step: 28960 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:11:02,347-Speed 5192.21 samples/sec Loss 5.8048 LearningRate 0.0834 Epoch: 1 Global Step: 28970 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:11:04,314-Speed 5207.28 samples/sec Loss 6.0209 LearningRate 0.0834 Epoch: 1 Global Step: 28980 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:11:06,282-Speed 5206.62 samples/sec Loss 5.8984 LearningRate 0.0834 Epoch: 1 Global Step: 28990 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:11:08,250-Speed 5204.42 samples/sec Loss 5.8255 LearningRate 0.0834 Epoch: 1 Global Step: 29000 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:11:10,221-Speed 5198.10 samples/sec Loss 5.9523 LearningRate 0.0834 Epoch: 1 Global Step: 29010 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:11:12,202-Speed 5169.74 samples/sec Loss 5.8784 LearningRate 0.0834 Epoch: 1 Global Step: 29020 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:11:14,167-Speed 5213.50 samples/sec Loss 5.8938 LearningRate 0.0834 Epoch: 1 Global Step: 29030 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:11:16,134-Speed 5206.29 samples/sec Loss 6.0592 LearningRate 0.0834 Epoch: 1 Global Step: 29040 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:11:18,101-Speed 5208.58 samples/sec Loss 5.9271 LearningRate 0.0834 Epoch: 1 Global Step: 29050 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:11:20,069-Speed 5203.57 samples/sec Loss 5.9148 LearningRate 0.0833 Epoch: 1 Global Step: 29060 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:11:22,044-Speed 5186.54 samples/sec Loss 5.8619 LearningRate 0.0833 Epoch: 1 Global Step: 29070 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:11:24,026-Speed 5169.88 samples/sec Loss 5.9938 LearningRate 0.0833 Epoch: 1 Global Step: 29080 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:11:26,000-Speed 5188.73 samples/sec Loss 5.8516 LearningRate 0.0833 Epoch: 1 Global Step: 29090 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:11:27,975-Speed 5186.63 samples/sec Loss 5.8895 LearningRate 0.0833 Epoch: 1 Global Step: 29100 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:11:29,964-Speed 5151.84 samples/sec Loss 5.8554 LearningRate 0.0833 Epoch: 1 Global Step: 29110 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:11:31,932-Speed 5204.15 samples/sec Loss 5.9125 LearningRate 0.0833 Epoch: 1 Global Step: 29120 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:11:33,905-Speed 5191.40 samples/sec Loss 5.8213 LearningRate 0.0833 Epoch: 1 Global Step: 29130 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:11:35,885-Speed 5173.39 samples/sec Loss 5.9349 LearningRate 0.0833 Epoch: 1 Global Step: 29140 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:11:37,851-Speed 5211.06 samples/sec Loss 5.8795 LearningRate 0.0833 Epoch: 1 Global Step: 29150 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:11:39,819-Speed 5206.21 samples/sec Loss 5.8877 LearningRate 0.0833 Epoch: 1 Global Step: 29160 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-11 01:11:41,786-Speed 5204.80 samples/sec Loss 5.9580 LearningRate 0.0833 Epoch: 1 Global Step: 29170 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:11:43,752-Speed 5212.37 samples/sec Loss 5.9079 LearningRate 0.0833 Epoch: 1 Global Step: 29180 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:11:45,718-Speed 5207.85 samples/sec Loss 5.8993 LearningRate 0.0833 Epoch: 1 Global Step: 29190 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:11:47,682-Speed 5216.88 samples/sec Loss 5.9319 LearningRate 0.0833 Epoch: 1 Global Step: 29200 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:11:49,657-Speed 5186.53 samples/sec Loss 5.8988 LearningRate 0.0833 Epoch: 1 Global Step: 29210 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:11:51,626-Speed 5204.41 samples/sec Loss 5.8919 LearningRate 0.0833 Epoch: 1 Global Step: 29220 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:11:53,596-Speed 5198.22 samples/sec Loss 5.9226 LearningRate 0.0833 Epoch: 1 Global Step: 29230 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:11:55,556-Speed 5225.22 samples/sec Loss 5.9957 LearningRate 0.0832 Epoch: 1 Global Step: 29240 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:11:57,523-Speed 5208.11 samples/sec Loss 5.9381 LearningRate 0.0832 Epoch: 1 Global Step: 29250 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:11:59,493-Speed 5202.03 samples/sec Loss 5.9282 LearningRate 0.0832 Epoch: 1 Global Step: 29260 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:12:01,460-Speed 5205.88 samples/sec Loss 5.9709 LearningRate 0.0832 Epoch: 1 Global Step: 29270 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:12:03,436-Speed 5184.42 samples/sec Loss 5.8939 LearningRate 0.0832 Epoch: 1 Global Step: 29280 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:12:05,411-Speed 5184.83 samples/sec Loss 5.9413 LearningRate 0.0832 Epoch: 1 Global Step: 29290 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:12:07,381-Speed 5201.67 samples/sec Loss 5.9901 LearningRate 0.0832 Epoch: 1 Global Step: 29300 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:12:09,347-Speed 5208.32 samples/sec Loss 5.9916 LearningRate 0.0832 Epoch: 1 Global Step: 29310 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:12:11,314-Speed 5208.84 samples/sec Loss 5.9355 LearningRate 0.0832 Epoch: 1 Global Step: 29320 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:12:13,282-Speed 5205.04 samples/sec Loss 5.9386 LearningRate 0.0832 Epoch: 1 Global Step: 29330 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:12:15,251-Speed 5202.15 samples/sec Loss 5.7854 LearningRate 0.0832 Epoch: 1 Global Step: 29340 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:12:17,218-Speed 5208.22 samples/sec Loss 5.9679 LearningRate 0.0832 Epoch: 1 Global Step: 29350 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:12:19,183-Speed 5212.66 samples/sec Loss 5.9310 LearningRate 0.0832 Epoch: 1 Global Step: 29360 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:12:21,153-Speed 5199.19 samples/sec Loss 5.9908 LearningRate 0.0832 Epoch: 1 Global Step: 29370 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:12:23,131-Speed 5179.67 samples/sec Loss 5.8532 LearningRate 0.0832 Epoch: 1 Global Step: 29380 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:12:25,099-Speed 5203.92 samples/sec Loss 5.8975 LearningRate 0.0832 Epoch: 1 Global Step: 29390 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:12:27,083-Speed 5164.66 samples/sec Loss 5.8568 LearningRate 0.0832 Epoch: 1 Global Step: 29400 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:12:29,055-Speed 5193.83 samples/sec Loss 5.9635 LearningRate 0.0832 Epoch: 1 Global Step: 29410 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:12:31,044-Speed 5148.68 samples/sec Loss 5.7685 LearningRate 0.0832 Epoch: 1 Global Step: 29420 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:12:33,019-Speed 5187.03 samples/sec Loss 5.7840 LearningRate 0.0831 Epoch: 1 Global Step: 29430 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:12:34,986-Speed 5208.72 samples/sec Loss 5.9083 LearningRate 0.0831 Epoch: 1 Global Step: 29440 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:12:36,979-Speed 5141.17 samples/sec Loss 5.8265 LearningRate 0.0831 Epoch: 1 Global Step: 29450 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:12:38,959-Speed 5173.39 samples/sec Loss 5.9214 LearningRate 0.0831 Epoch: 1 Global Step: 29460 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:12:40,924-Speed 5213.39 samples/sec Loss 5.9250 LearningRate 0.0831 Epoch: 1 Global Step: 29470 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:12:42,894-Speed 5196.91 samples/sec Loss 5.9189 LearningRate 0.0831 Epoch: 1 Global Step: 29480 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:12:44,863-Speed 5203.06 samples/sec Loss 5.9849 LearningRate 0.0831 Epoch: 1 Global Step: 29490 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:12:46,833-Speed 5199.23 samples/sec Loss 5.9355 LearningRate 0.0831 Epoch: 1 Global Step: 29500 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:12:48,804-Speed 5198.62 samples/sec Loss 5.8230 LearningRate 0.0831 Epoch: 1 Global Step: 29510 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:12:50,769-Speed 5210.54 samples/sec Loss 5.8679 LearningRate 0.0831 Epoch: 1 Global Step: 29520 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:12:52,752-Speed 5166.32 samples/sec Loss 5.8842 LearningRate 0.0831 Epoch: 1 Global Step: 29530 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:12:54,730-Speed 5178.72 samples/sec Loss 5.8432 LearningRate 0.0831 Epoch: 1 Global Step: 29540 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:12:56,699-Speed 5204.95 samples/sec Loss 5.9301 LearningRate 0.0831 Epoch: 1 Global Step: 29550 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:12:58,684-Speed 5158.92 samples/sec Loss 5.8482 LearningRate 0.0831 Epoch: 1 Global Step: 29560 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:13:00,650-Speed 5210.12 samples/sec Loss 6.0069 LearningRate 0.0831 Epoch: 1 Global Step: 29570 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:13:02,617-Speed 5206.79 samples/sec Loss 5.8835 LearningRate 0.0831 Epoch: 1 Global Step: 29580 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:13:04,597-Speed 5173.85 samples/sec Loss 5.8693 LearningRate 0.0831 Epoch: 1 Global Step: 29590 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:13:06,564-Speed 5207.52 samples/sec Loss 5.8037 LearningRate 0.0831 Epoch: 1 Global Step: 29600 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:13:08,544-Speed 5174.97 samples/sec Loss 5.8462 LearningRate 0.0830 Epoch: 1 Global Step: 29610 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:13:10,533-Speed 5150.22 samples/sec Loss 6.0245 LearningRate 0.0830 Epoch: 1 Global Step: 29620 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:13:12,524-Speed 5142.63 samples/sec Loss 5.9720 LearningRate 0.0830 Epoch: 1 Global Step: 29630 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:13:14,496-Speed 5194.41 samples/sec Loss 5.9806 LearningRate 0.0830 Epoch: 1 Global Step: 29640 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:13:16,470-Speed 5189.78 samples/sec Loss 5.8352 LearningRate 0.0830 Epoch: 1 Global Step: 29650 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:13:18,441-Speed 5199.26 samples/sec Loss 5.8903 LearningRate 0.0830 Epoch: 1 Global Step: 29660 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:13:20,409-Speed 5203.00 samples/sec Loss 5.7968 LearningRate 0.0830 Epoch: 1 Global Step: 29670 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-11 01:13:22,388-Speed 5177.18 samples/sec Loss 5.7769 LearningRate 0.0830 Epoch: 1 Global Step: 29680 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-11 01:13:24,351-Speed 5218.21 samples/sec Loss 5.8547 LearningRate 0.0830 Epoch: 1 Global Step: 29690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:13:26,322-Speed 5198.28 samples/sec Loss 5.8937 LearningRate 0.0830 Epoch: 1 Global Step: 29700 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:13:28,292-Speed 5198.48 samples/sec Loss 5.8478 LearningRate 0.0830 Epoch: 1 Global Step: 29710 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:13:30,258-Speed 5212.37 samples/sec Loss 5.8952 LearningRate 0.0830 Epoch: 1 Global Step: 29720 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:13:32,240-Speed 5165.86 samples/sec Loss 5.9809 LearningRate 0.0830 Epoch: 1 Global Step: 29730 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:13:34,226-Speed 5158.43 samples/sec Loss 5.8123 LearningRate 0.0830 Epoch: 1 Global Step: 29740 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:13:36,205-Speed 5177.32 samples/sec Loss 5.8215 LearningRate 0.0830 Epoch: 1 Global Step: 29750 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:13:38,179-Speed 5190.34 samples/sec Loss 5.7961 LearningRate 0.0830 Epoch: 1 Global Step: 29760 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:13:40,159-Speed 5172.27 samples/sec Loss 5.8654 LearningRate 0.0830 Epoch: 1 Global Step: 29770 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:13:42,128-Speed 5201.26 samples/sec Loss 5.8712 LearningRate 0.0830 Epoch: 1 Global Step: 29780 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:13:44,097-Speed 5203.84 samples/sec Loss 5.9445 LearningRate 0.0829 Epoch: 1 Global Step: 29790 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:13:46,077-Speed 5173.04 samples/sec Loss 5.8314 LearningRate 0.0829 Epoch: 1 Global Step: 29800 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:13:48,054-Speed 5181.10 samples/sec Loss 5.8851 LearningRate 0.0829 Epoch: 1 Global Step: 29810 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:13:50,024-Speed 5199.98 samples/sec Loss 5.8170 LearningRate 0.0829 Epoch: 1 Global Step: 29820 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:13:52,001-Speed 5180.11 samples/sec Loss 5.8442 LearningRate 0.0829 Epoch: 1 Global Step: 29830 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:13:53,969-Speed 5205.11 samples/sec Loss 5.8233 LearningRate 0.0829 Epoch: 1 Global Step: 29840 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:13:55,938-Speed 5203.22 samples/sec Loss 5.9285 LearningRate 0.0829 Epoch: 1 Global Step: 29850 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:13:57,916-Speed 5179.31 samples/sec Loss 5.7944 LearningRate 0.0829 Epoch: 1 Global Step: 29860 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:13:59,885-Speed 5202.86 samples/sec Loss 6.0005 LearningRate 0.0829 Epoch: 1 Global Step: 29870 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:14:01,858-Speed 5191.47 samples/sec Loss 5.8631 LearningRate 0.0829 Epoch: 1 Global Step: 29880 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:14:03,827-Speed 5202.22 samples/sec Loss 5.8565 LearningRate 0.0829 Epoch: 1 Global Step: 29890 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:14:05,789-Speed 5219.93 samples/sec Loss 5.7332 LearningRate 0.0829 Epoch: 1 Global Step: 29900 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:14:07,758-Speed 5202.10 samples/sec Loss 5.8537 LearningRate 0.0829 Epoch: 1 Global Step: 29910 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:14:09,727-Speed 5202.67 samples/sec Loss 5.8451 LearningRate 0.0829 Epoch: 1 Global Step: 29920 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:14:11,694-Speed 5207.16 samples/sec Loss 5.7673 LearningRate 0.0829 Epoch: 1 Global Step: 29930 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:14:13,665-Speed 5197.87 samples/sec Loss 5.8419 LearningRate 0.0829 Epoch: 1 Global Step: 29940 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:14:15,634-Speed 5201.86 samples/sec Loss 5.8661 LearningRate 0.0829 Epoch: 1 Global Step: 29950 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:14:17,601-Speed 5206.89 samples/sec Loss 5.8276 LearningRate 0.0829 Epoch: 1 Global Step: 29960 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:14:19,571-Speed 5199.72 samples/sec Loss 5.9115 LearningRate 0.0829 Epoch: 1 Global Step: 29970 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:14:21,544-Speed 5194.74 samples/sec Loss 5.8155 LearningRate 0.0828 Epoch: 1 Global Step: 29980 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:14:23,524-Speed 5173.20 samples/sec Loss 5.8871 LearningRate 0.0828 Epoch: 1 Global Step: 29990 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:14:25,526-Speed 5114.51 samples/sec Loss 6.0216 LearningRate 0.0828 Epoch: 1 Global Step: 30000 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:14:52,283-[lfw][30000]XNorm: 22.456418 Training: 2022-04-11 01:14:52,283-[lfw][30000]Accuracy-Flip: 0.99667+-0.00325 Training: 2022-04-11 01:14:52,284-[lfw][30000]Accuracy-Highest: 0.99700 Training: 2022-04-11 01:15:23,218-[cfp_fp][30000]XNorm: 20.503947 Training: 2022-04-11 01:15:23,219-[cfp_fp][30000]Accuracy-Flip: 0.97486+-0.00689 Training: 2022-04-11 01:15:23,219-[cfp_fp][30000]Accuracy-Highest: 0.97486 Training: 2022-04-11 01:15:49,988-[agedb_30][30000]XNorm: 22.566248 Training: 2022-04-11 01:15:49,989-[agedb_30][30000]Accuracy-Flip: 0.97333+-0.00715 Training: 2022-04-11 01:15:49,989-[agedb_30][30000]Accuracy-Highest: 0.97333 Training: 2022-04-11 01:15:51,965-Speed 118.47 samples/sec Loss 5.8477 LearningRate 0.0828 Epoch: 1 Global Step: 30010 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:15:53,917-Speed 5246.36 samples/sec Loss 5.8476 LearningRate 0.0828 Epoch: 1 Global Step: 30020 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:15:55,872-Speed 5238.69 samples/sec Loss 5.8441 LearningRate 0.0828 Epoch: 1 Global Step: 30030 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:15:57,843-Speed 5197.56 samples/sec Loss 5.7843 LearningRate 0.0828 Epoch: 1 Global Step: 30040 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:15:59,831-Speed 5152.57 samples/sec Loss 5.7654 LearningRate 0.0828 Epoch: 1 Global Step: 30050 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:16:01,790-Speed 5229.63 samples/sec Loss 5.9176 LearningRate 0.0828 Epoch: 1 Global Step: 30060 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:16:03,753-Speed 5218.16 samples/sec Loss 5.8767 LearningRate 0.0828 Epoch: 1 Global Step: 30070 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:16:05,716-Speed 5219.14 samples/sec Loss 5.8345 LearningRate 0.0828 Epoch: 1 Global Step: 30080 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:16:07,674-Speed 5232.02 samples/sec Loss 5.8296 LearningRate 0.0828 Epoch: 1 Global Step: 30090 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:16:09,635-Speed 5224.52 samples/sec Loss 5.8327 LearningRate 0.0828 Epoch: 1 Global Step: 30100 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:16:11,592-Speed 5233.15 samples/sec Loss 5.8624 LearningRate 0.0828 Epoch: 1 Global Step: 30110 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:16:13,553-Speed 5222.75 samples/sec Loss 5.7691 LearningRate 0.0828 Epoch: 1 Global Step: 30120 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:16:15,535-Speed 5170.16 samples/sec Loss 5.9173 LearningRate 0.0828 Epoch: 1 Global Step: 30130 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:16:17,493-Speed 5229.08 samples/sec Loss 5.9373 LearningRate 0.0828 Epoch: 1 Global Step: 30140 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:16:19,454-Speed 5225.58 samples/sec Loss 5.8562 LearningRate 0.0828 Epoch: 1 Global Step: 30150 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:16:21,415-Speed 5222.85 samples/sec Loss 5.7624 LearningRate 0.0827 Epoch: 1 Global Step: 30160 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:16:23,388-Speed 5190.14 samples/sec Loss 5.8040 LearningRate 0.0827 Epoch: 1 Global Step: 30170 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:16:25,363-Speed 5187.61 samples/sec Loss 5.9000 LearningRate 0.0827 Epoch: 1 Global Step: 30180 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:16:27,340-Speed 5180.78 samples/sec Loss 5.7391 LearningRate 0.0827 Epoch: 1 Global Step: 30190 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:16:29,312-Speed 5194.69 samples/sec Loss 5.7537 LearningRate 0.0827 Epoch: 1 Global Step: 30200 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:16:31,273-Speed 5225.03 samples/sec Loss 5.8093 LearningRate 0.0827 Epoch: 1 Global Step: 30210 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:16:33,240-Speed 5206.73 samples/sec Loss 5.8412 LearningRate 0.0827 Epoch: 1 Global Step: 30220 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:16:35,208-Speed 5204.99 samples/sec Loss 5.8057 LearningRate 0.0827 Epoch: 1 Global Step: 30230 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:16:37,189-Speed 5171.98 samples/sec Loss 5.8253 LearningRate 0.0827 Epoch: 1 Global Step: 30240 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:16:39,173-Speed 5161.02 samples/sec Loss 5.9147 LearningRate 0.0827 Epoch: 1 Global Step: 30250 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:16:41,146-Speed 5194.52 samples/sec Loss 5.9254 LearningRate 0.0827 Epoch: 1 Global Step: 30260 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:16:43,115-Speed 5200.31 samples/sec Loss 5.8008 LearningRate 0.0827 Epoch: 1 Global Step: 30270 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:16:45,095-Speed 5173.08 samples/sec Loss 5.8049 LearningRate 0.0827 Epoch: 1 Global Step: 30280 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:16:47,067-Speed 5195.82 samples/sec Loss 5.9172 LearningRate 0.0827 Epoch: 1 Global Step: 30290 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:16:49,062-Speed 5135.07 samples/sec Loss 5.8193 LearningRate 0.0827 Epoch: 1 Global Step: 30300 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:16:51,039-Speed 5180.63 samples/sec Loss 5.8283 LearningRate 0.0827 Epoch: 1 Global Step: 30310 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:16:53,011-Speed 5196.49 samples/sec Loss 5.8084 LearningRate 0.0827 Epoch: 1 Global Step: 30320 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:16:54,978-Speed 5205.93 samples/sec Loss 5.7697 LearningRate 0.0827 Epoch: 1 Global Step: 30330 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:16:56,930-Speed 5247.02 samples/sec Loss 5.9072 LearningRate 0.0826 Epoch: 1 Global Step: 30340 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:16:58,897-Speed 5207.07 samples/sec Loss 5.8433 LearningRate 0.0826 Epoch: 1 Global Step: 30350 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:17:00,861-Speed 5215.48 samples/sec Loss 5.8005 LearningRate 0.0826 Epoch: 1 Global Step: 30360 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:17:02,826-Speed 5214.16 samples/sec Loss 5.7692 LearningRate 0.0826 Epoch: 1 Global Step: 30370 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:17:04,792-Speed 5210.62 samples/sec Loss 5.6901 LearningRate 0.0826 Epoch: 1 Global Step: 30380 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:17:06,755-Speed 5216.77 samples/sec Loss 5.7539 LearningRate 0.0826 Epoch: 1 Global Step: 30390 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:17:08,724-Speed 5203.01 samples/sec Loss 5.8331 LearningRate 0.0826 Epoch: 1 Global Step: 30400 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:17:10,714-Speed 5149.14 samples/sec Loss 5.8178 LearningRate 0.0826 Epoch: 1 Global Step: 30410 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:17:12,679-Speed 5212.37 samples/sec Loss 5.8215 LearningRate 0.0826 Epoch: 1 Global Step: 30420 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:17:14,642-Speed 5217.01 samples/sec Loss 5.7468 LearningRate 0.0826 Epoch: 1 Global Step: 30430 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:17:16,606-Speed 5216.78 samples/sec Loss 5.8343 LearningRate 0.0826 Epoch: 1 Global Step: 30440 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:17:18,573-Speed 5206.77 samples/sec Loss 5.8960 LearningRate 0.0826 Epoch: 1 Global Step: 30450 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:17:20,537-Speed 5217.28 samples/sec Loss 5.8173 LearningRate 0.0826 Epoch: 1 Global Step: 30460 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:17:22,503-Speed 5210.16 samples/sec Loss 5.9380 LearningRate 0.0826 Epoch: 1 Global Step: 30470 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:17:24,468-Speed 5212.52 samples/sec Loss 5.8628 LearningRate 0.0826 Epoch: 1 Global Step: 30480 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:17:26,431-Speed 5216.41 samples/sec Loss 5.8192 LearningRate 0.0826 Epoch: 1 Global Step: 30490 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:17:28,394-Speed 5218.14 samples/sec Loss 5.7316 LearningRate 0.0826 Epoch: 1 Global Step: 30500 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:17:30,365-Speed 5199.32 samples/sec Loss 5.7310 LearningRate 0.0826 Epoch: 1 Global Step: 30510 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:17:32,329-Speed 5216.47 samples/sec Loss 5.7229 LearningRate 0.0826 Epoch: 1 Global Step: 30520 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:17:34,294-Speed 5211.82 samples/sec Loss 5.8067 LearningRate 0.0825 Epoch: 1 Global Step: 30530 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:17:36,255-Speed 5224.54 samples/sec Loss 5.8302 LearningRate 0.0825 Epoch: 1 Global Step: 30540 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:17:38,216-Speed 5222.30 samples/sec Loss 5.7634 LearningRate 0.0825 Epoch: 1 Global Step: 30550 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:17:40,176-Speed 5226.31 samples/sec Loss 5.7466 LearningRate 0.0825 Epoch: 1 Global Step: 30560 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:17:42,137-Speed 5224.92 samples/sec Loss 5.7697 LearningRate 0.0825 Epoch: 1 Global Step: 30570 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:17:44,103-Speed 5208.20 samples/sec Loss 5.9191 LearningRate 0.0825 Epoch: 1 Global Step: 30580 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:17:46,078-Speed 5187.76 samples/sec Loss 5.7587 LearningRate 0.0825 Epoch: 1 Global Step: 30590 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:17:48,042-Speed 5216.16 samples/sec Loss 5.7632 LearningRate 0.0825 Epoch: 1 Global Step: 30600 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:17:50,012-Speed 5199.08 samples/sec Loss 5.8234 LearningRate 0.0825 Epoch: 1 Global Step: 30610 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:17:51,991-Speed 5175.22 samples/sec Loss 5.7433 LearningRate 0.0825 Epoch: 1 Global Step: 30620 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:17:53,956-Speed 5213.48 samples/sec Loss 5.7499 LearningRate 0.0825 Epoch: 1 Global Step: 30630 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:17:55,918-Speed 5221.72 samples/sec Loss 5.7373 LearningRate 0.0825 Epoch: 1 Global Step: 30640 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-11 01:17:57,879-Speed 5223.76 samples/sec Loss 5.8187 LearningRate 0.0825 Epoch: 1 Global Step: 30650 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:17:59,836-Speed 5232.98 samples/sec Loss 5.7464 LearningRate 0.0825 Epoch: 1 Global Step: 30660 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:18:01,795-Speed 5228.86 samples/sec Loss 5.8366 LearningRate 0.0825 Epoch: 1 Global Step: 30670 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:18:03,770-Speed 5187.21 samples/sec Loss 5.8100 LearningRate 0.0825 Epoch: 1 Global Step: 30680 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:18:05,732-Speed 5220.59 samples/sec Loss 5.7524 LearningRate 0.0825 Epoch: 1 Global Step: 30690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:18:07,693-Speed 5224.48 samples/sec Loss 5.8346 LearningRate 0.0825 Epoch: 1 Global Step: 30700 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:18:09,675-Speed 5166.21 samples/sec Loss 5.7526 LearningRate 0.0824 Epoch: 1 Global Step: 30710 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:18:11,644-Speed 5203.12 samples/sec Loss 5.7925 LearningRate 0.0824 Epoch: 1 Global Step: 30720 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:18:13,619-Speed 5187.10 samples/sec Loss 5.7640 LearningRate 0.0824 Epoch: 1 Global Step: 30730 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:18:15,579-Speed 5226.55 samples/sec Loss 5.8747 LearningRate 0.0824 Epoch: 1 Global Step: 30740 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:18:17,539-Speed 5228.08 samples/sec Loss 5.8801 LearningRate 0.0824 Epoch: 1 Global Step: 30750 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-11 01:18:19,500-Speed 5223.43 samples/sec Loss 5.7823 LearningRate 0.0824 Epoch: 1 Global Step: 30760 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-11 01:18:21,467-Speed 5207.38 samples/sec Loss 5.9389 LearningRate 0.0824 Epoch: 1 Global Step: 30770 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:18:23,454-Speed 5154.06 samples/sec Loss 5.8815 LearningRate 0.0824 Epoch: 1 Global Step: 30780 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:18:25,424-Speed 5199.22 samples/sec Loss 5.7664 LearningRate 0.0824 Epoch: 1 Global Step: 30790 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:18:27,381-Speed 5234.11 samples/sec Loss 5.8318 LearningRate 0.0824 Epoch: 1 Global Step: 30800 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:18:29,348-Speed 5208.77 samples/sec Loss 5.8682 LearningRate 0.0824 Epoch: 1 Global Step: 30810 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:18:31,307-Speed 5227.51 samples/sec Loss 5.6972 LearningRate 0.0824 Epoch: 1 Global Step: 30820 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:18:33,280-Speed 5192.93 samples/sec Loss 5.7854 LearningRate 0.0824 Epoch: 1 Global Step: 30830 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:18:35,249-Speed 5202.01 samples/sec Loss 5.8549 LearningRate 0.0824 Epoch: 1 Global Step: 30840 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:18:37,214-Speed 5212.53 samples/sec Loss 5.7220 LearningRate 0.0824 Epoch: 1 Global Step: 30850 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:18:39,194-Speed 5174.92 samples/sec Loss 5.8180 LearningRate 0.0824 Epoch: 1 Global Step: 30860 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:18:41,160-Speed 5211.85 samples/sec Loss 5.8081 LearningRate 0.0824 Epoch: 1 Global Step: 30870 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:18:43,123-Speed 5216.85 samples/sec Loss 5.7720 LearningRate 0.0824 Epoch: 1 Global Step: 30880 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:18:45,094-Speed 5196.31 samples/sec Loss 5.7874 LearningRate 0.0823 Epoch: 1 Global Step: 30890 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:18:47,061-Speed 5208.09 samples/sec Loss 5.7427 LearningRate 0.0823 Epoch: 1 Global Step: 30900 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:18:49,023-Speed 5221.95 samples/sec Loss 5.7356 LearningRate 0.0823 Epoch: 1 Global Step: 30910 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:18:50,984-Speed 5221.60 samples/sec Loss 5.8500 LearningRate 0.0823 Epoch: 1 Global Step: 30920 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:18:52,951-Speed 5209.24 samples/sec Loss 5.7383 LearningRate 0.0823 Epoch: 1 Global Step: 30930 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:18:54,927-Speed 5183.61 samples/sec Loss 5.7975 LearningRate 0.0823 Epoch: 1 Global Step: 30940 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:18:56,891-Speed 5215.62 samples/sec Loss 5.8137 LearningRate 0.0823 Epoch: 1 Global Step: 30950 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:18:58,863-Speed 5194.94 samples/sec Loss 5.7433 LearningRate 0.0823 Epoch: 1 Global Step: 30960 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:19:00,824-Speed 5222.07 samples/sec Loss 5.7985 LearningRate 0.0823 Epoch: 1 Global Step: 30970 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:19:02,818-Speed 5139.27 samples/sec Loss 5.7179 LearningRate 0.0823 Epoch: 1 Global Step: 30980 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:19:04,790-Speed 5193.21 samples/sec Loss 5.7163 LearningRate 0.0823 Epoch: 1 Global Step: 30990 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:19:06,769-Speed 5176.78 samples/sec Loss 5.8563 LearningRate 0.0823 Epoch: 1 Global Step: 31000 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-11 01:19:08,726-Speed 5232.09 samples/sec Loss 5.7072 LearningRate 0.0823 Epoch: 1 Global Step: 31010 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:19:10,683-Speed 5235.23 samples/sec Loss 5.8477 LearningRate 0.0823 Epoch: 1 Global Step: 31020 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:19:12,648-Speed 5214.38 samples/sec Loss 5.7191 LearningRate 0.0823 Epoch: 1 Global Step: 31030 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:19:14,616-Speed 5204.00 samples/sec Loss 5.8303 LearningRate 0.0823 Epoch: 1 Global Step: 31040 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:19:16,600-Speed 5163.15 samples/sec Loss 5.7217 LearningRate 0.0823 Epoch: 1 Global Step: 31050 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:19:18,565-Speed 5212.47 samples/sec Loss 5.6942 LearningRate 0.0823 Epoch: 1 Global Step: 31060 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:19:20,526-Speed 5224.60 samples/sec Loss 5.6453 LearningRate 0.0823 Epoch: 1 Global Step: 31070 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:19:22,490-Speed 5214.37 samples/sec Loss 5.8146 LearningRate 0.0822 Epoch: 1 Global Step: 31080 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:19:24,453-Speed 5219.54 samples/sec Loss 5.7422 LearningRate 0.0822 Epoch: 1 Global Step: 31090 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:19:26,418-Speed 5212.26 samples/sec Loss 5.7935 LearningRate 0.0822 Epoch: 1 Global Step: 31100 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:19:28,398-Speed 5172.28 samples/sec Loss 5.7086 LearningRate 0.0822 Epoch: 1 Global Step: 31110 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:19:30,374-Speed 5183.48 samples/sec Loss 5.7039 LearningRate 0.0822 Epoch: 1 Global Step: 31120 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:19:32,353-Speed 5176.59 samples/sec Loss 5.6655 LearningRate 0.0822 Epoch: 1 Global Step: 31130 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:19:34,329-Speed 5186.13 samples/sec Loss 5.6912 LearningRate 0.0822 Epoch: 1 Global Step: 31140 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:19:36,311-Speed 5168.88 samples/sec Loss 5.7225 LearningRate 0.0822 Epoch: 1 Global Step: 31150 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:19:38,308-Speed 5128.30 samples/sec Loss 5.7636 LearningRate 0.0822 Epoch: 1 Global Step: 31160 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:19:40,271-Speed 5218.17 samples/sec Loss 5.6798 LearningRate 0.0822 Epoch: 1 Global Step: 31170 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:19:42,238-Speed 5207.84 samples/sec Loss 5.7907 LearningRate 0.0822 Epoch: 1 Global Step: 31180 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:19:44,213-Speed 5185.38 samples/sec Loss 5.7572 LearningRate 0.0822 Epoch: 1 Global Step: 31190 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:19:46,178-Speed 5213.91 samples/sec Loss 5.7626 LearningRate 0.0822 Epoch: 1 Global Step: 31200 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:19:48,141-Speed 5218.09 samples/sec Loss 5.7107 LearningRate 0.0822 Epoch: 1 Global Step: 31210 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:19:50,122-Speed 5170.90 samples/sec Loss 5.6736 LearningRate 0.0822 Epoch: 1 Global Step: 31220 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:19:52,084-Speed 5219.88 samples/sec Loss 5.7873 LearningRate 0.0822 Epoch: 1 Global Step: 31230 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:19:54,051-Speed 5207.61 samples/sec Loss 5.7628 LearningRate 0.0822 Epoch: 1 Global Step: 31240 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:19:56,031-Speed 5173.80 samples/sec Loss 5.8755 LearningRate 0.0822 Epoch: 1 Global Step: 31250 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:19:58,008-Speed 5182.47 samples/sec Loss 5.8403 LearningRate 0.0821 Epoch: 1 Global Step: 31260 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:19:59,985-Speed 5179.78 samples/sec Loss 5.7972 LearningRate 0.0821 Epoch: 1 Global Step: 31270 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:20:01,962-Speed 5182.21 samples/sec Loss 5.6790 LearningRate 0.0821 Epoch: 1 Global Step: 31280 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:20:03,928-Speed 5209.91 samples/sec Loss 5.7602 LearningRate 0.0821 Epoch: 1 Global Step: 31290 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:20:05,898-Speed 5201.24 samples/sec Loss 5.7507 LearningRate 0.0821 Epoch: 1 Global Step: 31300 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:20:07,870-Speed 5193.72 samples/sec Loss 5.8094 LearningRate 0.0821 Epoch: 1 Global Step: 31310 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:20:09,845-Speed 5185.45 samples/sec Loss 5.7070 LearningRate 0.0821 Epoch: 1 Global Step: 31320 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:20:11,824-Speed 5177.58 samples/sec Loss 5.6759 LearningRate 0.0821 Epoch: 1 Global Step: 31330 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:20:13,805-Speed 5170.39 samples/sec Loss 5.7176 LearningRate 0.0821 Epoch: 1 Global Step: 31340 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:20:15,783-Speed 5179.75 samples/sec Loss 5.7668 LearningRate 0.0821 Epoch: 1 Global Step: 31350 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:20:17,768-Speed 5158.41 samples/sec Loss 5.6580 LearningRate 0.0821 Epoch: 1 Global Step: 31360 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:20:19,732-Speed 5217.56 samples/sec Loss 5.6427 LearningRate 0.0821 Epoch: 1 Global Step: 31370 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:20:21,701-Speed 5202.44 samples/sec Loss 5.7369 LearningRate 0.0821 Epoch: 1 Global Step: 31380 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:20:23,674-Speed 5190.08 samples/sec Loss 5.8134 LearningRate 0.0821 Epoch: 1 Global Step: 31390 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:20:25,642-Speed 5205.31 samples/sec Loss 5.7585 LearningRate 0.0821 Epoch: 1 Global Step: 31400 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:20:27,614-Speed 5195.64 samples/sec Loss 5.7301 LearningRate 0.0821 Epoch: 1 Global Step: 31410 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:20:29,582-Speed 5205.90 samples/sec Loss 5.7314 LearningRate 0.0821 Epoch: 1 Global Step: 31420 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:20:31,547-Speed 5211.83 samples/sec Loss 5.7920 LearningRate 0.0821 Epoch: 1 Global Step: 31430 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:20:33,509-Speed 5221.66 samples/sec Loss 5.6639 LearningRate 0.0821 Epoch: 1 Global Step: 31440 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:20:35,492-Speed 5165.06 samples/sec Loss 5.7150 LearningRate 0.0820 Epoch: 1 Global Step: 31450 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:20:37,469-Speed 5180.18 samples/sec Loss 5.8293 LearningRate 0.0820 Epoch: 1 Global Step: 31460 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:20:39,442-Speed 5193.44 samples/sec Loss 5.6509 LearningRate 0.0820 Epoch: 1 Global Step: 31470 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:20:41,413-Speed 5196.24 samples/sec Loss 5.7639 LearningRate 0.0820 Epoch: 1 Global Step: 31480 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:20:43,391-Speed 5179.72 samples/sec Loss 5.7279 LearningRate 0.0820 Epoch: 1 Global Step: 31490 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:20:45,354-Speed 5216.84 samples/sec Loss 5.7740 LearningRate 0.0820 Epoch: 1 Global Step: 31500 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:20:47,323-Speed 5202.10 samples/sec Loss 5.6910 LearningRate 0.0820 Epoch: 1 Global Step: 31510 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:20:49,299-Speed 5184.96 samples/sec Loss 5.6367 LearningRate 0.0820 Epoch: 1 Global Step: 31520 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:20:51,257-Speed 5232.24 samples/sec Loss 5.7234 LearningRate 0.0820 Epoch: 1 Global Step: 31530 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:20:53,223-Speed 5209.44 samples/sec Loss 5.7566 LearningRate 0.0820 Epoch: 1 Global Step: 31540 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:20:55,186-Speed 5218.12 samples/sec Loss 5.6630 LearningRate 0.0820 Epoch: 1 Global Step: 31550 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:20:57,157-Speed 5196.02 samples/sec Loss 5.6780 LearningRate 0.0820 Epoch: 1 Global Step: 31560 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:20:59,131-Speed 5190.63 samples/sec Loss 5.5798 LearningRate 0.0820 Epoch: 1 Global Step: 31570 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:21:01,106-Speed 5186.05 samples/sec Loss 5.6942 LearningRate 0.0820 Epoch: 1 Global Step: 31580 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:21:03,082-Speed 5184.84 samples/sec Loss 5.7736 LearningRate 0.0820 Epoch: 1 Global Step: 31590 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:21:05,053-Speed 5196.80 samples/sec Loss 5.7788 LearningRate 0.0820 Epoch: 1 Global Step: 31600 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:21:07,030-Speed 5180.81 samples/sec Loss 5.7167 LearningRate 0.0820 Epoch: 1 Global Step: 31610 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:21:09,005-Speed 5187.56 samples/sec Loss 5.7424 LearningRate 0.0820 Epoch: 1 Global Step: 31620 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:21:10,989-Speed 5163.60 samples/sec Loss 5.6844 LearningRate 0.0819 Epoch: 1 Global Step: 31630 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:21:12,957-Speed 5205.83 samples/sec Loss 5.5990 LearningRate 0.0819 Epoch: 1 Global Step: 31640 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:21:14,935-Speed 5177.13 samples/sec Loss 5.8277 LearningRate 0.0819 Epoch: 1 Global Step: 31650 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:21:16,900-Speed 5213.26 samples/sec Loss 5.7093 LearningRate 0.0819 Epoch: 1 Global Step: 31660 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:21:18,868-Speed 5206.01 samples/sec Loss 5.5947 LearningRate 0.0819 Epoch: 1 Global Step: 31670 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:21:20,833-Speed 5213.19 samples/sec Loss 5.6718 LearningRate 0.0819 Epoch: 1 Global Step: 31680 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:21:22,802-Speed 5201.52 samples/sec Loss 5.7863 LearningRate 0.0819 Epoch: 1 Global Step: 31690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:21:24,766-Speed 5215.03 samples/sec Loss 5.7072 LearningRate 0.0819 Epoch: 1 Global Step: 31700 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:21:26,732-Speed 5209.50 samples/sec Loss 5.7308 LearningRate 0.0819 Epoch: 1 Global Step: 31710 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:21:28,710-Speed 5179.72 samples/sec Loss 5.6060 LearningRate 0.0819 Epoch: 1 Global Step: 31720 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:21:30,668-Speed 5230.23 samples/sec Loss 5.7520 LearningRate 0.0819 Epoch: 1 Global Step: 31730 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:21:32,634-Speed 5212.54 samples/sec Loss 5.6583 LearningRate 0.0819 Epoch: 1 Global Step: 31740 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:21:34,596-Speed 5219.38 samples/sec Loss 5.8109 LearningRate 0.0819 Epoch: 1 Global Step: 31750 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:21:36,565-Speed 5202.49 samples/sec Loss 5.7310 LearningRate 0.0819 Epoch: 1 Global Step: 31760 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:21:38,540-Speed 5188.36 samples/sec Loss 5.7289 LearningRate 0.0819 Epoch: 1 Global Step: 31770 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:21:40,505-Speed 5213.24 samples/sec Loss 5.6230 LearningRate 0.0819 Epoch: 1 Global Step: 31780 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:21:42,482-Speed 5181.25 samples/sec Loss 5.6427 LearningRate 0.0819 Epoch: 1 Global Step: 31790 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:21:44,454-Speed 5193.59 samples/sec Loss 5.6450 LearningRate 0.0819 Epoch: 1 Global Step: 31800 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:21:46,427-Speed 5191.46 samples/sec Loss 5.6388 LearningRate 0.0818 Epoch: 1 Global Step: 31810 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:21:48,402-Speed 5186.50 samples/sec Loss 5.7175 LearningRate 0.0818 Epoch: 1 Global Step: 31820 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:21:50,365-Speed 5218.63 samples/sec Loss 5.6485 LearningRate 0.0818 Epoch: 1 Global Step: 31830 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:21:52,335-Speed 5199.66 samples/sec Loss 5.7577 LearningRate 0.0818 Epoch: 1 Global Step: 31840 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:21:54,301-Speed 5211.05 samples/sec Loss 5.7197 LearningRate 0.0818 Epoch: 1 Global Step: 31850 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:21:56,265-Speed 5214.39 samples/sec Loss 5.7018 LearningRate 0.0818 Epoch: 1 Global Step: 31860 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:21:58,230-Speed 5214.41 samples/sec Loss 5.7018 LearningRate 0.0818 Epoch: 1 Global Step: 31870 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:22:00,198-Speed 5202.99 samples/sec Loss 5.7801 LearningRate 0.0818 Epoch: 1 Global Step: 31880 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:22:02,166-Speed 5207.09 samples/sec Loss 5.7253 LearningRate 0.0818 Epoch: 1 Global Step: 31890 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:22:04,139-Speed 5191.15 samples/sec Loss 5.6988 LearningRate 0.0818 Epoch: 1 Global Step: 31900 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:22:06,105-Speed 5209.16 samples/sec Loss 5.6186 LearningRate 0.0818 Epoch: 1 Global Step: 31910 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:22:08,069-Speed 5216.73 samples/sec Loss 5.7827 LearningRate 0.0818 Epoch: 1 Global Step: 31920 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:22:10,039-Speed 5197.50 samples/sec Loss 5.7150 LearningRate 0.0818 Epoch: 1 Global Step: 31930 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:22:12,008-Speed 5202.87 samples/sec Loss 5.8144 LearningRate 0.0818 Epoch: 1 Global Step: 31940 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:22:13,976-Speed 5205.51 samples/sec Loss 5.7127 LearningRate 0.0818 Epoch: 1 Global Step: 31950 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:22:15,949-Speed 5193.64 samples/sec Loss 5.7421 LearningRate 0.0818 Epoch: 1 Global Step: 31960 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:22:17,925-Speed 5184.49 samples/sec Loss 5.6935 LearningRate 0.0818 Epoch: 1 Global Step: 31970 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:22:19,889-Speed 5215.41 samples/sec Loss 5.7098 LearningRate 0.0818 Epoch: 1 Global Step: 31980 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:22:21,858-Speed 5201.97 samples/sec Loss 5.6316 LearningRate 0.0818 Epoch: 1 Global Step: 31990 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:22:23,832-Speed 5188.67 samples/sec Loss 5.6559 LearningRate 0.0817 Epoch: 1 Global Step: 32000 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:22:50,560-[lfw][32000]XNorm: 23.020224 Training: 2022-04-11 01:22:50,561-[lfw][32000]Accuracy-Flip: 0.99717+-0.00279 Training: 2022-04-11 01:22:50,561-[lfw][32000]Accuracy-Highest: 0.99717 Training: 2022-04-11 01:23:21,364-[cfp_fp][32000]XNorm: 20.306396 Training: 2022-04-11 01:23:21,365-[cfp_fp][32000]Accuracy-Flip: 0.97286+-0.00550 Training: 2022-04-11 01:23:21,365-[cfp_fp][32000]Accuracy-Highest: 0.97486 Training: 2022-04-11 01:23:47,842-[agedb_30][32000]XNorm: 22.464172 Training: 2022-04-11 01:23:47,843-[agedb_30][32000]Accuracy-Flip: 0.97117+-0.00789 Training: 2022-04-11 01:23:47,843-[agedb_30][32000]Accuracy-Highest: 0.97333 Training: 2022-04-11 01:23:49,819-Speed 119.09 samples/sec Loss 5.6980 LearningRate 0.0817 Epoch: 1 Global Step: 32010 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:23:51,774-Speed 5238.67 samples/sec Loss 5.7409 LearningRate 0.0817 Epoch: 1 Global Step: 32020 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:23:53,741-Speed 5208.18 samples/sec Loss 5.6439 LearningRate 0.0817 Epoch: 1 Global Step: 32030 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:23:55,700-Speed 5228.70 samples/sec Loss 5.6729 LearningRate 0.0817 Epoch: 1 Global Step: 32040 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:23:57,661-Speed 5224.51 samples/sec Loss 5.7164 LearningRate 0.0817 Epoch: 1 Global Step: 32050 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:23:59,618-Speed 5233.22 samples/sec Loss 5.7320 LearningRate 0.0817 Epoch: 1 Global Step: 32060 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:24:01,580-Speed 5220.55 samples/sec Loss 5.6292 LearningRate 0.0817 Epoch: 1 Global Step: 32070 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:24:03,546-Speed 5210.12 samples/sec Loss 5.7152 LearningRate 0.0817 Epoch: 1 Global Step: 32080 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:24:05,509-Speed 5220.24 samples/sec Loss 5.6641 LearningRate 0.0817 Epoch: 1 Global Step: 32090 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:24:07,464-Speed 5239.50 samples/sec Loss 5.7746 LearningRate 0.0817 Epoch: 1 Global Step: 32100 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:24:09,424-Speed 5224.46 samples/sec Loss 5.6913 LearningRate 0.0817 Epoch: 1 Global Step: 32110 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:24:11,390-Speed 5211.02 samples/sec Loss 5.5886 LearningRate 0.0817 Epoch: 1 Global Step: 32120 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:24:13,354-Speed 5214.46 samples/sec Loss 5.7507 LearningRate 0.0817 Epoch: 1 Global Step: 32130 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:24:15,333-Speed 5177.01 samples/sec Loss 5.6734 LearningRate 0.0817 Epoch: 1 Global Step: 32140 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:24:17,294-Speed 5224.02 samples/sec Loss 5.6403 LearningRate 0.0817 Epoch: 1 Global Step: 32150 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:24:19,257-Speed 5218.63 samples/sec Loss 5.6364 LearningRate 0.0817 Epoch: 1 Global Step: 32160 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:24:21,234-Speed 5181.08 samples/sec Loss 5.6344 LearningRate 0.0817 Epoch: 1 Global Step: 32170 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:24:23,222-Speed 5154.11 samples/sec Loss 5.5874 LearningRate 0.0816 Epoch: 1 Global Step: 32180 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:24:25,198-Speed 5182.81 samples/sec Loss 5.6615 LearningRate 0.0816 Epoch: 1 Global Step: 32190 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:24:27,187-Speed 5149.82 samples/sec Loss 5.6178 LearningRate 0.0816 Epoch: 1 Global Step: 32200 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:24:29,156-Speed 5203.32 samples/sec Loss 5.6472 LearningRate 0.0816 Epoch: 1 Global Step: 32210 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:24:31,122-Speed 5209.50 samples/sec Loss 5.7124 LearningRate 0.0816 Epoch: 1 Global Step: 32220 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:24:33,095-Speed 5192.90 samples/sec Loss 5.7196 LearningRate 0.0816 Epoch: 1 Global Step: 32230 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:24:35,064-Speed 5203.70 samples/sec Loss 5.7677 LearningRate 0.0816 Epoch: 1 Global Step: 32240 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:24:37,033-Speed 5200.73 samples/sec Loss 5.6569 LearningRate 0.0816 Epoch: 1 Global Step: 32250 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:24:39,009-Speed 5183.15 samples/sec Loss 5.6236 LearningRate 0.0816 Epoch: 1 Global Step: 32260 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:24:40,979-Speed 5200.38 samples/sec Loss 5.7113 LearningRate 0.0816 Epoch: 1 Global Step: 32270 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:24:42,964-Speed 5161.08 samples/sec Loss 5.6589 LearningRate 0.0816 Epoch: 1 Global Step: 32280 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:24:44,936-Speed 5195.97 samples/sec Loss 5.5511 LearningRate 0.0816 Epoch: 1 Global Step: 32290 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:24:46,933-Speed 5127.95 samples/sec Loss 5.6443 LearningRate 0.0816 Epoch: 1 Global Step: 32300 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:24:48,905-Speed 5196.50 samples/sec Loss 5.6242 LearningRate 0.0816 Epoch: 1 Global Step: 32310 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:24:50,875-Speed 5199.39 samples/sec Loss 5.5838 LearningRate 0.0816 Epoch: 1 Global Step: 32320 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:24:52,859-Speed 5161.58 samples/sec Loss 5.5248 LearningRate 0.0816 Epoch: 1 Global Step: 32330 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:24:54,831-Speed 5193.35 samples/sec Loss 5.6407 LearningRate 0.0816 Epoch: 1 Global Step: 32340 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:24:56,800-Speed 5204.88 samples/sec Loss 5.6288 LearningRate 0.0816 Epoch: 1 Global Step: 32350 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:24:58,778-Speed 5177.93 samples/sec Loss 5.7088 LearningRate 0.0816 Epoch: 1 Global Step: 32360 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:25:00,767-Speed 5149.40 samples/sec Loss 5.7197 LearningRate 0.0815 Epoch: 1 Global Step: 32370 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:25:02,751-Speed 5163.48 samples/sec Loss 5.6175 LearningRate 0.0815 Epoch: 1 Global Step: 32380 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:25:04,719-Speed 5205.77 samples/sec Loss 5.7374 LearningRate 0.0815 Epoch: 1 Global Step: 32390 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:25:06,687-Speed 5204.97 samples/sec Loss 5.6273 LearningRate 0.0815 Epoch: 1 Global Step: 32400 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:25:08,657-Speed 5199.44 samples/sec Loss 5.5969 LearningRate 0.0815 Epoch: 1 Global Step: 32410 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:25:10,629-Speed 5193.21 samples/sec Loss 5.6443 LearningRate 0.0815 Epoch: 1 Global Step: 32420 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:25:12,604-Speed 5188.33 samples/sec Loss 5.6743 LearningRate 0.0815 Epoch: 1 Global Step: 32430 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:25:14,605-Speed 5118.27 samples/sec Loss 5.6646 LearningRate 0.0815 Epoch: 1 Global Step: 32440 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:25:16,586-Speed 5170.48 samples/sec Loss 5.7763 LearningRate 0.0815 Epoch: 1 Global Step: 32450 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:25:18,555-Speed 5203.40 samples/sec Loss 5.6848 LearningRate 0.0815 Epoch: 1 Global Step: 32460 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:25:20,523-Speed 5204.76 samples/sec Loss 5.6480 LearningRate 0.0815 Epoch: 1 Global Step: 32470 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:25:22,489-Speed 5209.28 samples/sec Loss 5.7842 LearningRate 0.0815 Epoch: 1 Global Step: 32480 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:25:24,468-Speed 5176.80 samples/sec Loss 5.6626 LearningRate 0.0815 Epoch: 1 Global Step: 32490 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:25:26,435-Speed 5208.77 samples/sec Loss 5.7569 LearningRate 0.0815 Epoch: 1 Global Step: 32500 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:25:28,399-Speed 5213.97 samples/sec Loss 5.6878 LearningRate 0.0815 Epoch: 1 Global Step: 32510 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:25:30,362-Speed 5220.07 samples/sec Loss 5.6450 LearningRate 0.0815 Epoch: 1 Global Step: 32520 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:25:32,328-Speed 5209.43 samples/sec Loss 5.6833 LearningRate 0.0815 Epoch: 1 Global Step: 32530 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:25:34,293-Speed 5211.93 samples/sec Loss 5.6584 LearningRate 0.0815 Epoch: 1 Global Step: 32540 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:25:36,259-Speed 5210.37 samples/sec Loss 5.6585 LearningRate 0.0814 Epoch: 1 Global Step: 32550 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:25:38,230-Speed 5198.59 samples/sec Loss 5.5530 LearningRate 0.0814 Epoch: 1 Global Step: 32560 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:25:40,206-Speed 5182.25 samples/sec Loss 5.6543 LearningRate 0.0814 Epoch: 1 Global Step: 32570 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:25:42,177-Speed 5196.37 samples/sec Loss 5.5838 LearningRate 0.0814 Epoch: 1 Global Step: 32580 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:25:44,144-Speed 5208.28 samples/sec Loss 5.6924 LearningRate 0.0814 Epoch: 1 Global Step: 32590 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:25:46,112-Speed 5205.77 samples/sec Loss 5.5985 LearningRate 0.0814 Epoch: 1 Global Step: 32600 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:25:48,072-Speed 5225.91 samples/sec Loss 5.5237 LearningRate 0.0814 Epoch: 1 Global Step: 32610 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:25:50,037-Speed 5212.67 samples/sec Loss 5.6530 LearningRate 0.0814 Epoch: 1 Global Step: 32620 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:25:52,004-Speed 5209.84 samples/sec Loss 5.5866 LearningRate 0.0814 Epoch: 1 Global Step: 32630 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:25:53,974-Speed 5197.10 samples/sec Loss 5.6047 LearningRate 0.0814 Epoch: 1 Global Step: 32640 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:25:55,938-Speed 5216.89 samples/sec Loss 5.5890 LearningRate 0.0814 Epoch: 1 Global Step: 32650 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:25:57,914-Speed 5182.67 samples/sec Loss 5.7378 LearningRate 0.0814 Epoch: 1 Global Step: 32660 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:25:59,894-Speed 5173.96 samples/sec Loss 5.5276 LearningRate 0.0814 Epoch: 1 Global Step: 32670 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:26:01,898-Speed 5111.08 samples/sec Loss 5.6316 LearningRate 0.0814 Epoch: 1 Global Step: 32680 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:26:03,878-Speed 5174.38 samples/sec Loss 5.7982 LearningRate 0.0814 Epoch: 1 Global Step: 32690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:26:05,870-Speed 5141.62 samples/sec Loss 5.6400 LearningRate 0.0814 Epoch: 1 Global Step: 32700 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:26:07,831-Speed 5224.64 samples/sec Loss 5.6575 LearningRate 0.0814 Epoch: 1 Global Step: 32710 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:26:09,792-Speed 5222.54 samples/sec Loss 5.6769 LearningRate 0.0814 Epoch: 1 Global Step: 32720 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:26:11,755-Speed 5220.15 samples/sec Loss 5.6371 LearningRate 0.0814 Epoch: 1 Global Step: 32730 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:26:13,717-Speed 5220.27 samples/sec Loss 5.6128 LearningRate 0.0813 Epoch: 1 Global Step: 32740 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:26:15,681-Speed 5214.46 samples/sec Loss 5.5658 LearningRate 0.0813 Epoch: 1 Global Step: 32750 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:26:17,654-Speed 5191.52 samples/sec Loss 5.4811 LearningRate 0.0813 Epoch: 1 Global Step: 32760 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:26:19,617-Speed 5218.04 samples/sec Loss 5.6488 LearningRate 0.0813 Epoch: 1 Global Step: 32770 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:26:21,579-Speed 5222.50 samples/sec Loss 5.6328 LearningRate 0.0813 Epoch: 1 Global Step: 32780 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:26:23,565-Speed 5156.36 samples/sec Loss 5.5345 LearningRate 0.0813 Epoch: 1 Global Step: 32790 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:26:25,550-Speed 5160.27 samples/sec Loss 5.6344 LearningRate 0.0813 Epoch: 1 Global Step: 32800 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:26:27,533-Speed 5167.50 samples/sec Loss 5.5604 LearningRate 0.0813 Epoch: 1 Global Step: 32810 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:26:29,505-Speed 5193.39 samples/sec Loss 5.6135 LearningRate 0.0813 Epoch: 1 Global Step: 32820 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:26:31,468-Speed 5218.83 samples/sec Loss 5.5680 LearningRate 0.0813 Epoch: 1 Global Step: 32830 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:26:33,427-Speed 5229.23 samples/sec Loss 5.6104 LearningRate 0.0813 Epoch: 1 Global Step: 32840 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:26:35,388-Speed 5224.18 samples/sec Loss 5.5956 LearningRate 0.0813 Epoch: 1 Global Step: 32850 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:26:37,356-Speed 5204.56 samples/sec Loss 5.5382 LearningRate 0.0813 Epoch: 1 Global Step: 32860 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:26:39,325-Speed 5202.89 samples/sec Loss 5.5501 LearningRate 0.0813 Epoch: 1 Global Step: 32870 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:26:41,299-Speed 5187.35 samples/sec Loss 5.7033 LearningRate 0.0813 Epoch: 1 Global Step: 32880 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:26:43,282-Speed 5166.85 samples/sec Loss 5.5543 LearningRate 0.0813 Epoch: 1 Global Step: 32890 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:26:45,273-Speed 5143.13 samples/sec Loss 5.5615 LearningRate 0.0813 Epoch: 1 Global Step: 32900 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:26:47,248-Speed 5186.38 samples/sec Loss 5.5153 LearningRate 0.0813 Epoch: 1 Global Step: 32910 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:26:49,223-Speed 5187.60 samples/sec Loss 5.5832 LearningRate 0.0812 Epoch: 1 Global Step: 32920 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:26:51,189-Speed 5210.75 samples/sec Loss 5.6634 LearningRate 0.0812 Epoch: 1 Global Step: 32930 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:26:53,168-Speed 5175.05 samples/sec Loss 5.5929 LearningRate 0.0812 Epoch: 1 Global Step: 32940 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:26:55,130-Speed 5221.86 samples/sec Loss 5.5424 LearningRate 0.0812 Epoch: 1 Global Step: 32950 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:26:57,100-Speed 5200.50 samples/sec Loss 5.5299 LearningRate 0.0812 Epoch: 1 Global Step: 32960 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:26:59,064-Speed 5214.84 samples/sec Loss 5.5476 LearningRate 0.0812 Epoch: 1 Global Step: 32970 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:27:01,035-Speed 5198.03 samples/sec Loss 5.5145 LearningRate 0.0812 Epoch: 1 Global Step: 32980 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:27:03,004-Speed 5201.85 samples/sec Loss 5.6330 LearningRate 0.0812 Epoch: 1 Global Step: 32990 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:27:04,984-Speed 5174.12 samples/sec Loss 5.5589 LearningRate 0.0812 Epoch: 1 Global Step: 33000 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:27:06,957-Speed 5190.43 samples/sec Loss 5.5647 LearningRate 0.0812 Epoch: 1 Global Step: 33010 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:27:08,936-Speed 5176.03 samples/sec Loss 5.5567 LearningRate 0.0812 Epoch: 1 Global Step: 33020 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:27:10,926-Speed 5148.90 samples/sec Loss 5.5299 LearningRate 0.0812 Epoch: 1 Global Step: 33030 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:27:12,887-Speed 5223.63 samples/sec Loss 5.5879 LearningRate 0.0812 Epoch: 1 Global Step: 33040 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:27:14,861-Speed 5188.80 samples/sec Loss 5.6265 LearningRate 0.0812 Epoch: 1 Global Step: 33050 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:27:16,824-Speed 5217.07 samples/sec Loss 5.4901 LearningRate 0.0812 Epoch: 1 Global Step: 33060 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:27:18,806-Speed 5169.24 samples/sec Loss 5.5162 LearningRate 0.0812 Epoch: 1 Global Step: 33070 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:27:20,777-Speed 5195.91 samples/sec Loss 5.6096 LearningRate 0.0812 Epoch: 1 Global Step: 33080 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:27:22,752-Speed 5187.84 samples/sec Loss 5.5197 LearningRate 0.0812 Epoch: 1 Global Step: 33090 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:27:24,718-Speed 5210.74 samples/sec Loss 5.5350 LearningRate 0.0812 Epoch: 1 Global Step: 33100 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:27:26,687-Speed 5201.48 samples/sec Loss 5.5833 LearningRate 0.0811 Epoch: 1 Global Step: 33110 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:27:28,661-Speed 5188.29 samples/sec Loss 5.6580 LearningRate 0.0811 Epoch: 1 Global Step: 33120 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:27:30,632-Speed 5198.35 samples/sec Loss 5.5386 LearningRate 0.0811 Epoch: 1 Global Step: 33130 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:27:32,595-Speed 5218.05 samples/sec Loss 5.5300 LearningRate 0.0811 Epoch: 1 Global Step: 33140 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:27:34,563-Speed 5205.45 samples/sec Loss 5.5016 LearningRate 0.0811 Epoch: 1 Global Step: 33150 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:27:36,553-Speed 5147.33 samples/sec Loss 5.5675 LearningRate 0.0811 Epoch: 1 Global Step: 33160 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:27:38,518-Speed 5213.02 samples/sec Loss 5.5957 LearningRate 0.0811 Epoch: 1 Global Step: 33170 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:27:40,502-Speed 5162.52 samples/sec Loss 5.6183 LearningRate 0.0811 Epoch: 1 Global Step: 33180 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:27:42,467-Speed 5211.64 samples/sec Loss 5.6011 LearningRate 0.0811 Epoch: 1 Global Step: 33190 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:27:44,432-Speed 5213.03 samples/sec Loss 5.6933 LearningRate 0.0811 Epoch: 1 Global Step: 33200 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:27:46,411-Speed 5176.56 samples/sec Loss 5.5859 LearningRate 0.0811 Epoch: 1 Global Step: 33210 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:27:48,398-Speed 5154.08 samples/sec Loss 5.6529 LearningRate 0.0811 Epoch: 1 Global Step: 33220 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:27:50,363-Speed 5213.65 samples/sec Loss 5.6284 LearningRate 0.0811 Epoch: 1 Global Step: 33230 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:27:52,348-Speed 5160.63 samples/sec Loss 5.6420 LearningRate 0.0811 Epoch: 1 Global Step: 33240 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:27:54,328-Speed 5174.11 samples/sec Loss 5.5518 LearningRate 0.0811 Epoch: 1 Global Step: 33250 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:27:56,317-Speed 5150.27 samples/sec Loss 5.6385 LearningRate 0.0811 Epoch: 1 Global Step: 33260 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:27:58,303-Speed 5157.93 samples/sec Loss 5.5916 LearningRate 0.0811 Epoch: 1 Global Step: 33270 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:28:00,266-Speed 5217.60 samples/sec Loss 5.6161 LearningRate 0.0811 Epoch: 1 Global Step: 33280 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:28:02,241-Speed 5187.24 samples/sec Loss 5.6488 LearningRate 0.0810 Epoch: 1 Global Step: 33290 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:28:04,226-Speed 5159.67 samples/sec Loss 5.5948 LearningRate 0.0810 Epoch: 1 Global Step: 33300 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:28:06,195-Speed 5202.05 samples/sec Loss 5.6099 LearningRate 0.0810 Epoch: 1 Global Step: 33310 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:28:08,159-Speed 5216.70 samples/sec Loss 5.5245 LearningRate 0.0810 Epoch: 1 Global Step: 33320 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:28:10,129-Speed 5199.90 samples/sec Loss 5.6421 LearningRate 0.0810 Epoch: 1 Global Step: 33330 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:28:12,098-Speed 5200.43 samples/sec Loss 5.5126 LearningRate 0.0810 Epoch: 1 Global Step: 33340 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:28:14,088-Speed 5148.48 samples/sec Loss 5.5484 LearningRate 0.0810 Epoch: 1 Global Step: 33350 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:28:16,057-Speed 5203.59 samples/sec Loss 5.5283 LearningRate 0.0810 Epoch: 1 Global Step: 33360 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:28:18,023-Speed 5208.82 samples/sec Loss 5.5295 LearningRate 0.0810 Epoch: 1 Global Step: 33370 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:28:20,208-Speed 4689.24 samples/sec Loss 5.5483 LearningRate 0.0810 Epoch: 1 Global Step: 33380 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:28:50,829-Speed 334.42 samples/sec Loss 5.0302 LearningRate 0.0810 Epoch: 2 Global Step: 33390 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:28:52,804-Speed 5186.94 samples/sec Loss 4.9065 LearningRate 0.0810 Epoch: 2 Global Step: 33400 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:28:54,770-Speed 5211.28 samples/sec Loss 4.8667 LearningRate 0.0810 Epoch: 2 Global Step: 33410 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:28:56,732-Speed 5221.95 samples/sec Loss 4.9351 LearningRate 0.0810 Epoch: 2 Global Step: 33420 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:28:58,849-Speed 4839.57 samples/sec Loss 4.8247 LearningRate 0.0810 Epoch: 2 Global Step: 33430 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:29:01,087-Speed 4575.81 samples/sec Loss 4.8276 LearningRate 0.0810 Epoch: 2 Global Step: 33440 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:29:03,052-Speed 5212.30 samples/sec Loss 4.8099 LearningRate 0.0810 Epoch: 2 Global Step: 33450 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:29:05,026-Speed 5190.83 samples/sec Loss 4.9348 LearningRate 0.0810 Epoch: 2 Global Step: 33460 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:29:07,003-Speed 5182.12 samples/sec Loss 4.8939 LearningRate 0.0810 Epoch: 2 Global Step: 33470 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:29:08,981-Speed 5177.68 samples/sec Loss 4.9385 LearningRate 0.0809 Epoch: 2 Global Step: 33480 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:29:10,967-Speed 5158.27 samples/sec Loss 5.0259 LearningRate 0.0809 Epoch: 2 Global Step: 33490 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:29:12,972-Speed 5109.54 samples/sec Loss 4.9137 LearningRate 0.0809 Epoch: 2 Global Step: 33500 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:29:14,943-Speed 5196.32 samples/sec Loss 4.9353 LearningRate 0.0809 Epoch: 2 Global Step: 33510 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:29:16,908-Speed 5213.23 samples/sec Loss 4.9804 LearningRate 0.0809 Epoch: 2 Global Step: 33520 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:29:18,873-Speed 5212.18 samples/sec Loss 4.9439 LearningRate 0.0809 Epoch: 2 Global Step: 33530 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:29:20,846-Speed 5193.46 samples/sec Loss 4.9171 LearningRate 0.0809 Epoch: 2 Global Step: 33540 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:29:22,830-Speed 5161.81 samples/sec Loss 4.8707 LearningRate 0.0809 Epoch: 2 Global Step: 33550 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:29:24,795-Speed 5212.60 samples/sec Loss 4.9578 LearningRate 0.0809 Epoch: 2 Global Step: 33560 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:29:26,769-Speed 5189.29 samples/sec Loss 4.9101 LearningRate 0.0809 Epoch: 2 Global Step: 33570 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:29:28,743-Speed 5191.36 samples/sec Loss 4.9907 LearningRate 0.0809 Epoch: 2 Global Step: 33580 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:29:30,718-Speed 5184.87 samples/sec Loss 4.8662 LearningRate 0.0809 Epoch: 2 Global Step: 33590 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:29:32,696-Speed 5180.20 samples/sec Loss 4.9054 LearningRate 0.0809 Epoch: 2 Global Step: 33600 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:29:34,665-Speed 5201.42 samples/sec Loss 4.9066 LearningRate 0.0809 Epoch: 2 Global Step: 33610 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:29:36,619-Speed 5241.65 samples/sec Loss 4.9427 LearningRate 0.0809 Epoch: 2 Global Step: 33620 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:29:38,587-Speed 5204.79 samples/sec Loss 4.9884 LearningRate 0.0809 Epoch: 2 Global Step: 33630 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:29:40,551-Speed 5216.31 samples/sec Loss 4.9846 LearningRate 0.0809 Epoch: 2 Global Step: 33640 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:29:42,526-Speed 5187.56 samples/sec Loss 4.9434 LearningRate 0.0809 Epoch: 2 Global Step: 33650 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:29:44,518-Speed 5140.87 samples/sec Loss 4.9459 LearningRate 0.0809 Epoch: 2 Global Step: 33660 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:29:46,529-Speed 5093.34 samples/sec Loss 4.8759 LearningRate 0.0808 Epoch: 2 Global Step: 33670 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:29:48,493-Speed 5217.09 samples/sec Loss 4.8498 LearningRate 0.0808 Epoch: 2 Global Step: 33680 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:29:50,457-Speed 5215.55 samples/sec Loss 4.8305 LearningRate 0.0808 Epoch: 2 Global Step: 33690 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:29:52,425-Speed 5204.01 samples/sec Loss 4.9352 LearningRate 0.0808 Epoch: 2 Global Step: 33700 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:29:54,421-Speed 5132.29 samples/sec Loss 4.9348 LearningRate 0.0808 Epoch: 2 Global Step: 33710 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:29:56,389-Speed 5206.02 samples/sec Loss 4.8035 LearningRate 0.0808 Epoch: 2 Global Step: 33720 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:29:58,382-Speed 5140.09 samples/sec Loss 5.0224 LearningRate 0.0808 Epoch: 2 Global Step: 33730 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:30:00,377-Speed 5132.69 samples/sec Loss 4.8286 LearningRate 0.0808 Epoch: 2 Global Step: 33740 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:30:02,362-Speed 5161.79 samples/sec Loss 4.9727 LearningRate 0.0808 Epoch: 2 Global Step: 33750 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:30:04,366-Speed 5110.86 samples/sec Loss 4.8778 LearningRate 0.0808 Epoch: 2 Global Step: 33760 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:30:06,342-Speed 5184.92 samples/sec Loss 4.9433 LearningRate 0.0808 Epoch: 2 Global Step: 33770 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:30:08,312-Speed 5200.10 samples/sec Loss 4.9328 LearningRate 0.0808 Epoch: 2 Global Step: 33780 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:30:10,299-Speed 5154.64 samples/sec Loss 4.9992 LearningRate 0.0808 Epoch: 2 Global Step: 33790 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:30:12,284-Speed 5161.19 samples/sec Loss 4.9640 LearningRate 0.0808 Epoch: 2 Global Step: 33800 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:30:14,257-Speed 5190.41 samples/sec Loss 4.9729 LearningRate 0.0808 Epoch: 2 Global Step: 33810 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:30:16,229-Speed 5194.66 samples/sec Loss 4.9963 LearningRate 0.0808 Epoch: 2 Global Step: 33820 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:30:18,193-Speed 5214.90 samples/sec Loss 4.9800 LearningRate 0.0808 Epoch: 2 Global Step: 33830 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:30:20,168-Speed 5187.67 samples/sec Loss 4.9319 LearningRate 0.0808 Epoch: 2 Global Step: 33840 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:30:22,154-Speed 5156.57 samples/sec Loss 4.9770 LearningRate 0.0807 Epoch: 2 Global Step: 33850 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:30:24,135-Speed 5170.51 samples/sec Loss 4.9205 LearningRate 0.0807 Epoch: 2 Global Step: 33860 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:30:26,200-Speed 4961.90 samples/sec Loss 5.0182 LearningRate 0.0807 Epoch: 2 Global Step: 33870 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:30:28,218-Speed 5075.59 samples/sec Loss 4.9922 LearningRate 0.0807 Epoch: 2 Global Step: 33880 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:30:30,189-Speed 5196.96 samples/sec Loss 5.0116 LearningRate 0.0807 Epoch: 2 Global Step: 33890 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:30:32,168-Speed 5177.45 samples/sec Loss 4.9253 LearningRate 0.0807 Epoch: 2 Global Step: 33900 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:30:34,141-Speed 5192.11 samples/sec Loss 5.0002 LearningRate 0.0807 Epoch: 2 Global Step: 33910 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:30:36,115-Speed 5188.82 samples/sec Loss 5.0075 LearningRate 0.0807 Epoch: 2 Global Step: 33920 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:30:38,098-Speed 5163.66 samples/sec Loss 4.9909 LearningRate 0.0807 Epoch: 2 Global Step: 33930 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:30:40,225-Speed 4816.31 samples/sec Loss 5.0732 LearningRate 0.0807 Epoch: 2 Global Step: 33940 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:30:42,220-Speed 5135.38 samples/sec Loss 4.8863 LearningRate 0.0807 Epoch: 2 Global Step: 33950 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:30:44,194-Speed 5188.00 samples/sec Loss 4.9139 LearningRate 0.0807 Epoch: 2 Global Step: 33960 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:30:46,186-Speed 5142.44 samples/sec Loss 5.0330 LearningRate 0.0807 Epoch: 2 Global Step: 33970 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:30:48,173-Speed 5155.16 samples/sec Loss 4.9631 LearningRate 0.0807 Epoch: 2 Global Step: 33980 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:30:50,166-Speed 5138.30 samples/sec Loss 4.9373 LearningRate 0.0807 Epoch: 2 Global Step: 33990 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:30:52,149-Speed 5168.60 samples/sec Loss 5.0729 LearningRate 0.0807 Epoch: 2 Global Step: 34000 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:31:18,775-[lfw][34000]XNorm: 22.384825 Training: 2022-04-11 01:31:18,775-[lfw][34000]Accuracy-Flip: 0.99650+-0.00398 Training: 2022-04-11 01:31:18,776-[lfw][34000]Accuracy-Highest: 0.99717 Training: 2022-04-11 01:31:49,500-[cfp_fp][34000]XNorm: 20.290635 Training: 2022-04-11 01:31:49,501-[cfp_fp][34000]Accuracy-Flip: 0.97329+-0.00783 Training: 2022-04-11 01:31:49,501-[cfp_fp][34000]Accuracy-Highest: 0.97486 Training: 2022-04-11 01:32:15,953-[agedb_30][34000]XNorm: 22.168511 Training: 2022-04-11 01:32:15,953-[agedb_30][34000]Accuracy-Flip: 0.97400+-0.00879 Training: 2022-04-11 01:32:15,954-[agedb_30][34000]Accuracy-Highest: 0.97400 Training: 2022-04-11 01:32:17,939-Speed 119.36 samples/sec Loss 5.0779 LearningRate 0.0807 Epoch: 2 Global Step: 34010 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:32:19,890-Speed 5249.31 samples/sec Loss 5.0297 LearningRate 0.0807 Epoch: 2 Global Step: 34020 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:32:21,837-Speed 5262.03 samples/sec Loss 4.9967 LearningRate 0.0807 Epoch: 2 Global Step: 34030 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:32:23,804-Speed 5206.00 samples/sec Loss 5.0119 LearningRate 0.0806 Epoch: 2 Global Step: 34040 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:32:25,772-Speed 5206.09 samples/sec Loss 4.9801 LearningRate 0.0806 Epoch: 2 Global Step: 34050 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:32:27,748-Speed 5184.35 samples/sec Loss 5.0387 LearningRate 0.0806 Epoch: 2 Global Step: 34060 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:32:29,712-Speed 5215.05 samples/sec Loss 4.9991 LearningRate 0.0806 Epoch: 2 Global Step: 34070 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:32:31,673-Speed 5221.46 samples/sec Loss 4.9517 LearningRate 0.0806 Epoch: 2 Global Step: 34080 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:32:33,665-Speed 5143.40 samples/sec Loss 5.0463 LearningRate 0.0806 Epoch: 2 Global Step: 34090 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:32:35,649-Speed 5163.22 samples/sec Loss 4.9333 LearningRate 0.0806 Epoch: 2 Global Step: 34100 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:32:37,623-Speed 5188.54 samples/sec Loss 5.0341 LearningRate 0.0806 Epoch: 2 Global Step: 34110 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:32:39,613-Speed 5146.93 samples/sec Loss 4.9564 LearningRate 0.0806 Epoch: 2 Global Step: 34120 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:32:41,599-Speed 5159.50 samples/sec Loss 4.9236 LearningRate 0.0806 Epoch: 2 Global Step: 34130 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:32:43,557-Speed 5232.29 samples/sec Loss 4.9421 LearningRate 0.0806 Epoch: 2 Global Step: 34140 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:32:45,550-Speed 5140.91 samples/sec Loss 4.8982 LearningRate 0.0806 Epoch: 2 Global Step: 34150 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:32:47,526-Speed 5182.27 samples/sec Loss 5.0178 LearningRate 0.0806 Epoch: 2 Global Step: 34160 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:32:49,510-Speed 5163.74 samples/sec Loss 5.0854 LearningRate 0.0806 Epoch: 2 Global Step: 34170 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:32:51,478-Speed 5203.81 samples/sec Loss 5.0111 LearningRate 0.0806 Epoch: 2 Global Step: 34180 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:32:53,462-Speed 5162.04 samples/sec Loss 4.9477 LearningRate 0.0806 Epoch: 2 Global Step: 34190 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:32:55,433-Speed 5199.48 samples/sec Loss 5.0104 LearningRate 0.0806 Epoch: 2 Global Step: 34200 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:32:57,410-Speed 5180.83 samples/sec Loss 4.9114 LearningRate 0.0806 Epoch: 2 Global Step: 34210 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:32:59,388-Speed 5178.28 samples/sec Loss 5.0790 LearningRate 0.0805 Epoch: 2 Global Step: 34220 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:33:01,352-Speed 5213.52 samples/sec Loss 5.0307 LearningRate 0.0805 Epoch: 2 Global Step: 34230 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:33:03,329-Speed 5184.43 samples/sec Loss 4.9983 LearningRate 0.0805 Epoch: 2 Global Step: 34240 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:33:05,295-Speed 5210.14 samples/sec Loss 5.0545 LearningRate 0.0805 Epoch: 2 Global Step: 34250 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:33:07,262-Speed 5207.09 samples/sec Loss 4.9774 LearningRate 0.0805 Epoch: 2 Global Step: 34260 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:33:09,256-Speed 5136.99 samples/sec Loss 4.9567 LearningRate 0.0805 Epoch: 2 Global Step: 34270 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:33:11,223-Speed 5210.70 samples/sec Loss 5.1472 LearningRate 0.0805 Epoch: 2 Global Step: 34280 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:33:13,193-Speed 5201.23 samples/sec Loss 5.0395 LearningRate 0.0805 Epoch: 2 Global Step: 34290 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:33:15,167-Speed 5188.33 samples/sec Loss 4.9517 LearningRate 0.0805 Epoch: 2 Global Step: 34300 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:33:17,137-Speed 5200.04 samples/sec Loss 5.0573 LearningRate 0.0805 Epoch: 2 Global Step: 34310 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:33:19,097-Speed 5225.49 samples/sec Loss 5.0674 LearningRate 0.0805 Epoch: 2 Global Step: 34320 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:33:21,065-Speed 5204.85 samples/sec Loss 4.9043 LearningRate 0.0805 Epoch: 2 Global Step: 34330 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:33:23,039-Speed 5187.92 samples/sec Loss 5.0829 LearningRate 0.0805 Epoch: 2 Global Step: 34340 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:33:25,010-Speed 5198.36 samples/sec Loss 5.0516 LearningRate 0.0805 Epoch: 2 Global Step: 34350 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:33:26,985-Speed 5188.40 samples/sec Loss 4.9434 LearningRate 0.0805 Epoch: 2 Global Step: 34360 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:33:28,961-Speed 5183.37 samples/sec Loss 4.9036 LearningRate 0.0805 Epoch: 2 Global Step: 34370 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:33:30,922-Speed 5223.76 samples/sec Loss 4.9729 LearningRate 0.0805 Epoch: 2 Global Step: 34380 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:33:32,888-Speed 5210.28 samples/sec Loss 5.0791 LearningRate 0.0805 Epoch: 2 Global Step: 34390 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:33:34,864-Speed 5183.87 samples/sec Loss 5.0926 LearningRate 0.0805 Epoch: 2 Global Step: 34400 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:33:36,842-Speed 5177.38 samples/sec Loss 5.0982 LearningRate 0.0804 Epoch: 2 Global Step: 34410 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:33:38,806-Speed 5215.27 samples/sec Loss 5.1352 LearningRate 0.0804 Epoch: 2 Global Step: 34420 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:33:40,788-Speed 5168.42 samples/sec Loss 5.0343 LearningRate 0.0804 Epoch: 2 Global Step: 34430 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:33:42,743-Speed 5239.33 samples/sec Loss 5.0652 LearningRate 0.0804 Epoch: 2 Global Step: 34440 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:33:44,710-Speed 5208.04 samples/sec Loss 5.0969 LearningRate 0.0804 Epoch: 2 Global Step: 34450 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:33:46,686-Speed 5183.92 samples/sec Loss 5.0480 LearningRate 0.0804 Epoch: 2 Global Step: 34460 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:33:48,650-Speed 5217.34 samples/sec Loss 5.0358 LearningRate 0.0804 Epoch: 2 Global Step: 34470 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:33:50,615-Speed 5213.28 samples/sec Loss 5.1764 LearningRate 0.0804 Epoch: 2 Global Step: 34480 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:33:52,579-Speed 5214.62 samples/sec Loss 5.1539 LearningRate 0.0804 Epoch: 2 Global Step: 34490 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:33:54,570-Speed 5145.82 samples/sec Loss 5.0823 LearningRate 0.0804 Epoch: 2 Global Step: 34500 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:33:56,533-Speed 5217.90 samples/sec Loss 5.0661 LearningRate 0.0804 Epoch: 2 Global Step: 34510 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:33:58,498-Speed 5212.03 samples/sec Loss 5.1933 LearningRate 0.0804 Epoch: 2 Global Step: 34520 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:34:00,476-Speed 5179.55 samples/sec Loss 5.1187 LearningRate 0.0804 Epoch: 2 Global Step: 34530 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:34:02,468-Speed 5140.84 samples/sec Loss 5.1129 LearningRate 0.0804 Epoch: 2 Global Step: 34540 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:34:04,444-Speed 5185.00 samples/sec Loss 5.1167 LearningRate 0.0804 Epoch: 2 Global Step: 34550 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:34:06,412-Speed 5206.73 samples/sec Loss 5.0757 LearningRate 0.0804 Epoch: 2 Global Step: 34560 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:34:08,376-Speed 5214.63 samples/sec Loss 5.1746 LearningRate 0.0804 Epoch: 2 Global Step: 34570 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:34:10,347-Speed 5198.79 samples/sec Loss 5.0124 LearningRate 0.0804 Epoch: 2 Global Step: 34580 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:34:12,309-Speed 5218.62 samples/sec Loss 5.0961 LearningRate 0.0803 Epoch: 2 Global Step: 34590 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:34:14,277-Speed 5206.95 samples/sec Loss 5.1192 LearningRate 0.0803 Epoch: 2 Global Step: 34600 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:34:16,245-Speed 5203.93 samples/sec Loss 5.0373 LearningRate 0.0803 Epoch: 2 Global Step: 34610 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:34:18,209-Speed 5215.17 samples/sec Loss 5.1301 LearningRate 0.0803 Epoch: 2 Global Step: 34620 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:34:20,175-Speed 5209.94 samples/sec Loss 5.0783 LearningRate 0.0803 Epoch: 2 Global Step: 34630 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:34:22,164-Speed 5149.62 samples/sec Loss 5.0755 LearningRate 0.0803 Epoch: 2 Global Step: 34640 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-11 01:34:24,126-Speed 5222.69 samples/sec Loss 5.0202 LearningRate 0.0803 Epoch: 2 Global Step: 34650 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:34:26,112-Speed 5155.33 samples/sec Loss 4.9969 LearningRate 0.0803 Epoch: 2 Global Step: 34660 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:34:28,086-Speed 5190.46 samples/sec Loss 5.1073 LearningRate 0.0803 Epoch: 2 Global Step: 34670 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:34:30,063-Speed 5182.42 samples/sec Loss 5.1413 LearningRate 0.0803 Epoch: 2 Global Step: 34680 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:34:32,030-Speed 5208.98 samples/sec Loss 5.0261 LearningRate 0.0803 Epoch: 2 Global Step: 34690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:34:33,995-Speed 5212.24 samples/sec Loss 5.0387 LearningRate 0.0803 Epoch: 2 Global Step: 34700 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:34:35,961-Speed 5210.12 samples/sec Loss 5.1107 LearningRate 0.0803 Epoch: 2 Global Step: 34710 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:34:37,925-Speed 5213.83 samples/sec Loss 5.0238 LearningRate 0.0803 Epoch: 2 Global Step: 34720 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:34:39,887-Speed 5221.76 samples/sec Loss 5.1580 LearningRate 0.0803 Epoch: 2 Global Step: 34730 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:34:41,855-Speed 5205.68 samples/sec Loss 5.1009 LearningRate 0.0803 Epoch: 2 Global Step: 34740 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:34:43,818-Speed 5217.48 samples/sec Loss 5.1738 LearningRate 0.0803 Epoch: 2 Global Step: 34750 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-11 01:34:45,788-Speed 5199.36 samples/sec Loss 5.1182 LearningRate 0.0803 Epoch: 2 Global Step: 34760 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:34:47,771-Speed 5167.15 samples/sec Loss 5.1560 LearningRate 0.0803 Epoch: 2 Global Step: 34770 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:34:49,744-Speed 5192.57 samples/sec Loss 5.1139 LearningRate 0.0802 Epoch: 2 Global Step: 34780 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:34:51,718-Speed 5187.90 samples/sec Loss 5.0715 LearningRate 0.0802 Epoch: 2 Global Step: 34790 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:34:53,700-Speed 5167.67 samples/sec Loss 5.0992 LearningRate 0.0802 Epoch: 2 Global Step: 34800 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:34:55,672-Speed 5195.02 samples/sec Loss 5.1346 LearningRate 0.0802 Epoch: 2 Global Step: 34810 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:34:57,634-Speed 5221.95 samples/sec Loss 5.0886 LearningRate 0.0802 Epoch: 2 Global Step: 34820 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:34:59,603-Speed 5200.56 samples/sec Loss 5.0457 LearningRate 0.0802 Epoch: 2 Global Step: 34830 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:35:01,570-Speed 5208.23 samples/sec Loss 5.2115 LearningRate 0.0802 Epoch: 2 Global Step: 34840 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:35:03,534-Speed 5216.55 samples/sec Loss 5.0858 LearningRate 0.0802 Epoch: 2 Global Step: 34850 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:35:05,498-Speed 5214.47 samples/sec Loss 5.1706 LearningRate 0.0802 Epoch: 2 Global Step: 34860 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:35:07,461-Speed 5217.71 samples/sec Loss 5.2209 LearningRate 0.0802 Epoch: 2 Global Step: 34870 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:35:09,440-Speed 5177.07 samples/sec Loss 5.1517 LearningRate 0.0802 Epoch: 2 Global Step: 34880 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:35:11,405-Speed 5213.88 samples/sec Loss 5.0771 LearningRate 0.0802 Epoch: 2 Global Step: 34890 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:35:13,376-Speed 5194.92 samples/sec Loss 5.0237 LearningRate 0.0802 Epoch: 2 Global Step: 34900 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:35:15,365-Speed 5151.84 samples/sec Loss 5.1022 LearningRate 0.0802 Epoch: 2 Global Step: 34910 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:35:17,328-Speed 5218.81 samples/sec Loss 5.1126 LearningRate 0.0802 Epoch: 2 Global Step: 34920 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:35:19,292-Speed 5214.12 samples/sec Loss 5.1488 LearningRate 0.0802 Epoch: 2 Global Step: 34930 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:35:21,262-Speed 5201.05 samples/sec Loss 5.0955 LearningRate 0.0802 Epoch: 2 Global Step: 34940 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:35:23,234-Speed 5192.84 samples/sec Loss 5.0409 LearningRate 0.0802 Epoch: 2 Global Step: 34950 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:35:25,203-Speed 5201.45 samples/sec Loss 5.0816 LearningRate 0.0802 Epoch: 2 Global Step: 34960 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:35:27,173-Speed 5200.14 samples/sec Loss 5.1765 LearningRate 0.0801 Epoch: 2 Global Step: 34970 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:35:29,139-Speed 5211.61 samples/sec Loss 5.1569 LearningRate 0.0801 Epoch: 2 Global Step: 34980 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:35:31,104-Speed 5213.56 samples/sec Loss 5.0939 LearningRate 0.0801 Epoch: 2 Global Step: 34990 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:35:33,075-Speed 5196.90 samples/sec Loss 5.0903 LearningRate 0.0801 Epoch: 2 Global Step: 35000 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:35:35,052-Speed 5180.92 samples/sec Loss 5.0530 LearningRate 0.0801 Epoch: 2 Global Step: 35010 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:35:37,016-Speed 5216.78 samples/sec Loss 5.1115 LearningRate 0.0801 Epoch: 2 Global Step: 35020 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:35:38,988-Speed 5193.09 samples/sec Loss 5.0042 LearningRate 0.0801 Epoch: 2 Global Step: 35030 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:35:40,962-Speed 5190.18 samples/sec Loss 5.0770 LearningRate 0.0801 Epoch: 2 Global Step: 35040 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:35:42,931-Speed 5201.88 samples/sec Loss 5.1439 LearningRate 0.0801 Epoch: 2 Global Step: 35050 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:35:44,912-Speed 5169.52 samples/sec Loss 5.0947 LearningRate 0.0801 Epoch: 2 Global Step: 35060 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:35:46,877-Speed 5212.96 samples/sec Loss 5.1518 LearningRate 0.0801 Epoch: 2 Global Step: 35070 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:35:48,842-Speed 5214.02 samples/sec Loss 5.1298 LearningRate 0.0801 Epoch: 2 Global Step: 35080 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:35:50,808-Speed 5209.54 samples/sec Loss 5.0939 LearningRate 0.0801 Epoch: 2 Global Step: 35090 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:35:52,787-Speed 5177.44 samples/sec Loss 5.1813 LearningRate 0.0801 Epoch: 2 Global Step: 35100 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:35:54,757-Speed 5198.79 samples/sec Loss 5.1589 LearningRate 0.0801 Epoch: 2 Global Step: 35110 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:35:56,725-Speed 5206.11 samples/sec Loss 5.1170 LearningRate 0.0801 Epoch: 2 Global Step: 35120 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:35:58,700-Speed 5185.02 samples/sec Loss 5.1575 LearningRate 0.0801 Epoch: 2 Global Step: 35130 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:36:00,675-Speed 5187.15 samples/sec Loss 5.1369 LearningRate 0.0801 Epoch: 2 Global Step: 35140 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:36:02,654-Speed 5175.10 samples/sec Loss 5.1784 LearningRate 0.0800 Epoch: 2 Global Step: 35150 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:36:04,625-Speed 5199.01 samples/sec Loss 5.0666 LearningRate 0.0800 Epoch: 2 Global Step: 35160 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:36:06,597-Speed 5192.56 samples/sec Loss 5.1863 LearningRate 0.0800 Epoch: 2 Global Step: 35170 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:36:08,573-Speed 5186.07 samples/sec Loss 5.1587 LearningRate 0.0800 Epoch: 2 Global Step: 35180 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:36:10,549-Speed 5182.42 samples/sec Loss 5.0664 LearningRate 0.0800 Epoch: 2 Global Step: 35190 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:36:12,535-Speed 5157.43 samples/sec Loss 5.0793 LearningRate 0.0800 Epoch: 2 Global Step: 35200 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:36:14,525-Speed 5148.27 samples/sec Loss 4.9757 LearningRate 0.0800 Epoch: 2 Global Step: 35210 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:36:16,503-Speed 5178.58 samples/sec Loss 5.0426 LearningRate 0.0800 Epoch: 2 Global Step: 35220 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:36:18,472-Speed 5202.64 samples/sec Loss 4.9885 LearningRate 0.0800 Epoch: 2 Global Step: 35230 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:36:20,437-Speed 5211.88 samples/sec Loss 5.0858 LearningRate 0.0800 Epoch: 2 Global Step: 35240 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:36:22,412-Speed 5187.82 samples/sec Loss 5.1529 LearningRate 0.0800 Epoch: 2 Global Step: 35250 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:36:24,398-Speed 5157.73 samples/sec Loss 5.0943 LearningRate 0.0800 Epoch: 2 Global Step: 35260 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:36:26,365-Speed 5207.55 samples/sec Loss 5.1330 LearningRate 0.0800 Epoch: 2 Global Step: 35270 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:36:28,334-Speed 5201.16 samples/sec Loss 5.0726 LearningRate 0.0800 Epoch: 2 Global Step: 35280 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:36:30,308-Speed 5190.45 samples/sec Loss 5.0927 LearningRate 0.0800 Epoch: 2 Global Step: 35290 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:36:32,301-Speed 5140.88 samples/sec Loss 5.1515 LearningRate 0.0800 Epoch: 2 Global Step: 35300 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:36:34,275-Speed 5188.92 samples/sec Loss 5.1352 LearningRate 0.0800 Epoch: 2 Global Step: 35310 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:36:36,256-Speed 5170.39 samples/sec Loss 5.1285 LearningRate 0.0800 Epoch: 2 Global Step: 35320 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:36:38,238-Speed 5167.19 samples/sec Loss 5.1066 LearningRate 0.0800 Epoch: 2 Global Step: 35330 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:36:40,226-Speed 5154.15 samples/sec Loss 5.1810 LearningRate 0.0799 Epoch: 2 Global Step: 35340 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:36:42,200-Speed 5189.18 samples/sec Loss 5.1175 LearningRate 0.0799 Epoch: 2 Global Step: 35350 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:36:44,177-Speed 5179.57 samples/sec Loss 5.2000 LearningRate 0.0799 Epoch: 2 Global Step: 35360 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:36:46,150-Speed 5193.19 samples/sec Loss 5.0822 LearningRate 0.0799 Epoch: 2 Global Step: 35370 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:36:48,125-Speed 5186.84 samples/sec Loss 5.2846 LearningRate 0.0799 Epoch: 2 Global Step: 35380 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:36:50,096-Speed 5196.71 samples/sec Loss 5.1391 LearningRate 0.0799 Epoch: 2 Global Step: 35390 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:36:52,072-Speed 5182.89 samples/sec Loss 5.1559 LearningRate 0.0799 Epoch: 2 Global Step: 35400 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:36:54,049-Speed 5181.57 samples/sec Loss 5.1469 LearningRate 0.0799 Epoch: 2 Global Step: 35410 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:36:56,045-Speed 5133.41 samples/sec Loss 5.1586 LearningRate 0.0799 Epoch: 2 Global Step: 35420 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:36:58,020-Speed 5184.91 samples/sec Loss 5.1522 LearningRate 0.0799 Epoch: 2 Global Step: 35430 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:37:00,004-Speed 5163.19 samples/sec Loss 5.2060 LearningRate 0.0799 Epoch: 2 Global Step: 35440 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:37:01,971-Speed 5207.29 samples/sec Loss 5.1444 LearningRate 0.0799 Epoch: 2 Global Step: 35450 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:37:03,962-Speed 5144.47 samples/sec Loss 5.1732 LearningRate 0.0799 Epoch: 2 Global Step: 35460 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:37:05,932-Speed 5200.99 samples/sec Loss 5.1293 LearningRate 0.0799 Epoch: 2 Global Step: 35470 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:37:07,900-Speed 5204.45 samples/sec Loss 5.2010 LearningRate 0.0799 Epoch: 2 Global Step: 35480 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:37:09,883-Speed 5165.09 samples/sec Loss 5.1969 LearningRate 0.0799 Epoch: 2 Global Step: 35490 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:37:11,863-Speed 5173.82 samples/sec Loss 5.2002 LearningRate 0.0799 Epoch: 2 Global Step: 35500 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:37:13,848-Speed 5160.33 samples/sec Loss 5.2627 LearningRate 0.0799 Epoch: 2 Global Step: 35510 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:37:15,836-Speed 5154.87 samples/sec Loss 5.2145 LearningRate 0.0799 Epoch: 2 Global Step: 35520 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:37:17,819-Speed 5164.96 samples/sec Loss 5.1799 LearningRate 0.0798 Epoch: 2 Global Step: 35530 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:37:19,787-Speed 5204.20 samples/sec Loss 5.2115 LearningRate 0.0798 Epoch: 2 Global Step: 35540 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:37:21,768-Speed 5172.12 samples/sec Loss 5.1969 LearningRate 0.0798 Epoch: 2 Global Step: 35550 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:37:23,734-Speed 5210.34 samples/sec Loss 5.2192 LearningRate 0.0798 Epoch: 2 Global Step: 35560 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:37:25,738-Speed 5111.31 samples/sec Loss 5.1081 LearningRate 0.0798 Epoch: 2 Global Step: 35570 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:37:27,735-Speed 5129.56 samples/sec Loss 5.2507 LearningRate 0.0798 Epoch: 2 Global Step: 35580 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:37:29,712-Speed 5181.35 samples/sec Loss 5.2141 LearningRate 0.0798 Epoch: 2 Global Step: 35590 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:37:31,682-Speed 5197.67 samples/sec Loss 5.2037 LearningRate 0.0798 Epoch: 2 Global Step: 35600 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:37:33,681-Speed 5124.37 samples/sec Loss 5.1990 LearningRate 0.0798 Epoch: 2 Global Step: 35610 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:37:35,669-Speed 5154.31 samples/sec Loss 5.1341 LearningRate 0.0798 Epoch: 2 Global Step: 35620 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:37:37,664-Speed 5134.94 samples/sec Loss 5.0623 LearningRate 0.0798 Epoch: 2 Global Step: 35630 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:37:39,648-Speed 5162.69 samples/sec Loss 5.0870 LearningRate 0.0798 Epoch: 2 Global Step: 35640 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:37:41,623-Speed 5186.15 samples/sec Loss 5.2050 LearningRate 0.0798 Epoch: 2 Global Step: 35650 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:37:43,585-Speed 5221.54 samples/sec Loss 5.2780 LearningRate 0.0798 Epoch: 2 Global Step: 35660 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:37:45,572-Speed 5155.22 samples/sec Loss 5.1662 LearningRate 0.0798 Epoch: 2 Global Step: 35670 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:37:47,555-Speed 5165.70 samples/sec Loss 5.1430 LearningRate 0.0798 Epoch: 2 Global Step: 35680 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:37:49,525-Speed 5198.09 samples/sec Loss 5.1404 LearningRate 0.0798 Epoch: 2 Global Step: 35690 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:37:51,501-Speed 5185.19 samples/sec Loss 5.2687 LearningRate 0.0798 Epoch: 2 Global Step: 35700 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:37:53,486-Speed 5159.59 samples/sec Loss 5.1299 LearningRate 0.0797 Epoch: 2 Global Step: 35710 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:37:55,466-Speed 5173.12 samples/sec Loss 5.0973 LearningRate 0.0797 Epoch: 2 Global Step: 35720 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:37:57,441-Speed 5186.36 samples/sec Loss 5.1570 LearningRate 0.0797 Epoch: 2 Global Step: 35730 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:37:59,430-Speed 5150.94 samples/sec Loss 5.1852 LearningRate 0.0797 Epoch: 2 Global Step: 35740 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:38:01,411-Speed 5171.53 samples/sec Loss 5.0705 LearningRate 0.0797 Epoch: 2 Global Step: 35750 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:38:03,388-Speed 5181.03 samples/sec Loss 5.1588 LearningRate 0.0797 Epoch: 2 Global Step: 35760 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:38:05,377-Speed 5150.68 samples/sec Loss 5.1767 LearningRate 0.0797 Epoch: 2 Global Step: 35770 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:38:07,352-Speed 5184.83 samples/sec Loss 5.1103 LearningRate 0.0797 Epoch: 2 Global Step: 35780 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:38:09,316-Speed 5217.13 samples/sec Loss 5.0673 LearningRate 0.0797 Epoch: 2 Global Step: 35790 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:38:11,289-Speed 5189.75 samples/sec Loss 5.0993 LearningRate 0.0797 Epoch: 2 Global Step: 35800 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:38:13,262-Speed 5191.36 samples/sec Loss 5.2435 LearningRate 0.0797 Epoch: 2 Global Step: 35810 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:38:15,234-Speed 5194.78 samples/sec Loss 5.1242 LearningRate 0.0797 Epoch: 2 Global Step: 35820 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:38:17,206-Speed 5195.35 samples/sec Loss 5.1232 LearningRate 0.0797 Epoch: 2 Global Step: 35830 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:38:19,181-Speed 5186.44 samples/sec Loss 5.1114 LearningRate 0.0797 Epoch: 2 Global Step: 35840 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:38:21,158-Speed 5182.82 samples/sec Loss 5.1247 LearningRate 0.0797 Epoch: 2 Global Step: 35850 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:38:23,157-Speed 5122.83 samples/sec Loss 5.2158 LearningRate 0.0797 Epoch: 2 Global Step: 35860 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:38:25,144-Speed 5157.14 samples/sec Loss 5.1406 LearningRate 0.0797 Epoch: 2 Global Step: 35870 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:38:27,123-Speed 5174.39 samples/sec Loss 5.1215 LearningRate 0.0797 Epoch: 2 Global Step: 35880 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:38:29,105-Speed 5169.68 samples/sec Loss 5.2163 LearningRate 0.0797 Epoch: 2 Global Step: 35890 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:38:31,071-Speed 5208.44 samples/sec Loss 5.1387 LearningRate 0.0796 Epoch: 2 Global Step: 35900 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:38:33,047-Speed 5183.57 samples/sec Loss 5.1324 LearningRate 0.0796 Epoch: 2 Global Step: 35910 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:38:35,031-Speed 5165.12 samples/sec Loss 5.2208 LearningRate 0.0796 Epoch: 2 Global Step: 35920 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:38:37,013-Speed 5165.49 samples/sec Loss 5.2358 LearningRate 0.0796 Epoch: 2 Global Step: 35930 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:38:39,009-Speed 5133.67 samples/sec Loss 5.2835 LearningRate 0.0796 Epoch: 2 Global Step: 35940 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:38:40,979-Speed 5199.84 samples/sec Loss 5.0913 LearningRate 0.0796 Epoch: 2 Global Step: 35950 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:38:42,964-Speed 5162.71 samples/sec Loss 5.2406 LearningRate 0.0796 Epoch: 2 Global Step: 35960 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:38:44,950-Speed 5155.19 samples/sec Loss 5.1247 LearningRate 0.0796 Epoch: 2 Global Step: 35970 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:38:46,939-Speed 5150.93 samples/sec Loss 5.1903 LearningRate 0.0796 Epoch: 2 Global Step: 35980 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:38:48,914-Speed 5186.68 samples/sec Loss 5.2255 LearningRate 0.0796 Epoch: 2 Global Step: 35990 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:38:50,911-Speed 5130.76 samples/sec Loss 5.2309 LearningRate 0.0796 Epoch: 2 Global Step: 36000 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:39:17,630-[lfw][36000]XNorm: 21.791859 Training: 2022-04-11 01:39:17,631-[lfw][36000]Accuracy-Flip: 0.99717+-0.00279 Training: 2022-04-11 01:39:17,631-[lfw][36000]Accuracy-Highest: 0.99717 Training: 2022-04-11 01:39:48,402-[cfp_fp][36000]XNorm: 19.592480 Training: 2022-04-11 01:39:48,403-[cfp_fp][36000]Accuracy-Flip: 0.96971+-0.00641 Training: 2022-04-11 01:39:48,403-[cfp_fp][36000]Accuracy-Highest: 0.97486 Training: 2022-04-11 01:40:14,848-[agedb_30][36000]XNorm: 21.764029 Training: 2022-04-11 01:40:14,849-[agedb_30][36000]Accuracy-Flip: 0.97333+-0.00937 Training: 2022-04-11 01:40:14,849-[agedb_30][36000]Accuracy-Highest: 0.97400 Training: 2022-04-11 01:40:16,822-Speed 119.19 samples/sec Loss 5.2254 LearningRate 0.0796 Epoch: 2 Global Step: 36010 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:40:18,783-Speed 5221.32 samples/sec Loss 5.1410 LearningRate 0.0796 Epoch: 2 Global Step: 36020 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:40:20,744-Speed 5224.95 samples/sec Loss 5.1159 LearningRate 0.0796 Epoch: 2 Global Step: 36030 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:40:22,702-Speed 5232.07 samples/sec Loss 5.2166 LearningRate 0.0796 Epoch: 2 Global Step: 36040 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:40:24,659-Speed 5233.63 samples/sec Loss 5.1556 LearningRate 0.0796 Epoch: 2 Global Step: 36050 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:40:26,637-Speed 5177.50 samples/sec Loss 5.1160 LearningRate 0.0796 Epoch: 2 Global Step: 36060 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:40:28,598-Speed 5225.38 samples/sec Loss 5.2651 LearningRate 0.0796 Epoch: 2 Global Step: 36070 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:40:30,559-Speed 5222.59 samples/sec Loss 5.1577 LearningRate 0.0796 Epoch: 2 Global Step: 36080 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:40:32,535-Speed 5182.85 samples/sec Loss 5.2437 LearningRate 0.0795 Epoch: 2 Global Step: 36090 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:40:34,507-Speed 5194.35 samples/sec Loss 5.1667 LearningRate 0.0795 Epoch: 2 Global Step: 36100 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:40:36,471-Speed 5215.48 samples/sec Loss 5.1360 LearningRate 0.0795 Epoch: 2 Global Step: 36110 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:40:38,441-Speed 5200.74 samples/sec Loss 5.1755 LearningRate 0.0795 Epoch: 2 Global Step: 36120 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:40:40,413-Speed 5194.07 samples/sec Loss 5.2438 LearningRate 0.0795 Epoch: 2 Global Step: 36130 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:40:42,395-Speed 5167.99 samples/sec Loss 5.1695 LearningRate 0.0795 Epoch: 2 Global Step: 36140 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:40:44,359-Speed 5217.92 samples/sec Loss 5.1571 LearningRate 0.0795 Epoch: 2 Global Step: 36150 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:40:46,361-Speed 5116.05 samples/sec Loss 5.1926 LearningRate 0.0795 Epoch: 2 Global Step: 36160 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:40:48,403-Speed 5016.28 samples/sec Loss 5.2341 LearningRate 0.0795 Epoch: 2 Global Step: 36170 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:40:50,392-Speed 5151.84 samples/sec Loss 5.2332 LearningRate 0.0795 Epoch: 2 Global Step: 36180 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:40:52,374-Speed 5166.61 samples/sec Loss 5.2453 LearningRate 0.0795 Epoch: 2 Global Step: 36190 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:40:54,343-Speed 5202.69 samples/sec Loss 5.2267 LearningRate 0.0795 Epoch: 2 Global Step: 36200 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:40:56,311-Speed 5203.62 samples/sec Loss 5.2136 LearningRate 0.0795 Epoch: 2 Global Step: 36210 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:40:58,281-Speed 5199.31 samples/sec Loss 5.3288 LearningRate 0.0795 Epoch: 2 Global Step: 36220 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:41:00,249-Speed 5205.08 samples/sec Loss 5.1544 LearningRate 0.0795 Epoch: 2 Global Step: 36230 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:41:02,227-Speed 5180.27 samples/sec Loss 5.1302 LearningRate 0.0795 Epoch: 2 Global Step: 36240 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:41:04,200-Speed 5192.96 samples/sec Loss 5.2745 LearningRate 0.0795 Epoch: 2 Global Step: 36250 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:41:06,163-Speed 5216.05 samples/sec Loss 5.2477 LearningRate 0.0795 Epoch: 2 Global Step: 36260 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:41:08,143-Speed 5175.60 samples/sec Loss 5.2666 LearningRate 0.0795 Epoch: 2 Global Step: 36270 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:41:10,128-Speed 5159.43 samples/sec Loss 5.2175 LearningRate 0.0794 Epoch: 2 Global Step: 36280 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:41:12,118-Speed 5145.58 samples/sec Loss 5.2019 LearningRate 0.0794 Epoch: 2 Global Step: 36290 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:41:14,125-Speed 5105.56 samples/sec Loss 5.2518 LearningRate 0.0794 Epoch: 2 Global Step: 36300 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:41:16,106-Speed 5172.02 samples/sec Loss 5.1522 LearningRate 0.0794 Epoch: 2 Global Step: 36310 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:41:18,073-Speed 5207.52 samples/sec Loss 5.1615 LearningRate 0.0794 Epoch: 2 Global Step: 36320 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:41:20,034-Speed 5222.85 samples/sec Loss 5.1766 LearningRate 0.0794 Epoch: 2 Global Step: 36330 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:41:22,014-Speed 5172.57 samples/sec Loss 5.2421 LearningRate 0.0794 Epoch: 2 Global Step: 36340 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:41:23,997-Speed 5166.43 samples/sec Loss 5.2207 LearningRate 0.0794 Epoch: 2 Global Step: 36350 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:41:25,967-Speed 5200.50 samples/sec Loss 5.1881 LearningRate 0.0794 Epoch: 2 Global Step: 36360 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:41:27,951-Speed 5162.31 samples/sec Loss 5.1935 LearningRate 0.0794 Epoch: 2 Global Step: 36370 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:41:29,923-Speed 5196.36 samples/sec Loss 5.1297 LearningRate 0.0794 Epoch: 2 Global Step: 36380 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:41:31,886-Speed 5216.23 samples/sec Loss 5.0989 LearningRate 0.0794 Epoch: 2 Global Step: 36390 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:41:33,858-Speed 5194.48 samples/sec Loss 5.1821 LearningRate 0.0794 Epoch: 2 Global Step: 36400 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:41:35,846-Speed 5153.01 samples/sec Loss 5.1762 LearningRate 0.0794 Epoch: 2 Global Step: 36410 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:41:37,811-Speed 5212.43 samples/sec Loss 5.2020 LearningRate 0.0794 Epoch: 2 Global Step: 36420 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:41:39,783-Speed 5195.21 samples/sec Loss 5.1384 LearningRate 0.0794 Epoch: 2 Global Step: 36430 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:41:41,743-Speed 5226.43 samples/sec Loss 5.2623 LearningRate 0.0794 Epoch: 2 Global Step: 36440 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:41:43,704-Speed 5224.42 samples/sec Loss 5.1859 LearningRate 0.0794 Epoch: 2 Global Step: 36450 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:41:45,684-Speed 5174.04 samples/sec Loss 5.1722 LearningRate 0.0793 Epoch: 2 Global Step: 36460 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:41:47,647-Speed 5217.01 samples/sec Loss 5.2569 LearningRate 0.0793 Epoch: 2 Global Step: 36470 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:41:49,641-Speed 5137.47 samples/sec Loss 5.1541 LearningRate 0.0793 Epoch: 2 Global Step: 36480 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:41:51,604-Speed 5217.70 samples/sec Loss 5.2552 LearningRate 0.0793 Epoch: 2 Global Step: 36490 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:41:53,565-Speed 5225.53 samples/sec Loss 5.2154 LearningRate 0.0793 Epoch: 2 Global Step: 36500 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:41:55,529-Speed 5214.70 samples/sec Loss 5.2204 LearningRate 0.0793 Epoch: 2 Global Step: 36510 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:41:57,488-Speed 5229.19 samples/sec Loss 5.0221 LearningRate 0.0793 Epoch: 2 Global Step: 36520 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:41:59,480-Speed 5140.81 samples/sec Loss 5.1184 LearningRate 0.0793 Epoch: 2 Global Step: 36530 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:42:01,460-Speed 5174.13 samples/sec Loss 5.2183 LearningRate 0.0793 Epoch: 2 Global Step: 36540 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:42:03,433-Speed 5190.19 samples/sec Loss 5.0881 LearningRate 0.0793 Epoch: 2 Global Step: 36550 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:42:05,423-Speed 5148.88 samples/sec Loss 5.2769 LearningRate 0.0793 Epoch: 2 Global Step: 36560 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:42:07,390-Speed 5207.78 samples/sec Loss 5.1700 LearningRate 0.0793 Epoch: 2 Global Step: 36570 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:42:09,368-Speed 5180.43 samples/sec Loss 5.1361 LearningRate 0.0793 Epoch: 2 Global Step: 36580 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:42:11,331-Speed 5218.21 samples/sec Loss 5.1506 LearningRate 0.0793 Epoch: 2 Global Step: 36590 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:42:13,330-Speed 5122.85 samples/sec Loss 5.2428 LearningRate 0.0793 Epoch: 2 Global Step: 36600 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:42:15,303-Speed 5193.14 samples/sec Loss 5.2858 LearningRate 0.0793 Epoch: 2 Global Step: 36610 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:42:17,261-Speed 5231.35 samples/sec Loss 5.1822 LearningRate 0.0793 Epoch: 2 Global Step: 36620 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:42:19,225-Speed 5215.39 samples/sec Loss 5.2133 LearningRate 0.0793 Epoch: 2 Global Step: 36630 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:42:21,195-Speed 5199.73 samples/sec Loss 5.2972 LearningRate 0.0793 Epoch: 2 Global Step: 36640 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:42:23,175-Speed 5173.84 samples/sec Loss 5.2505 LearningRate 0.0792 Epoch: 2 Global Step: 36650 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:42:25,144-Speed 5201.79 samples/sec Loss 5.2283 LearningRate 0.0792 Epoch: 2 Global Step: 36660 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:42:27,134-Speed 5146.70 samples/sec Loss 5.2698 LearningRate 0.0792 Epoch: 2 Global Step: 36670 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:42:29,118-Speed 5165.45 samples/sec Loss 5.2422 LearningRate 0.0792 Epoch: 2 Global Step: 36680 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:42:31,095-Speed 5181.43 samples/sec Loss 5.1071 LearningRate 0.0792 Epoch: 2 Global Step: 36690 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:42:33,062-Speed 5206.27 samples/sec Loss 5.2468 LearningRate 0.0792 Epoch: 2 Global Step: 36700 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:42:35,027-Speed 5211.84 samples/sec Loss 5.1909 LearningRate 0.0792 Epoch: 2 Global Step: 36710 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:42:36,997-Speed 5201.10 samples/sec Loss 5.2876 LearningRate 0.0792 Epoch: 2 Global Step: 36720 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:42:38,959-Speed 5220.34 samples/sec Loss 5.2503 LearningRate 0.0792 Epoch: 2 Global Step: 36730 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:42:40,921-Speed 5221.78 samples/sec Loss 5.2294 LearningRate 0.0792 Epoch: 2 Global Step: 36740 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:42:42,891-Speed 5197.79 samples/sec Loss 5.2142 LearningRate 0.0792 Epoch: 2 Global Step: 36750 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:42:44,884-Speed 5141.28 samples/sec Loss 5.1857 LearningRate 0.0792 Epoch: 2 Global Step: 36760 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:42:46,866-Speed 5168.51 samples/sec Loss 5.2232 LearningRate 0.0792 Epoch: 2 Global Step: 36770 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:42:48,828-Speed 5219.07 samples/sec Loss 5.2643 LearningRate 0.0792 Epoch: 2 Global Step: 36780 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:42:50,806-Speed 5179.64 samples/sec Loss 5.2827 LearningRate 0.0792 Epoch: 2 Global Step: 36790 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:42:52,771-Speed 5213.94 samples/sec Loss 5.1653 LearningRate 0.0792 Epoch: 2 Global Step: 36800 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:42:54,732-Speed 5223.71 samples/sec Loss 5.2689 LearningRate 0.0792 Epoch: 2 Global Step: 36810 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:42:56,701-Speed 5203.54 samples/sec Loss 5.2218 LearningRate 0.0792 Epoch: 2 Global Step: 36820 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-11 01:42:58,663-Speed 5219.98 samples/sec Loss 5.1416 LearningRate 0.0792 Epoch: 2 Global Step: 36830 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:43:00,647-Speed 5161.57 samples/sec Loss 5.2312 LearningRate 0.0791 Epoch: 2 Global Step: 36840 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:43:02,624-Speed 5181.70 samples/sec Loss 5.1339 LearningRate 0.0791 Epoch: 2 Global Step: 36850 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:43:04,598-Speed 5189.39 samples/sec Loss 5.2136 LearningRate 0.0791 Epoch: 2 Global Step: 36860 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:43:06,558-Speed 5225.19 samples/sec Loss 5.2140 LearningRate 0.0791 Epoch: 2 Global Step: 36870 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:43:08,525-Speed 5208.18 samples/sec Loss 5.2696 LearningRate 0.0791 Epoch: 2 Global Step: 36880 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:43:10,491-Speed 5209.64 samples/sec Loss 5.2483 LearningRate 0.0791 Epoch: 2 Global Step: 36890 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:43:12,465-Speed 5190.28 samples/sec Loss 5.2247 LearningRate 0.0791 Epoch: 2 Global Step: 36900 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:43:14,437-Speed 5193.75 samples/sec Loss 5.1798 LearningRate 0.0791 Epoch: 2 Global Step: 36910 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:43:16,406-Speed 5202.94 samples/sec Loss 5.2719 LearningRate 0.0791 Epoch: 2 Global Step: 36920 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:43:18,366-Speed 5225.96 samples/sec Loss 5.1463 LearningRate 0.0791 Epoch: 2 Global Step: 36930 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:43:20,338-Speed 5194.28 samples/sec Loss 5.2206 LearningRate 0.0791 Epoch: 2 Global Step: 36940 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:43:22,316-Speed 5179.75 samples/sec Loss 5.2446 LearningRate 0.0791 Epoch: 2 Global Step: 36950 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:43:24,289-Speed 5190.73 samples/sec Loss 5.2246 LearningRate 0.0791 Epoch: 2 Global Step: 36960 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:43:26,281-Speed 5142.98 samples/sec Loss 5.1190 LearningRate 0.0791 Epoch: 2 Global Step: 36970 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:43:28,255-Speed 5190.69 samples/sec Loss 5.1865 LearningRate 0.0791 Epoch: 2 Global Step: 36980 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:43:30,221-Speed 5209.75 samples/sec Loss 5.2625 LearningRate 0.0791 Epoch: 2 Global Step: 36990 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:43:32,185-Speed 5216.40 samples/sec Loss 5.1577 LearningRate 0.0791 Epoch: 2 Global Step: 37000 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:43:34,146-Speed 5223.70 samples/sec Loss 5.1570 LearningRate 0.0791 Epoch: 2 Global Step: 37010 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:43:36,129-Speed 5165.59 samples/sec Loss 5.2016 LearningRate 0.0791 Epoch: 2 Global Step: 37020 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:43:38,118-Speed 5149.21 samples/sec Loss 5.2727 LearningRate 0.0790 Epoch: 2 Global Step: 37030 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:43:40,083-Speed 5213.91 samples/sec Loss 5.2172 LearningRate 0.0790 Epoch: 2 Global Step: 37040 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:43:42,051-Speed 5203.39 samples/sec Loss 5.1754 LearningRate 0.0790 Epoch: 2 Global Step: 37050 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:43:44,018-Speed 5208.84 samples/sec Loss 5.1604 LearningRate 0.0790 Epoch: 2 Global Step: 37060 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:43:45,988-Speed 5198.59 samples/sec Loss 5.2067 LearningRate 0.0790 Epoch: 2 Global Step: 37070 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:43:47,952-Speed 5215.87 samples/sec Loss 5.2253 LearningRate 0.0790 Epoch: 2 Global Step: 37080 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:43:49,939-Speed 5155.39 samples/sec Loss 5.2080 LearningRate 0.0790 Epoch: 2 Global Step: 37090 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:43:51,925-Speed 5159.31 samples/sec Loss 5.1452 LearningRate 0.0790 Epoch: 2 Global Step: 37100 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:43:53,926-Speed 5118.04 samples/sec Loss 5.1944 LearningRate 0.0790 Epoch: 2 Global Step: 37110 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:43:55,897-Speed 5196.91 samples/sec Loss 5.2179 LearningRate 0.0790 Epoch: 2 Global Step: 37120 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:43:57,885-Speed 5154.11 samples/sec Loss 5.2131 LearningRate 0.0790 Epoch: 2 Global Step: 37130 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:43:59,853-Speed 5203.83 samples/sec Loss 5.2230 LearningRate 0.0790 Epoch: 2 Global Step: 37140 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:44:01,825-Speed 5193.99 samples/sec Loss 5.0404 LearningRate 0.0790 Epoch: 2 Global Step: 37150 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:44:03,801-Speed 5185.85 samples/sec Loss 5.2049 LearningRate 0.0790 Epoch: 2 Global Step: 37160 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:44:05,772-Speed 5194.77 samples/sec Loss 5.3120 LearningRate 0.0790 Epoch: 2 Global Step: 37170 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:44:07,737-Speed 5213.21 samples/sec Loss 5.2889 LearningRate 0.0790 Epoch: 2 Global Step: 37180 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:44:09,701-Speed 5216.05 samples/sec Loss 5.2632 LearningRate 0.0790 Epoch: 2 Global Step: 37190 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:44:11,674-Speed 5191.14 samples/sec Loss 5.2419 LearningRate 0.0790 Epoch: 2 Global Step: 37200 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:44:13,650-Speed 5185.31 samples/sec Loss 5.1778 LearningRate 0.0789 Epoch: 2 Global Step: 37210 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:44:15,621-Speed 5196.85 samples/sec Loss 5.2001 LearningRate 0.0789 Epoch: 2 Global Step: 37220 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:44:17,590-Speed 5201.03 samples/sec Loss 5.1836 LearningRate 0.0789 Epoch: 2 Global Step: 37230 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:44:19,563-Speed 5193.26 samples/sec Loss 5.2936 LearningRate 0.0789 Epoch: 2 Global Step: 37240 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:44:21,523-Speed 5228.08 samples/sec Loss 5.1899 LearningRate 0.0789 Epoch: 2 Global Step: 37250 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:44:23,493-Speed 5199.25 samples/sec Loss 5.2713 LearningRate 0.0789 Epoch: 2 Global Step: 37260 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:44:25,488-Speed 5133.97 samples/sec Loss 5.1807 LearningRate 0.0789 Epoch: 2 Global Step: 37270 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:44:27,476-Speed 5153.09 samples/sec Loss 5.2572 LearningRate 0.0789 Epoch: 2 Global Step: 37280 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:44:29,441-Speed 5211.32 samples/sec Loss 5.2358 LearningRate 0.0789 Epoch: 2 Global Step: 37290 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:44:31,404-Speed 5219.36 samples/sec Loss 5.1559 LearningRate 0.0789 Epoch: 2 Global Step: 37300 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:44:33,378-Speed 5188.45 samples/sec Loss 5.2114 LearningRate 0.0789 Epoch: 2 Global Step: 37310 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:44:35,352-Speed 5187.80 samples/sec Loss 5.2182 LearningRate 0.0789 Epoch: 2 Global Step: 37320 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:44:37,323-Speed 5197.41 samples/sec Loss 5.1928 LearningRate 0.0789 Epoch: 2 Global Step: 37330 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:44:39,293-Speed 5201.35 samples/sec Loss 5.1254 LearningRate 0.0789 Epoch: 2 Global Step: 37340 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:44:41,261-Speed 5204.87 samples/sec Loss 5.2470 LearningRate 0.0789 Epoch: 2 Global Step: 37350 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:44:43,250-Speed 5149.51 samples/sec Loss 5.2434 LearningRate 0.0789 Epoch: 2 Global Step: 37360 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:44:45,231-Speed 5171.69 samples/sec Loss 5.1506 LearningRate 0.0789 Epoch: 2 Global Step: 37370 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:44:47,233-Speed 5115.20 samples/sec Loss 5.2036 LearningRate 0.0789 Epoch: 2 Global Step: 37380 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:44:49,202-Speed 5203.26 samples/sec Loss 5.2000 LearningRate 0.0789 Epoch: 2 Global Step: 37390 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:44:51,165-Speed 5217.91 samples/sec Loss 5.2461 LearningRate 0.0788 Epoch: 2 Global Step: 37400 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:44:53,146-Speed 5172.23 samples/sec Loss 5.2910 LearningRate 0.0788 Epoch: 2 Global Step: 37410 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:44:55,117-Speed 5196.34 samples/sec Loss 5.2091 LearningRate 0.0788 Epoch: 2 Global Step: 37420 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:44:57,081-Speed 5214.95 samples/sec Loss 5.1902 LearningRate 0.0788 Epoch: 2 Global Step: 37430 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:44:59,084-Speed 5114.98 samples/sec Loss 5.1900 LearningRate 0.0788 Epoch: 2 Global Step: 37440 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:45:01,052-Speed 5205.42 samples/sec Loss 5.2105 LearningRate 0.0788 Epoch: 2 Global Step: 37450 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:45:03,035-Speed 5165.48 samples/sec Loss 5.2517 LearningRate 0.0788 Epoch: 2 Global Step: 37460 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:45:05,006-Speed 5196.25 samples/sec Loss 5.2458 LearningRate 0.0788 Epoch: 2 Global Step: 37470 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:45:06,983-Speed 5182.71 samples/sec Loss 5.2865 LearningRate 0.0788 Epoch: 2 Global Step: 37480 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:45:08,967-Speed 5161.72 samples/sec Loss 5.2424 LearningRate 0.0788 Epoch: 2 Global Step: 37490 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:45:10,941-Speed 5188.91 samples/sec Loss 5.2380 LearningRate 0.0788 Epoch: 2 Global Step: 37500 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:45:12,916-Speed 5188.51 samples/sec Loss 5.2317 LearningRate 0.0788 Epoch: 2 Global Step: 37510 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:45:14,906-Speed 5145.35 samples/sec Loss 5.1684 LearningRate 0.0788 Epoch: 2 Global Step: 37520 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:45:16,877-Speed 5197.39 samples/sec Loss 5.1762 LearningRate 0.0788 Epoch: 2 Global Step: 37530 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:45:18,849-Speed 5196.25 samples/sec Loss 5.1858 LearningRate 0.0788 Epoch: 2 Global Step: 37540 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:45:20,813-Speed 5214.05 samples/sec Loss 5.2340 LearningRate 0.0788 Epoch: 2 Global Step: 37550 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:45:22,782-Speed 5201.69 samples/sec Loss 5.1659 LearningRate 0.0788 Epoch: 2 Global Step: 37560 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:45:24,772-Speed 5147.36 samples/sec Loss 5.2126 LearningRate 0.0788 Epoch: 2 Global Step: 37570 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:45:26,737-Speed 5215.21 samples/sec Loss 5.3151 LearningRate 0.0788 Epoch: 2 Global Step: 37580 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:45:28,721-Speed 5162.32 samples/sec Loss 5.1580 LearningRate 0.0787 Epoch: 2 Global Step: 37590 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:45:30,686-Speed 5213.07 samples/sec Loss 5.0913 LearningRate 0.0787 Epoch: 2 Global Step: 37600 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:45:32,678-Speed 5143.14 samples/sec Loss 5.1845 LearningRate 0.0787 Epoch: 2 Global Step: 37610 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:45:34,646-Speed 5202.84 samples/sec Loss 5.1662 LearningRate 0.0787 Epoch: 2 Global Step: 37620 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:45:36,632-Speed 5158.44 samples/sec Loss 5.2623 LearningRate 0.0787 Epoch: 2 Global Step: 37630 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-11 01:45:38,593-Speed 5223.18 samples/sec Loss 5.1541 LearningRate 0.0787 Epoch: 2 Global Step: 37640 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:45:40,565-Speed 5195.04 samples/sec Loss 5.0734 LearningRate 0.0787 Epoch: 2 Global Step: 37650 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:45:42,544-Speed 5175.99 samples/sec Loss 5.2510 LearningRate 0.0787 Epoch: 2 Global Step: 37660 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:45:44,524-Speed 5173.99 samples/sec Loss 5.2380 LearningRate 0.0787 Epoch: 2 Global Step: 37670 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:45:46,499-Speed 5184.88 samples/sec Loss 5.1838 LearningRate 0.0787 Epoch: 2 Global Step: 37680 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:45:48,506-Speed 5105.29 samples/sec Loss 5.0484 LearningRate 0.0787 Epoch: 2 Global Step: 37690 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:45:50,487-Speed 5171.78 samples/sec Loss 5.1345 LearningRate 0.0787 Epoch: 2 Global Step: 37700 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:45:52,457-Speed 5198.33 samples/sec Loss 5.2778 LearningRate 0.0787 Epoch: 2 Global Step: 37710 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:45:54,422-Speed 5213.17 samples/sec Loss 5.2175 LearningRate 0.0787 Epoch: 2 Global Step: 37720 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:45:56,405-Speed 5166.36 samples/sec Loss 5.2129 LearningRate 0.0787 Epoch: 2 Global Step: 37730 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:45:58,395-Speed 5147.18 samples/sec Loss 5.1235 LearningRate 0.0787 Epoch: 2 Global Step: 37740 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:46:00,369-Speed 5190.02 samples/sec Loss 5.1509 LearningRate 0.0787 Epoch: 2 Global Step: 37750 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:46:02,371-Speed 5114.66 samples/sec Loss 5.2405 LearningRate 0.0787 Epoch: 2 Global Step: 37760 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:46:04,344-Speed 5192.20 samples/sec Loss 5.1208 LearningRate 0.0787 Epoch: 2 Global Step: 37770 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:46:06,309-Speed 5211.91 samples/sec Loss 5.2231 LearningRate 0.0786 Epoch: 2 Global Step: 37780 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:46:08,268-Speed 5230.37 samples/sec Loss 5.2480 LearningRate 0.0786 Epoch: 2 Global Step: 37790 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:46:10,232-Speed 5214.28 samples/sec Loss 5.2648 LearningRate 0.0786 Epoch: 2 Global Step: 37800 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:46:12,207-Speed 5188.16 samples/sec Loss 5.1817 LearningRate 0.0786 Epoch: 2 Global Step: 37810 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:46:14,182-Speed 5187.11 samples/sec Loss 5.1664 LearningRate 0.0786 Epoch: 2 Global Step: 37820 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:46:16,148-Speed 5210.84 samples/sec Loss 5.1672 LearningRate 0.0786 Epoch: 2 Global Step: 37830 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:46:18,114-Speed 5209.51 samples/sec Loss 5.0871 LearningRate 0.0786 Epoch: 2 Global Step: 37840 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:46:20,079-Speed 5212.30 samples/sec Loss 5.1599 LearningRate 0.0786 Epoch: 2 Global Step: 37850 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:46:22,055-Speed 5184.78 samples/sec Loss 5.2539 LearningRate 0.0786 Epoch: 2 Global Step: 37860 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:46:24,035-Speed 5172.13 samples/sec Loss 5.2052 LearningRate 0.0786 Epoch: 2 Global Step: 37870 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:46:26,017-Speed 5168.36 samples/sec Loss 5.2491 LearningRate 0.0786 Epoch: 2 Global Step: 37880 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:46:27,980-Speed 5219.97 samples/sec Loss 5.2117 LearningRate 0.0786 Epoch: 2 Global Step: 37890 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:46:29,960-Speed 5172.20 samples/sec Loss 5.2030 LearningRate 0.0786 Epoch: 2 Global Step: 37900 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:46:31,914-Speed 5241.27 samples/sec Loss 5.2216 LearningRate 0.0786 Epoch: 2 Global Step: 37910 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:46:33,887-Speed 5191.35 samples/sec Loss 5.1259 LearningRate 0.0786 Epoch: 2 Global Step: 37920 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:46:35,863-Speed 5186.10 samples/sec Loss 5.1930 LearningRate 0.0786 Epoch: 2 Global Step: 37930 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:46:37,832-Speed 5202.96 samples/sec Loss 5.2501 LearningRate 0.0786 Epoch: 2 Global Step: 37940 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:46:39,809-Speed 5181.13 samples/sec Loss 5.1719 LearningRate 0.0786 Epoch: 2 Global Step: 37950 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:46:41,800-Speed 5145.02 samples/sec Loss 5.1852 LearningRate 0.0786 Epoch: 2 Global Step: 37960 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:46:43,762-Speed 5220.78 samples/sec Loss 5.2476 LearningRate 0.0785 Epoch: 2 Global Step: 37970 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:46:45,724-Speed 5219.49 samples/sec Loss 5.1256 LearningRate 0.0785 Epoch: 2 Global Step: 37980 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:46:47,717-Speed 5140.19 samples/sec Loss 5.1790 LearningRate 0.0785 Epoch: 2 Global Step: 37990 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:46:49,698-Speed 5171.46 samples/sec Loss 5.1037 LearningRate 0.0785 Epoch: 2 Global Step: 38000 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-11 01:47:16,210-[lfw][38000]XNorm: 22.068708 Training: 2022-04-11 01:47:16,211-[lfw][38000]Accuracy-Flip: 0.99683+-0.00229 Training: 2022-04-11 01:47:16,211-[lfw][38000]Accuracy-Highest: 0.99717 Training: 2022-04-11 01:47:46,880-[cfp_fp][38000]XNorm: 19.929096 Training: 2022-04-11 01:47:46,881-[cfp_fp][38000]Accuracy-Flip: 0.97414+-0.00505 Training: 2022-04-11 01:47:46,881-[cfp_fp][38000]Accuracy-Highest: 0.97486 Training: 2022-04-11 01:48:13,402-[agedb_30][38000]XNorm: 21.741211 Training: 2022-04-11 01:48:13,403-[agedb_30][38000]Accuracy-Flip: 0.97550+-0.00695 Training: 2022-04-11 01:48:13,403-[agedb_30][38000]Accuracy-Highest: 0.97550 Training: 2022-04-11 01:48:15,384-Speed 119.51 samples/sec Loss 5.1568 LearningRate 0.0785 Epoch: 2 Global Step: 38010 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:48:17,337-Speed 5244.56 samples/sec Loss 5.1907 LearningRate 0.0785 Epoch: 2 Global Step: 38020 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:48:19,293-Speed 5237.50 samples/sec Loss 5.1383 LearningRate 0.0785 Epoch: 2 Global Step: 38030 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:48:21,248-Speed 5240.57 samples/sec Loss 5.1900 LearningRate 0.0785 Epoch: 2 Global Step: 38040 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:48:23,237-Speed 5149.50 samples/sec Loss 5.1606 LearningRate 0.0785 Epoch: 2 Global Step: 38050 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:48:25,215-Speed 5178.92 samples/sec Loss 5.1862 LearningRate 0.0785 Epoch: 2 Global Step: 38060 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:48:27,174-Speed 5228.81 samples/sec Loss 5.1907 LearningRate 0.0785 Epoch: 2 Global Step: 38070 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:48:29,135-Speed 5223.72 samples/sec Loss 5.1786 LearningRate 0.0785 Epoch: 2 Global Step: 38080 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:48:31,091-Speed 5236.28 samples/sec Loss 5.1820 LearningRate 0.0785 Epoch: 2 Global Step: 38090 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:48:33,074-Speed 5166.88 samples/sec Loss 5.3045 LearningRate 0.0785 Epoch: 2 Global Step: 38100 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:48:35,049-Speed 5186.46 samples/sec Loss 5.1994 LearningRate 0.0785 Epoch: 2 Global Step: 38110 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:48:37,027-Speed 5177.95 samples/sec Loss 5.2323 LearningRate 0.0785 Epoch: 2 Global Step: 38120 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:48:38,991-Speed 5214.50 samples/sec Loss 5.2749 LearningRate 0.0785 Epoch: 2 Global Step: 38130 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:48:40,970-Speed 5178.05 samples/sec Loss 5.1188 LearningRate 0.0785 Epoch: 2 Global Step: 38140 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:48:42,954-Speed 5160.98 samples/sec Loss 5.0217 LearningRate 0.0784 Epoch: 2 Global Step: 38150 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:48:44,925-Speed 5198.61 samples/sec Loss 5.3111 LearningRate 0.0784 Epoch: 2 Global Step: 38160 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:48:46,897-Speed 5192.90 samples/sec Loss 5.2433 LearningRate 0.0784 Epoch: 2 Global Step: 38170 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:48:48,939-Speed 5020.20 samples/sec Loss 5.2500 LearningRate 0.0784 Epoch: 2 Global Step: 38180 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:48:50,959-Speed 5072.32 samples/sec Loss 5.2175 LearningRate 0.0784 Epoch: 2 Global Step: 38190 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:48:52,940-Speed 5170.74 samples/sec Loss 5.1174 LearningRate 0.0784 Epoch: 2 Global Step: 38200 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:48:54,906-Speed 5210.00 samples/sec Loss 5.2072 LearningRate 0.0784 Epoch: 2 Global Step: 38210 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-11 01:48:56,879-Speed 5190.46 samples/sec Loss 5.1836 LearningRate 0.0784 Epoch: 2 Global Step: 38220 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:48:58,848-Speed 5203.00 samples/sec Loss 5.0999 LearningRate 0.0784 Epoch: 2 Global Step: 38230 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:49:00,848-Speed 5120.56 samples/sec Loss 5.1720 LearningRate 0.0784 Epoch: 2 Global Step: 38240 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:49:02,834-Speed 5159.86 samples/sec Loss 5.1978 LearningRate 0.0784 Epoch: 2 Global Step: 38250 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:49:04,811-Speed 5180.24 samples/sec Loss 5.2380 LearningRate 0.0784 Epoch: 2 Global Step: 38260 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:49:06,773-Speed 5221.11 samples/sec Loss 5.1845 LearningRate 0.0784 Epoch: 2 Global Step: 38270 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:49:08,767-Speed 5137.77 samples/sec Loss 5.1593 LearningRate 0.0784 Epoch: 2 Global Step: 38280 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:49:10,747-Speed 5174.19 samples/sec Loss 5.2354 LearningRate 0.0784 Epoch: 2 Global Step: 38290 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:49:12,725-Speed 5178.26 samples/sec Loss 5.1350 LearningRate 0.0784 Epoch: 2 Global Step: 38300 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:49:14,702-Speed 5179.52 samples/sec Loss 5.2384 LearningRate 0.0784 Epoch: 2 Global Step: 38310 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:49:16,670-Speed 5205.43 samples/sec Loss 5.1789 LearningRate 0.0784 Epoch: 2 Global Step: 38320 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:49:18,645-Speed 5186.26 samples/sec Loss 5.2429 LearningRate 0.0784 Epoch: 2 Global Step: 38330 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:49:20,619-Speed 5189.96 samples/sec Loss 5.1878 LearningRate 0.0783 Epoch: 2 Global Step: 38340 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:49:22,599-Speed 5174.00 samples/sec Loss 5.2237 LearningRate 0.0783 Epoch: 2 Global Step: 38350 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:49:24,575-Speed 5182.81 samples/sec Loss 5.2461 LearningRate 0.0783 Epoch: 2 Global Step: 38360 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:49:26,564-Speed 5150.82 samples/sec Loss 5.0936 LearningRate 0.0783 Epoch: 2 Global Step: 38370 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:49:28,558-Speed 5137.58 samples/sec Loss 5.2211 LearningRate 0.0783 Epoch: 2 Global Step: 38380 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:49:30,527-Speed 5201.66 samples/sec Loss 5.2227 LearningRate 0.0783 Epoch: 2 Global Step: 38390 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:49:32,495-Speed 5205.89 samples/sec Loss 5.2637 LearningRate 0.0783 Epoch: 2 Global Step: 38400 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:49:34,484-Speed 5149.09 samples/sec Loss 5.1500 LearningRate 0.0783 Epoch: 2 Global Step: 38410 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:49:36,454-Speed 5201.43 samples/sec Loss 5.2106 LearningRate 0.0783 Epoch: 2 Global Step: 38420 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:49:38,418-Speed 5215.21 samples/sec Loss 5.2165 LearningRate 0.0783 Epoch: 2 Global Step: 38430 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:49:40,380-Speed 5218.93 samples/sec Loss 5.3017 LearningRate 0.0783 Epoch: 2 Global Step: 38440 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:49:42,346-Speed 5211.13 samples/sec Loss 5.2488 LearningRate 0.0783 Epoch: 2 Global Step: 38450 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:49:44,326-Speed 5173.96 samples/sec Loss 5.2669 LearningRate 0.0783 Epoch: 2 Global Step: 38460 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:49:46,299-Speed 5192.43 samples/sec Loss 5.1696 LearningRate 0.0783 Epoch: 2 Global Step: 38470 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:49:48,335-Speed 5030.40 samples/sec Loss 5.2593 LearningRate 0.0783 Epoch: 2 Global Step: 38480 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:49:50,316-Speed 5170.46 samples/sec Loss 5.1941 LearningRate 0.0783 Epoch: 2 Global Step: 38490 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:49:52,283-Speed 5208.88 samples/sec Loss 5.1581 LearningRate 0.0783 Epoch: 2 Global Step: 38500 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:49:54,248-Speed 5212.85 samples/sec Loss 5.1076 LearningRate 0.0783 Epoch: 2 Global Step: 38510 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:49:56,203-Speed 5240.62 samples/sec Loss 5.1898 LearningRate 0.0783 Epoch: 2 Global Step: 38520 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:49:58,173-Speed 5197.46 samples/sec Loss 5.2148 LearningRate 0.0782 Epoch: 2 Global Step: 38530 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:50:00,139-Speed 5210.14 samples/sec Loss 5.2399 LearningRate 0.0782 Epoch: 2 Global Step: 38540 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:50:02,120-Speed 5171.47 samples/sec Loss 5.1669 LearningRate 0.0782 Epoch: 2 Global Step: 38550 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:50:04,090-Speed 5198.80 samples/sec Loss 5.2350 LearningRate 0.0782 Epoch: 2 Global Step: 38560 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:50:06,051-Speed 5224.71 samples/sec Loss 5.1451 LearningRate 0.0782 Epoch: 2 Global Step: 38570 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:50:08,031-Speed 5173.55 samples/sec Loss 5.1337 LearningRate 0.0782 Epoch: 2 Global Step: 38580 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:50:10,005-Speed 5186.95 samples/sec Loss 5.1247 LearningRate 0.0782 Epoch: 2 Global Step: 38590 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:50:11,997-Speed 5144.53 samples/sec Loss 5.1825 LearningRate 0.0782 Epoch: 2 Global Step: 38600 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:50:13,986-Speed 5150.79 samples/sec Loss 5.1202 LearningRate 0.0782 Epoch: 2 Global Step: 38610 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:50:15,949-Speed 5218.50 samples/sec Loss 5.1902 LearningRate 0.0782 Epoch: 2 Global Step: 38620 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:50:17,915-Speed 5209.69 samples/sec Loss 5.1107 LearningRate 0.0782 Epoch: 2 Global Step: 38630 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:50:19,892-Speed 5180.08 samples/sec Loss 5.2471 LearningRate 0.0782 Epoch: 2 Global Step: 38640 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:50:21,869-Speed 5182.72 samples/sec Loss 5.2243 LearningRate 0.0782 Epoch: 2 Global Step: 38650 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:50:23,838-Speed 5202.64 samples/sec Loss 5.3022 LearningRate 0.0782 Epoch: 2 Global Step: 38660 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:50:25,805-Speed 5205.75 samples/sec Loss 5.1962 LearningRate 0.0782 Epoch: 2 Global Step: 38670 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:50:27,788-Speed 5166.54 samples/sec Loss 5.1792 LearningRate 0.0782 Epoch: 2 Global Step: 38680 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:50:29,775-Speed 5155.27 samples/sec Loss 5.2878 LearningRate 0.0782 Epoch: 2 Global Step: 38690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:50:31,761-Speed 5157.53 samples/sec Loss 5.1619 LearningRate 0.0782 Epoch: 2 Global Step: 38700 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:50:33,721-Speed 5228.47 samples/sec Loss 5.3096 LearningRate 0.0782 Epoch: 2 Global Step: 38710 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:50:35,683-Speed 5220.49 samples/sec Loss 5.2325 LearningRate 0.0781 Epoch: 2 Global Step: 38720 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-11 01:50:37,639-Speed 5237.06 samples/sec Loss 5.1951 LearningRate 0.0781 Epoch: 2 Global Step: 38730 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:50:39,602-Speed 5216.23 samples/sec Loss 5.1922 LearningRate 0.0781 Epoch: 2 Global Step: 38740 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:50:41,570-Speed 5206.42 samples/sec Loss 5.2323 LearningRate 0.0781 Epoch: 2 Global Step: 38750 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:50:43,533-Speed 5218.18 samples/sec Loss 5.1725 LearningRate 0.0781 Epoch: 2 Global Step: 38760 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:50:45,502-Speed 5202.37 samples/sec Loss 5.1563 LearningRate 0.0781 Epoch: 2 Global Step: 38770 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:50:47,476-Speed 5188.46 samples/sec Loss 5.2311 LearningRate 0.0781 Epoch: 2 Global Step: 38780 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:50:49,444-Speed 5204.83 samples/sec Loss 5.1731 LearningRate 0.0781 Epoch: 2 Global Step: 38790 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:50:51,426-Speed 5167.55 samples/sec Loss 5.1878 LearningRate 0.0781 Epoch: 2 Global Step: 38800 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:50:53,399-Speed 5193.00 samples/sec Loss 5.1809 LearningRate 0.0781 Epoch: 2 Global Step: 38810 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:50:55,362-Speed 5219.86 samples/sec Loss 5.1692 LearningRate 0.0781 Epoch: 2 Global Step: 38820 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:50:57,345-Speed 5164.45 samples/sec Loss 5.2601 LearningRate 0.0781 Epoch: 2 Global Step: 38830 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:50:59,312-Speed 5207.16 samples/sec Loss 5.0888 LearningRate 0.0781 Epoch: 2 Global Step: 38840 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:51:01,276-Speed 5215.62 samples/sec Loss 5.2749 LearningRate 0.0781 Epoch: 2 Global Step: 38850 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:51:03,247-Speed 5198.19 samples/sec Loss 5.0936 LearningRate 0.0781 Epoch: 2 Global Step: 38860 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:51:05,217-Speed 5199.51 samples/sec Loss 5.1151 LearningRate 0.0781 Epoch: 2 Global Step: 38870 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:51:07,195-Speed 5177.42 samples/sec Loss 5.2301 LearningRate 0.0781 Epoch: 2 Global Step: 38880 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:51:09,156-Speed 5222.68 samples/sec Loss 5.1281 LearningRate 0.0781 Epoch: 2 Global Step: 38890 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:51:11,149-Speed 5139.76 samples/sec Loss 5.0963 LearningRate 0.0781 Epoch: 2 Global Step: 38900 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:51:13,122-Speed 5191.90 samples/sec Loss 5.1197 LearningRate 0.0780 Epoch: 2 Global Step: 38910 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:51:15,123-Speed 5121.35 samples/sec Loss 5.0960 LearningRate 0.0780 Epoch: 2 Global Step: 38920 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:51:17,100-Speed 5181.69 samples/sec Loss 5.1566 LearningRate 0.0780 Epoch: 2 Global Step: 38930 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:51:19,062-Speed 5219.87 samples/sec Loss 5.1671 LearningRate 0.0780 Epoch: 2 Global Step: 38940 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:51:21,022-Speed 5225.41 samples/sec Loss 5.1721 LearningRate 0.0780 Epoch: 2 Global Step: 38950 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:51:22,988-Speed 5211.72 samples/sec Loss 5.1565 LearningRate 0.0780 Epoch: 2 Global Step: 38960 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:51:24,955-Speed 5208.79 samples/sec Loss 5.1431 LearningRate 0.0780 Epoch: 2 Global Step: 38970 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:51:26,942-Speed 5155.46 samples/sec Loss 5.2100 LearningRate 0.0780 Epoch: 2 Global Step: 38980 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:51:28,915-Speed 5190.66 samples/sec Loss 5.2205 LearningRate 0.0780 Epoch: 2 Global Step: 38990 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:51:30,884-Speed 5201.38 samples/sec Loss 5.2316 LearningRate 0.0780 Epoch: 2 Global Step: 39000 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:51:32,876-Speed 5142.21 samples/sec Loss 5.1560 LearningRate 0.0780 Epoch: 2 Global Step: 39010 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:51:34,858-Speed 5167.67 samples/sec Loss 5.2152 LearningRate 0.0780 Epoch: 2 Global Step: 39020 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:51:36,837-Speed 5177.02 samples/sec Loss 5.2385 LearningRate 0.0780 Epoch: 2 Global Step: 39030 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:51:38,845-Speed 5103.24 samples/sec Loss 5.2355 LearningRate 0.0780 Epoch: 2 Global Step: 39040 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:51:40,810-Speed 5213.24 samples/sec Loss 5.2121 LearningRate 0.0780 Epoch: 2 Global Step: 39050 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:51:42,781-Speed 5195.42 samples/sec Loss 5.2307 LearningRate 0.0780 Epoch: 2 Global Step: 39060 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:51:44,758-Speed 5181.83 samples/sec Loss 5.2463 LearningRate 0.0780 Epoch: 2 Global Step: 39070 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:51:46,727-Speed 5200.95 samples/sec Loss 5.1593 LearningRate 0.0780 Epoch: 2 Global Step: 39080 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:51:48,718-Speed 5145.48 samples/sec Loss 5.1426 LearningRate 0.0780 Epoch: 2 Global Step: 39090 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:51:50,684-Speed 5209.91 samples/sec Loss 5.2308 LearningRate 0.0779 Epoch: 2 Global Step: 39100 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:51:52,666-Speed 5168.05 samples/sec Loss 5.2416 LearningRate 0.0779 Epoch: 2 Global Step: 39110 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:51:54,629-Speed 5217.93 samples/sec Loss 5.2016 LearningRate 0.0779 Epoch: 2 Global Step: 39120 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:51:56,599-Speed 5200.89 samples/sec Loss 5.1462 LearningRate 0.0779 Epoch: 2 Global Step: 39130 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:51:58,574-Speed 5186.43 samples/sec Loss 5.0794 LearningRate 0.0779 Epoch: 2 Global Step: 39140 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:52:00,551-Speed 5182.80 samples/sec Loss 5.0836 LearningRate 0.0779 Epoch: 2 Global Step: 39150 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:52:02,552-Speed 5119.33 samples/sec Loss 5.1816 LearningRate 0.0779 Epoch: 2 Global Step: 39160 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:52:04,551-Speed 5123.70 samples/sec Loss 5.2210 LearningRate 0.0779 Epoch: 2 Global Step: 39170 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:52:06,516-Speed 5211.77 samples/sec Loss 5.1484 LearningRate 0.0779 Epoch: 2 Global Step: 39180 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:52:08,488-Speed 5194.59 samples/sec Loss 5.2137 LearningRate 0.0779 Epoch: 2 Global Step: 39190 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:52:10,477-Speed 5151.23 samples/sec Loss 5.1666 LearningRate 0.0779 Epoch: 2 Global Step: 39200 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:52:12,440-Speed 5216.14 samples/sec Loss 5.1878 LearningRate 0.0779 Epoch: 2 Global Step: 39210 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:52:14,424-Speed 5164.93 samples/sec Loss 5.2022 LearningRate 0.0779 Epoch: 2 Global Step: 39220 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:52:16,415-Speed 5143.25 samples/sec Loss 5.1784 LearningRate 0.0779 Epoch: 2 Global Step: 39230 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:52:18,394-Speed 5178.87 samples/sec Loss 5.2661 LearningRate 0.0779 Epoch: 2 Global Step: 39240 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:52:20,362-Speed 5203.32 samples/sec Loss 5.2305 LearningRate 0.0779 Epoch: 2 Global Step: 39250 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:52:22,329-Speed 5207.09 samples/sec Loss 5.1641 LearningRate 0.0779 Epoch: 2 Global Step: 39260 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:52:24,298-Speed 5205.38 samples/sec Loss 5.1411 LearningRate 0.0779 Epoch: 2 Global Step: 39270 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:52:26,264-Speed 5209.19 samples/sec Loss 5.2838 LearningRate 0.0779 Epoch: 2 Global Step: 39280 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:52:28,235-Speed 5196.11 samples/sec Loss 5.2083 LearningRate 0.0778 Epoch: 2 Global Step: 39290 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:52:30,203-Speed 5204.45 samples/sec Loss 5.1923 LearningRate 0.0778 Epoch: 2 Global Step: 39300 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:52:32,170-Speed 5209.31 samples/sec Loss 5.0804 LearningRate 0.0778 Epoch: 2 Global Step: 39310 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:52:34,167-Speed 5127.77 samples/sec Loss 5.1778 LearningRate 0.0778 Epoch: 2 Global Step: 39320 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:52:36,147-Speed 5173.10 samples/sec Loss 5.2298 LearningRate 0.0778 Epoch: 2 Global Step: 39330 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:52:38,138-Speed 5146.25 samples/sec Loss 5.2294 LearningRate 0.0778 Epoch: 2 Global Step: 39340 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:52:40,115-Speed 5181.46 samples/sec Loss 5.2496 LearningRate 0.0778 Epoch: 2 Global Step: 39350 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:52:42,091-Speed 5182.97 samples/sec Loss 5.1415 LearningRate 0.0778 Epoch: 2 Global Step: 39360 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:52:44,069-Speed 5181.04 samples/sec Loss 5.1170 LearningRate 0.0778 Epoch: 2 Global Step: 39370 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:52:46,055-Speed 5156.52 samples/sec Loss 5.1898 LearningRate 0.0778 Epoch: 2 Global Step: 39380 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:52:48,046-Speed 5145.00 samples/sec Loss 5.1465 LearningRate 0.0778 Epoch: 2 Global Step: 39390 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:52:50,021-Speed 5185.22 samples/sec Loss 5.1807 LearningRate 0.0778 Epoch: 2 Global Step: 39400 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:52:51,992-Speed 5197.67 samples/sec Loss 5.1428 LearningRate 0.0778 Epoch: 2 Global Step: 39410 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:52:53,968-Speed 5184.45 samples/sec Loss 5.1406 LearningRate 0.0778 Epoch: 2 Global Step: 39420 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:52:55,944-Speed 5183.54 samples/sec Loss 5.1844 LearningRate 0.0778 Epoch: 2 Global Step: 39430 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:52:57,924-Speed 5174.75 samples/sec Loss 5.2423 LearningRate 0.0778 Epoch: 2 Global Step: 39440 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:52:59,900-Speed 5183.92 samples/sec Loss 5.2420 LearningRate 0.0778 Epoch: 2 Global Step: 39450 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:53:01,877-Speed 5181.06 samples/sec Loss 5.1893 LearningRate 0.0778 Epoch: 2 Global Step: 39460 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:53:03,864-Speed 5155.17 samples/sec Loss 5.1847 LearningRate 0.0778 Epoch: 2 Global Step: 39470 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:53:05,835-Speed 5196.55 samples/sec Loss 5.1269 LearningRate 0.0777 Epoch: 2 Global Step: 39480 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:53:07,797-Speed 5221.26 samples/sec Loss 5.1161 LearningRate 0.0777 Epoch: 2 Global Step: 39490 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:53:09,766-Speed 5203.44 samples/sec Loss 5.1409 LearningRate 0.0777 Epoch: 2 Global Step: 39500 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:53:11,759-Speed 5138.05 samples/sec Loss 5.2289 LearningRate 0.0777 Epoch: 2 Global Step: 39510 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:53:13,722-Speed 5217.79 samples/sec Loss 5.1028 LearningRate 0.0777 Epoch: 2 Global Step: 39520 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:53:15,693-Speed 5197.62 samples/sec Loss 5.1480 LearningRate 0.0777 Epoch: 2 Global Step: 39530 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:53:17,660-Speed 5207.23 samples/sec Loss 5.1112 LearningRate 0.0777 Epoch: 2 Global Step: 39540 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:53:19,624-Speed 5216.80 samples/sec Loss 5.1585 LearningRate 0.0777 Epoch: 2 Global Step: 39550 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:53:21,593-Speed 5202.05 samples/sec Loss 5.1775 LearningRate 0.0777 Epoch: 2 Global Step: 39560 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:53:23,558-Speed 5212.16 samples/sec Loss 5.1880 LearningRate 0.0777 Epoch: 2 Global Step: 39570 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:53:25,524-Speed 5212.01 samples/sec Loss 5.1031 LearningRate 0.0777 Epoch: 2 Global Step: 39580 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:53:27,492-Speed 5203.65 samples/sec Loss 5.1568 LearningRate 0.0777 Epoch: 2 Global Step: 39590 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:53:29,492-Speed 5121.95 samples/sec Loss 5.1791 LearningRate 0.0777 Epoch: 2 Global Step: 39600 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:53:31,460-Speed 5204.28 samples/sec Loss 5.0781 LearningRate 0.0777 Epoch: 2 Global Step: 39610 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:53:33,454-Speed 5138.43 samples/sec Loss 5.2825 LearningRate 0.0777 Epoch: 2 Global Step: 39620 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:53:35,423-Speed 5203.13 samples/sec Loss 5.2145 LearningRate 0.0777 Epoch: 2 Global Step: 39630 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:53:37,399-Speed 5183.43 samples/sec Loss 5.1117 LearningRate 0.0777 Epoch: 2 Global Step: 39640 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:53:39,370-Speed 5196.70 samples/sec Loss 5.1198 LearningRate 0.0777 Epoch: 2 Global Step: 39650 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:53:41,341-Speed 5197.42 samples/sec Loss 5.1188 LearningRate 0.0777 Epoch: 2 Global Step: 39660 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:53:43,316-Speed 5185.61 samples/sec Loss 5.0813 LearningRate 0.0776 Epoch: 2 Global Step: 39670 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:53:45,287-Speed 5197.01 samples/sec Loss 5.1023 LearningRate 0.0776 Epoch: 2 Global Step: 39680 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:53:47,254-Speed 5209.24 samples/sec Loss 5.1470 LearningRate 0.0776 Epoch: 2 Global Step: 39690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:53:49,222-Speed 5205.06 samples/sec Loss 5.0860 LearningRate 0.0776 Epoch: 2 Global Step: 39700 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:53:51,188-Speed 5209.45 samples/sec Loss 5.1742 LearningRate 0.0776 Epoch: 2 Global Step: 39710 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:53:53,159-Speed 5196.10 samples/sec Loss 5.1315 LearningRate 0.0776 Epoch: 2 Global Step: 39720 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:53:55,130-Speed 5198.93 samples/sec Loss 5.1493 LearningRate 0.0776 Epoch: 2 Global Step: 39730 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:53:57,104-Speed 5189.21 samples/sec Loss 5.0952 LearningRate 0.0776 Epoch: 2 Global Step: 39740 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:53:59,076-Speed 5193.65 samples/sec Loss 5.1822 LearningRate 0.0776 Epoch: 2 Global Step: 39750 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:54:01,074-Speed 5125.94 samples/sec Loss 5.1125 LearningRate 0.0776 Epoch: 2 Global Step: 39760 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:54:03,070-Speed 5133.20 samples/sec Loss 5.0786 LearningRate 0.0776 Epoch: 2 Global Step: 39770 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:54:05,060-Speed 5148.49 samples/sec Loss 5.1524 LearningRate 0.0776 Epoch: 2 Global Step: 39780 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:54:07,039-Speed 5176.33 samples/sec Loss 5.0528 LearningRate 0.0776 Epoch: 2 Global Step: 39790 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-11 01:54:09,001-Speed 5220.00 samples/sec Loss 5.1918 LearningRate 0.0776 Epoch: 2 Global Step: 39800 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:54:11,001-Speed 5122.19 samples/sec Loss 5.1349 LearningRate 0.0776 Epoch: 2 Global Step: 39810 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:54:12,973-Speed 5192.27 samples/sec Loss 5.2670 LearningRate 0.0776 Epoch: 2 Global Step: 39820 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:54:14,957-Speed 5162.76 samples/sec Loss 5.1450 LearningRate 0.0776 Epoch: 2 Global Step: 39830 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:54:16,948-Speed 5146.77 samples/sec Loss 5.1887 LearningRate 0.0776 Epoch: 2 Global Step: 39840 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:54:18,919-Speed 5195.38 samples/sec Loss 5.1640 LearningRate 0.0775 Epoch: 2 Global Step: 39850 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:54:20,897-Speed 5180.20 samples/sec Loss 5.1933 LearningRate 0.0775 Epoch: 2 Global Step: 39860 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:54:22,867-Speed 5198.34 samples/sec Loss 5.2096 LearningRate 0.0775 Epoch: 2 Global Step: 39870 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:54:24,845-Speed 5180.46 samples/sec Loss 5.1688 LearningRate 0.0775 Epoch: 2 Global Step: 39880 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:54:26,810-Speed 5213.07 samples/sec Loss 5.1094 LearningRate 0.0775 Epoch: 2 Global Step: 39890 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:54:28,796-Speed 5156.35 samples/sec Loss 5.2081 LearningRate 0.0775 Epoch: 2 Global Step: 39900 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:54:30,767-Speed 5197.04 samples/sec Loss 5.2606 LearningRate 0.0775 Epoch: 2 Global Step: 39910 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:54:32,737-Speed 5200.38 samples/sec Loss 5.2151 LearningRate 0.0775 Epoch: 2 Global Step: 39920 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:54:34,733-Speed 5132.99 samples/sec Loss 5.1563 LearningRate 0.0775 Epoch: 2 Global Step: 39930 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:54:36,710-Speed 5180.70 samples/sec Loss 5.0424 LearningRate 0.0775 Epoch: 2 Global Step: 39940 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:54:38,694-Speed 5161.74 samples/sec Loss 5.0941 LearningRate 0.0775 Epoch: 2 Global Step: 39950 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:54:40,682-Speed 5153.27 samples/sec Loss 5.0607 LearningRate 0.0775 Epoch: 2 Global Step: 39960 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:54:42,671-Speed 5149.81 samples/sec Loss 5.2291 LearningRate 0.0775 Epoch: 2 Global Step: 39970 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:54:44,659-Speed 5151.69 samples/sec Loss 5.1579 LearningRate 0.0775 Epoch: 2 Global Step: 39980 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:54:46,634-Speed 5189.77 samples/sec Loss 5.0583 LearningRate 0.0775 Epoch: 2 Global Step: 39990 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:54:48,616-Speed 5166.27 samples/sec Loss 5.1614 LearningRate 0.0775 Epoch: 2 Global Step: 40000 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:55:15,162-[lfw][40000]XNorm: 23.089730 Training: 2022-04-11 01:55:15,163-[lfw][40000]Accuracy-Flip: 0.99733+-0.00309 Training: 2022-04-11 01:55:15,163-[lfw][40000]Accuracy-Highest: 0.99733 Training: 2022-04-11 01:55:45,803-[cfp_fp][40000]XNorm: 20.758932 Training: 2022-04-11 01:55:45,804-[cfp_fp][40000]Accuracy-Flip: 0.97486+-0.00806 Training: 2022-04-11 01:55:45,804-[cfp_fp][40000]Accuracy-Highest: 0.97486 Training: 2022-04-11 01:56:12,210-[agedb_30][40000]XNorm: 22.522756 Training: 2022-04-11 01:56:12,211-[agedb_30][40000]Accuracy-Flip: 0.97433+-0.00841 Training: 2022-04-11 01:56:12,211-[agedb_30][40000]Accuracy-Highest: 0.97550 Training: 2022-04-11 01:56:14,212-Speed 119.63 samples/sec Loss 5.0914 LearningRate 0.0775 Epoch: 2 Global Step: 40010 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:56:16,170-Speed 5233.37 samples/sec Loss 5.1139 LearningRate 0.0775 Epoch: 2 Global Step: 40020 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:56:18,143-Speed 5190.17 samples/sec Loss 5.1557 LearningRate 0.0775 Epoch: 2 Global Step: 40030 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:56:20,110-Speed 5208.15 samples/sec Loss 5.1548 LearningRate 0.0774 Epoch: 2 Global Step: 40040 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:56:22,088-Speed 5178.22 samples/sec Loss 5.1917 LearningRate 0.0774 Epoch: 2 Global Step: 40050 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:56:24,085-Speed 5129.72 samples/sec Loss 5.1829 LearningRate 0.0774 Epoch: 2 Global Step: 40060 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:56:26,072-Speed 5154.44 samples/sec Loss 5.2544 LearningRate 0.0774 Epoch: 2 Global Step: 40070 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:56:28,056-Speed 5162.70 samples/sec Loss 5.1850 LearningRate 0.0774 Epoch: 2 Global Step: 40080 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:56:30,030-Speed 5190.64 samples/sec Loss 5.1787 LearningRate 0.0774 Epoch: 2 Global Step: 40090 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:56:31,996-Speed 5209.47 samples/sec Loss 5.1955 LearningRate 0.0774 Epoch: 2 Global Step: 40100 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:56:33,978-Speed 5168.70 samples/sec Loss 5.2026 LearningRate 0.0774 Epoch: 2 Global Step: 40110 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:56:35,950-Speed 5193.13 samples/sec Loss 5.0923 LearningRate 0.0774 Epoch: 2 Global Step: 40120 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:56:37,922-Speed 5195.79 samples/sec Loss 5.2037 LearningRate 0.0774 Epoch: 2 Global Step: 40130 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:56:39,908-Speed 5159.03 samples/sec Loss 5.1217 LearningRate 0.0774 Epoch: 2 Global Step: 40140 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:56:41,893-Speed 5159.09 samples/sec Loss 5.1593 LearningRate 0.0774 Epoch: 2 Global Step: 40150 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:56:43,863-Speed 5199.41 samples/sec Loss 5.1029 LearningRate 0.0774 Epoch: 2 Global Step: 40160 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:56:45,835-Speed 5193.66 samples/sec Loss 5.0654 LearningRate 0.0774 Epoch: 2 Global Step: 40170 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:56:47,836-Speed 5120.79 samples/sec Loss 5.0860 LearningRate 0.0774 Epoch: 2 Global Step: 40180 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:56:49,815-Speed 5175.00 samples/sec Loss 5.1290 LearningRate 0.0774 Epoch: 2 Global Step: 40190 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:56:51,824-Speed 5098.77 samples/sec Loss 5.1385 LearningRate 0.0774 Epoch: 2 Global Step: 40200 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:56:53,799-Speed 5186.74 samples/sec Loss 5.0241 LearningRate 0.0774 Epoch: 2 Global Step: 40210 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:56:55,779-Speed 5174.29 samples/sec Loss 5.1795 LearningRate 0.0774 Epoch: 2 Global Step: 40220 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:56:57,744-Speed 5212.92 samples/sec Loss 5.1616 LearningRate 0.0773 Epoch: 2 Global Step: 40230 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:56:59,722-Speed 5179.66 samples/sec Loss 5.1007 LearningRate 0.0773 Epoch: 2 Global Step: 40240 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:57:01,701-Speed 5175.12 samples/sec Loss 5.1520 LearningRate 0.0773 Epoch: 2 Global Step: 40250 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:57:03,688-Speed 5155.61 samples/sec Loss 5.1857 LearningRate 0.0773 Epoch: 2 Global Step: 40260 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:57:05,674-Speed 5157.06 samples/sec Loss 5.0710 LearningRate 0.0773 Epoch: 2 Global Step: 40270 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:57:07,655-Speed 5172.54 samples/sec Loss 5.1660 LearningRate 0.0773 Epoch: 2 Global Step: 40280 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:57:09,642-Speed 5153.86 samples/sec Loss 5.0489 LearningRate 0.0773 Epoch: 2 Global Step: 40290 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:57:11,628-Speed 5158.19 samples/sec Loss 5.0271 LearningRate 0.0773 Epoch: 2 Global Step: 40300 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:57:13,605-Speed 5179.23 samples/sec Loss 5.1313 LearningRate 0.0773 Epoch: 2 Global Step: 40310 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:57:15,601-Speed 5131.83 samples/sec Loss 5.0985 LearningRate 0.0773 Epoch: 2 Global Step: 40320 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:57:17,589-Speed 5155.33 samples/sec Loss 5.1246 LearningRate 0.0773 Epoch: 2 Global Step: 40330 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:57:19,575-Speed 5156.93 samples/sec Loss 5.1100 LearningRate 0.0773 Epoch: 2 Global Step: 40340 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:57:21,575-Speed 5121.61 samples/sec Loss 5.1363 LearningRate 0.0773 Epoch: 2 Global Step: 40350 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:57:23,554-Speed 5176.18 samples/sec Loss 5.1377 LearningRate 0.0773 Epoch: 2 Global Step: 40360 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:57:25,529-Speed 5187.23 samples/sec Loss 5.1690 LearningRate 0.0773 Epoch: 2 Global Step: 40370 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:57:27,515-Speed 5158.32 samples/sec Loss 5.1770 LearningRate 0.0773 Epoch: 2 Global Step: 40380 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:57:29,487-Speed 5194.26 samples/sec Loss 5.1716 LearningRate 0.0773 Epoch: 2 Global Step: 40390 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:57:31,475-Speed 5152.31 samples/sec Loss 5.0893 LearningRate 0.0773 Epoch: 2 Global Step: 40400 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:57:33,442-Speed 5206.56 samples/sec Loss 5.2071 LearningRate 0.0773 Epoch: 2 Global Step: 40410 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:57:35,445-Speed 5115.21 samples/sec Loss 5.1748 LearningRate 0.0772 Epoch: 2 Global Step: 40420 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:57:37,445-Speed 5120.62 samples/sec Loss 5.1963 LearningRate 0.0772 Epoch: 2 Global Step: 40430 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:57:39,442-Speed 5129.90 samples/sec Loss 5.0770 LearningRate 0.0772 Epoch: 2 Global Step: 40440 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:57:41,422-Speed 5172.83 samples/sec Loss 5.0827 LearningRate 0.0772 Epoch: 2 Global Step: 40450 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:57:43,408-Speed 5159.10 samples/sec Loss 5.0931 LearningRate 0.0772 Epoch: 2 Global Step: 40460 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:57:45,392-Speed 5163.38 samples/sec Loss 5.1556 LearningRate 0.0772 Epoch: 2 Global Step: 40470 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:57:47,378-Speed 5158.46 samples/sec Loss 5.2509 LearningRate 0.0772 Epoch: 2 Global Step: 40480 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:57:49,347-Speed 5202.12 samples/sec Loss 5.1964 LearningRate 0.0772 Epoch: 2 Global Step: 40490 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:57:51,322-Speed 5186.01 samples/sec Loss 5.1228 LearningRate 0.0772 Epoch: 2 Global Step: 40500 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:57:53,293-Speed 5194.97 samples/sec Loss 5.0942 LearningRate 0.0772 Epoch: 2 Global Step: 40510 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:57:55,263-Speed 5201.12 samples/sec Loss 5.1014 LearningRate 0.0772 Epoch: 2 Global Step: 40520 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:57:57,233-Speed 5199.18 samples/sec Loss 5.0875 LearningRate 0.0772 Epoch: 2 Global Step: 40530 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:57:59,216-Speed 5165.71 samples/sec Loss 5.1977 LearningRate 0.0772 Epoch: 2 Global Step: 40540 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:58:01,208-Speed 5142.63 samples/sec Loss 5.1310 LearningRate 0.0772 Epoch: 2 Global Step: 40550 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:58:03,190-Speed 5169.96 samples/sec Loss 5.1662 LearningRate 0.0772 Epoch: 2 Global Step: 40560 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:58:05,172-Speed 5167.16 samples/sec Loss 5.1281 LearningRate 0.0772 Epoch: 2 Global Step: 40570 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:58:07,129-Speed 5234.25 samples/sec Loss 5.0796 LearningRate 0.0772 Epoch: 2 Global Step: 40580 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:58:09,106-Speed 5180.52 samples/sec Loss 5.0983 LearningRate 0.0772 Epoch: 2 Global Step: 40590 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:58:11,092-Speed 5159.23 samples/sec Loss 5.1598 LearningRate 0.0772 Epoch: 2 Global Step: 40600 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:58:13,063-Speed 5196.91 samples/sec Loss 5.0800 LearningRate 0.0771 Epoch: 2 Global Step: 40610 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:58:15,033-Speed 5197.83 samples/sec Loss 5.1586 LearningRate 0.0771 Epoch: 2 Global Step: 40620 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:58:17,010-Speed 5182.14 samples/sec Loss 5.2423 LearningRate 0.0771 Epoch: 2 Global Step: 40630 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:58:18,989-Speed 5176.29 samples/sec Loss 5.1920 LearningRate 0.0771 Epoch: 2 Global Step: 40640 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:58:20,967-Speed 5179.47 samples/sec Loss 5.1451 LearningRate 0.0771 Epoch: 2 Global Step: 40650 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:58:22,954-Speed 5153.91 samples/sec Loss 5.0344 LearningRate 0.0771 Epoch: 2 Global Step: 40660 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:58:24,941-Speed 5156.98 samples/sec Loss 5.1950 LearningRate 0.0771 Epoch: 2 Global Step: 40670 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:58:26,905-Speed 5216.09 samples/sec Loss 5.2475 LearningRate 0.0771 Epoch: 2 Global Step: 40680 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:58:28,875-Speed 5197.77 samples/sec Loss 5.1218 LearningRate 0.0771 Epoch: 2 Global Step: 40690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:58:30,839-Speed 5216.30 samples/sec Loss 5.0615 LearningRate 0.0771 Epoch: 2 Global Step: 40700 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:58:32,819-Speed 5173.73 samples/sec Loss 5.1540 LearningRate 0.0771 Epoch: 2 Global Step: 40710 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:58:34,800-Speed 5172.07 samples/sec Loss 5.1557 LearningRate 0.0771 Epoch: 2 Global Step: 40720 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:58:36,803-Speed 5111.99 samples/sec Loss 5.1373 LearningRate 0.0771 Epoch: 2 Global Step: 40730 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:58:38,793-Speed 5148.18 samples/sec Loss 5.1938 LearningRate 0.0771 Epoch: 2 Global Step: 40740 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:58:40,781-Speed 5152.72 samples/sec Loss 5.1118 LearningRate 0.0771 Epoch: 2 Global Step: 40750 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:58:42,747-Speed 5208.98 samples/sec Loss 5.2104 LearningRate 0.0771 Epoch: 2 Global Step: 40760 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:58:44,729-Speed 5168.48 samples/sec Loss 5.0424 LearningRate 0.0771 Epoch: 2 Global Step: 40770 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:58:46,701-Speed 5194.75 samples/sec Loss 5.1051 LearningRate 0.0771 Epoch: 2 Global Step: 40780 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:58:48,665-Speed 5216.93 samples/sec Loss 5.1153 LearningRate 0.0771 Epoch: 2 Global Step: 40790 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:58:50,668-Speed 5114.21 samples/sec Loss 5.0413 LearningRate 0.0770 Epoch: 2 Global Step: 40800 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:58:52,637-Speed 5201.83 samples/sec Loss 5.1173 LearningRate 0.0770 Epoch: 2 Global Step: 40810 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:58:54,605-Speed 5204.31 samples/sec Loss 5.1345 LearningRate 0.0770 Epoch: 2 Global Step: 40820 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:58:56,574-Speed 5202.67 samples/sec Loss 5.1405 LearningRate 0.0770 Epoch: 2 Global Step: 40830 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:58:58,540-Speed 5212.23 samples/sec Loss 5.1385 LearningRate 0.0770 Epoch: 2 Global Step: 40840 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:59:00,513-Speed 5190.19 samples/sec Loss 5.1632 LearningRate 0.0770 Epoch: 2 Global Step: 40850 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:59:02,500-Speed 5156.56 samples/sec Loss 5.1443 LearningRate 0.0770 Epoch: 2 Global Step: 40860 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:59:04,469-Speed 5201.04 samples/sec Loss 5.0918 LearningRate 0.0770 Epoch: 2 Global Step: 40870 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:59:06,454-Speed 5160.84 samples/sec Loss 5.1379 LearningRate 0.0770 Epoch: 2 Global Step: 40880 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:59:08,458-Speed 5110.46 samples/sec Loss 5.1611 LearningRate 0.0770 Epoch: 2 Global Step: 40890 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:59:10,427-Speed 5203.36 samples/sec Loss 5.1425 LearningRate 0.0770 Epoch: 2 Global Step: 40900 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:59:12,397-Speed 5200.55 samples/sec Loss 5.2559 LearningRate 0.0770 Epoch: 2 Global Step: 40910 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:59:14,417-Speed 5070.81 samples/sec Loss 5.1619 LearningRate 0.0770 Epoch: 2 Global Step: 40920 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:59:16,398-Speed 5169.71 samples/sec Loss 5.1452 LearningRate 0.0770 Epoch: 2 Global Step: 40930 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:59:18,366-Speed 5204.98 samples/sec Loss 5.1057 LearningRate 0.0770 Epoch: 2 Global Step: 40940 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:59:20,335-Speed 5204.20 samples/sec Loss 5.0155 LearningRate 0.0770 Epoch: 2 Global Step: 40950 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:59:22,298-Speed 5218.22 samples/sec Loss 5.1358 LearningRate 0.0770 Epoch: 2 Global Step: 40960 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:59:24,268-Speed 5198.55 samples/sec Loss 5.1596 LearningRate 0.0770 Epoch: 2 Global Step: 40970 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:59:26,246-Speed 5177.84 samples/sec Loss 4.9944 LearningRate 0.0770 Epoch: 2 Global Step: 40980 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:59:28,211-Speed 5212.79 samples/sec Loss 5.1421 LearningRate 0.0769 Epoch: 2 Global Step: 40990 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:59:30,200-Speed 5152.61 samples/sec Loss 5.1180 LearningRate 0.0769 Epoch: 2 Global Step: 41000 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:59:32,168-Speed 5204.90 samples/sec Loss 5.0424 LearningRate 0.0769 Epoch: 2 Global Step: 41010 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:59:34,135-Speed 5206.99 samples/sec Loss 5.1272 LearningRate 0.0769 Epoch: 2 Global Step: 41020 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:59:36,112-Speed 5182.76 samples/sec Loss 5.2190 LearningRate 0.0769 Epoch: 2 Global Step: 41030 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:59:38,077-Speed 5211.44 samples/sec Loss 5.1422 LearningRate 0.0769 Epoch: 2 Global Step: 41040 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:59:40,056-Speed 5176.24 samples/sec Loss 5.0400 LearningRate 0.0769 Epoch: 2 Global Step: 41050 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:59:42,036-Speed 5174.15 samples/sec Loss 5.0703 LearningRate 0.0769 Epoch: 2 Global Step: 41060 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:59:44,003-Speed 5205.95 samples/sec Loss 5.1427 LearningRate 0.0769 Epoch: 2 Global Step: 41070 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:59:45,980-Speed 5180.79 samples/sec Loss 5.1898 LearningRate 0.0769 Epoch: 2 Global Step: 41080 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:59:47,945-Speed 5214.11 samples/sec Loss 5.1206 LearningRate 0.0769 Epoch: 2 Global Step: 41090 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:59:49,920-Speed 5187.24 samples/sec Loss 5.0741 LearningRate 0.0769 Epoch: 2 Global Step: 41100 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:59:51,912-Speed 5142.38 samples/sec Loss 5.1304 LearningRate 0.0769 Epoch: 2 Global Step: 41110 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:59:53,883-Speed 5195.51 samples/sec Loss 5.1131 LearningRate 0.0769 Epoch: 2 Global Step: 41120 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:59:55,847-Speed 5218.13 samples/sec Loss 5.0783 LearningRate 0.0769 Epoch: 2 Global Step: 41130 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 01:59:57,834-Speed 5155.12 samples/sec Loss 5.0634 LearningRate 0.0769 Epoch: 2 Global Step: 41140 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 01:59:59,814-Speed 5173.03 samples/sec Loss 5.0695 LearningRate 0.0769 Epoch: 2 Global Step: 41150 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:00:01,798-Speed 5161.76 samples/sec Loss 5.1267 LearningRate 0.0769 Epoch: 2 Global Step: 41160 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:00:03,765-Speed 5207.07 samples/sec Loss 5.0808 LearningRate 0.0769 Epoch: 2 Global Step: 41170 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:00:05,744-Speed 5177.12 samples/sec Loss 5.1008 LearningRate 0.0768 Epoch: 2 Global Step: 41180 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:00:07,713-Speed 5203.24 samples/sec Loss 5.2183 LearningRate 0.0768 Epoch: 2 Global Step: 41190 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:00:09,729-Speed 5080.49 samples/sec Loss 5.1038 LearningRate 0.0768 Epoch: 2 Global Step: 41200 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:00:11,748-Speed 5074.89 samples/sec Loss 5.2230 LearningRate 0.0768 Epoch: 2 Global Step: 41210 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:00:13,722-Speed 5189.29 samples/sec Loss 5.1509 LearningRate 0.0768 Epoch: 2 Global Step: 41220 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:00:15,707-Speed 5160.35 samples/sec Loss 5.1032 LearningRate 0.0768 Epoch: 2 Global Step: 41230 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:00:17,713-Speed 5105.66 samples/sec Loss 5.1623 LearningRate 0.0768 Epoch: 2 Global Step: 41240 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:00:19,679-Speed 5210.22 samples/sec Loss 4.9982 LearningRate 0.0768 Epoch: 2 Global Step: 41250 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:00:21,674-Speed 5133.76 samples/sec Loss 5.0592 LearningRate 0.0768 Epoch: 2 Global Step: 41260 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:00:23,662-Speed 5152.19 samples/sec Loss 5.1200 LearningRate 0.0768 Epoch: 2 Global Step: 41270 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:00:25,651-Speed 5152.04 samples/sec Loss 5.1611 LearningRate 0.0768 Epoch: 2 Global Step: 41280 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:00:27,651-Speed 5119.79 samples/sec Loss 5.1986 LearningRate 0.0768 Epoch: 2 Global Step: 41290 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:00:29,617-Speed 5210.79 samples/sec Loss 5.1541 LearningRate 0.0768 Epoch: 2 Global Step: 41300 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:00:31,581-Speed 5214.76 samples/sec Loss 5.0854 LearningRate 0.0768 Epoch: 2 Global Step: 41310 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:00:33,548-Speed 5210.06 samples/sec Loss 5.0402 LearningRate 0.0768 Epoch: 2 Global Step: 41320 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:00:35,514-Speed 5209.67 samples/sec Loss 5.1066 LearningRate 0.0768 Epoch: 2 Global Step: 41330 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:00:37,525-Speed 5094.96 samples/sec Loss 5.0773 LearningRate 0.0768 Epoch: 2 Global Step: 41340 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:00:39,491-Speed 5209.14 samples/sec Loss 5.1736 LearningRate 0.0768 Epoch: 2 Global Step: 41350 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:00:41,462-Speed 5196.29 samples/sec Loss 5.1121 LearningRate 0.0768 Epoch: 2 Global Step: 41360 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:00:43,433-Speed 5196.87 samples/sec Loss 5.0840 LearningRate 0.0768 Epoch: 2 Global Step: 41370 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:00:45,418-Speed 5159.98 samples/sec Loss 5.1284 LearningRate 0.0767 Epoch: 2 Global Step: 41380 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:00:47,399-Speed 5170.59 samples/sec Loss 5.1630 LearningRate 0.0767 Epoch: 2 Global Step: 41390 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:00:49,375-Speed 5184.71 samples/sec Loss 5.1435 LearningRate 0.0767 Epoch: 2 Global Step: 41400 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:00:51,347-Speed 5195.61 samples/sec Loss 5.1552 LearningRate 0.0767 Epoch: 2 Global Step: 41410 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:00:53,335-Speed 5150.61 samples/sec Loss 5.0388 LearningRate 0.0767 Epoch: 2 Global Step: 41420 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:00:55,319-Speed 5163.64 samples/sec Loss 5.0519 LearningRate 0.0767 Epoch: 2 Global Step: 41430 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:00:57,306-Speed 5157.19 samples/sec Loss 5.0855 LearningRate 0.0767 Epoch: 2 Global Step: 41440 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:00:59,320-Speed 5085.66 samples/sec Loss 5.0348 LearningRate 0.0767 Epoch: 2 Global Step: 41450 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:01:01,313-Speed 5139.95 samples/sec Loss 5.0750 LearningRate 0.0767 Epoch: 2 Global Step: 41460 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:01:03,292-Speed 5175.62 samples/sec Loss 5.1588 LearningRate 0.0767 Epoch: 2 Global Step: 41470 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:01:05,260-Speed 5205.29 samples/sec Loss 5.0589 LearningRate 0.0767 Epoch: 2 Global Step: 41480 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:01:07,227-Speed 5207.29 samples/sec Loss 5.1108 LearningRate 0.0767 Epoch: 2 Global Step: 41490 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:01:09,216-Speed 5149.12 samples/sec Loss 5.0997 LearningRate 0.0767 Epoch: 2 Global Step: 41500 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:01:11,192-Speed 5182.94 samples/sec Loss 5.0473 LearningRate 0.0767 Epoch: 2 Global Step: 41510 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:01:13,163-Speed 5196.93 samples/sec Loss 5.1635 LearningRate 0.0767 Epoch: 2 Global Step: 41520 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:01:15,132-Speed 5202.75 samples/sec Loss 5.1875 LearningRate 0.0767 Epoch: 2 Global Step: 41530 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:01:17,100-Speed 5207.87 samples/sec Loss 5.1311 LearningRate 0.0767 Epoch: 2 Global Step: 41540 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:01:19,068-Speed 5204.22 samples/sec Loss 5.0707 LearningRate 0.0767 Epoch: 2 Global Step: 41550 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:01:21,037-Speed 5201.33 samples/sec Loss 5.1405 LearningRate 0.0767 Epoch: 2 Global Step: 41560 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:01:23,014-Speed 5181.59 samples/sec Loss 5.1533 LearningRate 0.0766 Epoch: 2 Global Step: 41570 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:01:24,995-Speed 5172.57 samples/sec Loss 5.0362 LearningRate 0.0766 Epoch: 2 Global Step: 41580 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:01:26,962-Speed 5206.47 samples/sec Loss 5.1848 LearningRate 0.0766 Epoch: 2 Global Step: 41590 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:01:28,942-Speed 5172.84 samples/sec Loss 5.1232 LearningRate 0.0766 Epoch: 2 Global Step: 41600 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:01:30,908-Speed 5209.91 samples/sec Loss 5.1445 LearningRate 0.0766 Epoch: 2 Global Step: 41610 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:01:32,916-Speed 5103.27 samples/sec Loss 5.1701 LearningRate 0.0766 Epoch: 2 Global Step: 41620 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:01:34,885-Speed 5199.69 samples/sec Loss 5.0016 LearningRate 0.0766 Epoch: 2 Global Step: 41630 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:01:36,870-Speed 5161.96 samples/sec Loss 4.9996 LearningRate 0.0766 Epoch: 2 Global Step: 41640 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:01:38,843-Speed 5191.27 samples/sec Loss 5.0502 LearningRate 0.0766 Epoch: 2 Global Step: 41650 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:01:40,813-Speed 5199.30 samples/sec Loss 5.0125 LearningRate 0.0766 Epoch: 2 Global Step: 41660 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:01:42,792-Speed 5177.86 samples/sec Loss 5.2854 LearningRate 0.0766 Epoch: 2 Global Step: 41670 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:01:44,774-Speed 5166.95 samples/sec Loss 5.1154 LearningRate 0.0766 Epoch: 2 Global Step: 41680 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:01:46,753-Speed 5176.88 samples/sec Loss 5.0825 LearningRate 0.0766 Epoch: 2 Global Step: 41690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:01:48,731-Speed 5177.30 samples/sec Loss 5.1389 LearningRate 0.0766 Epoch: 2 Global Step: 41700 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:01:50,699-Speed 5205.13 samples/sec Loss 5.1034 LearningRate 0.0766 Epoch: 2 Global Step: 41710 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:01:52,674-Speed 5186.14 samples/sec Loss 5.0951 LearningRate 0.0766 Epoch: 2 Global Step: 41720 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-11 02:01:54,641-Speed 5207.78 samples/sec Loss 5.0747 LearningRate 0.0766 Epoch: 2 Global Step: 41730 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:01:56,621-Speed 5174.00 samples/sec Loss 5.1382 LearningRate 0.0766 Epoch: 2 Global Step: 41740 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:01:58,602-Speed 5171.96 samples/sec Loss 5.0540 LearningRate 0.0766 Epoch: 2 Global Step: 41750 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:02:00,570-Speed 5205.84 samples/sec Loss 5.0658 LearningRate 0.0765 Epoch: 2 Global Step: 41760 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:02:02,545-Speed 5185.23 samples/sec Loss 5.0938 LearningRate 0.0765 Epoch: 2 Global Step: 41770 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:02:04,516-Speed 5199.09 samples/sec Loss 5.0190 LearningRate 0.0765 Epoch: 2 Global Step: 41780 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:02:06,485-Speed 5201.74 samples/sec Loss 5.0869 LearningRate 0.0765 Epoch: 2 Global Step: 41790 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:02:08,466-Speed 5169.57 samples/sec Loss 5.0302 LearningRate 0.0765 Epoch: 2 Global Step: 41800 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:02:10,459-Speed 5140.86 samples/sec Loss 5.0521 LearningRate 0.0765 Epoch: 2 Global Step: 41810 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:02:12,429-Speed 5199.87 samples/sec Loss 5.1103 LearningRate 0.0765 Epoch: 2 Global Step: 41820 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:02:14,402-Speed 5192.15 samples/sec Loss 4.9997 LearningRate 0.0765 Epoch: 2 Global Step: 41830 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-11 02:02:16,371-Speed 5201.97 samples/sec Loss 5.0628 LearningRate 0.0765 Epoch: 2 Global Step: 41840 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:02:18,351-Speed 5174.00 samples/sec Loss 5.0928 LearningRate 0.0765 Epoch: 2 Global Step: 41850 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:02:20,334-Speed 5164.22 samples/sec Loss 5.1570 LearningRate 0.0765 Epoch: 2 Global Step: 41860 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:02:22,314-Speed 5175.44 samples/sec Loss 5.1371 LearningRate 0.0765 Epoch: 2 Global Step: 41870 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:02:24,297-Speed 5165.22 samples/sec Loss 5.0523 LearningRate 0.0765 Epoch: 2 Global Step: 41880 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:02:26,278-Speed 5171.62 samples/sec Loss 5.1374 LearningRate 0.0765 Epoch: 2 Global Step: 41890 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:02:28,280-Speed 5114.55 samples/sec Loss 5.1371 LearningRate 0.0765 Epoch: 2 Global Step: 41900 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:02:30,255-Speed 5187.67 samples/sec Loss 5.0281 LearningRate 0.0765 Epoch: 2 Global Step: 41910 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:02:32,225-Speed 5198.64 samples/sec Loss 5.0789 LearningRate 0.0765 Epoch: 2 Global Step: 41920 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:02:34,200-Speed 5186.98 samples/sec Loss 5.2371 LearningRate 0.0765 Epoch: 2 Global Step: 41930 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:02:36,194-Speed 5138.60 samples/sec Loss 5.1017 LearningRate 0.0765 Epoch: 2 Global Step: 41940 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-11 02:02:38,173-Speed 5175.68 samples/sec Loss 5.1157 LearningRate 0.0764 Epoch: 2 Global Step: 41950 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-11 02:02:40,146-Speed 5190.85 samples/sec Loss 5.0843 LearningRate 0.0764 Epoch: 2 Global Step: 41960 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:02:42,122-Speed 5184.53 samples/sec Loss 4.9711 LearningRate 0.0764 Epoch: 2 Global Step: 41970 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:02:44,096-Speed 5188.18 samples/sec Loss 5.1104 LearningRate 0.0764 Epoch: 2 Global Step: 41980 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:02:46,077-Speed 5172.42 samples/sec Loss 5.1661 LearningRate 0.0764 Epoch: 2 Global Step: 41990 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:02:48,059-Speed 5169.00 samples/sec Loss 5.0395 LearningRate 0.0764 Epoch: 2 Global Step: 42000 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:03:14,630-[lfw][42000]XNorm: 21.319479 Training: 2022-04-11 02:03:14,631-[lfw][42000]Accuracy-Flip: 0.99717+-0.00269 Training: 2022-04-11 02:03:14,631-[lfw][42000]Accuracy-Highest: 0.99733 Training: 2022-04-11 02:03:45,435-[cfp_fp][42000]XNorm: 18.921378 Training: 2022-04-11 02:03:45,436-[cfp_fp][42000]Accuracy-Flip: 0.97400+-0.00845 Training: 2022-04-11 02:03:45,436-[cfp_fp][42000]Accuracy-Highest: 0.97486 Training: 2022-04-11 02:04:11,987-[agedb_30][42000]XNorm: 20.921961 Training: 2022-04-11 02:04:11,988-[agedb_30][42000]Accuracy-Flip: 0.97550+-0.00789 Training: 2022-04-11 02:04:11,989-[agedb_30][42000]Accuracy-Highest: 0.97550 Training: 2022-04-11 02:04:13,981-Speed 119.18 samples/sec Loss 5.1213 LearningRate 0.0764 Epoch: 2 Global Step: 42010 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:04:15,954-Speed 5189.28 samples/sec Loss 5.1093 LearningRate 0.0764 Epoch: 2 Global Step: 42020 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:04:17,946-Speed 5141.94 samples/sec Loss 5.0316 LearningRate 0.0764 Epoch: 2 Global Step: 42030 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:04:19,923-Speed 5183.57 samples/sec Loss 5.0317 LearningRate 0.0764 Epoch: 2 Global Step: 42040 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:04:21,888-Speed 5211.93 samples/sec Loss 5.1141 LearningRate 0.0764 Epoch: 2 Global Step: 42050 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:04:23,850-Speed 5221.25 samples/sec Loss 5.2054 LearningRate 0.0764 Epoch: 2 Global Step: 42060 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:04:25,839-Speed 5148.50 samples/sec Loss 5.0758 LearningRate 0.0764 Epoch: 2 Global Step: 42070 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:04:27,839-Speed 5123.50 samples/sec Loss 5.0844 LearningRate 0.0764 Epoch: 2 Global Step: 42080 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:04:29,804-Speed 5214.42 samples/sec Loss 5.1684 LearningRate 0.0764 Epoch: 2 Global Step: 42090 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:04:31,771-Speed 5206.52 samples/sec Loss 5.0486 LearningRate 0.0764 Epoch: 2 Global Step: 42100 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:04:33,738-Speed 5207.82 samples/sec Loss 5.1756 LearningRate 0.0764 Epoch: 2 Global Step: 42110 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:04:35,708-Speed 5200.21 samples/sec Loss 5.0153 LearningRate 0.0764 Epoch: 2 Global Step: 42120 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:04:37,694-Speed 5157.63 samples/sec Loss 5.1895 LearningRate 0.0764 Epoch: 2 Global Step: 42130 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:04:39,673-Speed 5175.29 samples/sec Loss 5.1614 LearningRate 0.0763 Epoch: 2 Global Step: 42140 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:04:41,665-Speed 5141.31 samples/sec Loss 5.1732 LearningRate 0.0763 Epoch: 2 Global Step: 42150 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:04:43,631-Speed 5212.52 samples/sec Loss 5.0085 LearningRate 0.0763 Epoch: 2 Global Step: 42160 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:04:45,605-Speed 5187.66 samples/sec Loss 5.1850 LearningRate 0.0763 Epoch: 2 Global Step: 42170 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:04:47,588-Speed 5165.30 samples/sec Loss 5.0882 LearningRate 0.0763 Epoch: 2 Global Step: 42180 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:04:49,563-Speed 5187.64 samples/sec Loss 5.0724 LearningRate 0.0763 Epoch: 2 Global Step: 42190 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:04:51,549-Speed 5157.13 samples/sec Loss 5.0972 LearningRate 0.0763 Epoch: 2 Global Step: 42200 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:04:53,530-Speed 5172.73 samples/sec Loss 5.0146 LearningRate 0.0763 Epoch: 2 Global Step: 42210 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:04:55,513-Speed 5165.33 samples/sec Loss 5.0239 LearningRate 0.0763 Epoch: 2 Global Step: 42220 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:04:57,487-Speed 5189.33 samples/sec Loss 5.1130 LearningRate 0.0763 Epoch: 2 Global Step: 42230 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:04:59,458-Speed 5197.45 samples/sec Loss 5.1098 LearningRate 0.0763 Epoch: 2 Global Step: 42240 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:05:01,435-Speed 5179.30 samples/sec Loss 4.9886 LearningRate 0.0763 Epoch: 2 Global Step: 42250 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:05:03,414-Speed 5175.90 samples/sec Loss 5.0819 LearningRate 0.0763 Epoch: 2 Global Step: 42260 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-11 02:05:05,394-Speed 5174.15 samples/sec Loss 5.0646 LearningRate 0.0763 Epoch: 2 Global Step: 42270 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:05:07,373-Speed 5176.00 samples/sec Loss 4.9947 LearningRate 0.0763 Epoch: 2 Global Step: 42280 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:05:09,349-Speed 5183.92 samples/sec Loss 5.0488 LearningRate 0.0763 Epoch: 2 Global Step: 42290 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:05:11,354-Speed 5108.68 samples/sec Loss 5.1718 LearningRate 0.0763 Epoch: 2 Global Step: 42300 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:05:13,329-Speed 5187.84 samples/sec Loss 4.9963 LearningRate 0.0763 Epoch: 2 Global Step: 42310 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:05:15,319-Speed 5147.95 samples/sec Loss 5.0398 LearningRate 0.0763 Epoch: 2 Global Step: 42320 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:05:17,299-Speed 5171.31 samples/sec Loss 4.9526 LearningRate 0.0762 Epoch: 2 Global Step: 42330 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:05:19,280-Speed 5172.01 samples/sec Loss 5.0688 LearningRate 0.0762 Epoch: 2 Global Step: 42340 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:05:21,278-Speed 5126.62 samples/sec Loss 4.9885 LearningRate 0.0762 Epoch: 2 Global Step: 42350 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:05:23,273-Speed 5134.37 samples/sec Loss 5.0977 LearningRate 0.0762 Epoch: 2 Global Step: 42360 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:05:25,255-Speed 5168.96 samples/sec Loss 5.1657 LearningRate 0.0762 Epoch: 2 Global Step: 42370 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:05:27,248-Speed 5138.58 samples/sec Loss 5.0348 LearningRate 0.0762 Epoch: 2 Global Step: 42380 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:05:29,229-Speed 5171.69 samples/sec Loss 5.1687 LearningRate 0.0762 Epoch: 2 Global Step: 42390 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:05:31,208-Speed 5175.43 samples/sec Loss 5.1071 LearningRate 0.0762 Epoch: 2 Global Step: 42400 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:05:33,196-Speed 5153.21 samples/sec Loss 5.0726 LearningRate 0.0762 Epoch: 2 Global Step: 42410 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:05:35,173-Speed 5181.70 samples/sec Loss 4.9710 LearningRate 0.0762 Epoch: 2 Global Step: 42420 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:05:37,151-Speed 5178.43 samples/sec Loss 5.0579 LearningRate 0.0762 Epoch: 2 Global Step: 42430 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:05:39,127-Speed 5184.63 samples/sec Loss 5.0753 LearningRate 0.0762 Epoch: 2 Global Step: 42440 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:05:41,097-Speed 5199.58 samples/sec Loss 5.0393 LearningRate 0.0762 Epoch: 2 Global Step: 42450 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:05:43,076-Speed 5175.46 samples/sec Loss 4.9998 LearningRate 0.0762 Epoch: 2 Global Step: 42460 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:05:45,049-Speed 5190.95 samples/sec Loss 5.1817 LearningRate 0.0762 Epoch: 2 Global Step: 42470 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:05:47,056-Speed 5103.49 samples/sec Loss 5.0041 LearningRate 0.0762 Epoch: 2 Global Step: 42480 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:05:49,028-Speed 5195.25 samples/sec Loss 4.9222 LearningRate 0.0762 Epoch: 2 Global Step: 42490 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:05:51,028-Speed 5122.34 samples/sec Loss 5.0794 LearningRate 0.0762 Epoch: 2 Global Step: 42500 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-11 02:05:53,000-Speed 5194.04 samples/sec Loss 5.0088 LearningRate 0.0762 Epoch: 2 Global Step: 42510 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:05:54,979-Speed 5177.10 samples/sec Loss 5.1015 LearningRate 0.0761 Epoch: 2 Global Step: 42520 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:05:56,959-Speed 5171.86 samples/sec Loss 5.0928 LearningRate 0.0761 Epoch: 2 Global Step: 42530 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-11 02:05:58,927-Speed 5206.13 samples/sec Loss 5.0465 LearningRate 0.0761 Epoch: 2 Global Step: 42540 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:06:00,901-Speed 5189.02 samples/sec Loss 5.0296 LearningRate 0.0761 Epoch: 2 Global Step: 42550 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:06:02,890-Speed 5151.35 samples/sec Loss 5.0345 LearningRate 0.0761 Epoch: 2 Global Step: 42560 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:06:04,858-Speed 5203.79 samples/sec Loss 5.0494 LearningRate 0.0761 Epoch: 2 Global Step: 42570 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:06:06,825-Speed 5207.65 samples/sec Loss 5.0934 LearningRate 0.0761 Epoch: 2 Global Step: 42580 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:06:08,792-Speed 5206.96 samples/sec Loss 5.1087 LearningRate 0.0761 Epoch: 2 Global Step: 42590 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:06:10,765-Speed 5190.78 samples/sec Loss 4.9910 LearningRate 0.0761 Epoch: 2 Global Step: 42600 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:06:12,753-Speed 5153.93 samples/sec Loss 5.1062 LearningRate 0.0761 Epoch: 2 Global Step: 42610 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-11 02:06:14,733-Speed 5173.26 samples/sec Loss 5.1356 LearningRate 0.0761 Epoch: 2 Global Step: 42620 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:06:16,717-Speed 5164.98 samples/sec Loss 5.0175 LearningRate 0.0761 Epoch: 2 Global Step: 42630 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:06:18,679-Speed 5219.78 samples/sec Loss 5.0903 LearningRate 0.0761 Epoch: 2 Global Step: 42640 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:06:20,643-Speed 5215.29 samples/sec Loss 5.0720 LearningRate 0.0761 Epoch: 2 Global Step: 42650 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:06:22,678-Speed 5034.32 samples/sec Loss 5.0681 LearningRate 0.0761 Epoch: 2 Global Step: 42660 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:06:24,658-Speed 5171.88 samples/sec Loss 5.0687 LearningRate 0.0761 Epoch: 2 Global Step: 42670 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:06:26,647-Speed 5149.55 samples/sec Loss 5.0363 LearningRate 0.0761 Epoch: 2 Global Step: 42680 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:06:28,621-Speed 5191.31 samples/sec Loss 5.0864 LearningRate 0.0761 Epoch: 2 Global Step: 42690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:06:30,585-Speed 5215.13 samples/sec Loss 5.0016 LearningRate 0.0761 Epoch: 2 Global Step: 42700 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:06:32,568-Speed 5166.26 samples/sec Loss 4.9808 LearningRate 0.0760 Epoch: 2 Global Step: 42710 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:06:34,538-Speed 5197.82 samples/sec Loss 5.0782 LearningRate 0.0760 Epoch: 2 Global Step: 42720 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:06:36,521-Speed 5166.19 samples/sec Loss 5.0335 LearningRate 0.0760 Epoch: 2 Global Step: 42730 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:06:38,495-Speed 5189.72 samples/sec Loss 5.0042 LearningRate 0.0760 Epoch: 2 Global Step: 42740 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:06:40,460-Speed 5211.95 samples/sec Loss 5.0397 LearningRate 0.0760 Epoch: 2 Global Step: 42750 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:06:42,431-Speed 5198.75 samples/sec Loss 5.0073 LearningRate 0.0760 Epoch: 2 Global Step: 42760 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:06:44,399-Speed 5205.59 samples/sec Loss 5.0588 LearningRate 0.0760 Epoch: 2 Global Step: 42770 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:06:46,366-Speed 5207.53 samples/sec Loss 5.0987 LearningRate 0.0760 Epoch: 2 Global Step: 42780 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:06:48,336-Speed 5198.94 samples/sec Loss 5.0636 LearningRate 0.0760 Epoch: 2 Global Step: 42790 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:06:50,313-Speed 5181.23 samples/sec Loss 4.9414 LearningRate 0.0760 Epoch: 2 Global Step: 42800 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:06:52,273-Speed 5225.71 samples/sec Loss 5.0869 LearningRate 0.0760 Epoch: 2 Global Step: 42810 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:06:54,237-Speed 5214.95 samples/sec Loss 5.0463 LearningRate 0.0760 Epoch: 2 Global Step: 42820 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:06:56,218-Speed 5171.72 samples/sec Loss 5.0726 LearningRate 0.0760 Epoch: 2 Global Step: 42830 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:06:58,194-Speed 5182.62 samples/sec Loss 4.9837 LearningRate 0.0760 Epoch: 2 Global Step: 42840 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:07:00,170-Speed 5185.34 samples/sec Loss 5.0054 LearningRate 0.0760 Epoch: 2 Global Step: 42850 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:07:02,149-Speed 5176.60 samples/sec Loss 4.9830 LearningRate 0.0760 Epoch: 2 Global Step: 42860 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:07:04,113-Speed 5215.10 samples/sec Loss 4.9824 LearningRate 0.0760 Epoch: 2 Global Step: 42870 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:07:06,091-Speed 5177.11 samples/sec Loss 5.1414 LearningRate 0.0760 Epoch: 2 Global Step: 42880 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:07:08,061-Speed 5201.81 samples/sec Loss 5.0420 LearningRate 0.0760 Epoch: 2 Global Step: 42890 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:07:10,050-Speed 5148.89 samples/sec Loss 5.0318 LearningRate 0.0759 Epoch: 2 Global Step: 42900 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:07:12,029-Speed 5175.21 samples/sec Loss 5.1087 LearningRate 0.0759 Epoch: 2 Global Step: 42910 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:07:14,009-Speed 5177.15 samples/sec Loss 5.0109 LearningRate 0.0759 Epoch: 2 Global Step: 42920 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:07:15,991-Speed 5168.27 samples/sec Loss 5.1071 LearningRate 0.0759 Epoch: 2 Global Step: 42930 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:07:17,985-Speed 5135.82 samples/sec Loss 5.0225 LearningRate 0.0759 Epoch: 2 Global Step: 42940 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:07:19,953-Speed 5205.45 samples/sec Loss 5.0446 LearningRate 0.0759 Epoch: 2 Global Step: 42950 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:07:21,922-Speed 5202.58 samples/sec Loss 5.0102 LearningRate 0.0759 Epoch: 2 Global Step: 42960 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:07:23,896-Speed 5189.00 samples/sec Loss 5.1277 LearningRate 0.0759 Epoch: 2 Global Step: 42970 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:07:25,879-Speed 5165.73 samples/sec Loss 5.0931 LearningRate 0.0759 Epoch: 2 Global Step: 42980 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:07:27,861-Speed 5167.48 samples/sec Loss 5.0276 LearningRate 0.0759 Epoch: 2 Global Step: 42990 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:07:29,826-Speed 5215.27 samples/sec Loss 5.0108 LearningRate 0.0759 Epoch: 2 Global Step: 43000 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:07:31,789-Speed 5218.32 samples/sec Loss 5.0301 LearningRate 0.0759 Epoch: 2 Global Step: 43010 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:07:33,756-Speed 5206.25 samples/sec Loss 5.1118 LearningRate 0.0759 Epoch: 2 Global Step: 43020 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:07:35,735-Speed 5176.25 samples/sec Loss 5.1226 LearningRate 0.0759 Epoch: 2 Global Step: 43030 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:07:37,714-Speed 5176.07 samples/sec Loss 5.0565 LearningRate 0.0759 Epoch: 2 Global Step: 43040 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:07:39,769-Speed 4984.70 samples/sec Loss 5.0692 LearningRate 0.0759 Epoch: 2 Global Step: 43050 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:07:41,745-Speed 5185.28 samples/sec Loss 5.0475 LearningRate 0.0759 Epoch: 2 Global Step: 43060 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:07:43,712-Speed 5207.78 samples/sec Loss 5.0062 LearningRate 0.0759 Epoch: 2 Global Step: 43070 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:07:45,681-Speed 5202.22 samples/sec Loss 4.9510 LearningRate 0.0759 Epoch: 2 Global Step: 43080 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:07:47,667-Speed 5156.44 samples/sec Loss 5.0535 LearningRate 0.0758 Epoch: 2 Global Step: 43090 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:07:49,649-Speed 5170.36 samples/sec Loss 4.9814 LearningRate 0.0758 Epoch: 2 Global Step: 43100 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:07:51,621-Speed 5194.12 samples/sec Loss 5.0984 LearningRate 0.0758 Epoch: 2 Global Step: 43110 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:07:53,589-Speed 5203.09 samples/sec Loss 4.9892 LearningRate 0.0758 Epoch: 2 Global Step: 43120 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:07:55,557-Speed 5206.71 samples/sec Loss 4.9263 LearningRate 0.0758 Epoch: 2 Global Step: 43130 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:07:57,534-Speed 5181.90 samples/sec Loss 5.0174 LearningRate 0.0758 Epoch: 2 Global Step: 43140 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:07:59,517-Speed 5164.48 samples/sec Loss 5.0947 LearningRate 0.0758 Epoch: 2 Global Step: 43150 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:01,500-Speed 5166.96 samples/sec Loss 4.9970 LearningRate 0.0758 Epoch: 2 Global Step: 43160 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:03,491-Speed 5144.09 samples/sec Loss 4.9937 LearningRate 0.0758 Epoch: 2 Global Step: 43170 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:05,477-Speed 5158.42 samples/sec Loss 5.0129 LearningRate 0.0758 Epoch: 2 Global Step: 43180 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:07,460-Speed 5165.50 samples/sec Loss 5.1075 LearningRate 0.0758 Epoch: 2 Global Step: 43190 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:09,440-Speed 5173.55 samples/sec Loss 5.0684 LearningRate 0.0758 Epoch: 2 Global Step: 43200 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-11 02:08:11,403-Speed 5217.69 samples/sec Loss 5.1136 LearningRate 0.0758 Epoch: 2 Global Step: 43210 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:13,374-Speed 5195.74 samples/sec Loss 5.0808 LearningRate 0.0758 Epoch: 2 Global Step: 43220 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:15,342-Speed 5204.79 samples/sec Loss 5.1451 LearningRate 0.0758 Epoch: 2 Global Step: 43230 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:17,313-Speed 5198.25 samples/sec Loss 4.9209 LearningRate 0.0758 Epoch: 2 Global Step: 43240 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:19,282-Speed 5203.12 samples/sec Loss 5.0587 LearningRate 0.0758 Epoch: 2 Global Step: 43250 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:21,259-Speed 5180.13 samples/sec Loss 4.9467 LearningRate 0.0758 Epoch: 2 Global Step: 43260 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:23,243-Speed 5162.29 samples/sec Loss 4.9812 LearningRate 0.0758 Epoch: 2 Global Step: 43270 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:25,228-Speed 5160.91 samples/sec Loss 5.0563 LearningRate 0.0758 Epoch: 2 Global Step: 43280 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:27,221-Speed 5141.19 samples/sec Loss 4.9921 LearningRate 0.0757 Epoch: 2 Global Step: 43290 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:29,195-Speed 5189.73 samples/sec Loss 4.9768 LearningRate 0.0757 Epoch: 2 Global Step: 43300 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:31,156-Speed 5222.56 samples/sec Loss 5.1201 LearningRate 0.0757 Epoch: 2 Global Step: 43310 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:33,137-Speed 5169.25 samples/sec Loss 4.9632 LearningRate 0.0757 Epoch: 2 Global Step: 43320 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:35,108-Speed 5199.60 samples/sec Loss 5.1388 LearningRate 0.0757 Epoch: 2 Global Step: 43330 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:37,076-Speed 5205.04 samples/sec Loss 4.9909 LearningRate 0.0757 Epoch: 2 Global Step: 43340 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:39,060-Speed 5162.31 samples/sec Loss 5.1499 LearningRate 0.0757 Epoch: 2 Global Step: 43350 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:41,033-Speed 5191.22 samples/sec Loss 5.1134 LearningRate 0.0757 Epoch: 2 Global Step: 43360 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:43,004-Speed 5196.86 samples/sec Loss 5.0566 LearningRate 0.0757 Epoch: 2 Global Step: 43370 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:44,976-Speed 5193.74 samples/sec Loss 5.0311 LearningRate 0.0757 Epoch: 2 Global Step: 43380 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:46,951-Speed 5187.12 samples/sec Loss 4.9869 LearningRate 0.0757 Epoch: 2 Global Step: 43390 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:48,923-Speed 5195.37 samples/sec Loss 5.0937 LearningRate 0.0757 Epoch: 2 Global Step: 43400 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:50,885-Speed 5220.96 samples/sec Loss 5.0424 LearningRate 0.0757 Epoch: 2 Global Step: 43410 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:52,850-Speed 5212.90 samples/sec Loss 5.0306 LearningRate 0.0757 Epoch: 2 Global Step: 43420 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:54,823-Speed 5191.50 samples/sec Loss 4.9609 LearningRate 0.0757 Epoch: 2 Global Step: 43430 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:56,811-Speed 5151.49 samples/sec Loss 5.0013 LearningRate 0.0757 Epoch: 2 Global Step: 43440 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:08:58,797-Speed 5158.43 samples/sec Loss 5.0393 LearningRate 0.0757 Epoch: 2 Global Step: 43450 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:09:00,795-Speed 5126.84 samples/sec Loss 4.9657 LearningRate 0.0757 Epoch: 2 Global Step: 43460 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:09:02,795-Speed 5121.71 samples/sec Loss 4.9828 LearningRate 0.0757 Epoch: 2 Global Step: 43470 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:09:04,785-Speed 5148.07 samples/sec Loss 5.0751 LearningRate 0.0756 Epoch: 2 Global Step: 43480 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:09:06,768-Speed 5165.29 samples/sec Loss 5.0447 LearningRate 0.0756 Epoch: 2 Global Step: 43490 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:09:08,743-Speed 5187.67 samples/sec Loss 5.0401 LearningRate 0.0756 Epoch: 2 Global Step: 43500 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:09:10,728-Speed 5159.82 samples/sec Loss 5.0337 LearningRate 0.0756 Epoch: 2 Global Step: 43510 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:09:12,694-Speed 5211.20 samples/sec Loss 5.0437 LearningRate 0.0756 Epoch: 2 Global Step: 43520 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:09:14,678-Speed 5162.62 samples/sec Loss 5.1543 LearningRate 0.0756 Epoch: 2 Global Step: 43530 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:09:16,663-Speed 5159.62 samples/sec Loss 5.0639 LearningRate 0.0756 Epoch: 2 Global Step: 43540 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:09:18,643-Speed 5174.40 samples/sec Loss 4.9137 LearningRate 0.0756 Epoch: 2 Global Step: 43550 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:09:20,609-Speed 5208.15 samples/sec Loss 4.9315 LearningRate 0.0756 Epoch: 2 Global Step: 43560 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:09:22,583-Speed 5189.33 samples/sec Loss 5.0071 LearningRate 0.0756 Epoch: 2 Global Step: 43570 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:09:24,583-Speed 5121.55 samples/sec Loss 4.9790 LearningRate 0.0756 Epoch: 2 Global Step: 43580 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:09:26,565-Speed 5169.35 samples/sec Loss 5.1111 LearningRate 0.0756 Epoch: 2 Global Step: 43590 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:09:28,558-Speed 5141.11 samples/sec Loss 5.0747 LearningRate 0.0756 Epoch: 2 Global Step: 43600 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:09:30,526-Speed 5205.11 samples/sec Loss 4.9843 LearningRate 0.0756 Epoch: 2 Global Step: 43610 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-11 02:09:32,491-Speed 5211.83 samples/sec Loss 5.0437 LearningRate 0.0756 Epoch: 2 Global Step: 43620 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:09:34,461-Speed 5199.82 samples/sec Loss 5.0786 LearningRate 0.0756 Epoch: 2 Global Step: 43630 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:09:36,447-Speed 5156.93 samples/sec Loss 5.1086 LearningRate 0.0756 Epoch: 2 Global Step: 43640 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:09:38,427-Speed 5174.62 samples/sec Loss 5.0051 LearningRate 0.0756 Epoch: 2 Global Step: 43650 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:09:40,429-Speed 5116.50 samples/sec Loss 4.9633 LearningRate 0.0756 Epoch: 2 Global Step: 43660 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:09:42,396-Speed 5206.79 samples/sec Loss 5.2259 LearningRate 0.0755 Epoch: 2 Global Step: 43670 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:09:44,391-Speed 5133.27 samples/sec Loss 5.1144 LearningRate 0.0755 Epoch: 2 Global Step: 43680 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:09:46,371-Speed 5174.86 samples/sec Loss 4.9498 LearningRate 0.0755 Epoch: 2 Global Step: 43690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:09:48,354-Speed 5166.18 samples/sec Loss 5.1001 LearningRate 0.0755 Epoch: 2 Global Step: 43700 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:09:50,323-Speed 5202.51 samples/sec Loss 4.9510 LearningRate 0.0755 Epoch: 2 Global Step: 43710 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:09:52,275-Speed 5247.65 samples/sec Loss 5.0129 LearningRate 0.0755 Epoch: 2 Global Step: 43720 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:09:54,243-Speed 5206.06 samples/sec Loss 4.9505 LearningRate 0.0755 Epoch: 2 Global Step: 43730 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:09:56,208-Speed 5212.38 samples/sec Loss 5.1075 LearningRate 0.0755 Epoch: 2 Global Step: 43740 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:09:58,179-Speed 5196.01 samples/sec Loss 4.9324 LearningRate 0.0755 Epoch: 2 Global Step: 43750 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:10:00,159-Speed 5174.68 samples/sec Loss 4.9610 LearningRate 0.0755 Epoch: 2 Global Step: 43760 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:10:02,132-Speed 5190.18 samples/sec Loss 4.9895 LearningRate 0.0755 Epoch: 2 Global Step: 43770 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:10:04,101-Speed 5204.26 samples/sec Loss 4.9856 LearningRate 0.0755 Epoch: 2 Global Step: 43780 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:10:06,071-Speed 5201.00 samples/sec Loss 5.0693 LearningRate 0.0755 Epoch: 2 Global Step: 43790 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:10:08,035-Speed 5215.26 samples/sec Loss 4.9259 LearningRate 0.0755 Epoch: 2 Global Step: 43800 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:10:10,003-Speed 5204.87 samples/sec Loss 5.0802 LearningRate 0.0755 Epoch: 2 Global Step: 43810 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:10:11,968-Speed 5211.30 samples/sec Loss 5.0006 LearningRate 0.0755 Epoch: 2 Global Step: 43820 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:10:13,946-Speed 5179.81 samples/sec Loss 5.1068 LearningRate 0.0755 Epoch: 2 Global Step: 43830 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:10:15,913-Speed 5206.12 samples/sec Loss 4.9583 LearningRate 0.0755 Epoch: 2 Global Step: 43840 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:10:17,880-Speed 5209.83 samples/sec Loss 4.9650 LearningRate 0.0755 Epoch: 2 Global Step: 43850 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:10:19,848-Speed 5204.37 samples/sec Loss 5.0062 LearningRate 0.0754 Epoch: 2 Global Step: 43860 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:10:21,821-Speed 5190.50 samples/sec Loss 4.9944 LearningRate 0.0754 Epoch: 2 Global Step: 43870 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:10:23,813-Speed 5143.60 samples/sec Loss 4.9498 LearningRate 0.0754 Epoch: 2 Global Step: 43880 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:10:25,807-Speed 5136.27 samples/sec Loss 5.0843 LearningRate 0.0754 Epoch: 2 Global Step: 43890 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:10:27,777-Speed 5200.12 samples/sec Loss 5.0088 LearningRate 0.0754 Epoch: 2 Global Step: 43900 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:10:29,754-Speed 5183.13 samples/sec Loss 4.9753 LearningRate 0.0754 Epoch: 2 Global Step: 43910 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:10:31,714-Speed 5225.08 samples/sec Loss 4.9847 LearningRate 0.0754 Epoch: 2 Global Step: 43920 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:10:33,704-Speed 5147.54 samples/sec Loss 4.9600 LearningRate 0.0754 Epoch: 2 Global Step: 43930 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:10:35,671-Speed 5206.89 samples/sec Loss 5.0218 LearningRate 0.0754 Epoch: 2 Global Step: 43940 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:10:37,671-Speed 5122.58 samples/sec Loss 5.0052 LearningRate 0.0754 Epoch: 2 Global Step: 43950 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:10:39,663-Speed 5142.74 samples/sec Loss 4.9562 LearningRate 0.0754 Epoch: 2 Global Step: 43960 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:10:41,647-Speed 5162.49 samples/sec Loss 5.0337 LearningRate 0.0754 Epoch: 2 Global Step: 43970 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:10:43,628-Speed 5170.71 samples/sec Loss 4.9547 LearningRate 0.0754 Epoch: 2 Global Step: 43980 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:10:45,608-Speed 5171.83 samples/sec Loss 4.9623 LearningRate 0.0754 Epoch: 2 Global Step: 43990 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:10:47,578-Speed 5199.87 samples/sec Loss 4.9034 LearningRate 0.0754 Epoch: 2 Global Step: 44000 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:11:14,029-[lfw][44000]XNorm: 22.702494 Training: 2022-04-11 02:11:14,030-[lfw][44000]Accuracy-Flip: 0.99783+-0.00259 Training: 2022-04-11 02:11:14,030-[lfw][44000]Accuracy-Highest: 0.99783 Training: 2022-04-11 02:11:44,830-[cfp_fp][44000]XNorm: 20.503973 Training: 2022-04-11 02:11:44,830-[cfp_fp][44000]Accuracy-Flip: 0.97500+-0.00475 Training: 2022-04-11 02:11:44,831-[cfp_fp][44000]Accuracy-Highest: 0.97500 Training: 2022-04-11 02:12:11,411-[agedb_30][44000]XNorm: 22.325265 Training: 2022-04-11 02:12:11,417-[agedb_30][44000]Accuracy-Flip: 0.97350+-0.00838 Training: 2022-04-11 02:12:11,418-[agedb_30][44000]Accuracy-Highest: 0.97550 Training: 2022-04-11 02:12:13,396-Speed 119.32 samples/sec Loss 5.0162 LearningRate 0.0754 Epoch: 2 Global Step: 44010 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:12:15,388-Speed 5143.58 samples/sec Loss 4.9383 LearningRate 0.0754 Epoch: 2 Global Step: 44020 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:12:17,351-Speed 5218.70 samples/sec Loss 5.0310 LearningRate 0.0754 Epoch: 2 Global Step: 44030 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:12:19,321-Speed 5199.07 samples/sec Loss 4.9908 LearningRate 0.0754 Epoch: 2 Global Step: 44040 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:12:21,307-Speed 5157.65 samples/sec Loss 4.9456 LearningRate 0.0753 Epoch: 2 Global Step: 44050 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:12:23,282-Speed 5186.63 samples/sec Loss 5.0002 LearningRate 0.0753 Epoch: 2 Global Step: 44060 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:12:25,250-Speed 5204.37 samples/sec Loss 5.0573 LearningRate 0.0753 Epoch: 2 Global Step: 44070 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:12:27,234-Speed 5162.78 samples/sec Loss 4.9793 LearningRate 0.0753 Epoch: 2 Global Step: 44080 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:12:29,214-Speed 5175.08 samples/sec Loss 4.9689 LearningRate 0.0753 Epoch: 2 Global Step: 44090 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:12:31,191-Speed 5181.04 samples/sec Loss 5.0417 LearningRate 0.0753 Epoch: 2 Global Step: 44100 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:12:33,168-Speed 5179.45 samples/sec Loss 5.0828 LearningRate 0.0753 Epoch: 2 Global Step: 44110 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:12:35,152-Speed 5163.41 samples/sec Loss 4.9311 LearningRate 0.0753 Epoch: 2 Global Step: 44120 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:12:37,121-Speed 5204.25 samples/sec Loss 5.0376 LearningRate 0.0753 Epoch: 2 Global Step: 44130 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:12:39,118-Speed 5127.24 samples/sec Loss 5.0785 LearningRate 0.0753 Epoch: 2 Global Step: 44140 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:12:41,128-Speed 5099.42 samples/sec Loss 4.9815 LearningRate 0.0753 Epoch: 2 Global Step: 44150 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-11 02:12:43,091-Speed 5217.78 samples/sec Loss 4.9719 LearningRate 0.0753 Epoch: 2 Global Step: 44160 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:12:45,072-Speed 5170.05 samples/sec Loss 5.0383 LearningRate 0.0753 Epoch: 2 Global Step: 44170 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:12:47,061-Speed 5150.27 samples/sec Loss 4.9476 LearningRate 0.0753 Epoch: 2 Global Step: 44180 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:12:49,048-Speed 5155.96 samples/sec Loss 4.9206 LearningRate 0.0753 Epoch: 2 Global Step: 44190 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:12:51,035-Speed 5154.80 samples/sec Loss 4.9533 LearningRate 0.0753 Epoch: 2 Global Step: 44200 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:12:53,019-Speed 5161.45 samples/sec Loss 4.9646 LearningRate 0.0753 Epoch: 2 Global Step: 44210 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:12:55,000-Speed 5171.69 samples/sec Loss 5.0140 LearningRate 0.0753 Epoch: 2 Global Step: 44220 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:12:56,990-Speed 5148.32 samples/sec Loss 4.9464 LearningRate 0.0753 Epoch: 2 Global Step: 44230 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:12:58,989-Speed 5123.15 samples/sec Loss 4.9670 LearningRate 0.0753 Epoch: 2 Global Step: 44240 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:13:00,970-Speed 5171.47 samples/sec Loss 4.9981 LearningRate 0.0752 Epoch: 2 Global Step: 44250 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:13:02,962-Speed 5142.13 samples/sec Loss 4.9574 LearningRate 0.0752 Epoch: 2 Global Step: 44260 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:13:04,948-Speed 5158.79 samples/sec Loss 5.0763 LearningRate 0.0752 Epoch: 2 Global Step: 44270 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:13:06,921-Speed 5190.90 samples/sec Loss 5.0327 LearningRate 0.0752 Epoch: 2 Global Step: 44280 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:13:08,896-Speed 5187.77 samples/sec Loss 5.0981 LearningRate 0.0752 Epoch: 2 Global Step: 44290 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:13:10,893-Speed 5128.57 samples/sec Loss 5.0289 LearningRate 0.0752 Epoch: 2 Global Step: 44300 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:13:12,870-Speed 5181.84 samples/sec Loss 4.9868 LearningRate 0.0752 Epoch: 2 Global Step: 44310 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:13:14,855-Speed 5159.81 samples/sec Loss 5.0549 LearningRate 0.0752 Epoch: 2 Global Step: 44320 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:13:16,844-Speed 5147.81 samples/sec Loss 5.1016 LearningRate 0.0752 Epoch: 2 Global Step: 44330 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:13:18,827-Speed 5166.04 samples/sec Loss 4.9936 LearningRate 0.0752 Epoch: 2 Global Step: 44340 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:13:20,814-Speed 5155.66 samples/sec Loss 5.0399 LearningRate 0.0752 Epoch: 2 Global Step: 44350 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:13:22,810-Speed 5132.20 samples/sec Loss 5.0446 LearningRate 0.0752 Epoch: 2 Global Step: 44360 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:13:24,797-Speed 5157.68 samples/sec Loss 5.0056 LearningRate 0.0752 Epoch: 2 Global Step: 44370 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:13:26,787-Speed 5145.87 samples/sec Loss 4.9579 LearningRate 0.0752 Epoch: 2 Global Step: 44380 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:13:28,767-Speed 5175.07 samples/sec Loss 4.9490 LearningRate 0.0752 Epoch: 2 Global Step: 44390 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:13:30,761-Speed 5135.28 samples/sec Loss 5.0342 LearningRate 0.0752 Epoch: 2 Global Step: 44400 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:13:32,742-Speed 5172.57 samples/sec Loss 5.0838 LearningRate 0.0752 Epoch: 2 Global Step: 44410 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:13:34,729-Speed 5153.85 samples/sec Loss 5.0087 LearningRate 0.0752 Epoch: 2 Global Step: 44420 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:13:36,735-Speed 5107.46 samples/sec Loss 5.1179 LearningRate 0.0752 Epoch: 2 Global Step: 44430 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:13:38,721-Speed 5156.29 samples/sec Loss 4.9044 LearningRate 0.0751 Epoch: 2 Global Step: 44440 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:13:40,717-Speed 5132.94 samples/sec Loss 5.0208 LearningRate 0.0751 Epoch: 2 Global Step: 44450 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:13:42,707-Speed 5147.73 samples/sec Loss 5.0042 LearningRate 0.0751 Epoch: 2 Global Step: 44460 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:13:44,688-Speed 5171.94 samples/sec Loss 4.9712 LearningRate 0.0751 Epoch: 2 Global Step: 44470 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:13:46,670-Speed 5167.10 samples/sec Loss 4.9483 LearningRate 0.0751 Epoch: 2 Global Step: 44480 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:13:48,647-Speed 5182.23 samples/sec Loss 4.8926 LearningRate 0.0751 Epoch: 2 Global Step: 44490 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:13:50,622-Speed 5185.75 samples/sec Loss 4.9408 LearningRate 0.0751 Epoch: 2 Global Step: 44500 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:13:52,596-Speed 5188.18 samples/sec Loss 5.0686 LearningRate 0.0751 Epoch: 2 Global Step: 44510 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:13:54,575-Speed 5177.12 samples/sec Loss 4.9431 LearningRate 0.0751 Epoch: 2 Global Step: 44520 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:13:56,547-Speed 5194.00 samples/sec Loss 5.0324 LearningRate 0.0751 Epoch: 2 Global Step: 44530 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:13:58,521-Speed 5189.71 samples/sec Loss 4.9670 LearningRate 0.0751 Epoch: 2 Global Step: 44540 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:14:00,505-Speed 5162.01 samples/sec Loss 4.9483 LearningRate 0.0751 Epoch: 2 Global Step: 44550 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:14:02,489-Speed 5162.85 samples/sec Loss 4.9950 LearningRate 0.0751 Epoch: 2 Global Step: 44560 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:14:04,484-Speed 5134.58 samples/sec Loss 4.8322 LearningRate 0.0751 Epoch: 2 Global Step: 44570 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:14:06,455-Speed 5199.36 samples/sec Loss 4.9560 LearningRate 0.0751 Epoch: 2 Global Step: 44580 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:14:08,423-Speed 5204.11 samples/sec Loss 4.9932 LearningRate 0.0751 Epoch: 2 Global Step: 44590 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:14:10,393-Speed 5198.55 samples/sec Loss 5.0889 LearningRate 0.0751 Epoch: 2 Global Step: 44600 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:14:12,365-Speed 5196.79 samples/sec Loss 4.9196 LearningRate 0.0751 Epoch: 2 Global Step: 44610 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:14:14,331-Speed 5209.38 samples/sec Loss 4.9700 LearningRate 0.0751 Epoch: 2 Global Step: 44620 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:14:16,300-Speed 5203.38 samples/sec Loss 5.0016 LearningRate 0.0750 Epoch: 2 Global Step: 44630 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:14:18,300-Speed 5119.98 samples/sec Loss 5.0112 LearningRate 0.0750 Epoch: 2 Global Step: 44640 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:14:20,267-Speed 5207.07 samples/sec Loss 4.9316 LearningRate 0.0750 Epoch: 2 Global Step: 44650 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:14:22,265-Speed 5128.03 samples/sec Loss 5.0138 LearningRate 0.0750 Epoch: 2 Global Step: 44660 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:14:24,236-Speed 5196.91 samples/sec Loss 4.9013 LearningRate 0.0750 Epoch: 2 Global Step: 44670 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:14:26,216-Speed 5174.03 samples/sec Loss 4.9826 LearningRate 0.0750 Epoch: 2 Global Step: 44680 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:14:28,198-Speed 5167.97 samples/sec Loss 5.0939 LearningRate 0.0750 Epoch: 2 Global Step: 44690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:14:30,167-Speed 5202.94 samples/sec Loss 5.0481 LearningRate 0.0750 Epoch: 2 Global Step: 44700 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:14:32,135-Speed 5204.63 samples/sec Loss 4.9340 LearningRate 0.0750 Epoch: 2 Global Step: 44710 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:14:34,114-Speed 5175.97 samples/sec Loss 4.9813 LearningRate 0.0750 Epoch: 2 Global Step: 44720 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:14:36,094-Speed 5174.47 samples/sec Loss 4.9620 LearningRate 0.0750 Epoch: 2 Global Step: 44730 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:14:38,067-Speed 5190.30 samples/sec Loss 4.9202 LearningRate 0.0750 Epoch: 2 Global Step: 44740 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:14:40,069-Speed 5118.41 samples/sec Loss 4.9804 LearningRate 0.0750 Epoch: 2 Global Step: 44750 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:14:42,046-Speed 5179.12 samples/sec Loss 4.8662 LearningRate 0.0750 Epoch: 2 Global Step: 44760 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:14:44,026-Speed 5174.54 samples/sec Loss 4.9832 LearningRate 0.0750 Epoch: 2 Global Step: 44770 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:14:46,015-Speed 5151.65 samples/sec Loss 4.9371 LearningRate 0.0750 Epoch: 2 Global Step: 44780 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:14:47,993-Speed 5177.90 samples/sec Loss 5.0001 LearningRate 0.0750 Epoch: 2 Global Step: 44790 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:14:49,979-Speed 5157.61 samples/sec Loss 4.8594 LearningRate 0.0750 Epoch: 2 Global Step: 44800 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:14:51,950-Speed 5196.70 samples/sec Loss 4.9845 LearningRate 0.0750 Epoch: 2 Global Step: 44810 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:14:53,936-Speed 5158.23 samples/sec Loss 4.9918 LearningRate 0.0749 Epoch: 2 Global Step: 44820 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:14:55,899-Speed 5216.74 samples/sec Loss 5.0330 LearningRate 0.0749 Epoch: 2 Global Step: 44830 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:14:57,868-Speed 5203.81 samples/sec Loss 4.8929 LearningRate 0.0749 Epoch: 2 Global Step: 44840 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:14:59,845-Speed 5180.68 samples/sec Loss 5.0429 LearningRate 0.0749 Epoch: 2 Global Step: 44850 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:15:01,824-Speed 5177.11 samples/sec Loss 4.8902 LearningRate 0.0749 Epoch: 2 Global Step: 44860 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:15:03,805-Speed 5169.37 samples/sec Loss 5.0245 LearningRate 0.0749 Epoch: 2 Global Step: 44870 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:15:05,812-Speed 5102.71 samples/sec Loss 4.8848 LearningRate 0.0749 Epoch: 2 Global Step: 44880 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:15:07,783-Speed 5199.92 samples/sec Loss 4.9407 LearningRate 0.0749 Epoch: 2 Global Step: 44890 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:15:09,765-Speed 5166.56 samples/sec Loss 4.9312 LearningRate 0.0749 Epoch: 2 Global Step: 44900 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:15:11,734-Speed 5204.13 samples/sec Loss 4.8740 LearningRate 0.0749 Epoch: 2 Global Step: 44910 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:15:13,706-Speed 5192.33 samples/sec Loss 4.9141 LearningRate 0.0749 Epoch: 2 Global Step: 44920 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:15:15,683-Speed 5183.39 samples/sec Loss 4.9907 LearningRate 0.0749 Epoch: 2 Global Step: 44930 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:15:17,654-Speed 5196.61 samples/sec Loss 4.9748 LearningRate 0.0749 Epoch: 2 Global Step: 44940 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:15:19,624-Speed 5198.42 samples/sec Loss 4.9732 LearningRate 0.0749 Epoch: 2 Global Step: 44950 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:15:21,595-Speed 5196.81 samples/sec Loss 4.9880 LearningRate 0.0749 Epoch: 2 Global Step: 44960 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:15:23,578-Speed 5166.89 samples/sec Loss 5.0119 LearningRate 0.0749 Epoch: 2 Global Step: 44970 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:15:25,557-Speed 5176.65 samples/sec Loss 5.0301 LearningRate 0.0749 Epoch: 2 Global Step: 44980 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:15:27,527-Speed 5198.68 samples/sec Loss 4.9045 LearningRate 0.0749 Epoch: 2 Global Step: 44990 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:15:29,509-Speed 5170.07 samples/sec Loss 4.9672 LearningRate 0.0749 Epoch: 2 Global Step: 45000 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:15:31,482-Speed 5189.79 samples/sec Loss 4.9946 LearningRate 0.0749 Epoch: 2 Global Step: 45010 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:15:33,466-Speed 5164.63 samples/sec Loss 4.8960 LearningRate 0.0748 Epoch: 2 Global Step: 45020 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:15:35,436-Speed 5197.73 samples/sec Loss 4.9569 LearningRate 0.0748 Epoch: 2 Global Step: 45030 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:15:37,424-Speed 5154.73 samples/sec Loss 4.9769 LearningRate 0.0748 Epoch: 2 Global Step: 45040 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:15:39,395-Speed 5197.03 samples/sec Loss 4.9276 LearningRate 0.0748 Epoch: 2 Global Step: 45050 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:15:41,366-Speed 5195.16 samples/sec Loss 4.9532 LearningRate 0.0748 Epoch: 2 Global Step: 45060 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:15:43,332-Speed 5210.08 samples/sec Loss 4.9456 LearningRate 0.0748 Epoch: 2 Global Step: 45070 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:15:45,339-Speed 5103.73 samples/sec Loss 4.9235 LearningRate 0.0748 Epoch: 2 Global Step: 45080 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:15:47,318-Speed 5177.24 samples/sec Loss 4.9064 LearningRate 0.0748 Epoch: 2 Global Step: 45090 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:15:49,296-Speed 5179.34 samples/sec Loss 4.9489 LearningRate 0.0748 Epoch: 2 Global Step: 45100 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:15:51,266-Speed 5198.28 samples/sec Loss 4.9847 LearningRate 0.0748 Epoch: 2 Global Step: 45110 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:15:53,237-Speed 5198.32 samples/sec Loss 4.9006 LearningRate 0.0748 Epoch: 2 Global Step: 45120 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:15:55,202-Speed 5211.34 samples/sec Loss 4.9642 LearningRate 0.0748 Epoch: 2 Global Step: 45130 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:15:57,180-Speed 5179.58 samples/sec Loss 4.8676 LearningRate 0.0748 Epoch: 2 Global Step: 45140 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:15:59,176-Speed 5156.18 samples/sec Loss 4.9226 LearningRate 0.0748 Epoch: 2 Global Step: 45150 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:16:01,163-Speed 5156.30 samples/sec Loss 4.8785 LearningRate 0.0748 Epoch: 2 Global Step: 45160 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:16:03,129-Speed 5207.74 samples/sec Loss 4.9896 LearningRate 0.0748 Epoch: 2 Global Step: 45170 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:16:05,112-Speed 5165.37 samples/sec Loss 4.9955 LearningRate 0.0748 Epoch: 2 Global Step: 45180 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:16:07,101-Speed 5152.34 samples/sec Loss 4.9685 LearningRate 0.0748 Epoch: 2 Global Step: 45190 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:16:09,086-Speed 5158.44 samples/sec Loss 4.9945 LearningRate 0.0748 Epoch: 2 Global Step: 45200 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:16:11,066-Speed 5175.49 samples/sec Loss 5.0028 LearningRate 0.0747 Epoch: 2 Global Step: 45210 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:16:13,040-Speed 5188.40 samples/sec Loss 5.0251 LearningRate 0.0747 Epoch: 2 Global Step: 45220 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:16:15,014-Speed 5191.40 samples/sec Loss 4.8874 LearningRate 0.0747 Epoch: 2 Global Step: 45230 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-11 02:16:16,996-Speed 5168.07 samples/sec Loss 5.0074 LearningRate 0.0747 Epoch: 2 Global Step: 45240 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:16:18,978-Speed 5166.41 samples/sec Loss 4.9414 LearningRate 0.0747 Epoch: 2 Global Step: 45250 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:16:20,947-Speed 5203.39 samples/sec Loss 4.8797 LearningRate 0.0747 Epoch: 2 Global Step: 45260 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:16:22,925-Speed 5178.00 samples/sec Loss 4.9363 LearningRate 0.0747 Epoch: 2 Global Step: 45270 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:16:24,906-Speed 5170.16 samples/sec Loss 4.9499 LearningRate 0.0747 Epoch: 2 Global Step: 45280 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:16:26,896-Speed 5147.21 samples/sec Loss 4.9666 LearningRate 0.0747 Epoch: 2 Global Step: 45290 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:16:28,870-Speed 5189.14 samples/sec Loss 4.8822 LearningRate 0.0747 Epoch: 2 Global Step: 45300 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:16:30,857-Speed 5154.12 samples/sec Loss 4.9949 LearningRate 0.0747 Epoch: 2 Global Step: 45310 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:16:32,848-Speed 5146.36 samples/sec Loss 4.8901 LearningRate 0.0747 Epoch: 2 Global Step: 45320 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:16:34,817-Speed 5203.86 samples/sec Loss 4.9698 LearningRate 0.0747 Epoch: 2 Global Step: 45330 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:16:36,806-Speed 5149.90 samples/sec Loss 4.8197 LearningRate 0.0747 Epoch: 2 Global Step: 45340 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:16:38,822-Speed 5081.63 samples/sec Loss 4.9958 LearningRate 0.0747 Epoch: 2 Global Step: 45350 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:16:40,793-Speed 5196.24 samples/sec Loss 5.0036 LearningRate 0.0747 Epoch: 2 Global Step: 45360 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:16:42,763-Speed 5199.92 samples/sec Loss 4.9471 LearningRate 0.0747 Epoch: 2 Global Step: 45370 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:16:44,729-Speed 5209.81 samples/sec Loss 4.9147 LearningRate 0.0747 Epoch: 2 Global Step: 45380 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:16:46,703-Speed 5189.32 samples/sec Loss 5.0039 LearningRate 0.0747 Epoch: 2 Global Step: 45390 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:16:48,685-Speed 5168.36 samples/sec Loss 4.9429 LearningRate 0.0746 Epoch: 2 Global Step: 45400 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:16:50,699-Speed 5085.79 samples/sec Loss 5.0349 LearningRate 0.0746 Epoch: 2 Global Step: 45410 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:16:52,687-Speed 5151.15 samples/sec Loss 4.8432 LearningRate 0.0746 Epoch: 2 Global Step: 45420 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:16:54,683-Speed 5133.66 samples/sec Loss 4.9127 LearningRate 0.0746 Epoch: 2 Global Step: 45430 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:16:56,650-Speed 5207.80 samples/sec Loss 4.9181 LearningRate 0.0746 Epoch: 2 Global Step: 45440 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:16:58,631-Speed 5170.18 samples/sec Loss 4.9675 LearningRate 0.0746 Epoch: 2 Global Step: 45450 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:17:00,600-Speed 5204.36 samples/sec Loss 4.9544 LearningRate 0.0746 Epoch: 2 Global Step: 45460 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:17:02,583-Speed 5164.84 samples/sec Loss 4.9734 LearningRate 0.0746 Epoch: 2 Global Step: 45470 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:17:04,569-Speed 5158.02 samples/sec Loss 4.9401 LearningRate 0.0746 Epoch: 2 Global Step: 45480 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:17:06,536-Speed 5207.00 samples/sec Loss 4.8936 LearningRate 0.0746 Epoch: 2 Global Step: 45490 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:17:08,509-Speed 5191.95 samples/sec Loss 4.8317 LearningRate 0.0746 Epoch: 2 Global Step: 45500 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:17:10,495-Speed 5158.27 samples/sec Loss 5.0933 LearningRate 0.0746 Epoch: 2 Global Step: 45510 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:17:12,479-Speed 5162.07 samples/sec Loss 4.9239 LearningRate 0.0746 Epoch: 2 Global Step: 45520 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:17:14,479-Speed 5121.57 samples/sec Loss 4.8994 LearningRate 0.0746 Epoch: 2 Global Step: 45530 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:17:16,460-Speed 5171.18 samples/sec Loss 5.0116 LearningRate 0.0746 Epoch: 2 Global Step: 45540 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:17:18,430-Speed 5200.99 samples/sec Loss 4.9812 LearningRate 0.0746 Epoch: 2 Global Step: 45550 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:17:20,402-Speed 5194.24 samples/sec Loss 5.0100 LearningRate 0.0746 Epoch: 2 Global Step: 45560 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:17:22,382-Speed 5173.17 samples/sec Loss 4.9629 LearningRate 0.0746 Epoch: 2 Global Step: 45570 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:17:24,348-Speed 5208.36 samples/sec Loss 5.0219 LearningRate 0.0746 Epoch: 2 Global Step: 45580 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:17:26,333-Speed 5161.60 samples/sec Loss 4.9260 LearningRate 0.0746 Epoch: 2 Global Step: 45590 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:17:28,305-Speed 5194.28 samples/sec Loss 4.8922 LearningRate 0.0745 Epoch: 2 Global Step: 45600 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:17:30,274-Speed 5202.00 samples/sec Loss 4.9103 LearningRate 0.0745 Epoch: 2 Global Step: 45610 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:17:32,249-Speed 5187.44 samples/sec Loss 4.9765 LearningRate 0.0745 Epoch: 2 Global Step: 45620 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:17:34,249-Speed 5121.70 samples/sec Loss 4.8831 LearningRate 0.0745 Epoch: 2 Global Step: 45630 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:17:36,237-Speed 5152.36 samples/sec Loss 4.9456 LearningRate 0.0745 Epoch: 2 Global Step: 45640 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-11 02:17:38,198-Speed 5224.28 samples/sec Loss 4.8364 LearningRate 0.0745 Epoch: 2 Global Step: 45650 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:17:40,161-Speed 5217.71 samples/sec Loss 4.8922 LearningRate 0.0745 Epoch: 2 Global Step: 45660 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:17:42,144-Speed 5165.44 samples/sec Loss 4.9651 LearningRate 0.0745 Epoch: 2 Global Step: 45670 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:17:44,110-Speed 5209.52 samples/sec Loss 4.9258 LearningRate 0.0745 Epoch: 2 Global Step: 45680 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:17:46,106-Speed 5133.85 samples/sec Loss 4.9841 LearningRate 0.0745 Epoch: 2 Global Step: 45690 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:17:48,082-Speed 5182.84 samples/sec Loss 4.9877 LearningRate 0.0745 Epoch: 2 Global Step: 45700 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:17:50,050-Speed 5205.68 samples/sec Loss 4.9420 LearningRate 0.0745 Epoch: 2 Global Step: 45710 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:17:52,018-Speed 5205.39 samples/sec Loss 4.8162 LearningRate 0.0745 Epoch: 2 Global Step: 45720 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:17:53,986-Speed 5204.92 samples/sec Loss 4.8553 LearningRate 0.0745 Epoch: 2 Global Step: 45730 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:17:55,978-Speed 5141.01 samples/sec Loss 4.9475 LearningRate 0.0745 Epoch: 2 Global Step: 45740 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:17:57,965-Speed 5156.57 samples/sec Loss 4.9628 LearningRate 0.0745 Epoch: 2 Global Step: 45750 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:17:59,944-Speed 5176.91 samples/sec Loss 4.8701 LearningRate 0.0745 Epoch: 2 Global Step: 45760 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:18:01,923-Speed 5175.50 samples/sec Loss 5.0185 LearningRate 0.0745 Epoch: 2 Global Step: 45770 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:18:03,905-Speed 5166.92 samples/sec Loss 4.9095 LearningRate 0.0745 Epoch: 2 Global Step: 45780 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:18:05,874-Speed 5202.30 samples/sec Loss 4.8594 LearningRate 0.0744 Epoch: 2 Global Step: 45790 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:18:07,842-Speed 5206.23 samples/sec Loss 4.8631 LearningRate 0.0744 Epoch: 2 Global Step: 45800 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:18:09,821-Speed 5175.78 samples/sec Loss 4.9330 LearningRate 0.0744 Epoch: 2 Global Step: 45810 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:18:11,808-Speed 5154.64 samples/sec Loss 4.9278 LearningRate 0.0744 Epoch: 2 Global Step: 45820 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:18:13,775-Speed 5207.91 samples/sec Loss 4.9399 LearningRate 0.0744 Epoch: 2 Global Step: 45830 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:18:15,765-Speed 5146.79 samples/sec Loss 4.8375 LearningRate 0.0744 Epoch: 2 Global Step: 45840 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:18:17,752-Speed 5154.34 samples/sec Loss 4.9634 LearningRate 0.0744 Epoch: 2 Global Step: 45850 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:18:19,727-Speed 5187.20 samples/sec Loss 4.8401 LearningRate 0.0744 Epoch: 2 Global Step: 45860 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:18:21,705-Speed 5179.72 samples/sec Loss 4.9489 LearningRate 0.0744 Epoch: 2 Global Step: 45870 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:18:23,681-Speed 5184.95 samples/sec Loss 4.9076 LearningRate 0.0744 Epoch: 2 Global Step: 45880 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:18:25,668-Speed 5154.85 samples/sec Loss 4.9639 LearningRate 0.0744 Epoch: 2 Global Step: 45890 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:18:27,659-Speed 5144.57 samples/sec Loss 4.8834 LearningRate 0.0744 Epoch: 2 Global Step: 45900 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:18:29,651-Speed 5141.40 samples/sec Loss 4.9562 LearningRate 0.0744 Epoch: 2 Global Step: 45910 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:18:31,623-Speed 5194.93 samples/sec Loss 4.9957 LearningRate 0.0744 Epoch: 2 Global Step: 45920 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:18:33,610-Speed 5156.45 samples/sec Loss 4.8662 LearningRate 0.0744 Epoch: 2 Global Step: 45930 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:18:35,590-Speed 5172.62 samples/sec Loss 4.9567 LearningRate 0.0744 Epoch: 2 Global Step: 45940 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:18:37,579-Speed 5149.59 samples/sec Loss 5.0147 LearningRate 0.0744 Epoch: 2 Global Step: 45950 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:18:39,560-Speed 5171.22 samples/sec Loss 4.9251 LearningRate 0.0744 Epoch: 2 Global Step: 45960 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:18:41,542-Speed 5168.68 samples/sec Loss 4.9172 LearningRate 0.0744 Epoch: 2 Global Step: 45970 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:18:43,540-Speed 5127.69 samples/sec Loss 4.9240 LearningRate 0.0743 Epoch: 2 Global Step: 45980 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:18:45,523-Speed 5164.12 samples/sec Loss 4.9963 LearningRate 0.0743 Epoch: 2 Global Step: 45990 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:18:47,506-Speed 5167.43 samples/sec Loss 4.9671 LearningRate 0.0743 Epoch: 2 Global Step: 46000 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:19:14,289-[lfw][46000]XNorm: 22.325820 Training: 2022-04-11 02:19:14,289-[lfw][46000]Accuracy-Flip: 0.99767+-0.00309 Training: 2022-04-11 02:19:14,290-[lfw][46000]Accuracy-Highest: 0.99783 Training: 2022-04-11 02:19:45,364-[cfp_fp][46000]XNorm: 19.882292 Training: 2022-04-11 02:19:45,364-[cfp_fp][46000]Accuracy-Flip: 0.97871+-0.00594 Training: 2022-04-11 02:19:45,365-[cfp_fp][46000]Accuracy-Highest: 0.97871 Training: 2022-04-11 02:20:11,834-[agedb_30][46000]XNorm: 22.215084 Training: 2022-04-11 02:20:11,834-[agedb_30][46000]Accuracy-Flip: 0.97583+-0.00583 Training: 2022-04-11 02:20:11,835-[agedb_30][46000]Accuracy-Highest: 0.97583 Training: 2022-04-11 02:20:13,812-Speed 118.65 samples/sec Loss 4.8634 LearningRate 0.0743 Epoch: 2 Global Step: 46010 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:20:15,785-Speed 5191.39 samples/sec Loss 5.0115 LearningRate 0.0743 Epoch: 2 Global Step: 46020 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:20:17,776-Speed 5144.20 samples/sec Loss 4.7964 LearningRate 0.0743 Epoch: 2 Global Step: 46030 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:20:19,750-Speed 5189.63 samples/sec Loss 4.8456 LearningRate 0.0743 Epoch: 2 Global Step: 46040 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:20:21,743-Speed 5138.43 samples/sec Loss 4.9034 LearningRate 0.0743 Epoch: 2 Global Step: 46050 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:20:23,720-Speed 5182.81 samples/sec Loss 4.8616 LearningRate 0.0743 Epoch: 2 Global Step: 46060 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:20:25,700-Speed 5172.96 samples/sec Loss 4.9411 LearningRate 0.0743 Epoch: 2 Global Step: 46070 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:20:27,697-Speed 5128.17 samples/sec Loss 4.8895 LearningRate 0.0743 Epoch: 2 Global Step: 46080 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:20:29,682-Speed 5161.21 samples/sec Loss 5.0089 LearningRate 0.0743 Epoch: 2 Global Step: 46090 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:20:31,658-Speed 5183.92 samples/sec Loss 4.8132 LearningRate 0.0743 Epoch: 2 Global Step: 46100 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:20:33,644-Speed 5159.38 samples/sec Loss 4.9002 LearningRate 0.0743 Epoch: 2 Global Step: 46110 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:20:35,612-Speed 5205.03 samples/sec Loss 4.8974 LearningRate 0.0743 Epoch: 2 Global Step: 46120 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:20:37,595-Speed 5165.91 samples/sec Loss 4.8399 LearningRate 0.0743 Epoch: 2 Global Step: 46130 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:20:39,582-Speed 5153.88 samples/sec Loss 4.9196 LearningRate 0.0743 Epoch: 2 Global Step: 46140 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:20:41,548-Speed 5210.47 samples/sec Loss 4.8894 LearningRate 0.0743 Epoch: 2 Global Step: 46150 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:20:43,518-Speed 5200.14 samples/sec Loss 4.9261 LearningRate 0.0743 Epoch: 2 Global Step: 46160 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:20:45,505-Speed 5154.24 samples/sec Loss 4.9229 LearningRate 0.0743 Epoch: 2 Global Step: 46170 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:20:47,484-Speed 5175.73 samples/sec Loss 4.9704 LearningRate 0.0742 Epoch: 2 Global Step: 46180 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:20:49,458-Speed 5188.97 samples/sec Loss 4.9400 LearningRate 0.0742 Epoch: 2 Global Step: 46190 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:20:51,441-Speed 5166.82 samples/sec Loss 4.9666 LearningRate 0.0742 Epoch: 2 Global Step: 46200 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:20:53,421-Speed 5173.93 samples/sec Loss 4.8793 LearningRate 0.0742 Epoch: 2 Global Step: 46210 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:20:55,395-Speed 5188.58 samples/sec Loss 4.9133 LearningRate 0.0742 Epoch: 2 Global Step: 46220 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:20:57,357-Speed 5221.74 samples/sec Loss 4.9608 LearningRate 0.0742 Epoch: 2 Global Step: 46230 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:20:59,329-Speed 5194.23 samples/sec Loss 4.8757 LearningRate 0.0742 Epoch: 2 Global Step: 46240 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:21:01,300-Speed 5198.28 samples/sec Loss 4.9381 LearningRate 0.0742 Epoch: 2 Global Step: 46250 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:21:03,274-Speed 5188.78 samples/sec Loss 4.8636 LearningRate 0.0742 Epoch: 2 Global Step: 46260 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:21:05,266-Speed 5142.06 samples/sec Loss 4.9167 LearningRate 0.0742 Epoch: 2 Global Step: 46270 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:21:07,253-Speed 5154.41 samples/sec Loss 4.9380 LearningRate 0.0742 Epoch: 2 Global Step: 46280 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:21:09,230-Speed 5180.03 samples/sec Loss 4.9733 LearningRate 0.0742 Epoch: 2 Global Step: 46290 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:21:11,218-Speed 5152.16 samples/sec Loss 4.9338 LearningRate 0.0742 Epoch: 2 Global Step: 46300 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:21:13,202-Speed 5163.83 samples/sec Loss 4.9153 LearningRate 0.0742 Epoch: 2 Global Step: 46310 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:21:15,197-Speed 5135.37 samples/sec Loss 4.8441 LearningRate 0.0742 Epoch: 2 Global Step: 46320 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:21:17,174-Speed 5182.23 samples/sec Loss 4.9529 LearningRate 0.0742 Epoch: 2 Global Step: 46330 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:21:19,149-Speed 5187.20 samples/sec Loss 4.9193 LearningRate 0.0742 Epoch: 2 Global Step: 46340 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:21:21,159-Speed 5093.90 samples/sec Loss 4.8012 LearningRate 0.0742 Epoch: 2 Global Step: 46350 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:21:23,134-Speed 5188.65 samples/sec Loss 4.7876 LearningRate 0.0742 Epoch: 2 Global Step: 46360 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:21:25,114-Speed 5172.40 samples/sec Loss 4.8161 LearningRate 0.0741 Epoch: 2 Global Step: 46370 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:21:27,098-Speed 5164.28 samples/sec Loss 4.8757 LearningRate 0.0741 Epoch: 2 Global Step: 46380 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:21:29,078-Speed 5173.02 samples/sec Loss 4.8909 LearningRate 0.0741 Epoch: 2 Global Step: 46390 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:21:31,072-Speed 5136.16 samples/sec Loss 4.8768 LearningRate 0.0741 Epoch: 2 Global Step: 46400 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:21:33,057-Speed 5161.86 samples/sec Loss 4.9510 LearningRate 0.0741 Epoch: 2 Global Step: 46410 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:21:35,029-Speed 5194.49 samples/sec Loss 4.8901 LearningRate 0.0741 Epoch: 2 Global Step: 46420 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:21:37,009-Speed 5173.25 samples/sec Loss 4.7431 LearningRate 0.0741 Epoch: 2 Global Step: 46430 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:21:38,998-Speed 5148.69 samples/sec Loss 4.8992 LearningRate 0.0741 Epoch: 2 Global Step: 46440 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:21:40,969-Speed 5198.64 samples/sec Loss 4.9286 LearningRate 0.0741 Epoch: 2 Global Step: 46450 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:21:42,937-Speed 5204.39 samples/sec Loss 4.7929 LearningRate 0.0741 Epoch: 2 Global Step: 46460 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:21:44,916-Speed 5175.51 samples/sec Loss 4.9014 LearningRate 0.0741 Epoch: 2 Global Step: 46470 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:21:46,889-Speed 5191.38 samples/sec Loss 4.9300 LearningRate 0.0741 Epoch: 2 Global Step: 46480 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:21:48,861-Speed 5196.43 samples/sec Loss 4.8552 LearningRate 0.0741 Epoch: 2 Global Step: 46490 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:21:50,829-Speed 5204.88 samples/sec Loss 4.8629 LearningRate 0.0741 Epoch: 2 Global Step: 46500 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:21:52,806-Speed 5180.05 samples/sec Loss 4.8862 LearningRate 0.0741 Epoch: 2 Global Step: 46510 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:21:54,786-Speed 5174.76 samples/sec Loss 4.9024 LearningRate 0.0741 Epoch: 2 Global Step: 46520 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:21:56,757-Speed 5196.96 samples/sec Loss 4.9168 LearningRate 0.0741 Epoch: 2 Global Step: 46530 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:21:58,753-Speed 5132.47 samples/sec Loss 4.8998 LearningRate 0.0741 Epoch: 2 Global Step: 46540 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:22:00,721-Speed 5202.51 samples/sec Loss 4.9620 LearningRate 0.0741 Epoch: 2 Global Step: 46550 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:22:02,699-Speed 5178.76 samples/sec Loss 4.8756 LearningRate 0.0741 Epoch: 2 Global Step: 46560 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:22:04,670-Speed 5198.33 samples/sec Loss 4.9921 LearningRate 0.0740 Epoch: 2 Global Step: 46570 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:22:06,647-Speed 5179.32 samples/sec Loss 4.8472 LearningRate 0.0740 Epoch: 2 Global Step: 46580 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:22:08,626-Speed 5177.80 samples/sec Loss 4.8294 LearningRate 0.0740 Epoch: 2 Global Step: 46590 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:22:10,599-Speed 5192.64 samples/sec Loss 4.7964 LearningRate 0.0740 Epoch: 2 Global Step: 46600 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:22:12,589-Speed 5147.11 samples/sec Loss 4.8692 LearningRate 0.0740 Epoch: 2 Global Step: 46610 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:22:14,564-Speed 5185.62 samples/sec Loss 4.9694 LearningRate 0.0740 Epoch: 2 Global Step: 46620 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:22:16,537-Speed 5190.66 samples/sec Loss 4.8682 LearningRate 0.0740 Epoch: 2 Global Step: 46630 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:22:18,506-Speed 5204.22 samples/sec Loss 4.8396 LearningRate 0.0740 Epoch: 2 Global Step: 46640 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:22:20,489-Speed 5166.40 samples/sec Loss 4.9638 LearningRate 0.0740 Epoch: 2 Global Step: 46650 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:22:22,459-Speed 5199.28 samples/sec Loss 4.7392 LearningRate 0.0740 Epoch: 2 Global Step: 46660 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:22:24,426-Speed 5207.02 samples/sec Loss 4.8521 LearningRate 0.0740 Epoch: 2 Global Step: 46670 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:22:26,405-Speed 5175.79 samples/sec Loss 4.8791 LearningRate 0.0740 Epoch: 2 Global Step: 46680 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:22:28,380-Speed 5186.57 samples/sec Loss 4.9421 LearningRate 0.0740 Epoch: 2 Global Step: 46690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:22:30,370-Speed 5148.48 samples/sec Loss 4.9893 LearningRate 0.0740 Epoch: 2 Global Step: 46700 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:22:32,355-Speed 5158.81 samples/sec Loss 4.9200 LearningRate 0.0740 Epoch: 2 Global Step: 46710 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:22:34,332-Speed 5181.26 samples/sec Loss 4.9131 LearningRate 0.0740 Epoch: 2 Global Step: 46720 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:22:36,301-Speed 5203.76 samples/sec Loss 4.7842 LearningRate 0.0740 Epoch: 2 Global Step: 46730 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:22:38,289-Speed 5152.39 samples/sec Loss 4.8330 LearningRate 0.0740 Epoch: 2 Global Step: 46740 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:22:40,258-Speed 5203.60 samples/sec Loss 4.8617 LearningRate 0.0740 Epoch: 2 Global Step: 46750 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:22:42,246-Speed 5151.78 samples/sec Loss 4.9763 LearningRate 0.0739 Epoch: 2 Global Step: 46760 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:22:44,217-Speed 5196.72 samples/sec Loss 4.9234 LearningRate 0.0739 Epoch: 2 Global Step: 46770 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:22:46,185-Speed 5205.75 samples/sec Loss 4.8984 LearningRate 0.0739 Epoch: 2 Global Step: 46780 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:22:48,168-Speed 5165.78 samples/sec Loss 4.9161 LearningRate 0.0739 Epoch: 2 Global Step: 46790 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:22:50,161-Speed 5138.28 samples/sec Loss 4.9574 LearningRate 0.0739 Epoch: 2 Global Step: 46800 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:22:52,161-Speed 5122.77 samples/sec Loss 4.7994 LearningRate 0.0739 Epoch: 2 Global Step: 46810 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:22:54,156-Speed 5134.02 samples/sec Loss 4.9052 LearningRate 0.0739 Epoch: 2 Global Step: 46820 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:22:56,124-Speed 5206.24 samples/sec Loss 4.9320 LearningRate 0.0739 Epoch: 2 Global Step: 46830 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-11 02:22:58,092-Speed 5203.28 samples/sec Loss 4.8821 LearningRate 0.0739 Epoch: 2 Global Step: 46840 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:23:00,067-Speed 5188.88 samples/sec Loss 4.9013 LearningRate 0.0739 Epoch: 2 Global Step: 46850 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:23:02,064-Speed 5127.12 samples/sec Loss 4.7218 LearningRate 0.0739 Epoch: 2 Global Step: 46860 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:23:04,035-Speed 5197.62 samples/sec Loss 4.8609 LearningRate 0.0739 Epoch: 2 Global Step: 46870 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:23:06,005-Speed 5199.35 samples/sec Loss 4.8689 LearningRate 0.0739 Epoch: 2 Global Step: 46880 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:23:07,976-Speed 5196.94 samples/sec Loss 4.8115 LearningRate 0.0739 Epoch: 2 Global Step: 46890 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:23:09,953-Speed 5183.91 samples/sec Loss 4.9378 LearningRate 0.0739 Epoch: 2 Global Step: 46900 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:23:11,941-Speed 5149.84 samples/sec Loss 4.9770 LearningRate 0.0739 Epoch: 2 Global Step: 46910 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:23:13,912-Speed 5199.26 samples/sec Loss 4.8906 LearningRate 0.0739 Epoch: 2 Global Step: 46920 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:23:15,896-Speed 5160.54 samples/sec Loss 4.8547 LearningRate 0.0739 Epoch: 2 Global Step: 46930 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:23:17,893-Speed 5132.21 samples/sec Loss 4.9023 LearningRate 0.0739 Epoch: 2 Global Step: 46940 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:23:19,863-Speed 5199.85 samples/sec Loss 4.8203 LearningRate 0.0738 Epoch: 2 Global Step: 46950 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:23:21,832-Speed 5202.15 samples/sec Loss 4.8510 LearningRate 0.0738 Epoch: 2 Global Step: 46960 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:23:23,809-Speed 5179.55 samples/sec Loss 4.8374 LearningRate 0.0738 Epoch: 2 Global Step: 46970 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:23:25,777-Speed 5207.11 samples/sec Loss 4.8404 LearningRate 0.0738 Epoch: 2 Global Step: 46980 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:23:27,753-Speed 5182.42 samples/sec Loss 4.9172 LearningRate 0.0738 Epoch: 2 Global Step: 46990 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:23:29,721-Speed 5204.68 samples/sec Loss 4.8512 LearningRate 0.0738 Epoch: 2 Global Step: 47000 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:23:31,688-Speed 5208.26 samples/sec Loss 4.8911 LearningRate 0.0738 Epoch: 2 Global Step: 47010 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:23:33,664-Speed 5184.24 samples/sec Loss 4.9563 LearningRate 0.0738 Epoch: 2 Global Step: 47020 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:23:35,638-Speed 5189.31 samples/sec Loss 4.8034 LearningRate 0.0738 Epoch: 2 Global Step: 47030 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:23:37,607-Speed 5200.24 samples/sec Loss 4.9110 LearningRate 0.0738 Epoch: 2 Global Step: 47040 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:23:39,591-Speed 5164.32 samples/sec Loss 4.9050 LearningRate 0.0738 Epoch: 2 Global Step: 47050 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:23:41,560-Speed 5203.09 samples/sec Loss 4.8312 LearningRate 0.0738 Epoch: 2 Global Step: 47060 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:23:43,530-Speed 5200.04 samples/sec Loss 4.8259 LearningRate 0.0738 Epoch: 2 Global Step: 47070 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:23:45,512-Speed 5167.61 samples/sec Loss 4.8657 LearningRate 0.0738 Epoch: 2 Global Step: 47080 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:23:47,490-Speed 5180.24 samples/sec Loss 4.9556 LearningRate 0.0738 Epoch: 2 Global Step: 47090 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:23:49,457-Speed 5205.88 samples/sec Loss 4.8458 LearningRate 0.0738 Epoch: 2 Global Step: 47100 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:23:51,428-Speed 5196.65 samples/sec Loss 4.9594 LearningRate 0.0738 Epoch: 2 Global Step: 47110 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:23:53,412-Speed 5162.70 samples/sec Loss 4.8857 LearningRate 0.0738 Epoch: 2 Global Step: 47120 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:23:55,411-Speed 5124.90 samples/sec Loss 4.8042 LearningRate 0.0738 Epoch: 2 Global Step: 47130 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:23:57,385-Speed 5188.75 samples/sec Loss 4.8836 LearningRate 0.0738 Epoch: 2 Global Step: 47140 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:23:59,370-Speed 5160.12 samples/sec Loss 4.8729 LearningRate 0.0737 Epoch: 2 Global Step: 47150 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:24:01,370-Speed 5123.97 samples/sec Loss 4.8961 LearningRate 0.0737 Epoch: 2 Global Step: 47160 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:24:03,343-Speed 5192.09 samples/sec Loss 4.8680 LearningRate 0.0737 Epoch: 2 Global Step: 47170 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:24:05,333-Speed 5145.90 samples/sec Loss 4.7656 LearningRate 0.0737 Epoch: 2 Global Step: 47180 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:24:07,301-Speed 5205.78 samples/sec Loss 4.8882 LearningRate 0.0737 Epoch: 2 Global Step: 47190 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:24:09,287-Speed 5158.13 samples/sec Loss 4.7867 LearningRate 0.0737 Epoch: 2 Global Step: 47200 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:24:11,285-Speed 5126.54 samples/sec Loss 4.8630 LearningRate 0.0737 Epoch: 2 Global Step: 47210 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:24:13,272-Speed 5156.70 samples/sec Loss 4.9306 LearningRate 0.0737 Epoch: 2 Global Step: 47220 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:24:15,250-Speed 5176.01 samples/sec Loss 4.8703 LearningRate 0.0737 Epoch: 2 Global Step: 47230 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:24:17,233-Speed 5167.88 samples/sec Loss 4.9368 LearningRate 0.0737 Epoch: 2 Global Step: 47240 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:24:19,203-Speed 5197.98 samples/sec Loss 4.8611 LearningRate 0.0737 Epoch: 2 Global Step: 47250 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:24:21,181-Speed 5180.94 samples/sec Loss 4.9446 LearningRate 0.0737 Epoch: 2 Global Step: 47260 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:24:23,184-Speed 5115.47 samples/sec Loss 4.9006 LearningRate 0.0737 Epoch: 2 Global Step: 47270 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:24:25,160-Speed 5183.13 samples/sec Loss 4.8292 LearningRate 0.0737 Epoch: 2 Global Step: 47280 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:24:27,136-Speed 5183.19 samples/sec Loss 4.9280 LearningRate 0.0737 Epoch: 2 Global Step: 47290 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:24:29,104-Speed 5203.72 samples/sec Loss 4.7778 LearningRate 0.0737 Epoch: 2 Global Step: 47300 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:24:31,080-Speed 5185.97 samples/sec Loss 4.9283 LearningRate 0.0737 Epoch: 2 Global Step: 47310 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:24:33,057-Speed 5181.69 samples/sec Loss 4.9331 LearningRate 0.0737 Epoch: 2 Global Step: 47320 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:24:35,036-Speed 5175.66 samples/sec Loss 4.8781 LearningRate 0.0737 Epoch: 2 Global Step: 47330 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:24:37,041-Speed 5106.86 samples/sec Loss 4.9703 LearningRate 0.0736 Epoch: 2 Global Step: 47340 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:24:39,030-Speed 5150.38 samples/sec Loss 4.9651 LearningRate 0.0736 Epoch: 2 Global Step: 47350 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:24:41,035-Speed 5110.51 samples/sec Loss 4.8840 LearningRate 0.0736 Epoch: 2 Global Step: 47360 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:24:43,024-Speed 5149.92 samples/sec Loss 4.8261 LearningRate 0.0736 Epoch: 2 Global Step: 47370 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:24:45,007-Speed 5167.12 samples/sec Loss 4.9016 LearningRate 0.0736 Epoch: 2 Global Step: 47380 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:24:46,982-Speed 5186.63 samples/sec Loss 4.8820 LearningRate 0.0736 Epoch: 2 Global Step: 47390 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:24:48,968-Speed 5155.91 samples/sec Loss 4.9592 LearningRate 0.0736 Epoch: 2 Global Step: 47400 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:24:50,970-Speed 5116.43 samples/sec Loss 4.8319 LearningRate 0.0736 Epoch: 2 Global Step: 47410 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:24:52,986-Speed 5082.77 samples/sec Loss 4.8324 LearningRate 0.0736 Epoch: 2 Global Step: 47420 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:24:54,961-Speed 5185.96 samples/sec Loss 4.8368 LearningRate 0.0736 Epoch: 2 Global Step: 47430 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:24:56,936-Speed 5186.91 samples/sec Loss 4.7919 LearningRate 0.0736 Epoch: 2 Global Step: 47440 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:24:58,897-Speed 5222.45 samples/sec Loss 4.9404 LearningRate 0.0736 Epoch: 2 Global Step: 47450 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:25:00,868-Speed 5196.49 samples/sec Loss 4.8575 LearningRate 0.0736 Epoch: 2 Global Step: 47460 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:25:02,852-Speed 5165.12 samples/sec Loss 4.9126 LearningRate 0.0736 Epoch: 2 Global Step: 47470 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:25:04,834-Speed 5166.58 samples/sec Loss 4.8402 LearningRate 0.0736 Epoch: 2 Global Step: 47480 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:25:06,817-Speed 5166.06 samples/sec Loss 4.8762 LearningRate 0.0736 Epoch: 2 Global Step: 47490 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:25:08,787-Speed 5200.21 samples/sec Loss 4.8070 LearningRate 0.0736 Epoch: 2 Global Step: 47500 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:25:10,766-Speed 5176.80 samples/sec Loss 4.9077 LearningRate 0.0736 Epoch: 2 Global Step: 47510 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:25:12,752-Speed 5156.34 samples/sec Loss 4.8551 LearningRate 0.0736 Epoch: 2 Global Step: 47520 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:25:14,732-Speed 5175.31 samples/sec Loss 4.8388 LearningRate 0.0736 Epoch: 2 Global Step: 47530 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:25:16,710-Speed 5177.00 samples/sec Loss 4.8707 LearningRate 0.0735 Epoch: 2 Global Step: 47540 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:25:18,676-Speed 5209.30 samples/sec Loss 4.8197 LearningRate 0.0735 Epoch: 2 Global Step: 47550 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:25:20,648-Speed 5196.72 samples/sec Loss 4.8490 LearningRate 0.0735 Epoch: 2 Global Step: 47560 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:25:22,622-Speed 5189.36 samples/sec Loss 4.7844 LearningRate 0.0735 Epoch: 2 Global Step: 47570 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:25:24,612-Speed 5148.15 samples/sec Loss 4.8451 LearningRate 0.0735 Epoch: 2 Global Step: 47580 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:25:26,588-Speed 5184.05 samples/sec Loss 4.8280 LearningRate 0.0735 Epoch: 2 Global Step: 47590 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:25:28,576-Speed 5151.73 samples/sec Loss 4.9582 LearningRate 0.0735 Epoch: 2 Global Step: 47600 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:25:30,554-Speed 5178.92 samples/sec Loss 4.8672 LearningRate 0.0735 Epoch: 2 Global Step: 47610 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:25:32,523-Speed 5203.49 samples/sec Loss 4.8620 LearningRate 0.0735 Epoch: 2 Global Step: 47620 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:25:34,496-Speed 5189.57 samples/sec Loss 4.7966 LearningRate 0.0735 Epoch: 2 Global Step: 47630 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:25:36,478-Speed 5169.27 samples/sec Loss 4.9031 LearningRate 0.0735 Epoch: 2 Global Step: 47640 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:25:38,484-Speed 5105.94 samples/sec Loss 4.9442 LearningRate 0.0735 Epoch: 2 Global Step: 47650 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:25:40,463-Speed 5177.06 samples/sec Loss 4.8500 LearningRate 0.0735 Epoch: 2 Global Step: 47660 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:25:42,434-Speed 5195.04 samples/sec Loss 4.7707 LearningRate 0.0735 Epoch: 2 Global Step: 47670 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:25:44,408-Speed 5189.40 samples/sec Loss 4.8876 LearningRate 0.0735 Epoch: 2 Global Step: 47680 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:25:46,382-Speed 5191.55 samples/sec Loss 4.8880 LearningRate 0.0735 Epoch: 2 Global Step: 47690 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:25:48,356-Speed 5189.34 samples/sec Loss 4.8619 LearningRate 0.0735 Epoch: 2 Global Step: 47700 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:25:50,328-Speed 5192.30 samples/sec Loss 4.7479 LearningRate 0.0735 Epoch: 2 Global Step: 47710 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:25:52,314-Speed 5159.62 samples/sec Loss 4.8731 LearningRate 0.0735 Epoch: 2 Global Step: 47720 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:25:54,299-Speed 5160.17 samples/sec Loss 4.9173 LearningRate 0.0734 Epoch: 2 Global Step: 47730 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:25:56,283-Speed 5164.23 samples/sec Loss 4.8816 LearningRate 0.0734 Epoch: 2 Global Step: 47740 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:25:58,257-Speed 5187.32 samples/sec Loss 4.8362 LearningRate 0.0734 Epoch: 2 Global Step: 47750 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:26:00,242-Speed 5160.87 samples/sec Loss 4.9150 LearningRate 0.0734 Epoch: 2 Global Step: 47760 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:26:02,224-Speed 5167.10 samples/sec Loss 4.9179 LearningRate 0.0734 Epoch: 2 Global Step: 47770 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:26:04,198-Speed 5190.54 samples/sec Loss 4.9558 LearningRate 0.0734 Epoch: 2 Global Step: 47780 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:26:06,191-Speed 5139.19 samples/sec Loss 4.8696 LearningRate 0.0734 Epoch: 2 Global Step: 47790 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:26:08,175-Speed 5163.92 samples/sec Loss 4.9406 LearningRate 0.0734 Epoch: 2 Global Step: 47800 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:26:10,164-Speed 5148.76 samples/sec Loss 4.8466 LearningRate 0.0734 Epoch: 2 Global Step: 47810 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:26:12,141-Speed 5183.40 samples/sec Loss 4.8050 LearningRate 0.0734 Epoch: 2 Global Step: 47820 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:26:14,116-Speed 5184.00 samples/sec Loss 4.8377 LearningRate 0.0734 Epoch: 2 Global Step: 47830 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:26:16,090-Speed 5192.21 samples/sec Loss 4.9244 LearningRate 0.0734 Epoch: 2 Global Step: 47840 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:26:18,058-Speed 5205.14 samples/sec Loss 4.9161 LearningRate 0.0734 Epoch: 2 Global Step: 47850 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:26:20,027-Speed 5201.21 samples/sec Loss 4.8081 LearningRate 0.0734 Epoch: 2 Global Step: 47860 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:26:22,000-Speed 5190.31 samples/sec Loss 4.8283 LearningRate 0.0734 Epoch: 2 Global Step: 47870 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:26:23,974-Speed 5191.42 samples/sec Loss 4.9669 LearningRate 0.0734 Epoch: 2 Global Step: 47880 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-11 02:26:25,958-Speed 5160.91 samples/sec Loss 4.8256 LearningRate 0.0734 Epoch: 2 Global Step: 47890 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:26:27,948-Speed 5149.43 samples/sec Loss 4.8602 LearningRate 0.0734 Epoch: 2 Global Step: 47900 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:26:29,921-Speed 5192.27 samples/sec Loss 4.8121 LearningRate 0.0734 Epoch: 2 Global Step: 47910 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:26:31,889-Speed 5204.25 samples/sec Loss 4.8596 LearningRate 0.0734 Epoch: 2 Global Step: 47920 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:26:33,869-Speed 5173.51 samples/sec Loss 4.7922 LearningRate 0.0733 Epoch: 2 Global Step: 47930 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:26:35,835-Speed 5210.48 samples/sec Loss 4.9454 LearningRate 0.0733 Epoch: 2 Global Step: 47940 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:26:37,831-Speed 5131.65 samples/sec Loss 4.8902 LearningRate 0.0733 Epoch: 2 Global Step: 47950 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:26:39,813-Speed 5169.50 samples/sec Loss 4.7559 LearningRate 0.0733 Epoch: 2 Global Step: 47960 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:26:41,819-Speed 5104.37 samples/sec Loss 4.8120 LearningRate 0.0733 Epoch: 2 Global Step: 47970 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:26:43,788-Speed 5202.40 samples/sec Loss 4.7135 LearningRate 0.0733 Epoch: 2 Global Step: 47980 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:26:45,774-Speed 5159.06 samples/sec Loss 4.8116 LearningRate 0.0733 Epoch: 2 Global Step: 47990 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:26:47,754-Speed 5172.29 samples/sec Loss 4.8031 LearningRate 0.0733 Epoch: 2 Global Step: 48000 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:27:14,259-[lfw][48000]XNorm: 22.398680 Training: 2022-04-11 02:27:14,260-[lfw][48000]Accuracy-Flip: 0.99700+-0.00267 Training: 2022-04-11 02:27:14,260-[lfw][48000]Accuracy-Highest: 0.99783 Training: 2022-04-11 02:27:44,907-[cfp_fp][48000]XNorm: 20.390997 Training: 2022-04-11 02:27:44,908-[cfp_fp][48000]Accuracy-Flip: 0.97443+-0.00778 Training: 2022-04-11 02:27:44,909-[cfp_fp][48000]Accuracy-Highest: 0.97871 Training: 2022-04-11 02:28:11,416-[agedb_30][48000]XNorm: 22.269919 Training: 2022-04-11 02:28:11,417-[agedb_30][48000]Accuracy-Flip: 0.97417+-0.00864 Training: 2022-04-11 02:28:11,417-[agedb_30][48000]Accuracy-Highest: 0.97583 Training: 2022-04-11 02:28:13,404-Speed 119.56 samples/sec Loss 4.7382 LearningRate 0.0733 Epoch: 2 Global Step: 48010 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:28:15,373-Speed 5204.71 samples/sec Loss 4.8405 LearningRate 0.0733 Epoch: 2 Global Step: 48020 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:28:17,334-Speed 5223.67 samples/sec Loss 4.7840 LearningRate 0.0733 Epoch: 2 Global Step: 48030 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:28:19,297-Speed 5218.25 samples/sec Loss 4.8587 LearningRate 0.0733 Epoch: 2 Global Step: 48040 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:28:21,287-Speed 5146.86 samples/sec Loss 4.8335 LearningRate 0.0733 Epoch: 2 Global Step: 48050 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:28:23,256-Speed 5203.37 samples/sec Loss 4.8606 LearningRate 0.0733 Epoch: 2 Global Step: 48060 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:28:25,229-Speed 5191.64 samples/sec Loss 4.8352 LearningRate 0.0733 Epoch: 2 Global Step: 48070 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:28:27,195-Speed 5210.03 samples/sec Loss 4.8606 LearningRate 0.0733 Epoch: 2 Global Step: 48080 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:28:29,164-Speed 5200.08 samples/sec Loss 4.7328 LearningRate 0.0733 Epoch: 2 Global Step: 48090 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:28:31,136-Speed 5194.31 samples/sec Loss 4.8534 LearningRate 0.0733 Epoch: 2 Global Step: 48100 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:28:33,109-Speed 5192.57 samples/sec Loss 4.7936 LearningRate 0.0733 Epoch: 2 Global Step: 48110 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:28:35,107-Speed 5127.45 samples/sec Loss 4.8545 LearningRate 0.0732 Epoch: 2 Global Step: 48120 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:28:37,080-Speed 5191.10 samples/sec Loss 4.8377 LearningRate 0.0732 Epoch: 2 Global Step: 48130 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:28:39,052-Speed 5194.58 samples/sec Loss 4.7698 LearningRate 0.0732 Epoch: 2 Global Step: 48140 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:28:41,034-Speed 5167.94 samples/sec Loss 4.8376 LearningRate 0.0732 Epoch: 2 Global Step: 48150 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:28:43,028-Speed 5138.64 samples/sec Loss 4.8843 LearningRate 0.0732 Epoch: 2 Global Step: 48160 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:28:45,010-Speed 5167.17 samples/sec Loss 4.8661 LearningRate 0.0732 Epoch: 2 Global Step: 48170 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:28:46,975-Speed 5213.29 samples/sec Loss 4.8194 LearningRate 0.0732 Epoch: 2 Global Step: 48180 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:28:48,957-Speed 5168.10 samples/sec Loss 4.8390 LearningRate 0.0732 Epoch: 2 Global Step: 48190 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:28:50,947-Speed 5149.03 samples/sec Loss 4.8888 LearningRate 0.0732 Epoch: 2 Global Step: 48200 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:28:52,913-Speed 5210.08 samples/sec Loss 4.7549 LearningRate 0.0732 Epoch: 2 Global Step: 48210 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:28:54,888-Speed 5184.98 samples/sec Loss 4.8356 LearningRate 0.0732 Epoch: 2 Global Step: 48220 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:28:56,859-Speed 5198.81 samples/sec Loss 4.8425 LearningRate 0.0732 Epoch: 2 Global Step: 48230 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:28:58,845-Speed 5157.37 samples/sec Loss 4.8249 LearningRate 0.0732 Epoch: 2 Global Step: 48240 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:29:00,839-Speed 5137.28 samples/sec Loss 4.7458 LearningRate 0.0732 Epoch: 2 Global Step: 48250 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:29:02,829-Speed 5147.63 samples/sec Loss 4.8930 LearningRate 0.0732 Epoch: 2 Global Step: 48260 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:29:04,813-Speed 5163.51 samples/sec Loss 4.8469 LearningRate 0.0732 Epoch: 2 Global Step: 48270 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:29:06,793-Speed 5173.02 samples/sec Loss 4.8535 LearningRate 0.0732 Epoch: 2 Global Step: 48280 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:29:08,758-Speed 5211.94 samples/sec Loss 4.8561 LearningRate 0.0732 Epoch: 2 Global Step: 48290 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:29:10,737-Speed 5175.69 samples/sec Loss 4.7502 LearningRate 0.0732 Epoch: 2 Global Step: 48300 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:29:12,712-Speed 5186.66 samples/sec Loss 4.8297 LearningRate 0.0732 Epoch: 2 Global Step: 48310 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:29:14,699-Speed 5155.68 samples/sec Loss 4.7882 LearningRate 0.0731 Epoch: 2 Global Step: 48320 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:29:16,686-Speed 5155.71 samples/sec Loss 4.8875 LearningRate 0.0731 Epoch: 2 Global Step: 48330 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:29:18,662-Speed 5183.62 samples/sec Loss 4.6927 LearningRate 0.0731 Epoch: 2 Global Step: 48340 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:29:20,630-Speed 5203.37 samples/sec Loss 4.8127 LearningRate 0.0731 Epoch: 2 Global Step: 48350 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:29:22,621-Speed 5144.71 samples/sec Loss 4.9061 LearningRate 0.0731 Epoch: 2 Global Step: 48360 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:29:24,606-Speed 5162.97 samples/sec Loss 4.8172 LearningRate 0.0731 Epoch: 2 Global Step: 48370 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:29:26,578-Speed 5193.88 samples/sec Loss 4.6339 LearningRate 0.0731 Epoch: 2 Global Step: 48380 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:29:28,575-Speed 5129.02 samples/sec Loss 4.7373 LearningRate 0.0731 Epoch: 2 Global Step: 48390 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:29:30,558-Speed 5166.61 samples/sec Loss 4.8368 LearningRate 0.0731 Epoch: 2 Global Step: 48400 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:29:32,536-Speed 5179.46 samples/sec Loss 4.7872 LearningRate 0.0731 Epoch: 2 Global Step: 48410 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:29:34,515-Speed 5173.76 samples/sec Loss 4.8773 LearningRate 0.0731 Epoch: 2 Global Step: 48420 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:29:36,488-Speed 5194.01 samples/sec Loss 4.7436 LearningRate 0.0731 Epoch: 2 Global Step: 48430 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:29:38,480-Speed 5141.35 samples/sec Loss 4.7454 LearningRate 0.0731 Epoch: 2 Global Step: 48440 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:29:40,460-Speed 5172.40 samples/sec Loss 4.8299 LearningRate 0.0731 Epoch: 2 Global Step: 48450 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:29:42,462-Speed 5117.91 samples/sec Loss 4.8043 LearningRate 0.0731 Epoch: 2 Global Step: 48460 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:29:44,461-Speed 5124.51 samples/sec Loss 4.7346 LearningRate 0.0731 Epoch: 2 Global Step: 48470 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:29:46,435-Speed 5189.79 samples/sec Loss 4.8436 LearningRate 0.0731 Epoch: 2 Global Step: 48480 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:29:48,434-Speed 5123.23 samples/sec Loss 4.8465 LearningRate 0.0731 Epoch: 2 Global Step: 48490 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:29:50,438-Speed 5111.74 samples/sec Loss 4.8809 LearningRate 0.0731 Epoch: 2 Global Step: 48500 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:29:52,419-Speed 5172.05 samples/sec Loss 4.8084 LearningRate 0.0730 Epoch: 2 Global Step: 48510 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:29:54,385-Speed 5208.02 samples/sec Loss 4.8312 LearningRate 0.0730 Epoch: 2 Global Step: 48520 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:29:56,358-Speed 5193.12 samples/sec Loss 4.7058 LearningRate 0.0730 Epoch: 2 Global Step: 48530 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:29:58,342-Speed 5161.84 samples/sec Loss 4.8579 LearningRate 0.0730 Epoch: 2 Global Step: 48540 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:30:00,336-Speed 5136.96 samples/sec Loss 4.8195 LearningRate 0.0730 Epoch: 2 Global Step: 48550 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:30:02,321-Speed 5159.90 samples/sec Loss 4.8672 LearningRate 0.0730 Epoch: 2 Global Step: 48560 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:30:04,317-Speed 5133.04 samples/sec Loss 4.8081 LearningRate 0.0730 Epoch: 2 Global Step: 48570 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:30:06,293-Speed 5183.56 samples/sec Loss 4.7831 LearningRate 0.0730 Epoch: 2 Global Step: 48580 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:30:08,264-Speed 5197.13 samples/sec Loss 4.7473 LearningRate 0.0730 Epoch: 2 Global Step: 48590 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:30:10,229-Speed 5213.49 samples/sec Loss 4.7909 LearningRate 0.0730 Epoch: 2 Global Step: 48600 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:30:12,198-Speed 5201.70 samples/sec Loss 4.8276 LearningRate 0.0730 Epoch: 2 Global Step: 48610 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:30:14,165-Speed 5207.57 samples/sec Loss 4.9265 LearningRate 0.0730 Epoch: 2 Global Step: 48620 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:30:16,144-Speed 5177.69 samples/sec Loss 4.7634 LearningRate 0.0730 Epoch: 2 Global Step: 48630 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:30:18,111-Speed 5207.03 samples/sec Loss 4.8161 LearningRate 0.0730 Epoch: 2 Global Step: 48640 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:30:20,107-Speed 5130.59 samples/sec Loss 4.8639 LearningRate 0.0730 Epoch: 2 Global Step: 48650 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:30:22,104-Speed 5130.94 samples/sec Loss 4.8164 LearningRate 0.0730 Epoch: 2 Global Step: 48660 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:30:24,093-Speed 5149.16 samples/sec Loss 4.7989 LearningRate 0.0730 Epoch: 2 Global Step: 48670 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:30:26,076-Speed 5166.69 samples/sec Loss 4.8081 LearningRate 0.0730 Epoch: 2 Global Step: 48680 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:30:28,052-Speed 5183.52 samples/sec Loss 4.6876 LearningRate 0.0730 Epoch: 2 Global Step: 48690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:30:30,021-Speed 5202.65 samples/sec Loss 4.8312 LearningRate 0.0730 Epoch: 2 Global Step: 48700 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:30:31,991-Speed 5198.84 samples/sec Loss 4.8968 LearningRate 0.0729 Epoch: 2 Global Step: 48710 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:30:33,961-Speed 5200.32 samples/sec Loss 4.8021 LearningRate 0.0729 Epoch: 2 Global Step: 48720 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:30:35,931-Speed 5200.78 samples/sec Loss 4.7771 LearningRate 0.0729 Epoch: 2 Global Step: 48730 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:30:37,897-Speed 5208.36 samples/sec Loss 4.7624 LearningRate 0.0729 Epoch: 2 Global Step: 48740 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:30:39,873-Speed 5186.09 samples/sec Loss 4.8610 LearningRate 0.0729 Epoch: 2 Global Step: 48750 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:30:41,852-Speed 5174.82 samples/sec Loss 4.7202 LearningRate 0.0729 Epoch: 2 Global Step: 48760 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:30:43,817-Speed 5213.48 samples/sec Loss 4.7458 LearningRate 0.0729 Epoch: 2 Global Step: 48770 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:30:45,811-Speed 5137.08 samples/sec Loss 4.8257 LearningRate 0.0729 Epoch: 2 Global Step: 48780 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:30:47,814-Speed 5114.02 samples/sec Loss 4.6977 LearningRate 0.0729 Epoch: 2 Global Step: 48790 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:30:49,791-Speed 5182.72 samples/sec Loss 4.8482 LearningRate 0.0729 Epoch: 2 Global Step: 48800 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:30:51,767-Speed 5183.46 samples/sec Loss 4.8477 LearningRate 0.0729 Epoch: 2 Global Step: 48810 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:30:53,734-Speed 5207.27 samples/sec Loss 4.8107 LearningRate 0.0729 Epoch: 2 Global Step: 48820 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:30:55,709-Speed 5186.57 samples/sec Loss 4.8604 LearningRate 0.0729 Epoch: 2 Global Step: 48830 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:30:57,698-Speed 5151.20 samples/sec Loss 4.7183 LearningRate 0.0729 Epoch: 2 Global Step: 48840 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:30:59,668-Speed 5198.34 samples/sec Loss 4.8037 LearningRate 0.0729 Epoch: 2 Global Step: 48850 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:31:01,641-Speed 5191.87 samples/sec Loss 4.7299 LearningRate 0.0729 Epoch: 2 Global Step: 48860 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:31:03,623-Speed 5168.46 samples/sec Loss 4.8202 LearningRate 0.0729 Epoch: 2 Global Step: 48870 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:31:05,612-Speed 5149.31 samples/sec Loss 4.8432 LearningRate 0.0729 Epoch: 2 Global Step: 48880 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:31:07,589-Speed 5182.23 samples/sec Loss 4.8349 LearningRate 0.0729 Epoch: 2 Global Step: 48890 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:31:09,581-Speed 5143.39 samples/sec Loss 4.8048 LearningRate 0.0728 Epoch: 2 Global Step: 48900 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:31:11,570-Speed 5148.61 samples/sec Loss 4.8382 LearningRate 0.0728 Epoch: 2 Global Step: 48910 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:31:13,555-Speed 5160.34 samples/sec Loss 4.7562 LearningRate 0.0728 Epoch: 2 Global Step: 48920 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:31:15,543-Speed 5151.93 samples/sec Loss 4.7261 LearningRate 0.0728 Epoch: 2 Global Step: 48930 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:31:17,524-Speed 5172.46 samples/sec Loss 4.8218 LearningRate 0.0728 Epoch: 2 Global Step: 48940 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:31:19,506-Speed 5168.80 samples/sec Loss 4.7992 LearningRate 0.0728 Epoch: 2 Global Step: 48950 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:31:21,490-Speed 5162.49 samples/sec Loss 4.7700 LearningRate 0.0728 Epoch: 2 Global Step: 48960 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:31:23,474-Speed 5162.37 samples/sec Loss 4.7360 LearningRate 0.0728 Epoch: 2 Global Step: 48970 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:31:25,452-Speed 5179.78 samples/sec Loss 4.8483 LearningRate 0.0728 Epoch: 2 Global Step: 48980 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:31:27,430-Speed 5176.80 samples/sec Loss 4.7852 LearningRate 0.0728 Epoch: 2 Global Step: 48990 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:31:29,426-Speed 5131.11 samples/sec Loss 4.8509 LearningRate 0.0728 Epoch: 2 Global Step: 49000 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:31:31,412-Speed 5160.21 samples/sec Loss 4.7552 LearningRate 0.0728 Epoch: 2 Global Step: 49010 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:31:33,395-Speed 5166.10 samples/sec Loss 4.7645 LearningRate 0.0728 Epoch: 2 Global Step: 49020 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:31:35,364-Speed 5200.45 samples/sec Loss 4.7420 LearningRate 0.0728 Epoch: 2 Global Step: 49030 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:31:37,342-Speed 5180.46 samples/sec Loss 4.7479 LearningRate 0.0728 Epoch: 2 Global Step: 49040 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:31:39,317-Speed 5187.07 samples/sec Loss 4.7972 LearningRate 0.0728 Epoch: 2 Global Step: 49050 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:31:41,290-Speed 5189.78 samples/sec Loss 4.7729 LearningRate 0.0728 Epoch: 2 Global Step: 49060 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:31:43,257-Speed 5207.34 samples/sec Loss 4.8020 LearningRate 0.0728 Epoch: 2 Global Step: 49070 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:31:45,237-Speed 5173.79 samples/sec Loss 4.7538 LearningRate 0.0728 Epoch: 2 Global Step: 49080 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:31:47,201-Speed 5215.38 samples/sec Loss 4.7465 LearningRate 0.0728 Epoch: 2 Global Step: 49090 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:31:49,180-Speed 5177.95 samples/sec Loss 4.8441 LearningRate 0.0727 Epoch: 2 Global Step: 49100 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:31:51,170-Speed 5147.95 samples/sec Loss 4.7721 LearningRate 0.0727 Epoch: 2 Global Step: 49110 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:31:53,152-Speed 5166.36 samples/sec Loss 4.8224 LearningRate 0.0727 Epoch: 2 Global Step: 49120 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:31:55,125-Speed 5192.02 samples/sec Loss 4.8698 LearningRate 0.0727 Epoch: 2 Global Step: 49130 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:31:57,112-Speed 5156.99 samples/sec Loss 4.8627 LearningRate 0.0727 Epoch: 2 Global Step: 49140 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:31:59,104-Speed 5141.21 samples/sec Loss 4.7890 LearningRate 0.0727 Epoch: 2 Global Step: 49150 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:32:01,085-Speed 5170.51 samples/sec Loss 4.7835 LearningRate 0.0727 Epoch: 2 Global Step: 49160 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:32:03,080-Speed 5134.99 samples/sec Loss 4.8035 LearningRate 0.0727 Epoch: 2 Global Step: 49170 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:32:05,055-Speed 5185.21 samples/sec Loss 4.7890 LearningRate 0.0727 Epoch: 2 Global Step: 49180 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:32:07,031-Speed 5185.48 samples/sec Loss 4.8540 LearningRate 0.0727 Epoch: 2 Global Step: 49190 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:32:09,014-Speed 5165.79 samples/sec Loss 4.7446 LearningRate 0.0727 Epoch: 2 Global Step: 49200 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:32:11,005-Speed 5143.86 samples/sec Loss 4.8121 LearningRate 0.0727 Epoch: 2 Global Step: 49210 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:32:12,984-Speed 5177.46 samples/sec Loss 4.6915 LearningRate 0.0727 Epoch: 2 Global Step: 49220 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:32:14,989-Speed 5108.04 samples/sec Loss 4.7639 LearningRate 0.0727 Epoch: 2 Global Step: 49230 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:32:16,972-Speed 5167.79 samples/sec Loss 4.8283 LearningRate 0.0727 Epoch: 2 Global Step: 49240 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:32:18,948-Speed 5183.95 samples/sec Loss 4.7314 LearningRate 0.0727 Epoch: 2 Global Step: 49250 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:32:20,951-Speed 5112.97 samples/sec Loss 4.7735 LearningRate 0.0727 Epoch: 2 Global Step: 49260 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:32:22,939-Speed 5152.43 samples/sec Loss 4.8773 LearningRate 0.0727 Epoch: 2 Global Step: 49270 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:32:24,967-Speed 5052.36 samples/sec Loss 4.7799 LearningRate 0.0727 Epoch: 2 Global Step: 49280 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:32:26,950-Speed 5166.04 samples/sec Loss 4.8044 LearningRate 0.0726 Epoch: 2 Global Step: 49290 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:32:28,942-Speed 5140.86 samples/sec Loss 4.7939 LearningRate 0.0726 Epoch: 2 Global Step: 49300 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:32:30,920-Speed 5179.89 samples/sec Loss 4.7238 LearningRate 0.0726 Epoch: 2 Global Step: 49310 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:32:32,912-Speed 5141.83 samples/sec Loss 4.7837 LearningRate 0.0726 Epoch: 2 Global Step: 49320 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:32:34,883-Speed 5196.45 samples/sec Loss 4.7792 LearningRate 0.0726 Epoch: 2 Global Step: 49330 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:32:36,873-Speed 5146.89 samples/sec Loss 4.7492 LearningRate 0.0726 Epoch: 2 Global Step: 49340 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:32:38,854-Speed 5172.54 samples/sec Loss 4.8043 LearningRate 0.0726 Epoch: 2 Global Step: 49350 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:32:40,831-Speed 5181.42 samples/sec Loss 4.8160 LearningRate 0.0726 Epoch: 2 Global Step: 49360 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:32:42,816-Speed 5160.33 samples/sec Loss 4.7482 LearningRate 0.0726 Epoch: 2 Global Step: 49370 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:32:44,789-Speed 5191.69 samples/sec Loss 4.8248 LearningRate 0.0726 Epoch: 2 Global Step: 49380 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:32:46,767-Speed 5176.26 samples/sec Loss 4.8241 LearningRate 0.0726 Epoch: 2 Global Step: 49390 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:32:48,756-Speed 5152.76 samples/sec Loss 4.8301 LearningRate 0.0726 Epoch: 2 Global Step: 49400 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:32:50,745-Speed 5149.04 samples/sec Loss 4.8275 LearningRate 0.0726 Epoch: 2 Global Step: 49410 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:32:52,725-Speed 5173.92 samples/sec Loss 4.8558 LearningRate 0.0726 Epoch: 2 Global Step: 49420 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:32:54,711-Speed 5158.19 samples/sec Loss 4.7447 LearningRate 0.0726 Epoch: 2 Global Step: 49430 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:32:56,683-Speed 5193.33 samples/sec Loss 4.7974 LearningRate 0.0726 Epoch: 2 Global Step: 49440 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:32:58,674-Speed 5146.46 samples/sec Loss 4.7731 LearningRate 0.0726 Epoch: 2 Global Step: 49450 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:33:00,651-Speed 5181.52 samples/sec Loss 4.7899 LearningRate 0.0726 Epoch: 2 Global Step: 49460 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:33:02,631-Speed 5171.37 samples/sec Loss 4.6490 LearningRate 0.0726 Epoch: 2 Global Step: 49470 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:33:04,610-Speed 5177.44 samples/sec Loss 4.7644 LearningRate 0.0726 Epoch: 2 Global Step: 49480 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:33:06,573-Speed 5217.58 samples/sec Loss 4.7550 LearningRate 0.0725 Epoch: 2 Global Step: 49490 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:33:08,559-Speed 5156.72 samples/sec Loss 4.7836 LearningRate 0.0725 Epoch: 2 Global Step: 49500 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:33:10,540-Speed 5171.88 samples/sec Loss 4.7119 LearningRate 0.0725 Epoch: 2 Global Step: 49510 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:33:12,517-Speed 5180.81 samples/sec Loss 4.7774 LearningRate 0.0725 Epoch: 2 Global Step: 49520 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:33:14,516-Speed 5125.96 samples/sec Loss 4.8587 LearningRate 0.0725 Epoch: 2 Global Step: 49530 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:33:16,505-Speed 5150.02 samples/sec Loss 4.7581 LearningRate 0.0725 Epoch: 2 Global Step: 49540 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:33:18,491-Speed 5157.83 samples/sec Loss 4.8300 LearningRate 0.0725 Epoch: 2 Global Step: 49550 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:33:20,460-Speed 5201.59 samples/sec Loss 4.7984 LearningRate 0.0725 Epoch: 2 Global Step: 49560 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:33:22,452-Speed 5143.67 samples/sec Loss 4.8002 LearningRate 0.0725 Epoch: 2 Global Step: 49570 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:33:24,441-Speed 5149.42 samples/sec Loss 4.7314 LearningRate 0.0725 Epoch: 2 Global Step: 49580 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:33:26,432-Speed 5143.24 samples/sec Loss 4.7704 LearningRate 0.0725 Epoch: 2 Global Step: 49590 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:33:28,420-Speed 5154.79 samples/sec Loss 4.8159 LearningRate 0.0725 Epoch: 2 Global Step: 49600 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:33:30,408-Speed 5151.59 samples/sec Loss 4.8427 LearningRate 0.0725 Epoch: 2 Global Step: 49610 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:33:32,393-Speed 5160.67 samples/sec Loss 4.7573 LearningRate 0.0725 Epoch: 2 Global Step: 49620 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:33:34,387-Speed 5136.36 samples/sec Loss 4.7692 LearningRate 0.0725 Epoch: 2 Global Step: 49630 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:33:36,368-Speed 5171.47 samples/sec Loss 4.7481 LearningRate 0.0725 Epoch: 2 Global Step: 49640 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:33:38,363-Speed 5134.50 samples/sec Loss 4.8148 LearningRate 0.0725 Epoch: 2 Global Step: 49650 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:33:40,344-Speed 5171.57 samples/sec Loss 4.7617 LearningRate 0.0725 Epoch: 2 Global Step: 49660 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:33:42,319-Speed 5187.30 samples/sec Loss 4.7836 LearningRate 0.0725 Epoch: 2 Global Step: 49670 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:33:44,302-Speed 5164.51 samples/sec Loss 4.7578 LearningRate 0.0725 Epoch: 2 Global Step: 49680 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:33:46,280-Speed 5177.08 samples/sec Loss 4.7934 LearningRate 0.0724 Epoch: 2 Global Step: 49690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:33:48,261-Speed 5172.53 samples/sec Loss 4.7609 LearningRate 0.0724 Epoch: 2 Global Step: 49700 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:33:50,250-Speed 5150.12 samples/sec Loss 4.8122 LearningRate 0.0724 Epoch: 2 Global Step: 49710 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:33:52,229-Speed 5176.25 samples/sec Loss 4.8207 LearningRate 0.0724 Epoch: 2 Global Step: 49720 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:33:54,219-Speed 5146.05 samples/sec Loss 4.7996 LearningRate 0.0724 Epoch: 2 Global Step: 49730 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:33:56,202-Speed 5167.20 samples/sec Loss 4.8461 LearningRate 0.0724 Epoch: 2 Global Step: 49740 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:33:58,192-Speed 5145.72 samples/sec Loss 4.7067 LearningRate 0.0724 Epoch: 2 Global Step: 49750 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:34:00,169-Speed 5182.30 samples/sec Loss 4.7668 LearningRate 0.0724 Epoch: 2 Global Step: 49760 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:34:02,146-Speed 5180.63 samples/sec Loss 4.7283 LearningRate 0.0724 Epoch: 2 Global Step: 49770 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:34:04,118-Speed 5195.51 samples/sec Loss 4.7868 LearningRate 0.0724 Epoch: 2 Global Step: 49780 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:34:06,090-Speed 5193.88 samples/sec Loss 4.8107 LearningRate 0.0724 Epoch: 2 Global Step: 49790 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:34:08,059-Speed 5200.89 samples/sec Loss 4.7707 LearningRate 0.0724 Epoch: 2 Global Step: 49800 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:34:10,035-Speed 5185.85 samples/sec Loss 4.7667 LearningRate 0.0724 Epoch: 2 Global Step: 49810 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:34:12,009-Speed 5190.18 samples/sec Loss 4.7891 LearningRate 0.0724 Epoch: 2 Global Step: 49820 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:34:13,979-Speed 5199.51 samples/sec Loss 4.7817 LearningRate 0.0724 Epoch: 2 Global Step: 49830 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:34:15,982-Speed 5112.61 samples/sec Loss 4.7956 LearningRate 0.0724 Epoch: 2 Global Step: 49840 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:34:17,951-Speed 5204.17 samples/sec Loss 4.6604 LearningRate 0.0724 Epoch: 2 Global Step: 49850 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:34:19,923-Speed 5194.41 samples/sec Loss 4.7578 LearningRate 0.0724 Epoch: 2 Global Step: 49860 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:34:21,912-Speed 5151.04 samples/sec Loss 4.6544 LearningRate 0.0724 Epoch: 2 Global Step: 49870 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:34:23,910-Speed 5125.74 samples/sec Loss 4.7386 LearningRate 0.0723 Epoch: 2 Global Step: 49880 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:34:25,882-Speed 5195.46 samples/sec Loss 4.7554 LearningRate 0.0723 Epoch: 2 Global Step: 49890 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:34:27,859-Speed 5181.55 samples/sec Loss 4.6916 LearningRate 0.0723 Epoch: 2 Global Step: 49900 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:34:29,835-Speed 5181.89 samples/sec Loss 4.7501 LearningRate 0.0723 Epoch: 2 Global Step: 49910 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:34:31,813-Speed 5178.33 samples/sec Loss 4.7306 LearningRate 0.0723 Epoch: 2 Global Step: 49920 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:34:33,790-Speed 5183.13 samples/sec Loss 4.7687 LearningRate 0.0723 Epoch: 2 Global Step: 49930 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:34:35,779-Speed 5147.99 samples/sec Loss 4.7412 LearningRate 0.0723 Epoch: 2 Global Step: 49940 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:34:37,768-Speed 5150.45 samples/sec Loss 4.7787 LearningRate 0.0723 Epoch: 2 Global Step: 49950 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:34:39,773-Speed 5111.47 samples/sec Loss 4.7692 LearningRate 0.0723 Epoch: 2 Global Step: 49960 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:34:41,743-Speed 5198.20 samples/sec Loss 4.7533 LearningRate 0.0723 Epoch: 2 Global Step: 49970 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:34:43,714-Speed 5196.72 samples/sec Loss 4.7578 LearningRate 0.0723 Epoch: 2 Global Step: 49980 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:34:45,702-Speed 5153.60 samples/sec Loss 4.7936 LearningRate 0.0723 Epoch: 2 Global Step: 49990 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:34:47,681-Speed 5176.50 samples/sec Loss 4.7890 LearningRate 0.0723 Epoch: 2 Global Step: 50000 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:35:14,320-[lfw][50000]XNorm: 23.456953 Training: 2022-04-11 02:35:14,320-[lfw][50000]Accuracy-Flip: 0.99783+-0.00269 Training: 2022-04-11 02:35:14,320-[lfw][50000]Accuracy-Highest: 0.99783 Training: 2022-04-11 02:35:45,320-[cfp_fp][50000]XNorm: 21.425725 Training: 2022-04-11 02:35:45,321-[cfp_fp][50000]Accuracy-Flip: 0.97714+-0.00639 Training: 2022-04-11 02:35:45,321-[cfp_fp][50000]Accuracy-Highest: 0.97871 Training: 2022-04-11 02:36:11,805-[agedb_30][50000]XNorm: 23.448377 Training: 2022-04-11 02:36:11,806-[agedb_30][50000]Accuracy-Flip: 0.97550+-0.00895 Training: 2022-04-11 02:36:11,806-[agedb_30][50000]Accuracy-Highest: 0.97583 Training: 2022-04-11 02:36:13,784-Speed 118.93 samples/sec Loss 4.8420 LearningRate 0.0723 Epoch: 2 Global Step: 50010 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:36:15,757-Speed 5192.06 samples/sec Loss 4.7402 LearningRate 0.0723 Epoch: 2 Global Step: 50020 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:36:17,732-Speed 5186.56 samples/sec Loss 4.8558 LearningRate 0.0723 Epoch: 2 Global Step: 50030 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:36:19,697-Speed 5212.90 samples/sec Loss 4.7309 LearningRate 0.0723 Epoch: 2 Global Step: 50040 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:36:21,678-Speed 5171.29 samples/sec Loss 4.7794 LearningRate 0.0723 Epoch: 2 Global Step: 50050 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:36:23,652-Speed 5187.97 samples/sec Loss 4.6634 LearningRate 0.0723 Epoch: 2 Global Step: 50060 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:36:25,839-Speed 4683.60 samples/sec Loss 4.6715 LearningRate 0.0723 Epoch: 2 Global Step: 50070 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:36:55,311-Speed 347.46 samples/sec Loss 4.2366 LearningRate 0.0722 Epoch: 3 Global Step: 50080 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:36:57,297-Speed 5159.97 samples/sec Loss 4.0908 LearningRate 0.0722 Epoch: 3 Global Step: 50090 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:36:59,278-Speed 5170.16 samples/sec Loss 4.1118 LearningRate 0.0722 Epoch: 3 Global Step: 50100 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:37:01,282-Speed 5111.69 samples/sec Loss 4.0236 LearningRate 0.0722 Epoch: 3 Global Step: 50110 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:37:03,249-Speed 5207.74 samples/sec Loss 4.0680 LearningRate 0.0722 Epoch: 3 Global Step: 50120 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:37:05,261-Speed 5091.00 samples/sec Loss 4.0982 LearningRate 0.0722 Epoch: 3 Global Step: 50130 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:37:07,234-Speed 5191.17 samples/sec Loss 4.0287 LearningRate 0.0722 Epoch: 3 Global Step: 50140 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:37:09,222-Speed 5153.49 samples/sec Loss 4.0998 LearningRate 0.0722 Epoch: 3 Global Step: 50150 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:37:11,219-Speed 5129.59 samples/sec Loss 4.0771 LearningRate 0.0722 Epoch: 3 Global Step: 50160 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:37:13,204-Speed 5158.75 samples/sec Loss 4.1343 LearningRate 0.0722 Epoch: 3 Global Step: 50170 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:37:15,192-Speed 5154.13 samples/sec Loss 4.1037 LearningRate 0.0722 Epoch: 3 Global Step: 50180 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:37:17,201-Speed 5098.45 samples/sec Loss 4.0432 LearningRate 0.0722 Epoch: 3 Global Step: 50190 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:37:19,190-Speed 5149.81 samples/sec Loss 4.0906 LearningRate 0.0722 Epoch: 3 Global Step: 50200 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:37:21,166-Speed 5184.89 samples/sec Loss 4.0144 LearningRate 0.0722 Epoch: 3 Global Step: 50210 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:37:23,140-Speed 5188.78 samples/sec Loss 4.0895 LearningRate 0.0722 Epoch: 3 Global Step: 50220 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:37:25,134-Speed 5138.11 samples/sec Loss 4.0508 LearningRate 0.0722 Epoch: 3 Global Step: 50230 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:37:27,311-Speed 4705.14 samples/sec Loss 4.1310 LearningRate 0.0722 Epoch: 3 Global Step: 50240 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:37:29,287-Speed 5182.05 samples/sec Loss 4.1143 LearningRate 0.0722 Epoch: 3 Global Step: 50250 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:37:31,259-Speed 5194.61 samples/sec Loss 4.1506 LearningRate 0.0722 Epoch: 3 Global Step: 50260 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:37:33,233-Speed 5189.90 samples/sec Loss 4.0351 LearningRate 0.0721 Epoch: 3 Global Step: 50270 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:37:35,224-Speed 5143.55 samples/sec Loss 4.0871 LearningRate 0.0721 Epoch: 3 Global Step: 50280 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:37:37,209-Speed 5161.27 samples/sec Loss 4.1019 LearningRate 0.0721 Epoch: 3 Global Step: 50290 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:37:39,204-Speed 5133.87 samples/sec Loss 4.0829 LearningRate 0.0721 Epoch: 3 Global Step: 50300 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:37:41,206-Speed 5117.16 samples/sec Loss 4.0898 LearningRate 0.0721 Epoch: 3 Global Step: 50310 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:37:43,181-Speed 5187.56 samples/sec Loss 4.0511 LearningRate 0.0721 Epoch: 3 Global Step: 50320 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:37:45,172-Speed 5143.86 samples/sec Loss 4.1681 LearningRate 0.0721 Epoch: 3 Global Step: 50330 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:37:47,188-Speed 5082.07 samples/sec Loss 4.1649 LearningRate 0.0721 Epoch: 3 Global Step: 50340 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:37:49,175-Speed 5155.92 samples/sec Loss 4.0605 LearningRate 0.0721 Epoch: 3 Global Step: 50350 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:37:51,184-Speed 5098.68 samples/sec Loss 4.1837 LearningRate 0.0721 Epoch: 3 Global Step: 50360 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:37:53,180-Speed 5130.73 samples/sec Loss 4.1953 LearningRate 0.0721 Epoch: 3 Global Step: 50370 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:37:55,156-Speed 5184.15 samples/sec Loss 4.1398 LearningRate 0.0721 Epoch: 3 Global Step: 50380 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:37:57,143-Speed 5155.41 samples/sec Loss 4.1800 LearningRate 0.0721 Epoch: 3 Global Step: 50390 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:37:59,140-Speed 5129.23 samples/sec Loss 4.0776 LearningRate 0.0721 Epoch: 3 Global Step: 50400 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:38:01,129-Speed 5149.84 samples/sec Loss 4.1767 LearningRate 0.0721 Epoch: 3 Global Step: 50410 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:38:03,124-Speed 5134.43 samples/sec Loss 4.1645 LearningRate 0.0721 Epoch: 3 Global Step: 50420 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:38:05,119-Speed 5136.28 samples/sec Loss 4.0860 LearningRate 0.0721 Epoch: 3 Global Step: 50430 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:38:07,109-Speed 5145.72 samples/sec Loss 4.1659 LearningRate 0.0721 Epoch: 3 Global Step: 50440 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:38:09,104-Speed 5133.90 samples/sec Loss 4.0731 LearningRate 0.0721 Epoch: 3 Global Step: 50450 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:38:11,090-Speed 5160.46 samples/sec Loss 4.0592 LearningRate 0.0721 Epoch: 3 Global Step: 50460 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:38:13,084-Speed 5137.08 samples/sec Loss 4.1288 LearningRate 0.0720 Epoch: 3 Global Step: 50470 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:38:15,070-Speed 5155.12 samples/sec Loss 4.0535 LearningRate 0.0720 Epoch: 3 Global Step: 50480 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:38:17,069-Speed 5124.56 samples/sec Loss 4.0986 LearningRate 0.0720 Epoch: 3 Global Step: 50490 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:38:19,055-Speed 5158.07 samples/sec Loss 4.2026 LearningRate 0.0720 Epoch: 3 Global Step: 50500 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:38:21,036-Speed 5171.40 samples/sec Loss 4.0821 LearningRate 0.0720 Epoch: 3 Global Step: 50510 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:38:23,014-Speed 5180.21 samples/sec Loss 4.2414 LearningRate 0.0720 Epoch: 3 Global Step: 50520 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:38:25,021-Speed 5103.22 samples/sec Loss 4.2037 LearningRate 0.0720 Epoch: 3 Global Step: 50530 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:38:27,435-Speed 4244.18 samples/sec Loss 4.1309 LearningRate 0.0720 Epoch: 3 Global Step: 50540 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:38:29,409-Speed 5187.45 samples/sec Loss 4.1621 LearningRate 0.0720 Epoch: 3 Global Step: 50550 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:38:31,386-Speed 5181.92 samples/sec Loss 4.1498 LearningRate 0.0720 Epoch: 3 Global Step: 50560 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:38:33,363-Speed 5179.82 samples/sec Loss 4.1917 LearningRate 0.0720 Epoch: 3 Global Step: 50570 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:38:35,334-Speed 5198.96 samples/sec Loss 4.1509 LearningRate 0.0720 Epoch: 3 Global Step: 50580 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:38:37,326-Speed 5140.92 samples/sec Loss 4.1849 LearningRate 0.0720 Epoch: 3 Global Step: 50590 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:38:39,297-Speed 5196.82 samples/sec Loss 4.1810 LearningRate 0.0720 Epoch: 3 Global Step: 50600 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:38:41,274-Speed 5183.87 samples/sec Loss 4.1758 LearningRate 0.0720 Epoch: 3 Global Step: 50610 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:38:43,270-Speed 5130.20 samples/sec Loss 4.1971 LearningRate 0.0720 Epoch: 3 Global Step: 50620 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-11 02:38:45,271-Speed 5118.90 samples/sec Loss 4.2760 LearningRate 0.0720 Epoch: 3 Global Step: 50630 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:38:47,257-Speed 5157.27 samples/sec Loss 4.1212 LearningRate 0.0720 Epoch: 3 Global Step: 50640 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:38:49,236-Speed 5178.03 samples/sec Loss 4.1782 LearningRate 0.0720 Epoch: 3 Global Step: 50650 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:38:51,228-Speed 5141.68 samples/sec Loss 4.1193 LearningRate 0.0720 Epoch: 3 Global Step: 50660 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:38:53,230-Speed 5116.07 samples/sec Loss 4.0957 LearningRate 0.0719 Epoch: 3 Global Step: 50670 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:38:55,220-Speed 5147.12 samples/sec Loss 4.1144 LearningRate 0.0719 Epoch: 3 Global Step: 50680 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:38:57,198-Speed 5180.28 samples/sec Loss 4.0954 LearningRate 0.0719 Epoch: 3 Global Step: 50690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:38:59,180-Speed 5168.41 samples/sec Loss 4.1515 LearningRate 0.0719 Epoch: 3 Global Step: 50700 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:01,158-Speed 5177.33 samples/sec Loss 4.0974 LearningRate 0.0719 Epoch: 3 Global Step: 50710 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:03,124-Speed 5210.00 samples/sec Loss 4.1531 LearningRate 0.0719 Epoch: 3 Global Step: 50720 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:05,099-Speed 5188.29 samples/sec Loss 4.2631 LearningRate 0.0719 Epoch: 3 Global Step: 50730 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:07,084-Speed 5158.13 samples/sec Loss 4.1862 LearningRate 0.0719 Epoch: 3 Global Step: 50740 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:09,090-Speed 5106.21 samples/sec Loss 4.1654 LearningRate 0.0719 Epoch: 3 Global Step: 50750 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:11,074-Speed 5163.53 samples/sec Loss 4.1668 LearningRate 0.0719 Epoch: 3 Global Step: 50760 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:13,041-Speed 5207.91 samples/sec Loss 4.1550 LearningRate 0.0719 Epoch: 3 Global Step: 50770 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:15,012-Speed 5198.69 samples/sec Loss 4.0911 LearningRate 0.0719 Epoch: 3 Global Step: 50780 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:16,990-Speed 5179.45 samples/sec Loss 4.1481 LearningRate 0.0719 Epoch: 3 Global Step: 50790 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:18,956-Speed 5210.42 samples/sec Loss 4.1920 LearningRate 0.0719 Epoch: 3 Global Step: 50800 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:20,925-Speed 5200.37 samples/sec Loss 4.2552 LearningRate 0.0719 Epoch: 3 Global Step: 50810 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:22,919-Speed 5137.40 samples/sec Loss 4.2364 LearningRate 0.0719 Epoch: 3 Global Step: 50820 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:24,922-Speed 5114.13 samples/sec Loss 4.1880 LearningRate 0.0719 Epoch: 3 Global Step: 50830 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:26,900-Speed 5180.26 samples/sec Loss 4.2358 LearningRate 0.0719 Epoch: 3 Global Step: 50840 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:28,908-Speed 5099.74 samples/sec Loss 4.1899 LearningRate 0.0719 Epoch: 3 Global Step: 50850 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:30,882-Speed 5190.22 samples/sec Loss 4.1218 LearningRate 0.0718 Epoch: 3 Global Step: 50860 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:32,866-Speed 5162.04 samples/sec Loss 4.2290 LearningRate 0.0718 Epoch: 3 Global Step: 50870 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:34,848-Speed 5168.95 samples/sec Loss 4.1999 LearningRate 0.0718 Epoch: 3 Global Step: 50880 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:36,845-Speed 5129.62 samples/sec Loss 4.2463 LearningRate 0.0718 Epoch: 3 Global Step: 50890 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:38,818-Speed 5192.04 samples/sec Loss 4.2786 LearningRate 0.0718 Epoch: 3 Global Step: 50900 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:40,804-Speed 5158.40 samples/sec Loss 4.2154 LearningRate 0.0718 Epoch: 3 Global Step: 50910 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:42,774-Speed 5199.03 samples/sec Loss 4.1841 LearningRate 0.0718 Epoch: 3 Global Step: 50920 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:44,765-Speed 5144.86 samples/sec Loss 4.1920 LearningRate 0.0718 Epoch: 3 Global Step: 50930 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-11 02:39:46,757-Speed 5141.70 samples/sec Loss 4.2303 LearningRate 0.0718 Epoch: 3 Global Step: 50940 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:48,733-Speed 5185.60 samples/sec Loss 4.1689 LearningRate 0.0718 Epoch: 3 Global Step: 50950 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:50,704-Speed 5197.41 samples/sec Loss 4.3097 LearningRate 0.0718 Epoch: 3 Global Step: 50960 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:52,674-Speed 5199.71 samples/sec Loss 4.2842 LearningRate 0.0718 Epoch: 3 Global Step: 50970 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:54,659-Speed 5160.32 samples/sec Loss 4.2214 LearningRate 0.0718 Epoch: 3 Global Step: 50980 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:56,632-Speed 5192.40 samples/sec Loss 4.2211 LearningRate 0.0718 Epoch: 3 Global Step: 50990 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:39:58,623-Speed 5146.50 samples/sec Loss 4.2141 LearningRate 0.0718 Epoch: 3 Global Step: 51000 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:40:00,594-Speed 5196.68 samples/sec Loss 4.2741 LearningRate 0.0718 Epoch: 3 Global Step: 51010 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:40:02,567-Speed 5191.11 samples/sec Loss 4.2232 LearningRate 0.0718 Epoch: 3 Global Step: 51020 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:40:04,534-Speed 5205.58 samples/sec Loss 4.2372 LearningRate 0.0718 Epoch: 3 Global Step: 51030 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:40:06,497-Speed 5218.97 samples/sec Loss 4.1755 LearningRate 0.0718 Epoch: 3 Global Step: 51040 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:40:08,485-Speed 5153.06 samples/sec Loss 4.1868 LearningRate 0.0718 Epoch: 3 Global Step: 51050 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:40:10,465-Speed 5174.47 samples/sec Loss 4.1902 LearningRate 0.0717 Epoch: 3 Global Step: 51060 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:40:12,439-Speed 5187.92 samples/sec Loss 4.1994 LearningRate 0.0717 Epoch: 3 Global Step: 51070 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:40:14,417-Speed 5179.02 samples/sec Loss 4.2017 LearningRate 0.0717 Epoch: 3 Global Step: 51080 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:40:16,395-Speed 5180.43 samples/sec Loss 4.2896 LearningRate 0.0717 Epoch: 3 Global Step: 51090 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:40:18,372-Speed 5179.86 samples/sec Loss 4.2833 LearningRate 0.0717 Epoch: 3 Global Step: 51100 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:40:20,339-Speed 5208.81 samples/sec Loss 4.2936 LearningRate 0.0717 Epoch: 3 Global Step: 51110 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:40:22,322-Speed 5164.73 samples/sec Loss 4.2435 LearningRate 0.0717 Epoch: 3 Global Step: 51120 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:40:24,299-Speed 5180.69 samples/sec Loss 4.2293 LearningRate 0.0717 Epoch: 3 Global Step: 51130 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:40:26,299-Speed 5122.11 samples/sec Loss 4.3017 LearningRate 0.0717 Epoch: 3 Global Step: 51140 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:40:28,289-Speed 5147.34 samples/sec Loss 4.2262 LearningRate 0.0717 Epoch: 3 Global Step: 51150 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:40:30,257-Speed 5205.81 samples/sec Loss 4.2188 LearningRate 0.0717 Epoch: 3 Global Step: 51160 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:40:32,244-Speed 5154.65 samples/sec Loss 4.3156 LearningRate 0.0717 Epoch: 3 Global Step: 51170 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:40:34,216-Speed 5195.64 samples/sec Loss 4.2988 LearningRate 0.0717 Epoch: 3 Global Step: 51180 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:40:36,200-Speed 5162.68 samples/sec Loss 4.2252 LearningRate 0.0717 Epoch: 3 Global Step: 51190 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:40:38,188-Speed 5151.61 samples/sec Loss 4.2212 LearningRate 0.0717 Epoch: 3 Global Step: 51200 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:40:40,168-Speed 5175.01 samples/sec Loss 4.2840 LearningRate 0.0717 Epoch: 3 Global Step: 51210 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:40:42,153-Speed 5160.57 samples/sec Loss 4.2857 LearningRate 0.0717 Epoch: 3 Global Step: 51220 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:40:44,137-Speed 5161.50 samples/sec Loss 4.2983 LearningRate 0.0717 Epoch: 3 Global Step: 51230 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:40:46,117-Speed 5173.13 samples/sec Loss 4.1853 LearningRate 0.0717 Epoch: 3 Global Step: 51240 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:40:48,101-Speed 5165.11 samples/sec Loss 4.2797 LearningRate 0.0717 Epoch: 3 Global Step: 51250 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:40:50,070-Speed 5203.07 samples/sec Loss 4.3322 LearningRate 0.0716 Epoch: 3 Global Step: 51260 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:40:52,065-Speed 5134.08 samples/sec Loss 4.2891 LearningRate 0.0716 Epoch: 3 Global Step: 51270 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:40:54,061-Speed 5132.94 samples/sec Loss 4.2019 LearningRate 0.0716 Epoch: 3 Global Step: 51280 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:40:56,028-Speed 5206.98 samples/sec Loss 4.2735 LearningRate 0.0716 Epoch: 3 Global Step: 51290 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:40:58,010-Speed 5169.77 samples/sec Loss 4.1940 LearningRate 0.0716 Epoch: 3 Global Step: 51300 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:41:00,008-Speed 5125.19 samples/sec Loss 4.2953 LearningRate 0.0716 Epoch: 3 Global Step: 51310 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:41:01,983-Speed 5187.33 samples/sec Loss 4.2625 LearningRate 0.0716 Epoch: 3 Global Step: 51320 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:41:03,959-Speed 5184.41 samples/sec Loss 4.1455 LearningRate 0.0716 Epoch: 3 Global Step: 51330 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:41:05,926-Speed 5205.47 samples/sec Loss 4.3750 LearningRate 0.0716 Epoch: 3 Global Step: 51340 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:41:07,879-Speed 5245.20 samples/sec Loss 4.2364 LearningRate 0.0716 Epoch: 3 Global Step: 51350 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:41:09,866-Speed 5157.20 samples/sec Loss 4.3474 LearningRate 0.0716 Epoch: 3 Global Step: 51360 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:41:11,852-Speed 5158.87 samples/sec Loss 4.3600 LearningRate 0.0716 Epoch: 3 Global Step: 51370 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:41:13,838-Speed 5155.35 samples/sec Loss 4.2051 LearningRate 0.0716 Epoch: 3 Global Step: 51380 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:41:15,811-Speed 5192.93 samples/sec Loss 4.2730 LearningRate 0.0716 Epoch: 3 Global Step: 51390 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:41:17,784-Speed 5191.76 samples/sec Loss 4.2773 LearningRate 0.0716 Epoch: 3 Global Step: 51400 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:41:19,750-Speed 5209.09 samples/sec Loss 4.3347 LearningRate 0.0716 Epoch: 3 Global Step: 51410 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:41:21,740-Speed 5148.74 samples/sec Loss 4.2663 LearningRate 0.0716 Epoch: 3 Global Step: 51420 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:41:23,727-Speed 5154.92 samples/sec Loss 4.2429 LearningRate 0.0716 Epoch: 3 Global Step: 51430 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:41:25,702-Speed 5187.19 samples/sec Loss 4.3077 LearningRate 0.0716 Epoch: 3 Global Step: 51440 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:41:27,673-Speed 5196.84 samples/sec Loss 4.3575 LearningRate 0.0716 Epoch: 3 Global Step: 51450 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:41:29,662-Speed 5151.13 samples/sec Loss 4.4033 LearningRate 0.0715 Epoch: 3 Global Step: 51460 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:41:31,632-Speed 5199.44 samples/sec Loss 4.2636 LearningRate 0.0715 Epoch: 3 Global Step: 51470 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:41:33,650-Speed 5076.01 samples/sec Loss 4.2780 LearningRate 0.0715 Epoch: 3 Global Step: 51480 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:41:35,627-Speed 5181.21 samples/sec Loss 4.3894 LearningRate 0.0715 Epoch: 3 Global Step: 51490 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:41:37,625-Speed 5126.22 samples/sec Loss 4.3509 LearningRate 0.0715 Epoch: 3 Global Step: 51500 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:41:39,612-Speed 5154.46 samples/sec Loss 4.3198 LearningRate 0.0715 Epoch: 3 Global Step: 51510 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:41:41,578-Speed 5211.46 samples/sec Loss 4.2877 LearningRate 0.0715 Epoch: 3 Global Step: 51520 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:41:43,555-Speed 5180.18 samples/sec Loss 4.3880 LearningRate 0.0715 Epoch: 3 Global Step: 51530 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:41:45,539-Speed 5162.00 samples/sec Loss 4.2946 LearningRate 0.0715 Epoch: 3 Global Step: 51540 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:41:47,536-Speed 5129.76 samples/sec Loss 4.2694 LearningRate 0.0715 Epoch: 3 Global Step: 51550 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:41:49,511-Speed 5189.06 samples/sec Loss 4.3072 LearningRate 0.0715 Epoch: 3 Global Step: 51560 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:41:51,496-Speed 5159.99 samples/sec Loss 4.3669 LearningRate 0.0715 Epoch: 3 Global Step: 51570 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:41:53,469-Speed 5190.58 samples/sec Loss 4.3356 LearningRate 0.0715 Epoch: 3 Global Step: 51580 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:41:55,440-Speed 5198.70 samples/sec Loss 4.3543 LearningRate 0.0715 Epoch: 3 Global Step: 51590 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:41:57,409-Speed 5202.31 samples/sec Loss 4.3285 LearningRate 0.0715 Epoch: 3 Global Step: 51600 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:41:59,382-Speed 5190.95 samples/sec Loss 4.3092 LearningRate 0.0715 Epoch: 3 Global Step: 51610 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:42:01,354-Speed 5194.79 samples/sec Loss 4.3093 LearningRate 0.0715 Epoch: 3 Global Step: 51620 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:42:03,325-Speed 5196.42 samples/sec Loss 4.2837 LearningRate 0.0715 Epoch: 3 Global Step: 51630 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:42:05,315-Speed 5146.43 samples/sec Loss 4.4034 LearningRate 0.0715 Epoch: 3 Global Step: 51640 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:42:07,285-Speed 5200.00 samples/sec Loss 4.2815 LearningRate 0.0714 Epoch: 3 Global Step: 51650 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-11 02:42:09,253-Speed 5206.27 samples/sec Loss 4.4109 LearningRate 0.0714 Epoch: 3 Global Step: 51660 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:42:11,239-Speed 5159.30 samples/sec Loss 4.2331 LearningRate 0.0714 Epoch: 3 Global Step: 51670 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:42:13,213-Speed 5186.92 samples/sec Loss 4.2629 LearningRate 0.0714 Epoch: 3 Global Step: 51680 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:42:15,201-Speed 5152.55 samples/sec Loss 4.3151 LearningRate 0.0714 Epoch: 3 Global Step: 51690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:42:17,164-Speed 5219.08 samples/sec Loss 4.2846 LearningRate 0.0714 Epoch: 3 Global Step: 51700 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:42:19,138-Speed 5188.51 samples/sec Loss 4.2988 LearningRate 0.0714 Epoch: 3 Global Step: 51710 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:42:21,121-Speed 5167.62 samples/sec Loss 4.2799 LearningRate 0.0714 Epoch: 3 Global Step: 51720 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:42:23,099-Speed 5178.53 samples/sec Loss 4.3305 LearningRate 0.0714 Epoch: 3 Global Step: 51730 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:42:25,074-Speed 5184.79 samples/sec Loss 4.3207 LearningRate 0.0714 Epoch: 3 Global Step: 51740 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:42:27,066-Speed 5142.84 samples/sec Loss 4.2920 LearningRate 0.0714 Epoch: 3 Global Step: 51750 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:42:29,054-Speed 5154.11 samples/sec Loss 4.3338 LearningRate 0.0714 Epoch: 3 Global Step: 51760 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:42:31,032-Speed 5178.93 samples/sec Loss 4.2901 LearningRate 0.0714 Epoch: 3 Global Step: 51770 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:42:33,005-Speed 5192.26 samples/sec Loss 4.2643 LearningRate 0.0714 Epoch: 3 Global Step: 51780 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:42:34,975-Speed 5197.38 samples/sec Loss 4.3312 LearningRate 0.0714 Epoch: 3 Global Step: 51790 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:42:36,952-Speed 5181.69 samples/sec Loss 4.2622 LearningRate 0.0714 Epoch: 3 Global Step: 51800 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:42:38,929-Speed 5181.01 samples/sec Loss 4.3661 LearningRate 0.0714 Epoch: 3 Global Step: 51810 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:42:40,913-Speed 5163.91 samples/sec Loss 4.3121 LearningRate 0.0714 Epoch: 3 Global Step: 51820 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:42:42,903-Speed 5147.30 samples/sec Loss 4.3086 LearningRate 0.0714 Epoch: 3 Global Step: 51830 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:42:44,891-Speed 5151.10 samples/sec Loss 4.3945 LearningRate 0.0714 Epoch: 3 Global Step: 51840 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:42:46,889-Speed 5127.48 samples/sec Loss 4.3038 LearningRate 0.0713 Epoch: 3 Global Step: 51850 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:42:48,880-Speed 5145.59 samples/sec Loss 4.3687 LearningRate 0.0713 Epoch: 3 Global Step: 51860 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:42:50,863-Speed 5165.20 samples/sec Loss 4.2770 LearningRate 0.0713 Epoch: 3 Global Step: 51870 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:42:52,833-Speed 5199.81 samples/sec Loss 4.3296 LearningRate 0.0713 Epoch: 3 Global Step: 51880 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:42:54,809-Speed 5185.26 samples/sec Loss 4.3937 LearningRate 0.0713 Epoch: 3 Global Step: 51890 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:42:56,786-Speed 5180.25 samples/sec Loss 4.3147 LearningRate 0.0713 Epoch: 3 Global Step: 51900 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:42:58,776-Speed 5148.54 samples/sec Loss 4.3222 LearningRate 0.0713 Epoch: 3 Global Step: 51910 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:43:00,752-Speed 5183.50 samples/sec Loss 4.3074 LearningRate 0.0713 Epoch: 3 Global Step: 51920 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:43:02,723-Speed 5196.80 samples/sec Loss 4.3850 LearningRate 0.0713 Epoch: 3 Global Step: 51930 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:43:04,697-Speed 5190.02 samples/sec Loss 4.3522 LearningRate 0.0713 Epoch: 3 Global Step: 51940 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:43:06,686-Speed 5150.20 samples/sec Loss 4.3665 LearningRate 0.0713 Epoch: 3 Global Step: 51950 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:43:08,677-Speed 5143.11 samples/sec Loss 4.2756 LearningRate 0.0713 Epoch: 3 Global Step: 51960 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:43:10,650-Speed 5192.97 samples/sec Loss 4.2755 LearningRate 0.0713 Epoch: 3 Global Step: 51970 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:43:12,635-Speed 5160.36 samples/sec Loss 4.3468 LearningRate 0.0713 Epoch: 3 Global Step: 51980 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:43:14,619-Speed 5163.85 samples/sec Loss 4.2892 LearningRate 0.0713 Epoch: 3 Global Step: 51990 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:43:16,611-Speed 5140.10 samples/sec Loss 4.3258 LearningRate 0.0713 Epoch: 3 Global Step: 52000 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:43:43,398-[lfw][52000]XNorm: 22.565055 Training: 2022-04-11 02:43:43,398-[lfw][52000]Accuracy-Flip: 0.99700+-0.00340 Training: 2022-04-11 02:43:43,399-[lfw][52000]Accuracy-Highest: 0.99783 Training: 2022-04-11 02:44:14,134-[cfp_fp][52000]XNorm: 20.350126 Training: 2022-04-11 02:44:14,134-[cfp_fp][52000]Accuracy-Flip: 0.97771+-0.00554 Training: 2022-04-11 02:44:14,135-[cfp_fp][52000]Accuracy-Highest: 0.97871 Training: 2022-04-11 02:44:40,649-[agedb_30][52000]XNorm: 22.576471 Training: 2022-04-11 02:44:40,649-[agedb_30][52000]Accuracy-Flip: 0.97717+-0.00760 Training: 2022-04-11 02:44:40,650-[agedb_30][52000]Accuracy-Highest: 0.97717 Training: 2022-04-11 02:44:42,635-Speed 119.04 samples/sec Loss 4.3788 LearningRate 0.0713 Epoch: 3 Global Step: 52010 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:44:44,610-Speed 5187.02 samples/sec Loss 4.3390 LearningRate 0.0713 Epoch: 3 Global Step: 52020 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:44:46,584-Speed 5189.33 samples/sec Loss 4.3107 LearningRate 0.0713 Epoch: 3 Global Step: 52030 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:44:48,549-Speed 5212.92 samples/sec Loss 4.4005 LearningRate 0.0713 Epoch: 3 Global Step: 52040 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:44:50,527-Speed 5180.10 samples/sec Loss 4.3327 LearningRate 0.0712 Epoch: 3 Global Step: 52050 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:44:52,505-Speed 5177.41 samples/sec Loss 4.3023 LearningRate 0.0712 Epoch: 3 Global Step: 52060 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:44:54,474-Speed 5203.17 samples/sec Loss 4.4097 LearningRate 0.0712 Epoch: 3 Global Step: 52070 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:44:56,457-Speed 5167.33 samples/sec Loss 4.4385 LearningRate 0.0712 Epoch: 3 Global Step: 52080 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:44:58,427-Speed 5198.93 samples/sec Loss 4.2703 LearningRate 0.0712 Epoch: 3 Global Step: 52090 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:45:00,398-Speed 5197.10 samples/sec Loss 4.3787 LearningRate 0.0712 Epoch: 3 Global Step: 52100 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:45:02,377-Speed 5175.45 samples/sec Loss 4.2664 LearningRate 0.0712 Epoch: 3 Global Step: 52110 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:45:04,348-Speed 5195.72 samples/sec Loss 4.3315 LearningRate 0.0712 Epoch: 3 Global Step: 52120 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:45:06,328-Speed 5174.44 samples/sec Loss 4.3305 LearningRate 0.0712 Epoch: 3 Global Step: 52130 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:45:08,304-Speed 5182.18 samples/sec Loss 4.3683 LearningRate 0.0712 Epoch: 3 Global Step: 52140 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:45:10,290-Speed 5160.33 samples/sec Loss 4.2980 LearningRate 0.0712 Epoch: 3 Global Step: 52150 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:45:12,280-Speed 5147.24 samples/sec Loss 4.3557 LearningRate 0.0712 Epoch: 3 Global Step: 52160 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:45:14,272-Speed 5143.05 samples/sec Loss 4.4445 LearningRate 0.0712 Epoch: 3 Global Step: 52170 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:45:16,244-Speed 5193.37 samples/sec Loss 4.3395 LearningRate 0.0712 Epoch: 3 Global Step: 52180 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:45:18,230-Speed 5157.22 samples/sec Loss 4.3261 LearningRate 0.0712 Epoch: 3 Global Step: 52190 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:45:20,204-Speed 5190.05 samples/sec Loss 4.3792 LearningRate 0.0712 Epoch: 3 Global Step: 52200 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:45:22,187-Speed 5165.84 samples/sec Loss 4.3096 LearningRate 0.0712 Epoch: 3 Global Step: 52210 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:45:24,163-Speed 5183.28 samples/sec Loss 4.3637 LearningRate 0.0712 Epoch: 3 Global Step: 52220 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:45:26,153-Speed 5147.14 samples/sec Loss 4.4199 LearningRate 0.0712 Epoch: 3 Global Step: 52230 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:45:28,130-Speed 5181.96 samples/sec Loss 4.2940 LearningRate 0.0712 Epoch: 3 Global Step: 52240 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:45:30,106-Speed 5182.06 samples/sec Loss 4.3059 LearningRate 0.0711 Epoch: 3 Global Step: 52250 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:45:32,085-Speed 5177.31 samples/sec Loss 4.3341 LearningRate 0.0711 Epoch: 3 Global Step: 52260 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:45:34,088-Speed 5113.91 samples/sec Loss 4.3638 LearningRate 0.0711 Epoch: 3 Global Step: 52270 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:45:36,112-Speed 5061.04 samples/sec Loss 4.4413 LearningRate 0.0711 Epoch: 3 Global Step: 52280 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:45:38,107-Speed 5134.61 samples/sec Loss 4.2974 LearningRate 0.0711 Epoch: 3 Global Step: 52290 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:45:40,095-Speed 5152.89 samples/sec Loss 4.4168 LearningRate 0.0711 Epoch: 3 Global Step: 52300 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:45:42,104-Speed 5099.62 samples/sec Loss 4.3441 LearningRate 0.0711 Epoch: 3 Global Step: 52310 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:45:44,122-Speed 5075.58 samples/sec Loss 4.3544 LearningRate 0.0711 Epoch: 3 Global Step: 52320 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:45:46,126-Speed 5111.12 samples/sec Loss 4.3866 LearningRate 0.0711 Epoch: 3 Global Step: 52330 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:45:48,117-Speed 5145.86 samples/sec Loss 4.3493 LearningRate 0.0711 Epoch: 3 Global Step: 52340 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:45:50,141-Speed 5061.40 samples/sec Loss 4.4173 LearningRate 0.0711 Epoch: 3 Global Step: 52350 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:45:52,130-Speed 5149.98 samples/sec Loss 4.3197 LearningRate 0.0711 Epoch: 3 Global Step: 52360 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:45:54,133-Speed 5114.92 samples/sec Loss 4.3285 LearningRate 0.0711 Epoch: 3 Global Step: 52370 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:45:56,117-Speed 5161.80 samples/sec Loss 4.3748 LearningRate 0.0711 Epoch: 3 Global Step: 52380 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:45:58,103-Speed 5159.44 samples/sec Loss 4.3816 LearningRate 0.0711 Epoch: 3 Global Step: 52390 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:46:00,116-Speed 5088.51 samples/sec Loss 4.3676 LearningRate 0.0711 Epoch: 3 Global Step: 52400 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:46:02,128-Speed 5090.43 samples/sec Loss 4.2728 LearningRate 0.0711 Epoch: 3 Global Step: 52410 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:46:04,140-Speed 5092.47 samples/sec Loss 4.3837 LearningRate 0.0711 Epoch: 3 Global Step: 52420 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:46:06,118-Speed 5177.44 samples/sec Loss 4.2887 LearningRate 0.0711 Epoch: 3 Global Step: 52430 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:46:08,101-Speed 5165.63 samples/sec Loss 4.3775 LearningRate 0.0710 Epoch: 3 Global Step: 52440 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:46:10,078-Speed 5180.77 samples/sec Loss 4.4289 LearningRate 0.0710 Epoch: 3 Global Step: 52450 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:46:12,060-Speed 5168.61 samples/sec Loss 4.2940 LearningRate 0.0710 Epoch: 3 Global Step: 52460 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:46:14,052-Speed 5142.30 samples/sec Loss 4.3927 LearningRate 0.0710 Epoch: 3 Global Step: 52470 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:46:16,033-Speed 5171.24 samples/sec Loss 4.3890 LearningRate 0.0710 Epoch: 3 Global Step: 52480 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:46:18,018-Speed 5160.63 samples/sec Loss 4.3921 LearningRate 0.0710 Epoch: 3 Global Step: 52490 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:46:19,998-Speed 5174.40 samples/sec Loss 4.3140 LearningRate 0.0710 Epoch: 3 Global Step: 52500 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:46:21,976-Speed 5177.27 samples/sec Loss 4.4713 LearningRate 0.0710 Epoch: 3 Global Step: 52510 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:46:23,981-Speed 5110.25 samples/sec Loss 4.3812 LearningRate 0.0710 Epoch: 3 Global Step: 52520 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:46:25,976-Speed 5133.04 samples/sec Loss 4.3030 LearningRate 0.0710 Epoch: 3 Global Step: 52530 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:46:27,949-Speed 5192.01 samples/sec Loss 4.3215 LearningRate 0.0710 Epoch: 3 Global Step: 52540 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 02:46:29,927-Speed 5178.66 samples/sec Loss 4.3289 LearningRate 0.0710 Epoch: 3 Global Step: 52550 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:46:31,909-Speed 5168.94 samples/sec Loss 4.3808 LearningRate 0.0710 Epoch: 3 Global Step: 52560 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:46:33,896-Speed 5154.14 samples/sec Loss 4.3525 LearningRate 0.0710 Epoch: 3 Global Step: 52570 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:46:35,887-Speed 5144.17 samples/sec Loss 4.3799 LearningRate 0.0710 Epoch: 3 Global Step: 52580 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:46:37,896-Speed 5100.76 samples/sec Loss 4.2753 LearningRate 0.0710 Epoch: 3 Global Step: 52590 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:46:39,893-Speed 5128.91 samples/sec Loss 4.3464 LearningRate 0.0710 Epoch: 3 Global Step: 52600 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:46:41,871-Speed 5178.51 samples/sec Loss 4.4278 LearningRate 0.0710 Epoch: 3 Global Step: 52610 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:46:43,864-Speed 5141.65 samples/sec Loss 4.3707 LearningRate 0.0710 Epoch: 3 Global Step: 52620 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:46:45,850-Speed 5155.16 samples/sec Loss 4.3460 LearningRate 0.0710 Epoch: 3 Global Step: 52630 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:46:47,875-Speed 5059.25 samples/sec Loss 4.3994 LearningRate 0.0709 Epoch: 3 Global Step: 52640 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:46:49,856-Speed 5170.20 samples/sec Loss 4.4933 LearningRate 0.0709 Epoch: 3 Global Step: 52650 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:46:51,836-Speed 5174.01 samples/sec Loss 4.4281 LearningRate 0.0709 Epoch: 3 Global Step: 52660 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:46:53,832-Speed 5130.93 samples/sec Loss 4.3950 LearningRate 0.0709 Epoch: 3 Global Step: 52670 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:46:55,812-Speed 5175.82 samples/sec Loss 4.4513 LearningRate 0.0709 Epoch: 3 Global Step: 52680 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:46:57,788-Speed 5183.61 samples/sec Loss 4.3884 LearningRate 0.0709 Epoch: 3 Global Step: 52690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:46:59,761-Speed 5192.73 samples/sec Loss 4.4406 LearningRate 0.0709 Epoch: 3 Global Step: 52700 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:47:01,732-Speed 5195.96 samples/sec Loss 4.3600 LearningRate 0.0709 Epoch: 3 Global Step: 52710 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:47:03,731-Speed 5124.90 samples/sec Loss 4.3885 LearningRate 0.0709 Epoch: 3 Global Step: 52720 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:47:05,708-Speed 5179.60 samples/sec Loss 4.3695 LearningRate 0.0709 Epoch: 3 Global Step: 52730 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:47:07,693-Speed 5161.42 samples/sec Loss 4.4173 LearningRate 0.0709 Epoch: 3 Global Step: 52740 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:47:09,664-Speed 5196.65 samples/sec Loss 4.3462 LearningRate 0.0709 Epoch: 3 Global Step: 52750 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:47:11,641-Speed 5182.11 samples/sec Loss 4.3681 LearningRate 0.0709 Epoch: 3 Global Step: 52760 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:47:13,627-Speed 5158.21 samples/sec Loss 4.3830 LearningRate 0.0709 Epoch: 3 Global Step: 52770 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:47:15,599-Speed 5193.35 samples/sec Loss 4.3830 LearningRate 0.0709 Epoch: 3 Global Step: 52780 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:47:17,579-Speed 5173.41 samples/sec Loss 4.3915 LearningRate 0.0709 Epoch: 3 Global Step: 52790 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:47:19,564-Speed 5162.28 samples/sec Loss 4.3925 LearningRate 0.0709 Epoch: 3 Global Step: 52800 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:47:21,550-Speed 5156.78 samples/sec Loss 4.4345 LearningRate 0.0709 Epoch: 3 Global Step: 52810 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:47:23,523-Speed 5192.94 samples/sec Loss 4.3542 LearningRate 0.0709 Epoch: 3 Global Step: 52820 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:47:25,508-Speed 5159.56 samples/sec Loss 4.4294 LearningRate 0.0709 Epoch: 3 Global Step: 52830 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:47:27,488-Speed 5172.58 samples/sec Loss 4.4860 LearningRate 0.0708 Epoch: 3 Global Step: 52840 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:47:29,473-Speed 5160.89 samples/sec Loss 4.3936 LearningRate 0.0708 Epoch: 3 Global Step: 52850 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:47:31,448-Speed 5186.60 samples/sec Loss 4.3768 LearningRate 0.0708 Epoch: 3 Global Step: 52860 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:47:33,434-Speed 5157.25 samples/sec Loss 4.4450 LearningRate 0.0708 Epoch: 3 Global Step: 52870 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:47:35,417-Speed 5165.64 samples/sec Loss 4.3160 LearningRate 0.0708 Epoch: 3 Global Step: 52880 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:47:37,398-Speed 5171.53 samples/sec Loss 4.4374 LearningRate 0.0708 Epoch: 3 Global Step: 52890 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:47:39,377-Speed 5176.80 samples/sec Loss 4.3825 LearningRate 0.0708 Epoch: 3 Global Step: 52900 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:47:41,357-Speed 5173.42 samples/sec Loss 4.4338 LearningRate 0.0708 Epoch: 3 Global Step: 52910 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:47:43,351-Speed 5137.82 samples/sec Loss 4.3941 LearningRate 0.0708 Epoch: 3 Global Step: 52920 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:47:45,333-Speed 5167.13 samples/sec Loss 4.4016 LearningRate 0.0708 Epoch: 3 Global Step: 52930 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:47:47,320-Speed 5154.28 samples/sec Loss 4.4296 LearningRate 0.0708 Epoch: 3 Global Step: 52940 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:47:49,294-Speed 5190.58 samples/sec Loss 4.4198 LearningRate 0.0708 Epoch: 3 Global Step: 52950 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-11 02:47:51,268-Speed 5189.32 samples/sec Loss 4.3648 LearningRate 0.0708 Epoch: 3 Global Step: 52960 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-11 02:47:53,243-Speed 5185.02 samples/sec Loss 4.3458 LearningRate 0.0708 Epoch: 3 Global Step: 52970 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:47:55,252-Speed 5099.18 samples/sec Loss 4.4252 LearningRate 0.0708 Epoch: 3 Global Step: 52980 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:47:57,231-Speed 5177.57 samples/sec Loss 4.3528 LearningRate 0.0708 Epoch: 3 Global Step: 52990 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:47:59,223-Speed 5140.81 samples/sec Loss 4.3952 LearningRate 0.0708 Epoch: 3 Global Step: 53000 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:48:01,214-Speed 5145.17 samples/sec Loss 4.4086 LearningRate 0.0708 Epoch: 3 Global Step: 53010 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:48:03,200-Speed 5159.77 samples/sec Loss 4.3875 LearningRate 0.0708 Epoch: 3 Global Step: 53020 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:48:05,179-Speed 5176.34 samples/sec Loss 4.4368 LearningRate 0.0708 Epoch: 3 Global Step: 53030 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:48:07,163-Speed 5162.40 samples/sec Loss 4.4168 LearningRate 0.0707 Epoch: 3 Global Step: 53040 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:48:09,156-Speed 5140.36 samples/sec Loss 4.3403 LearningRate 0.0707 Epoch: 3 Global Step: 53050 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:48:11,140-Speed 5163.11 samples/sec Loss 4.3472 LearningRate 0.0707 Epoch: 3 Global Step: 53060 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:48:13,125-Speed 5159.76 samples/sec Loss 4.3488 LearningRate 0.0707 Epoch: 3 Global Step: 53070 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:48:15,113-Speed 5151.83 samples/sec Loss 4.4271 LearningRate 0.0707 Epoch: 3 Global Step: 53080 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:48:17,115-Speed 5116.65 samples/sec Loss 4.3784 LearningRate 0.0707 Epoch: 3 Global Step: 53090 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:48:19,105-Speed 5146.82 samples/sec Loss 4.4153 LearningRate 0.0707 Epoch: 3 Global Step: 53100 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:48:21,095-Speed 5147.08 samples/sec Loss 4.4119 LearningRate 0.0707 Epoch: 3 Global Step: 53110 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:48:23,082-Speed 5156.93 samples/sec Loss 4.3878 LearningRate 0.0707 Epoch: 3 Global Step: 53120 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:48:25,072-Speed 5147.32 samples/sec Loss 4.3864 LearningRate 0.0707 Epoch: 3 Global Step: 53130 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:48:27,062-Speed 5147.39 samples/sec Loss 4.3901 LearningRate 0.0707 Epoch: 3 Global Step: 53140 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:48:29,047-Speed 5159.58 samples/sec Loss 4.4456 LearningRate 0.0707 Epoch: 3 Global Step: 53150 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:48:31,035-Speed 5154.76 samples/sec Loss 4.4252 LearningRate 0.0707 Epoch: 3 Global Step: 53160 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:48:33,004-Speed 5201.46 samples/sec Loss 4.3824 LearningRate 0.0707 Epoch: 3 Global Step: 53170 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:48:35,002-Speed 5126.84 samples/sec Loss 4.3684 LearningRate 0.0707 Epoch: 3 Global Step: 53180 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:48:36,990-Speed 5153.03 samples/sec Loss 4.4552 LearningRate 0.0707 Epoch: 3 Global Step: 53190 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:48:38,968-Speed 5177.95 samples/sec Loss 4.3209 LearningRate 0.0707 Epoch: 3 Global Step: 53200 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:48:40,967-Speed 5124.42 samples/sec Loss 4.4844 LearningRate 0.0707 Epoch: 3 Global Step: 53210 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:48:42,947-Speed 5171.52 samples/sec Loss 4.3021 LearningRate 0.0707 Epoch: 3 Global Step: 53220 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:48:44,927-Speed 5174.90 samples/sec Loss 4.4156 LearningRate 0.0707 Epoch: 3 Global Step: 53230 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:48:46,908-Speed 5171.56 samples/sec Loss 4.4551 LearningRate 0.0706 Epoch: 3 Global Step: 53240 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:48:48,884-Speed 5184.47 samples/sec Loss 4.3953 LearningRate 0.0706 Epoch: 3 Global Step: 53250 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:48:50,873-Speed 5152.24 samples/sec Loss 4.4343 LearningRate 0.0706 Epoch: 3 Global Step: 53260 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:48:52,856-Speed 5164.06 samples/sec Loss 4.3822 LearningRate 0.0706 Epoch: 3 Global Step: 53270 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:48:54,839-Speed 5166.59 samples/sec Loss 4.4128 LearningRate 0.0706 Epoch: 3 Global Step: 53280 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:48:56,823-Speed 5164.33 samples/sec Loss 4.4777 LearningRate 0.0706 Epoch: 3 Global Step: 53290 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:48:58,793-Speed 5199.43 samples/sec Loss 4.3900 LearningRate 0.0706 Epoch: 3 Global Step: 53300 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:49:00,775-Speed 5166.07 samples/sec Loss 4.4967 LearningRate 0.0706 Epoch: 3 Global Step: 53310 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:49:02,763-Speed 5153.92 samples/sec Loss 4.4298 LearningRate 0.0706 Epoch: 3 Global Step: 53320 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:49:04,737-Speed 5190.82 samples/sec Loss 4.4802 LearningRate 0.0706 Epoch: 3 Global Step: 53330 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:49:06,722-Speed 5159.54 samples/sec Loss 4.4379 LearningRate 0.0706 Epoch: 3 Global Step: 53340 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:49:08,697-Speed 5186.36 samples/sec Loss 4.4210 LearningRate 0.0706 Epoch: 3 Global Step: 53350 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:49:10,677-Speed 5173.89 samples/sec Loss 4.4974 LearningRate 0.0706 Epoch: 3 Global Step: 53360 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:49:12,668-Speed 5144.19 samples/sec Loss 4.4215 LearningRate 0.0706 Epoch: 3 Global Step: 53370 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:49:14,673-Speed 5109.15 samples/sec Loss 4.4945 LearningRate 0.0706 Epoch: 3 Global Step: 53380 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:49:16,647-Speed 5190.24 samples/sec Loss 4.4525 LearningRate 0.0706 Epoch: 3 Global Step: 53390 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:49:18,624-Speed 5179.50 samples/sec Loss 4.4544 LearningRate 0.0706 Epoch: 3 Global Step: 53400 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:49:20,611-Speed 5154.84 samples/sec Loss 4.5077 LearningRate 0.0706 Epoch: 3 Global Step: 53410 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:49:22,588-Speed 5180.77 samples/sec Loss 4.3779 LearningRate 0.0706 Epoch: 3 Global Step: 53420 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:49:24,583-Speed 5136.02 samples/sec Loss 4.4997 LearningRate 0.0706 Epoch: 3 Global Step: 53430 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:49:26,574-Speed 5145.10 samples/sec Loss 4.4594 LearningRate 0.0705 Epoch: 3 Global Step: 53440 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:49:28,577-Speed 5114.54 samples/sec Loss 4.4122 LearningRate 0.0705 Epoch: 3 Global Step: 53450 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:49:30,563-Speed 5158.27 samples/sec Loss 4.4428 LearningRate 0.0705 Epoch: 3 Global Step: 53460 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:49:32,545-Speed 5168.66 samples/sec Loss 4.4158 LearningRate 0.0705 Epoch: 3 Global Step: 53470 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:49:34,549-Speed 5112.28 samples/sec Loss 4.3762 LearningRate 0.0705 Epoch: 3 Global Step: 53480 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:49:36,527-Speed 5176.07 samples/sec Loss 4.4396 LearningRate 0.0705 Epoch: 3 Global Step: 53490 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:49:38,504-Speed 5182.01 samples/sec Loss 4.5099 LearningRate 0.0705 Epoch: 3 Global Step: 53500 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:49:40,482-Speed 5179.32 samples/sec Loss 4.4535 LearningRate 0.0705 Epoch: 3 Global Step: 53510 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:49:42,465-Speed 5165.11 samples/sec Loss 4.4560 LearningRate 0.0705 Epoch: 3 Global Step: 53520 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:49:44,455-Speed 5148.39 samples/sec Loss 4.4438 LearningRate 0.0705 Epoch: 3 Global Step: 53530 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:49:46,451-Speed 5131.83 samples/sec Loss 4.3721 LearningRate 0.0705 Epoch: 3 Global Step: 53540 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:49:48,466-Speed 5083.72 samples/sec Loss 4.4061 LearningRate 0.0705 Epoch: 3 Global Step: 53550 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:49:50,449-Speed 5167.22 samples/sec Loss 4.3201 LearningRate 0.0705 Epoch: 3 Global Step: 53560 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:49:52,427-Speed 5177.17 samples/sec Loss 4.3939 LearningRate 0.0705 Epoch: 3 Global Step: 53570 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:49:54,416-Speed 5150.22 samples/sec Loss 4.3873 LearningRate 0.0705 Epoch: 3 Global Step: 53580 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:49:56,389-Speed 5193.11 samples/sec Loss 4.4341 LearningRate 0.0705 Epoch: 3 Global Step: 53590 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-11 02:49:58,373-Speed 5161.33 samples/sec Loss 4.4435 LearningRate 0.0705 Epoch: 3 Global Step: 53600 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:50:00,349-Speed 5183.64 samples/sec Loss 4.5383 LearningRate 0.0705 Epoch: 3 Global Step: 53610 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:50:02,340-Speed 5146.09 samples/sec Loss 4.4010 LearningRate 0.0705 Epoch: 3 Global Step: 53620 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:50:04,344-Speed 5111.20 samples/sec Loss 4.4918 LearningRate 0.0704 Epoch: 3 Global Step: 53630 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:50:06,321-Speed 5181.92 samples/sec Loss 4.4187 LearningRate 0.0704 Epoch: 3 Global Step: 53640 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:50:08,293-Speed 5194.44 samples/sec Loss 4.5013 LearningRate 0.0704 Epoch: 3 Global Step: 53650 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:50:10,271-Speed 5178.39 samples/sec Loss 4.4169 LearningRate 0.0704 Epoch: 3 Global Step: 53660 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:50:12,251-Speed 5173.55 samples/sec Loss 4.3980 LearningRate 0.0704 Epoch: 3 Global Step: 53670 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:50:14,242-Speed 5144.44 samples/sec Loss 4.5096 LearningRate 0.0704 Epoch: 3 Global Step: 53680 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:50:16,227-Speed 5161.72 samples/sec Loss 4.4101 LearningRate 0.0704 Epoch: 3 Global Step: 53690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:50:18,198-Speed 5196.72 samples/sec Loss 4.4575 LearningRate 0.0704 Epoch: 3 Global Step: 53700 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:50:20,179-Speed 5170.64 samples/sec Loss 4.5287 LearningRate 0.0704 Epoch: 3 Global Step: 53710 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:50:22,169-Speed 5146.32 samples/sec Loss 4.4355 LearningRate 0.0704 Epoch: 3 Global Step: 53720 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:50:24,155-Speed 5159.17 samples/sec Loss 4.4169 LearningRate 0.0704 Epoch: 3 Global Step: 53730 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:50:26,136-Speed 5169.50 samples/sec Loss 4.3457 LearningRate 0.0704 Epoch: 3 Global Step: 53740 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:50:28,127-Speed 5144.80 samples/sec Loss 4.3918 LearningRate 0.0704 Epoch: 3 Global Step: 53750 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:50:30,110-Speed 5166.19 samples/sec Loss 4.5326 LearningRate 0.0704 Epoch: 3 Global Step: 53760 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:50:32,090-Speed 5174.16 samples/sec Loss 4.3432 LearningRate 0.0704 Epoch: 3 Global Step: 53770 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:50:34,087-Speed 5130.38 samples/sec Loss 4.3459 LearningRate 0.0704 Epoch: 3 Global Step: 53780 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:50:36,064-Speed 5179.37 samples/sec Loss 4.3333 LearningRate 0.0704 Epoch: 3 Global Step: 53790 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:50:38,035-Speed 5197.77 samples/sec Loss 4.4073 LearningRate 0.0704 Epoch: 3 Global Step: 53800 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:50:40,026-Speed 5144.76 samples/sec Loss 4.4003 LearningRate 0.0704 Epoch: 3 Global Step: 53810 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:50:42,001-Speed 5187.67 samples/sec Loss 4.4030 LearningRate 0.0704 Epoch: 3 Global Step: 53820 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:50:43,992-Speed 5142.54 samples/sec Loss 4.3906 LearningRate 0.0703 Epoch: 3 Global Step: 53830 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:50:45,982-Speed 5149.56 samples/sec Loss 4.3985 LearningRate 0.0703 Epoch: 3 Global Step: 53840 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:50:47,982-Speed 5120.74 samples/sec Loss 4.4968 LearningRate 0.0703 Epoch: 3 Global Step: 53850 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:50:49,968-Speed 5157.30 samples/sec Loss 4.4364 LearningRate 0.0703 Epoch: 3 Global Step: 53860 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:50:51,943-Speed 5186.72 samples/sec Loss 4.3653 LearningRate 0.0703 Epoch: 3 Global Step: 53870 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:50:53,919-Speed 5185.53 samples/sec Loss 4.4539 LearningRate 0.0703 Epoch: 3 Global Step: 53880 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:50:55,892-Speed 5192.27 samples/sec Loss 4.4104 LearningRate 0.0703 Epoch: 3 Global Step: 53890 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:50:57,887-Speed 5133.83 samples/sec Loss 4.4242 LearningRate 0.0703 Epoch: 3 Global Step: 53900 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:50:59,867-Speed 5172.40 samples/sec Loss 4.4259 LearningRate 0.0703 Epoch: 3 Global Step: 53910 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:51:01,861-Speed 5138.27 samples/sec Loss 4.4922 LearningRate 0.0703 Epoch: 3 Global Step: 53920 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:51:03,832-Speed 5198.19 samples/sec Loss 4.4709 LearningRate 0.0703 Epoch: 3 Global Step: 53930 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:51:05,806-Speed 5187.74 samples/sec Loss 4.4144 LearningRate 0.0703 Epoch: 3 Global Step: 53940 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:51:07,777-Speed 5197.51 samples/sec Loss 4.4020 LearningRate 0.0703 Epoch: 3 Global Step: 53950 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:51:09,751-Speed 5188.42 samples/sec Loss 4.3520 LearningRate 0.0703 Epoch: 3 Global Step: 53960 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:51:11,736-Speed 5162.38 samples/sec Loss 4.4416 LearningRate 0.0703 Epoch: 3 Global Step: 53970 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:51:13,749-Speed 5088.92 samples/sec Loss 4.4146 LearningRate 0.0703 Epoch: 3 Global Step: 53980 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:51:15,729-Speed 5172.62 samples/sec Loss 4.4281 LearningRate 0.0703 Epoch: 3 Global Step: 53990 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:51:17,703-Speed 5189.44 samples/sec Loss 4.4132 LearningRate 0.0703 Epoch: 3 Global Step: 54000 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:51:44,215-[lfw][54000]XNorm: 24.258608 Training: 2022-04-11 02:51:44,216-[lfw][54000]Accuracy-Flip: 0.99767+-0.00271 Training: 2022-04-11 02:51:44,216-[lfw][54000]Accuracy-Highest: 0.99783 Training: 2022-04-11 02:52:14,929-[cfp_fp][54000]XNorm: 22.146166 Training: 2022-04-11 02:52:14,930-[cfp_fp][54000]Accuracy-Flip: 0.97557+-0.00704 Training: 2022-04-11 02:52:14,930-[cfp_fp][54000]Accuracy-Highest: 0.97871 Training: 2022-04-11 02:52:42,095-[agedb_30][54000]XNorm: 23.940729 Training: 2022-04-11 02:52:42,096-[agedb_30][54000]Accuracy-Flip: 0.97617+-0.00764 Training: 2022-04-11 02:52:42,096-[agedb_30][54000]Accuracy-Highest: 0.97717 Training: 2022-04-11 02:52:44,086-Speed 118.54 samples/sec Loss 4.4225 LearningRate 0.0703 Epoch: 3 Global Step: 54010 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:52:46,065-Speed 5177.29 samples/sec Loss 4.4641 LearningRate 0.0703 Epoch: 3 Global Step: 54020 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-11 02:52:48,052-Speed 5153.48 samples/sec Loss 4.4136 LearningRate 0.0702 Epoch: 3 Global Step: 54030 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:52:50,021-Speed 5203.35 samples/sec Loss 4.4474 LearningRate 0.0702 Epoch: 3 Global Step: 54040 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:52:52,003-Speed 5167.11 samples/sec Loss 4.4231 LearningRate 0.0702 Epoch: 3 Global Step: 54050 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:52:53,984-Speed 5171.44 samples/sec Loss 4.4831 LearningRate 0.0702 Epoch: 3 Global Step: 54060 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:52:55,951-Speed 5208.37 samples/sec Loss 4.2974 LearningRate 0.0702 Epoch: 3 Global Step: 54070 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:52:57,933-Speed 5167.21 samples/sec Loss 4.4341 LearningRate 0.0702 Epoch: 3 Global Step: 54080 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:52:59,902-Speed 5202.06 samples/sec Loss 4.4895 LearningRate 0.0702 Epoch: 3 Global Step: 54090 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:53:01,877-Speed 5186.60 samples/sec Loss 4.4606 LearningRate 0.0702 Epoch: 3 Global Step: 54100 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:53:03,872-Speed 5135.73 samples/sec Loss 4.5108 LearningRate 0.0702 Epoch: 3 Global Step: 54110 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:53:05,842-Speed 5199.65 samples/sec Loss 4.4206 LearningRate 0.0702 Epoch: 3 Global Step: 54120 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:53:07,816-Speed 5190.64 samples/sec Loss 4.5138 LearningRate 0.0702 Epoch: 3 Global Step: 54130 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:53:09,795-Speed 5174.01 samples/sec Loss 4.5258 LearningRate 0.0702 Epoch: 3 Global Step: 54140 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:53:11,769-Speed 5190.01 samples/sec Loss 4.4955 LearningRate 0.0702 Epoch: 3 Global Step: 54150 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:53:13,742-Speed 5190.34 samples/sec Loss 4.4531 LearningRate 0.0702 Epoch: 3 Global Step: 54160 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:53:15,720-Speed 5180.19 samples/sec Loss 4.4471 LearningRate 0.0702 Epoch: 3 Global Step: 54170 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:53:17,699-Speed 5175.03 samples/sec Loss 4.4777 LearningRate 0.0702 Epoch: 3 Global Step: 54180 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:53:19,687-Speed 5154.68 samples/sec Loss 4.4031 LearningRate 0.0702 Epoch: 3 Global Step: 54190 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:53:21,682-Speed 5132.11 samples/sec Loss 4.4212 LearningRate 0.0702 Epoch: 3 Global Step: 54200 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:53:23,659-Speed 5181.05 samples/sec Loss 4.4667 LearningRate 0.0702 Epoch: 3 Global Step: 54210 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:53:25,631-Speed 5195.61 samples/sec Loss 4.5250 LearningRate 0.0702 Epoch: 3 Global Step: 54220 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:53:27,610-Speed 5178.10 samples/sec Loss 4.5392 LearningRate 0.0701 Epoch: 3 Global Step: 54230 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:53:29,585-Speed 5185.71 samples/sec Loss 4.4032 LearningRate 0.0701 Epoch: 3 Global Step: 54240 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:53:31,561-Speed 5184.41 samples/sec Loss 4.4453 LearningRate 0.0701 Epoch: 3 Global Step: 54250 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:53:33,534-Speed 5190.73 samples/sec Loss 4.4623 LearningRate 0.0701 Epoch: 3 Global Step: 54260 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:53:35,505-Speed 5197.68 samples/sec Loss 4.4391 LearningRate 0.0701 Epoch: 3 Global Step: 54270 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:53:37,480-Speed 5185.81 samples/sec Loss 4.4368 LearningRate 0.0701 Epoch: 3 Global Step: 54280 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:53:39,457-Speed 5183.19 samples/sec Loss 4.3104 LearningRate 0.0701 Epoch: 3 Global Step: 54290 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:53:41,453-Speed 5131.53 samples/sec Loss 4.4787 LearningRate 0.0701 Epoch: 3 Global Step: 54300 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:53:43,434-Speed 5169.79 samples/sec Loss 4.4630 LearningRate 0.0701 Epoch: 3 Global Step: 54310 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:53:45,425-Speed 5144.20 samples/sec Loss 4.4772 LearningRate 0.0701 Epoch: 3 Global Step: 54320 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:53:47,401-Speed 5185.56 samples/sec Loss 4.4226 LearningRate 0.0701 Epoch: 3 Global Step: 54330 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:53:49,385-Speed 5163.55 samples/sec Loss 4.4616 LearningRate 0.0701 Epoch: 3 Global Step: 54340 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:53:51,357-Speed 5194.63 samples/sec Loss 4.4752 LearningRate 0.0701 Epoch: 3 Global Step: 54350 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:53:53,330-Speed 5190.55 samples/sec Loss 4.3827 LearningRate 0.0701 Epoch: 3 Global Step: 54360 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:53:55,312-Speed 5169.23 samples/sec Loss 4.4946 LearningRate 0.0701 Epoch: 3 Global Step: 54370 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:53:57,292-Speed 5172.68 samples/sec Loss 4.4653 LearningRate 0.0701 Epoch: 3 Global Step: 54380 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:53:59,289-Speed 5130.19 samples/sec Loss 4.4604 LearningRate 0.0701 Epoch: 3 Global Step: 54390 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:54:01,271-Speed 5168.48 samples/sec Loss 4.4473 LearningRate 0.0701 Epoch: 3 Global Step: 54400 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:54:03,243-Speed 5193.22 samples/sec Loss 4.4260 LearningRate 0.0701 Epoch: 3 Global Step: 54410 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:54:05,216-Speed 5193.56 samples/sec Loss 4.5301 LearningRate 0.0701 Epoch: 3 Global Step: 54420 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:54:07,190-Speed 5189.41 samples/sec Loss 4.3756 LearningRate 0.0700 Epoch: 3 Global Step: 54430 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:54:09,179-Speed 5149.09 samples/sec Loss 4.3572 LearningRate 0.0700 Epoch: 3 Global Step: 54440 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:54:11,156-Speed 5181.70 samples/sec Loss 4.4471 LearningRate 0.0700 Epoch: 3 Global Step: 54450 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:54:13,160-Speed 5110.54 samples/sec Loss 4.3987 LearningRate 0.0700 Epoch: 3 Global Step: 54460 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:54:15,134-Speed 5190.56 samples/sec Loss 4.4793 LearningRate 0.0700 Epoch: 3 Global Step: 54470 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:54:17,123-Speed 5149.31 samples/sec Loss 4.4131 LearningRate 0.0700 Epoch: 3 Global Step: 54480 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:54:19,098-Speed 5188.62 samples/sec Loss 4.3982 LearningRate 0.0700 Epoch: 3 Global Step: 54490 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:54:21,079-Speed 5168.73 samples/sec Loss 4.3974 LearningRate 0.0700 Epoch: 3 Global Step: 54500 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:54:23,079-Speed 5120.85 samples/sec Loss 4.5103 LearningRate 0.0700 Epoch: 3 Global Step: 54510 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:54:25,057-Speed 5180.64 samples/sec Loss 4.4706 LearningRate 0.0700 Epoch: 3 Global Step: 54520 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:54:27,035-Speed 5179.77 samples/sec Loss 4.4975 LearningRate 0.0700 Epoch: 3 Global Step: 54530 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:54:29,028-Speed 5139.74 samples/sec Loss 4.3989 LearningRate 0.0700 Epoch: 3 Global Step: 54540 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:54:31,004-Speed 5183.32 samples/sec Loss 4.4214 LearningRate 0.0700 Epoch: 3 Global Step: 54550 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:54:32,980-Speed 5183.92 samples/sec Loss 4.5196 LearningRate 0.0700 Epoch: 3 Global Step: 54560 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:54:34,956-Speed 5183.19 samples/sec Loss 4.4569 LearningRate 0.0700 Epoch: 3 Global Step: 54570 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:54:36,947-Speed 5145.08 samples/sec Loss 4.4557 LearningRate 0.0700 Epoch: 3 Global Step: 54580 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:54:38,935-Speed 5153.88 samples/sec Loss 4.3956 LearningRate 0.0700 Epoch: 3 Global Step: 54590 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:54:40,969-Speed 5034.80 samples/sec Loss 4.5120 LearningRate 0.0700 Epoch: 3 Global Step: 54600 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:54:42,944-Speed 5186.31 samples/sec Loss 4.5187 LearningRate 0.0700 Epoch: 3 Global Step: 54610 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:54:44,935-Speed 5144.31 samples/sec Loss 4.4469 LearningRate 0.0700 Epoch: 3 Global Step: 54620 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:54:46,933-Speed 5127.25 samples/sec Loss 4.4743 LearningRate 0.0699 Epoch: 3 Global Step: 54630 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:54:48,913-Speed 5174.82 samples/sec Loss 4.4850 LearningRate 0.0699 Epoch: 3 Global Step: 54640 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:54:50,910-Speed 5128.87 samples/sec Loss 4.3589 LearningRate 0.0699 Epoch: 3 Global Step: 54650 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:54:52,887-Speed 5181.51 samples/sec Loss 4.3483 LearningRate 0.0699 Epoch: 3 Global Step: 54660 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:54:54,880-Speed 5139.13 samples/sec Loss 4.4050 LearningRate 0.0699 Epoch: 3 Global Step: 54670 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:54:56,856-Speed 5184.19 samples/sec Loss 4.4419 LearningRate 0.0699 Epoch: 3 Global Step: 54680 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:54:58,855-Speed 5124.29 samples/sec Loss 4.5334 LearningRate 0.0699 Epoch: 3 Global Step: 54690 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:55:00,836-Speed 5172.37 samples/sec Loss 4.4455 LearningRate 0.0699 Epoch: 3 Global Step: 54700 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:55:02,831-Speed 5134.75 samples/sec Loss 4.3864 LearningRate 0.0699 Epoch: 3 Global Step: 54710 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:55:04,818-Speed 5152.88 samples/sec Loss 4.3832 LearningRate 0.0699 Epoch: 3 Global Step: 54720 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:55:06,789-Speed 5198.06 samples/sec Loss 4.5127 LearningRate 0.0699 Epoch: 3 Global Step: 54730 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:55:08,774-Speed 5159.84 samples/sec Loss 4.5125 LearningRate 0.0699 Epoch: 3 Global Step: 54740 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:55:10,768-Speed 5138.19 samples/sec Loss 4.3805 LearningRate 0.0699 Epoch: 3 Global Step: 54750 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:55:12,741-Speed 5191.19 samples/sec Loss 4.4721 LearningRate 0.0699 Epoch: 3 Global Step: 54760 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:55:14,725-Speed 5163.63 samples/sec Loss 4.4834 LearningRate 0.0699 Epoch: 3 Global Step: 54770 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:55:16,694-Speed 5203.17 samples/sec Loss 4.4441 LearningRate 0.0699 Epoch: 3 Global Step: 54780 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:55:18,668-Speed 5188.69 samples/sec Loss 4.3968 LearningRate 0.0699 Epoch: 3 Global Step: 54790 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:55:20,632-Speed 5214.38 samples/sec Loss 4.4821 LearningRate 0.0699 Epoch: 3 Global Step: 54800 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:55:22,608-Speed 5184.80 samples/sec Loss 4.3979 LearningRate 0.0699 Epoch: 3 Global Step: 54810 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:55:24,596-Speed 5154.28 samples/sec Loss 4.5311 LearningRate 0.0699 Epoch: 3 Global Step: 54820 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:55:26,581-Speed 5157.98 samples/sec Loss 4.4054 LearningRate 0.0698 Epoch: 3 Global Step: 54830 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:55:28,569-Speed 5155.35 samples/sec Loss 4.4129 LearningRate 0.0698 Epoch: 3 Global Step: 54840 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:55:30,548-Speed 5175.46 samples/sec Loss 4.4622 LearningRate 0.0698 Epoch: 3 Global Step: 54850 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:55:32,532-Speed 5162.91 samples/sec Loss 4.4605 LearningRate 0.0698 Epoch: 3 Global Step: 54860 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:55:34,544-Speed 5090.55 samples/sec Loss 4.4537 LearningRate 0.0698 Epoch: 3 Global Step: 54870 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:55:36,536-Speed 5142.43 samples/sec Loss 4.5299 LearningRate 0.0698 Epoch: 3 Global Step: 54880 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:55:38,540-Speed 5110.44 samples/sec Loss 4.3773 LearningRate 0.0698 Epoch: 3 Global Step: 54890 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:55:40,558-Speed 5077.35 samples/sec Loss 4.3860 LearningRate 0.0698 Epoch: 3 Global Step: 54900 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:55:42,537-Speed 5176.33 samples/sec Loss 4.4036 LearningRate 0.0698 Epoch: 3 Global Step: 54910 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:55:44,513-Speed 5181.74 samples/sec Loss 4.3822 LearningRate 0.0698 Epoch: 3 Global Step: 54920 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:55:46,505-Speed 5143.76 samples/sec Loss 4.4340 LearningRate 0.0698 Epoch: 3 Global Step: 54930 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:55:48,482-Speed 5182.49 samples/sec Loss 4.4294 LearningRate 0.0698 Epoch: 3 Global Step: 54940 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:55:50,464-Speed 5168.03 samples/sec Loss 4.4455 LearningRate 0.0698 Epoch: 3 Global Step: 54950 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:55:52,435-Speed 5196.54 samples/sec Loss 4.4561 LearningRate 0.0698 Epoch: 3 Global Step: 54960 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:55:54,409-Speed 5188.92 samples/sec Loss 4.5067 LearningRate 0.0698 Epoch: 3 Global Step: 54970 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:55:56,390-Speed 5171.79 samples/sec Loss 4.5140 LearningRate 0.0698 Epoch: 3 Global Step: 54980 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:55:58,366-Speed 5182.48 samples/sec Loss 4.5205 LearningRate 0.0698 Epoch: 3 Global Step: 54990 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:56:00,340-Speed 5190.88 samples/sec Loss 4.4086 LearningRate 0.0698 Epoch: 3 Global Step: 55000 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:56:02,331-Speed 5143.69 samples/sec Loss 4.5032 LearningRate 0.0698 Epoch: 3 Global Step: 55010 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:56:04,304-Speed 5192.02 samples/sec Loss 4.3791 LearningRate 0.0698 Epoch: 3 Global Step: 55020 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:56:06,288-Speed 5162.53 samples/sec Loss 4.4506 LearningRate 0.0697 Epoch: 3 Global Step: 55030 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:56:08,262-Speed 5191.64 samples/sec Loss 4.3973 LearningRate 0.0697 Epoch: 3 Global Step: 55040 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:56:10,252-Speed 5146.41 samples/sec Loss 4.3898 LearningRate 0.0697 Epoch: 3 Global Step: 55050 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:56:12,247-Speed 5134.54 samples/sec Loss 4.4588 LearningRate 0.0697 Epoch: 3 Global Step: 55060 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:56:14,243-Speed 5131.28 samples/sec Loss 4.3876 LearningRate 0.0697 Epoch: 3 Global Step: 55070 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:56:16,219-Speed 5184.48 samples/sec Loss 4.5316 LearningRate 0.0697 Epoch: 3 Global Step: 55080 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:56:18,191-Speed 5195.43 samples/sec Loss 4.5104 LearningRate 0.0697 Epoch: 3 Global Step: 55090 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:56:20,160-Speed 5201.70 samples/sec Loss 4.5545 LearningRate 0.0697 Epoch: 3 Global Step: 55100 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:56:22,136-Speed 5182.17 samples/sec Loss 4.4854 LearningRate 0.0697 Epoch: 3 Global Step: 55110 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:56:24,125-Speed 5152.03 samples/sec Loss 4.4336 LearningRate 0.0697 Epoch: 3 Global Step: 55120 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:56:26,113-Speed 5153.07 samples/sec Loss 4.5652 LearningRate 0.0697 Epoch: 3 Global Step: 55130 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:56:28,080-Speed 5207.66 samples/sec Loss 4.4264 LearningRate 0.0697 Epoch: 3 Global Step: 55140 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:56:30,064-Speed 5163.19 samples/sec Loss 4.4261 LearningRate 0.0697 Epoch: 3 Global Step: 55150 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:56:32,035-Speed 5196.29 samples/sec Loss 4.4924 LearningRate 0.0697 Epoch: 3 Global Step: 55160 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:56:34,014-Speed 5177.03 samples/sec Loss 4.4938 LearningRate 0.0697 Epoch: 3 Global Step: 55170 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:56:36,011-Speed 5129.31 samples/sec Loss 4.4093 LearningRate 0.0697 Epoch: 3 Global Step: 55180 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:56:37,993-Speed 5167.38 samples/sec Loss 4.4385 LearningRate 0.0697 Epoch: 3 Global Step: 55190 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:56:39,972-Speed 5175.37 samples/sec Loss 4.4571 LearningRate 0.0697 Epoch: 3 Global Step: 55200 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:56:41,943-Speed 5198.16 samples/sec Loss 4.4780 LearningRate 0.0697 Epoch: 3 Global Step: 55210 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:56:43,921-Speed 5178.73 samples/sec Loss 4.4564 LearningRate 0.0697 Epoch: 3 Global Step: 55220 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:56:45,913-Speed 5142.41 samples/sec Loss 4.4184 LearningRate 0.0696 Epoch: 3 Global Step: 55230 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:56:47,915-Speed 5117.25 samples/sec Loss 4.4578 LearningRate 0.0696 Epoch: 3 Global Step: 55240 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:56:49,894-Speed 5176.57 samples/sec Loss 4.4758 LearningRate 0.0696 Epoch: 3 Global Step: 55250 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:56:51,875-Speed 5170.36 samples/sec Loss 4.4516 LearningRate 0.0696 Epoch: 3 Global Step: 55260 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:56:53,850-Speed 5184.53 samples/sec Loss 4.4455 LearningRate 0.0696 Epoch: 3 Global Step: 55270 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:56:55,835-Speed 5161.18 samples/sec Loss 4.4314 LearningRate 0.0696 Epoch: 3 Global Step: 55280 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:56:57,811-Speed 5183.37 samples/sec Loss 4.5285 LearningRate 0.0696 Epoch: 3 Global Step: 55290 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:56:59,785-Speed 5188.97 samples/sec Loss 4.5307 LearningRate 0.0696 Epoch: 3 Global Step: 55300 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:57:01,764-Speed 5178.18 samples/sec Loss 4.5027 LearningRate 0.0696 Epoch: 3 Global Step: 55310 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:57:03,762-Speed 5125.63 samples/sec Loss 4.4700 LearningRate 0.0696 Epoch: 3 Global Step: 55320 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:57:05,746-Speed 5164.49 samples/sec Loss 4.5431 LearningRate 0.0696 Epoch: 3 Global Step: 55330 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:57:07,737-Speed 5143.33 samples/sec Loss 4.5547 LearningRate 0.0696 Epoch: 3 Global Step: 55340 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:57:09,713-Speed 5184.30 samples/sec Loss 4.5070 LearningRate 0.0696 Epoch: 3 Global Step: 55350 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:57:11,696-Speed 5166.72 samples/sec Loss 4.4952 LearningRate 0.0696 Epoch: 3 Global Step: 55360 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:57:13,689-Speed 5139.95 samples/sec Loss 4.4579 LearningRate 0.0696 Epoch: 3 Global Step: 55370 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:57:15,692-Speed 5112.61 samples/sec Loss 4.4967 LearningRate 0.0696 Epoch: 3 Global Step: 55380 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:57:17,682-Speed 5148.23 samples/sec Loss 4.4615 LearningRate 0.0696 Epoch: 3 Global Step: 55390 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:57:19,671-Speed 5149.68 samples/sec Loss 4.3332 LearningRate 0.0696 Epoch: 3 Global Step: 55400 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:57:21,658-Speed 5154.39 samples/sec Loss 4.4649 LearningRate 0.0696 Epoch: 3 Global Step: 55410 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:57:23,640-Speed 5168.79 samples/sec Loss 4.3887 LearningRate 0.0696 Epoch: 3 Global Step: 55420 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:57:25,615-Speed 5186.77 samples/sec Loss 4.5043 LearningRate 0.0695 Epoch: 3 Global Step: 55430 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:57:27,608-Speed 5140.71 samples/sec Loss 4.5356 LearningRate 0.0695 Epoch: 3 Global Step: 55440 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:57:29,616-Speed 5102.05 samples/sec Loss 4.4943 LearningRate 0.0695 Epoch: 3 Global Step: 55450 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:57:31,610-Speed 5134.73 samples/sec Loss 4.5022 LearningRate 0.0695 Epoch: 3 Global Step: 55460 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:57:33,598-Speed 5153.75 samples/sec Loss 4.5224 LearningRate 0.0695 Epoch: 3 Global Step: 55470 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:57:35,588-Speed 5146.80 samples/sec Loss 4.3790 LearningRate 0.0695 Epoch: 3 Global Step: 55480 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:57:37,583-Speed 5136.10 samples/sec Loss 4.5262 LearningRate 0.0695 Epoch: 3 Global Step: 55490 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:57:39,571-Speed 5152.09 samples/sec Loss 4.4442 LearningRate 0.0695 Epoch: 3 Global Step: 55500 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:57:41,555-Speed 5161.53 samples/sec Loss 4.4873 LearningRate 0.0695 Epoch: 3 Global Step: 55510 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:57:43,534-Speed 5177.06 samples/sec Loss 4.3661 LearningRate 0.0695 Epoch: 3 Global Step: 55520 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:57:45,520-Speed 5157.05 samples/sec Loss 4.4251 LearningRate 0.0695 Epoch: 3 Global Step: 55530 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:57:47,495-Speed 5187.62 samples/sec Loss 4.4280 LearningRate 0.0695 Epoch: 3 Global Step: 55540 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:57:49,471-Speed 5185.55 samples/sec Loss 4.5436 LearningRate 0.0695 Epoch: 3 Global Step: 55550 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:57:51,465-Speed 5136.02 samples/sec Loss 4.4788 LearningRate 0.0695 Epoch: 3 Global Step: 55560 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:57:53,440-Speed 5186.23 samples/sec Loss 4.5121 LearningRate 0.0695 Epoch: 3 Global Step: 55570 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:57:55,430-Speed 5149.27 samples/sec Loss 4.4060 LearningRate 0.0695 Epoch: 3 Global Step: 55580 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:57:57,403-Speed 5189.56 samples/sec Loss 4.5126 LearningRate 0.0695 Epoch: 3 Global Step: 55590 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:57:59,401-Speed 5127.41 samples/sec Loss 4.4976 LearningRate 0.0695 Epoch: 3 Global Step: 55600 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:58:01,401-Speed 5122.55 samples/sec Loss 4.4896 LearningRate 0.0695 Epoch: 3 Global Step: 55610 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:58:03,387-Speed 5157.88 samples/sec Loss 4.3817 LearningRate 0.0695 Epoch: 3 Global Step: 55620 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:58:05,370-Speed 5165.67 samples/sec Loss 4.4489 LearningRate 0.0694 Epoch: 3 Global Step: 55630 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:58:07,356-Speed 5158.35 samples/sec Loss 4.5780 LearningRate 0.0694 Epoch: 3 Global Step: 55640 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:58:09,317-Speed 5222.53 samples/sec Loss 4.4906 LearningRate 0.0694 Epoch: 3 Global Step: 55650 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:58:11,306-Speed 5150.02 samples/sec Loss 4.3882 LearningRate 0.0694 Epoch: 3 Global Step: 55660 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:58:13,285-Speed 5176.19 samples/sec Loss 4.4366 LearningRate 0.0694 Epoch: 3 Global Step: 55670 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:58:15,267-Speed 5168.12 samples/sec Loss 4.4273 LearningRate 0.0694 Epoch: 3 Global Step: 55680 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:58:17,246-Speed 5178.35 samples/sec Loss 4.4362 LearningRate 0.0694 Epoch: 3 Global Step: 55690 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:58:19,220-Speed 5188.28 samples/sec Loss 4.4412 LearningRate 0.0694 Epoch: 3 Global Step: 55700 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:58:21,207-Speed 5153.95 samples/sec Loss 4.4760 LearningRate 0.0694 Epoch: 3 Global Step: 55710 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:58:23,203-Speed 5133.88 samples/sec Loss 4.4738 LearningRate 0.0694 Epoch: 3 Global Step: 55720 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:58:25,196-Speed 5140.42 samples/sec Loss 4.4418 LearningRate 0.0694 Epoch: 3 Global Step: 55730 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:58:27,182-Speed 5156.53 samples/sec Loss 4.4427 LearningRate 0.0694 Epoch: 3 Global Step: 55740 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:58:29,186-Speed 5112.40 samples/sec Loss 4.4421 LearningRate 0.0694 Epoch: 3 Global Step: 55750 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:58:31,158-Speed 5193.34 samples/sec Loss 4.4309 LearningRate 0.0694 Epoch: 3 Global Step: 55760 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:58:33,130-Speed 5193.93 samples/sec Loss 4.4859 LearningRate 0.0694 Epoch: 3 Global Step: 55770 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:58:35,127-Speed 5131.02 samples/sec Loss 4.4200 LearningRate 0.0694 Epoch: 3 Global Step: 55780 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:58:37,114-Speed 5155.01 samples/sec Loss 4.4865 LearningRate 0.0694 Epoch: 3 Global Step: 55790 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:58:39,125-Speed 5093.71 samples/sec Loss 4.4691 LearningRate 0.0694 Epoch: 3 Global Step: 55800 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:58:41,103-Speed 5178.23 samples/sec Loss 4.4314 LearningRate 0.0694 Epoch: 3 Global Step: 55810 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:58:43,086-Speed 5165.27 samples/sec Loss 4.4335 LearningRate 0.0694 Epoch: 3 Global Step: 55820 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:58:45,067-Speed 5171.00 samples/sec Loss 4.4463 LearningRate 0.0693 Epoch: 3 Global Step: 55830 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:58:47,054-Speed 5156.32 samples/sec Loss 4.4722 LearningRate 0.0693 Epoch: 3 Global Step: 55840 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:58:49,064-Speed 5096.85 samples/sec Loss 4.4240 LearningRate 0.0693 Epoch: 3 Global Step: 55850 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:58:51,055-Speed 5144.02 samples/sec Loss 4.4655 LearningRate 0.0693 Epoch: 3 Global Step: 55860 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:58:53,034-Speed 5175.80 samples/sec Loss 4.5559 LearningRate 0.0693 Epoch: 3 Global Step: 55870 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:58:55,019-Speed 5159.81 samples/sec Loss 4.4732 LearningRate 0.0693 Epoch: 3 Global Step: 55880 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:58:56,993-Speed 5190.51 samples/sec Loss 4.4160 LearningRate 0.0693 Epoch: 3 Global Step: 55890 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:58:58,965-Speed 5192.02 samples/sec Loss 4.4354 LearningRate 0.0693 Epoch: 3 Global Step: 55900 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:59:00,946-Speed 5172.58 samples/sec Loss 4.4281 LearningRate 0.0693 Epoch: 3 Global Step: 55910 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:59:02,929-Speed 5165.65 samples/sec Loss 4.4316 LearningRate 0.0693 Epoch: 3 Global Step: 55920 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:59:04,927-Speed 5125.52 samples/sec Loss 4.4255 LearningRate 0.0693 Epoch: 3 Global Step: 55930 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:59:06,900-Speed 5194.24 samples/sec Loss 4.4344 LearningRate 0.0693 Epoch: 3 Global Step: 55940 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:59:08,878-Speed 5178.75 samples/sec Loss 4.4917 LearningRate 0.0693 Epoch: 3 Global Step: 55950 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:59:10,886-Speed 5100.67 samples/sec Loss 4.5131 LearningRate 0.0693 Epoch: 3 Global Step: 55960 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:59:12,873-Speed 5154.95 samples/sec Loss 4.4337 LearningRate 0.0693 Epoch: 3 Global Step: 55970 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:59:14,859-Speed 5156.04 samples/sec Loss 4.5226 LearningRate 0.0693 Epoch: 3 Global Step: 55980 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:59:16,839-Speed 5174.93 samples/sec Loss 4.4760 LearningRate 0.0693 Epoch: 3 Global Step: 55990 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 02:59:18,806-Speed 5207.90 samples/sec Loss 4.4620 LearningRate 0.0693 Epoch: 3 Global Step: 56000 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 02:59:45,547-[lfw][56000]XNorm: 23.227999 Training: 2022-04-11 02:59:45,547-[lfw][56000]Accuracy-Flip: 0.99800+-0.00267 Training: 2022-04-11 02:59:45,548-[lfw][56000]Accuracy-Highest: 0.99800 Training: 2022-04-11 03:00:16,482-[cfp_fp][56000]XNorm: 21.187529 Training: 2022-04-11 03:00:16,482-[cfp_fp][56000]Accuracy-Flip: 0.97100+-0.00676 Training: 2022-04-11 03:00:16,483-[cfp_fp][56000]Accuracy-Highest: 0.97871 Training: 2022-04-11 03:00:43,137-[agedb_30][56000]XNorm: 22.797209 Training: 2022-04-11 03:00:43,138-[agedb_30][56000]Accuracy-Flip: 0.97567+-0.00904 Training: 2022-04-11 03:00:43,138-[agedb_30][56000]Accuracy-Highest: 0.97717 Training: 2022-04-11 03:00:45,131-Speed 118.62 samples/sec Loss 4.3951 LearningRate 0.0693 Epoch: 3 Global Step: 56010 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:00:47,111-Speed 5172.50 samples/sec Loss 4.5110 LearningRate 0.0693 Epoch: 3 Global Step: 56020 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:00:49,080-Speed 5202.22 samples/sec Loss 4.4041 LearningRate 0.0692 Epoch: 3 Global Step: 56030 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:00:51,087-Speed 5104.22 samples/sec Loss 4.4737 LearningRate 0.0692 Epoch: 3 Global Step: 56040 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:00:53,050-Speed 5219.09 samples/sec Loss 4.5383 LearningRate 0.0692 Epoch: 3 Global Step: 56050 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:00:55,029-Speed 5175.79 samples/sec Loss 4.3820 LearningRate 0.0692 Epoch: 3 Global Step: 56060 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:00:57,029-Speed 5121.88 samples/sec Loss 4.4320 LearningRate 0.0692 Epoch: 3 Global Step: 56070 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:00:59,002-Speed 5190.73 samples/sec Loss 4.4291 LearningRate 0.0692 Epoch: 3 Global Step: 56080 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:01:00,987-Speed 5161.41 samples/sec Loss 4.5520 LearningRate 0.0692 Epoch: 3 Global Step: 56090 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:01:02,963-Speed 5183.77 samples/sec Loss 4.5517 LearningRate 0.0692 Epoch: 3 Global Step: 56100 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:01:04,961-Speed 5125.11 samples/sec Loss 4.4738 LearningRate 0.0692 Epoch: 3 Global Step: 56110 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:01:06,946-Speed 5162.04 samples/sec Loss 4.4732 LearningRate 0.0692 Epoch: 3 Global Step: 56120 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:01:08,926-Speed 5171.74 samples/sec Loss 4.5059 LearningRate 0.0692 Epoch: 3 Global Step: 56130 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:01:10,896-Speed 5201.57 samples/sec Loss 4.4906 LearningRate 0.0692 Epoch: 3 Global Step: 56140 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:01:12,888-Speed 5142.57 samples/sec Loss 4.4865 LearningRate 0.0692 Epoch: 3 Global Step: 56150 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:01:14,869-Speed 5170.40 samples/sec Loss 4.4543 LearningRate 0.0692 Epoch: 3 Global Step: 56160 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:01:16,850-Speed 5169.93 samples/sec Loss 4.3506 LearningRate 0.0692 Epoch: 3 Global Step: 56170 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:01:18,824-Speed 5189.33 samples/sec Loss 4.4183 LearningRate 0.0692 Epoch: 3 Global Step: 56180 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:01:20,809-Speed 5160.01 samples/sec Loss 4.4726 LearningRate 0.0692 Epoch: 3 Global Step: 56190 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:01:22,785-Speed 5186.38 samples/sec Loss 4.4407 LearningRate 0.0692 Epoch: 3 Global Step: 56200 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:01:24,772-Speed 5154.72 samples/sec Loss 4.4896 LearningRate 0.0692 Epoch: 3 Global Step: 56210 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:01:26,755-Speed 5165.53 samples/sec Loss 4.5124 LearningRate 0.0692 Epoch: 3 Global Step: 56220 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:01:28,728-Speed 5192.64 samples/sec Loss 4.4375 LearningRate 0.0691 Epoch: 3 Global Step: 56230 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:01:30,713-Speed 5161.01 samples/sec Loss 4.5033 LearningRate 0.0691 Epoch: 3 Global Step: 56240 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:01:32,702-Speed 5150.00 samples/sec Loss 4.3918 LearningRate 0.0691 Epoch: 3 Global Step: 56250 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:01:34,698-Speed 5129.53 samples/sec Loss 4.4270 LearningRate 0.0691 Epoch: 3 Global Step: 56260 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:01:36,690-Speed 5142.02 samples/sec Loss 4.3629 LearningRate 0.0691 Epoch: 3 Global Step: 56270 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:01:38,687-Speed 5129.20 samples/sec Loss 4.4734 LearningRate 0.0691 Epoch: 3 Global Step: 56280 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:01:40,670-Speed 5166.69 samples/sec Loss 4.4255 LearningRate 0.0691 Epoch: 3 Global Step: 56290 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:01:42,642-Speed 5194.60 samples/sec Loss 4.4386 LearningRate 0.0691 Epoch: 3 Global Step: 56300 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:01:44,616-Speed 5190.43 samples/sec Loss 4.3914 LearningRate 0.0691 Epoch: 3 Global Step: 56310 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:01:46,589-Speed 5191.50 samples/sec Loss 4.5132 LearningRate 0.0691 Epoch: 3 Global Step: 56320 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:01:48,558-Speed 5202.06 samples/sec Loss 4.4653 LearningRate 0.0691 Epoch: 3 Global Step: 56330 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:01:50,532-Speed 5189.55 samples/sec Loss 4.4722 LearningRate 0.0691 Epoch: 3 Global Step: 56340 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:01:52,536-Speed 5109.85 samples/sec Loss 4.4734 LearningRate 0.0691 Epoch: 3 Global Step: 56350 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:01:54,522-Speed 5157.95 samples/sec Loss 4.3829 LearningRate 0.0691 Epoch: 3 Global Step: 56360 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:01:56,539-Speed 5079.62 samples/sec Loss 4.4590 LearningRate 0.0691 Epoch: 3 Global Step: 56370 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:01:58,541-Speed 5117.80 samples/sec Loss 4.3720 LearningRate 0.0691 Epoch: 3 Global Step: 56380 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:02:00,534-Speed 5138.38 samples/sec Loss 4.5457 LearningRate 0.0691 Epoch: 3 Global Step: 56390 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:02:02,535-Speed 5119.58 samples/sec Loss 4.3678 LearningRate 0.0691 Epoch: 3 Global Step: 56400 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:02:04,525-Speed 5147.55 samples/sec Loss 4.5195 LearningRate 0.0691 Epoch: 3 Global Step: 56410 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:02:06,505-Speed 5173.19 samples/sec Loss 4.4397 LearningRate 0.0691 Epoch: 3 Global Step: 56420 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:02:08,478-Speed 5191.23 samples/sec Loss 4.5740 LearningRate 0.0690 Epoch: 3 Global Step: 56430 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:02:10,460-Speed 5168.65 samples/sec Loss 4.4017 LearningRate 0.0690 Epoch: 3 Global Step: 56440 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:02:12,433-Speed 5192.40 samples/sec Loss 4.4287 LearningRate 0.0690 Epoch: 3 Global Step: 56450 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:02:14,430-Speed 5129.74 samples/sec Loss 4.5404 LearningRate 0.0690 Epoch: 3 Global Step: 56460 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:02:16,420-Speed 5146.86 samples/sec Loss 4.4232 LearningRate 0.0690 Epoch: 3 Global Step: 56470 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:02:18,413-Speed 5138.17 samples/sec Loss 4.4051 LearningRate 0.0690 Epoch: 3 Global Step: 56480 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:02:20,389-Speed 5185.39 samples/sec Loss 4.5455 LearningRate 0.0690 Epoch: 3 Global Step: 56490 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:02:22,386-Speed 5130.05 samples/sec Loss 4.4129 LearningRate 0.0690 Epoch: 3 Global Step: 56500 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:02:24,402-Speed 5081.62 samples/sec Loss 4.3788 LearningRate 0.0690 Epoch: 3 Global Step: 56510 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:02:26,405-Speed 5114.10 samples/sec Loss 4.4598 LearningRate 0.0690 Epoch: 3 Global Step: 56520 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:02:28,383-Speed 5179.67 samples/sec Loss 4.4151 LearningRate 0.0690 Epoch: 3 Global Step: 56530 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:02:30,381-Speed 5125.86 samples/sec Loss 4.4598 LearningRate 0.0690 Epoch: 3 Global Step: 56540 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:02:32,358-Speed 5179.91 samples/sec Loss 4.5438 LearningRate 0.0690 Epoch: 3 Global Step: 56550 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:02:34,333-Speed 5186.32 samples/sec Loss 4.4995 LearningRate 0.0690 Epoch: 3 Global Step: 56560 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:02:36,309-Speed 5183.99 samples/sec Loss 4.4987 LearningRate 0.0690 Epoch: 3 Global Step: 56570 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:02:38,317-Speed 5100.98 samples/sec Loss 4.4403 LearningRate 0.0690 Epoch: 3 Global Step: 56580 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:02:40,312-Speed 5135.20 samples/sec Loss 4.4227 LearningRate 0.0690 Epoch: 3 Global Step: 56590 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:02:42,310-Speed 5127.69 samples/sec Loss 4.4523 LearningRate 0.0690 Epoch: 3 Global Step: 56600 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:02:44,289-Speed 5176.57 samples/sec Loss 4.3533 LearningRate 0.0690 Epoch: 3 Global Step: 56610 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:02:46,284-Speed 5132.98 samples/sec Loss 4.4108 LearningRate 0.0690 Epoch: 3 Global Step: 56620 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:02:48,263-Speed 5178.04 samples/sec Loss 4.4386 LearningRate 0.0689 Epoch: 3 Global Step: 56630 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:02:50,248-Speed 5159.49 samples/sec Loss 4.4594 LearningRate 0.0689 Epoch: 3 Global Step: 56640 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:02:52,232-Speed 5163.20 samples/sec Loss 4.5992 LearningRate 0.0689 Epoch: 3 Global Step: 56650 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:02:54,214-Speed 5167.48 samples/sec Loss 4.4690 LearningRate 0.0689 Epoch: 3 Global Step: 56660 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:02:56,204-Speed 5148.67 samples/sec Loss 4.3626 LearningRate 0.0689 Epoch: 3 Global Step: 56670 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:02:58,173-Speed 5201.34 samples/sec Loss 4.4144 LearningRate 0.0689 Epoch: 3 Global Step: 56680 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:03:00,160-Speed 5155.19 samples/sec Loss 4.5098 LearningRate 0.0689 Epoch: 3 Global Step: 56690 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:03:02,141-Speed 5172.30 samples/sec Loss 4.5036 LearningRate 0.0689 Epoch: 3 Global Step: 56700 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:03:04,124-Speed 5163.34 samples/sec Loss 4.5022 LearningRate 0.0689 Epoch: 3 Global Step: 56710 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:03:06,131-Speed 5105.16 samples/sec Loss 4.5316 LearningRate 0.0689 Epoch: 3 Global Step: 56720 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:03:08,102-Speed 5198.17 samples/sec Loss 4.4034 LearningRate 0.0689 Epoch: 3 Global Step: 56730 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:03:10,078-Speed 5182.25 samples/sec Loss 4.4193 LearningRate 0.0689 Epoch: 3 Global Step: 56740 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:03:12,086-Speed 5102.44 samples/sec Loss 4.4610 LearningRate 0.0689 Epoch: 3 Global Step: 56750 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:03:14,094-Speed 5100.22 samples/sec Loss 4.5136 LearningRate 0.0689 Epoch: 3 Global Step: 56760 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:03:16,079-Speed 5160.62 samples/sec Loss 4.4556 LearningRate 0.0689 Epoch: 3 Global Step: 56770 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:03:18,052-Speed 5191.22 samples/sec Loss 4.3844 LearningRate 0.0689 Epoch: 3 Global Step: 56780 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:03:20,033-Speed 5172.47 samples/sec Loss 4.4334 LearningRate 0.0689 Epoch: 3 Global Step: 56790 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:03:22,003-Speed 5199.25 samples/sec Loss 4.5373 LearningRate 0.0689 Epoch: 3 Global Step: 56800 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:03:23,971-Speed 5205.63 samples/sec Loss 4.4229 LearningRate 0.0689 Epoch: 3 Global Step: 56810 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:03:25,960-Speed 5148.03 samples/sec Loss 4.4565 LearningRate 0.0689 Epoch: 3 Global Step: 56820 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:03:27,941-Speed 5172.94 samples/sec Loss 4.5544 LearningRate 0.0688 Epoch: 3 Global Step: 56830 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:03:29,929-Speed 5151.80 samples/sec Loss 4.4348 LearningRate 0.0688 Epoch: 3 Global Step: 56840 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:03:31,898-Speed 5201.73 samples/sec Loss 4.4270 LearningRate 0.0688 Epoch: 3 Global Step: 56850 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:03:33,883-Speed 5162.78 samples/sec Loss 4.5891 LearningRate 0.0688 Epoch: 3 Global Step: 56860 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:03:35,868-Speed 5159.01 samples/sec Loss 4.5086 LearningRate 0.0688 Epoch: 3 Global Step: 56870 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:03:37,871-Speed 5115.16 samples/sec Loss 4.4552 LearningRate 0.0688 Epoch: 3 Global Step: 56880 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:03:39,869-Speed 5124.95 samples/sec Loss 4.4344 LearningRate 0.0688 Epoch: 3 Global Step: 56890 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:03:41,848-Speed 5177.70 samples/sec Loss 4.5079 LearningRate 0.0688 Epoch: 3 Global Step: 56900 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:03:43,810-Speed 5221.29 samples/sec Loss 4.4464 LearningRate 0.0688 Epoch: 3 Global Step: 56910 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:03:45,788-Speed 5176.50 samples/sec Loss 4.5399 LearningRate 0.0688 Epoch: 3 Global Step: 56920 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:03:47,777-Speed 5152.01 samples/sec Loss 4.4515 LearningRate 0.0688 Epoch: 3 Global Step: 56930 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:03:49,752-Speed 5185.33 samples/sec Loss 4.4445 LearningRate 0.0688 Epoch: 3 Global Step: 56940 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:03:51,727-Speed 5187.61 samples/sec Loss 4.4730 LearningRate 0.0688 Epoch: 3 Global Step: 56950 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:03:53,698-Speed 5197.79 samples/sec Loss 4.5223 LearningRate 0.0688 Epoch: 3 Global Step: 56960 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:03:55,679-Speed 5169.66 samples/sec Loss 4.4144 LearningRate 0.0688 Epoch: 3 Global Step: 56970 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:03:57,654-Speed 5186.67 samples/sec Loss 4.3950 LearningRate 0.0688 Epoch: 3 Global Step: 56980 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:03:59,636-Speed 5167.48 samples/sec Loss 4.4940 LearningRate 0.0688 Epoch: 3 Global Step: 56990 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:04:01,622-Speed 5158.77 samples/sec Loss 4.4335 LearningRate 0.0688 Epoch: 3 Global Step: 57000 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-11 03:04:03,596-Speed 5188.10 samples/sec Loss 4.4530 LearningRate 0.0688 Epoch: 3 Global Step: 57010 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:04:05,573-Speed 5181.89 samples/sec Loss 4.4019 LearningRate 0.0688 Epoch: 3 Global Step: 57020 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:04:07,572-Speed 5122.85 samples/sec Loss 4.5165 LearningRate 0.0688 Epoch: 3 Global Step: 57030 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:04:09,545-Speed 5193.49 samples/sec Loss 4.4756 LearningRate 0.0687 Epoch: 3 Global Step: 57040 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:04:11,521-Speed 5183.26 samples/sec Loss 4.4651 LearningRate 0.0687 Epoch: 3 Global Step: 57050 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:04:13,520-Speed 5126.17 samples/sec Loss 4.4081 LearningRate 0.0687 Epoch: 3 Global Step: 57060 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:04:15,492-Speed 5194.69 samples/sec Loss 4.5056 LearningRate 0.0687 Epoch: 3 Global Step: 57070 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:04:17,476-Speed 5163.02 samples/sec Loss 4.4704 LearningRate 0.0687 Epoch: 3 Global Step: 57080 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:04:19,459-Speed 5164.90 samples/sec Loss 4.4471 LearningRate 0.0687 Epoch: 3 Global Step: 57090 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:04:21,429-Speed 5200.85 samples/sec Loss 4.5021 LearningRate 0.0687 Epoch: 3 Global Step: 57100 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:04:23,397-Speed 5205.56 samples/sec Loss 4.4608 LearningRate 0.0687 Epoch: 3 Global Step: 57110 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:04:25,371-Speed 5189.45 samples/sec Loss 4.4931 LearningRate 0.0687 Epoch: 3 Global Step: 57120 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:04:27,367-Speed 5130.45 samples/sec Loss 4.4846 LearningRate 0.0687 Epoch: 3 Global Step: 57130 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:04:29,341-Speed 5190.44 samples/sec Loss 4.4486 LearningRate 0.0687 Epoch: 3 Global Step: 57140 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:04:31,311-Speed 5200.48 samples/sec Loss 4.4804 LearningRate 0.0687 Epoch: 3 Global Step: 57150 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:04:33,285-Speed 5188.48 samples/sec Loss 4.4069 LearningRate 0.0687 Epoch: 3 Global Step: 57160 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:04:35,285-Speed 5121.65 samples/sec Loss 4.5039 LearningRate 0.0687 Epoch: 3 Global Step: 57170 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:04:37,272-Speed 5156.12 samples/sec Loss 4.4484 LearningRate 0.0687 Epoch: 3 Global Step: 57180 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:04:39,296-Speed 5061.10 samples/sec Loss 4.4993 LearningRate 0.0687 Epoch: 3 Global Step: 57190 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:04:41,289-Speed 5138.50 samples/sec Loss 4.3623 LearningRate 0.0687 Epoch: 3 Global Step: 57200 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:04:43,266-Speed 5180.35 samples/sec Loss 4.4677 LearningRate 0.0687 Epoch: 3 Global Step: 57210 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-11 03:04:45,246-Speed 5174.46 samples/sec Loss 4.5454 LearningRate 0.0687 Epoch: 3 Global Step: 57220 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:04:47,228-Speed 5167.31 samples/sec Loss 4.4418 LearningRate 0.0687 Epoch: 3 Global Step: 57230 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:04:49,241-Speed 5089.85 samples/sec Loss 4.4324 LearningRate 0.0686 Epoch: 3 Global Step: 57240 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:04:51,254-Speed 5087.49 samples/sec Loss 4.3670 LearningRate 0.0686 Epoch: 3 Global Step: 57250 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:04:53,229-Speed 5188.07 samples/sec Loss 4.3891 LearningRate 0.0686 Epoch: 3 Global Step: 57260 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:04:55,198-Speed 5201.43 samples/sec Loss 4.4112 LearningRate 0.0686 Epoch: 3 Global Step: 57270 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:04:57,191-Speed 5141.67 samples/sec Loss 4.5054 LearningRate 0.0686 Epoch: 3 Global Step: 57280 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:04:59,168-Speed 5179.15 samples/sec Loss 4.4669 LearningRate 0.0686 Epoch: 3 Global Step: 57290 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:05:01,144-Speed 5185.01 samples/sec Loss 4.3802 LearningRate 0.0686 Epoch: 3 Global Step: 57300 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:05:03,125-Speed 5170.90 samples/sec Loss 4.4763 LearningRate 0.0686 Epoch: 3 Global Step: 57310 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:05:05,094-Speed 5203.30 samples/sec Loss 4.5179 LearningRate 0.0686 Epoch: 3 Global Step: 57320 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:05:07,072-Speed 5177.63 samples/sec Loss 4.4488 LearningRate 0.0686 Epoch: 3 Global Step: 57330 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:05:09,081-Speed 5097.84 samples/sec Loss 4.4571 LearningRate 0.0686 Epoch: 3 Global Step: 57340 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:05:11,076-Speed 5135.19 samples/sec Loss 4.4665 LearningRate 0.0686 Epoch: 3 Global Step: 57350 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:05:13,064-Speed 5151.98 samples/sec Loss 4.3976 LearningRate 0.0686 Epoch: 3 Global Step: 57360 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:05:15,073-Speed 5099.69 samples/sec Loss 4.4862 LearningRate 0.0686 Epoch: 3 Global Step: 57370 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:05:17,055-Speed 5169.88 samples/sec Loss 4.4422 LearningRate 0.0686 Epoch: 3 Global Step: 57380 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:05:19,027-Speed 5193.71 samples/sec Loss 4.4449 LearningRate 0.0686 Epoch: 3 Global Step: 57390 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:05:21,008-Speed 5169.22 samples/sec Loss 4.4297 LearningRate 0.0686 Epoch: 3 Global Step: 57400 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:05:23,000-Speed 5142.64 samples/sec Loss 4.5190 LearningRate 0.0686 Epoch: 3 Global Step: 57410 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:05:24,989-Speed 5149.66 samples/sec Loss 4.3932 LearningRate 0.0686 Epoch: 3 Global Step: 57420 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:05:26,984-Speed 5136.50 samples/sec Loss 4.4307 LearningRate 0.0686 Epoch: 3 Global Step: 57430 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:05:28,973-Speed 5148.47 samples/sec Loss 4.5411 LearningRate 0.0685 Epoch: 3 Global Step: 57440 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:05:30,942-Speed 5201.93 samples/sec Loss 4.3614 LearningRate 0.0685 Epoch: 3 Global Step: 57450 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:05:32,946-Speed 5113.56 samples/sec Loss 4.5289 LearningRate 0.0685 Epoch: 3 Global Step: 57460 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:05:34,927-Speed 5171.61 samples/sec Loss 4.4400 LearningRate 0.0685 Epoch: 3 Global Step: 57470 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:05:36,943-Speed 5081.30 samples/sec Loss 4.4488 LearningRate 0.0685 Epoch: 3 Global Step: 57480 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:05:38,941-Speed 5126.53 samples/sec Loss 4.4780 LearningRate 0.0685 Epoch: 3 Global Step: 57490 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:05:40,915-Speed 5189.54 samples/sec Loss 4.3684 LearningRate 0.0685 Epoch: 3 Global Step: 57500 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:05:42,887-Speed 5193.17 samples/sec Loss 4.4161 LearningRate 0.0685 Epoch: 3 Global Step: 57510 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:05:44,875-Speed 5152.10 samples/sec Loss 4.4390 LearningRate 0.0685 Epoch: 3 Global Step: 57520 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:05:46,845-Speed 5200.73 samples/sec Loss 4.3952 LearningRate 0.0685 Epoch: 3 Global Step: 57530 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:05:48,828-Speed 5166.44 samples/sec Loss 4.4491 LearningRate 0.0685 Epoch: 3 Global Step: 57540 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:05:50,804-Speed 5182.41 samples/sec Loss 4.4277 LearningRate 0.0685 Epoch: 3 Global Step: 57550 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:05:52,787-Speed 5165.59 samples/sec Loss 4.4886 LearningRate 0.0685 Epoch: 3 Global Step: 57560 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:05:54,795-Speed 5102.32 samples/sec Loss 4.5111 LearningRate 0.0685 Epoch: 3 Global Step: 57570 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:05:56,768-Speed 5193.24 samples/sec Loss 4.4356 LearningRate 0.0685 Epoch: 3 Global Step: 57580 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:05:58,752-Speed 5162.95 samples/sec Loss 4.5172 LearningRate 0.0685 Epoch: 3 Global Step: 57590 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:06:00,733-Speed 5170.06 samples/sec Loss 4.4794 LearningRate 0.0685 Epoch: 3 Global Step: 57600 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:06:02,724-Speed 5145.02 samples/sec Loss 4.3635 LearningRate 0.0685 Epoch: 3 Global Step: 57610 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:06:04,712-Speed 5150.90 samples/sec Loss 4.4138 LearningRate 0.0685 Epoch: 3 Global Step: 57620 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:06:06,692-Speed 5174.76 samples/sec Loss 4.5123 LearningRate 0.0685 Epoch: 3 Global Step: 57630 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:06:08,672-Speed 5173.82 samples/sec Loss 4.4922 LearningRate 0.0684 Epoch: 3 Global Step: 57640 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:06:10,655-Speed 5163.08 samples/sec Loss 4.4386 LearningRate 0.0684 Epoch: 3 Global Step: 57650 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:06:12,635-Speed 5174.81 samples/sec Loss 4.4180 LearningRate 0.0684 Epoch: 3 Global Step: 57660 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:06:14,617-Speed 5169.55 samples/sec Loss 4.4749 LearningRate 0.0684 Epoch: 3 Global Step: 57670 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:06:16,594-Speed 5180.19 samples/sec Loss 4.4416 LearningRate 0.0684 Epoch: 3 Global Step: 57680 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:06:18,568-Speed 5188.87 samples/sec Loss 4.4494 LearningRate 0.0684 Epoch: 3 Global Step: 57690 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:06:20,552-Speed 5164.27 samples/sec Loss 4.4710 LearningRate 0.0684 Epoch: 3 Global Step: 57700 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:06:22,545-Speed 5139.97 samples/sec Loss 4.4097 LearningRate 0.0684 Epoch: 3 Global Step: 57710 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:06:24,534-Speed 5149.50 samples/sec Loss 4.4066 LearningRate 0.0684 Epoch: 3 Global Step: 57720 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:06:26,507-Speed 5192.55 samples/sec Loss 4.5408 LearningRate 0.0684 Epoch: 3 Global Step: 57730 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:06:28,481-Speed 5189.42 samples/sec Loss 4.5231 LearningRate 0.0684 Epoch: 3 Global Step: 57740 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:06:30,466-Speed 5158.82 samples/sec Loss 4.4432 LearningRate 0.0684 Epoch: 3 Global Step: 57750 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:06:32,454-Speed 5153.59 samples/sec Loss 4.3991 LearningRate 0.0684 Epoch: 3 Global Step: 57760 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:06:34,438-Speed 5162.74 samples/sec Loss 4.4761 LearningRate 0.0684 Epoch: 3 Global Step: 57770 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:06:36,410-Speed 5193.83 samples/sec Loss 4.4792 LearningRate 0.0684 Epoch: 3 Global Step: 57780 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:06:38,387-Speed 5183.39 samples/sec Loss 4.5228 LearningRate 0.0684 Epoch: 3 Global Step: 57790 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:06:40,369-Speed 5168.12 samples/sec Loss 4.3142 LearningRate 0.0684 Epoch: 3 Global Step: 57800 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:06:42,380-Speed 5093.87 samples/sec Loss 4.3776 LearningRate 0.0684 Epoch: 3 Global Step: 57810 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:06:44,365-Speed 5159.34 samples/sec Loss 4.4187 LearningRate 0.0684 Epoch: 3 Global Step: 57820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:06:46,349-Speed 5162.27 samples/sec Loss 4.4195 LearningRate 0.0684 Epoch: 3 Global Step: 57830 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:06:48,336-Speed 5156.38 samples/sec Loss 4.5089 LearningRate 0.0683 Epoch: 3 Global Step: 57840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:06:50,336-Speed 5120.69 samples/sec Loss 4.4394 LearningRate 0.0683 Epoch: 3 Global Step: 57850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:06:52,321-Speed 5162.21 samples/sec Loss 4.4261 LearningRate 0.0683 Epoch: 3 Global Step: 57860 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:06:54,295-Speed 5188.11 samples/sec Loss 4.4229 LearningRate 0.0683 Epoch: 3 Global Step: 57870 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:06:56,281-Speed 5157.36 samples/sec Loss 4.4022 LearningRate 0.0683 Epoch: 3 Global Step: 57880 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:06:58,273-Speed 5142.71 samples/sec Loss 4.4458 LearningRate 0.0683 Epoch: 3 Global Step: 57890 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-04-11 03:07:00,251-Speed 5178.62 samples/sec Loss 4.4315 LearningRate 0.0683 Epoch: 3 Global Step: 57900 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:07:02,239-Speed 5154.42 samples/sec Loss 4.4141 LearningRate 0.0683 Epoch: 3 Global Step: 57910 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:07:04,215-Speed 5181.82 samples/sec Loss 4.4152 LearningRate 0.0683 Epoch: 3 Global Step: 57920 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:07:06,189-Speed 5190.06 samples/sec Loss 4.4195 LearningRate 0.0683 Epoch: 3 Global Step: 57930 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:07:08,162-Speed 5190.48 samples/sec Loss 4.4040 LearningRate 0.0683 Epoch: 3 Global Step: 57940 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:07:10,176-Speed 5086.01 samples/sec Loss 4.3706 LearningRate 0.0683 Epoch: 3 Global Step: 57950 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:07:12,159-Speed 5165.67 samples/sec Loss 4.4068 LearningRate 0.0683 Epoch: 3 Global Step: 57960 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:07:14,138-Speed 5178.44 samples/sec Loss 4.3548 LearningRate 0.0683 Epoch: 3 Global Step: 57970 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:07:16,121-Speed 5163.35 samples/sec Loss 4.4396 LearningRate 0.0683 Epoch: 3 Global Step: 57980 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:07:18,102-Speed 5172.78 samples/sec Loss 4.4084 LearningRate 0.0683 Epoch: 3 Global Step: 57990 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:07:20,069-Speed 5207.78 samples/sec Loss 4.3439 LearningRate 0.0683 Epoch: 3 Global Step: 58000 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:07:46,531-[lfw][58000]XNorm: 21.684254 Training: 2022-04-11 03:07:46,532-[lfw][58000]Accuracy-Flip: 0.99633+-0.00364 Training: 2022-04-11 03:07:46,532-[lfw][58000]Accuracy-Highest: 0.99800 Training: 2022-04-11 03:08:17,134-[cfp_fp][58000]XNorm: 19.165293 Training: 2022-04-11 03:08:17,135-[cfp_fp][58000]Accuracy-Flip: 0.97743+-0.00549 Training: 2022-04-11 03:08:17,135-[cfp_fp][58000]Accuracy-Highest: 0.97871 Training: 2022-04-11 03:08:43,497-[agedb_30][58000]XNorm: 21.580797 Training: 2022-04-11 03:08:43,497-[agedb_30][58000]Accuracy-Flip: 0.97717+-0.00792 Training: 2022-04-11 03:08:43,498-[agedb_30][58000]Accuracy-Highest: 0.97717 Training: 2022-04-11 03:08:45,483-Speed 119.89 samples/sec Loss 4.4003 LearningRate 0.0683 Epoch: 3 Global Step: 58010 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:08:47,474-Speed 5143.90 samples/sec Loss 4.4352 LearningRate 0.0683 Epoch: 3 Global Step: 58020 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-11 03:08:49,429-Speed 5241.16 samples/sec Loss 4.4088 LearningRate 0.0683 Epoch: 3 Global Step: 58030 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 03:08:51,392-Speed 5218.38 samples/sec Loss 4.4109 LearningRate 0.0682 Epoch: 3 Global Step: 58040 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 03:08:53,366-Speed 5188.63 samples/sec Loss 4.4305 LearningRate 0.0682 Epoch: 3 Global Step: 58050 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 03:08:55,342-Speed 5184.76 samples/sec Loss 4.4703 LearningRate 0.0682 Epoch: 3 Global Step: 58060 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 03:08:57,313-Speed 5196.86 samples/sec Loss 4.5283 LearningRate 0.0682 Epoch: 3 Global Step: 58070 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 03:08:59,299-Speed 5157.36 samples/sec Loss 4.5484 LearningRate 0.0682 Epoch: 3 Global Step: 58080 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 03:09:01,276-Speed 5181.35 samples/sec Loss 4.4893 LearningRate 0.0682 Epoch: 3 Global Step: 58090 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 03:09:03,250-Speed 5187.81 samples/sec Loss 4.5671 LearningRate 0.0682 Epoch: 3 Global Step: 58100 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 03:09:05,235-Speed 5161.55 samples/sec Loss 4.5196 LearningRate 0.0682 Epoch: 3 Global Step: 58110 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 03:09:07,215-Speed 5173.94 samples/sec Loss 4.4402 LearningRate 0.0682 Epoch: 3 Global Step: 58120 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-11 03:09:09,196-Speed 5170.87 samples/sec Loss 4.4124 LearningRate 0.0682 Epoch: 3 Global Step: 58130 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:09:11,203-Speed 5103.47 samples/sec Loss 4.4474 LearningRate 0.0682 Epoch: 3 Global Step: 58140 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:09:13,178-Speed 5185.53 samples/sec Loss 4.3714 LearningRate 0.0682 Epoch: 3 Global Step: 58150 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:09:15,162-Speed 5163.38 samples/sec Loss 4.4911 LearningRate 0.0682 Epoch: 3 Global Step: 58160 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:09:17,149-Speed 5155.44 samples/sec Loss 4.3765 LearningRate 0.0682 Epoch: 3 Global Step: 58170 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:09:19,128-Speed 5175.42 samples/sec Loss 4.4135 LearningRate 0.0682 Epoch: 3 Global Step: 58180 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:09:21,134-Speed 5106.55 samples/sec Loss 4.4592 LearningRate 0.0682 Epoch: 3 Global Step: 58190 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:09:23,124-Speed 5151.87 samples/sec Loss 4.5604 LearningRate 0.0682 Epoch: 3 Global Step: 58200 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:09:25,092-Speed 5202.40 samples/sec Loss 4.3897 LearningRate 0.0682 Epoch: 3 Global Step: 58210 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:09:27,096-Speed 5112.09 samples/sec Loss 4.4580 LearningRate 0.0682 Epoch: 3 Global Step: 58220 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:09:29,072-Speed 5184.74 samples/sec Loss 4.4817 LearningRate 0.0682 Epoch: 3 Global Step: 58230 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:09:31,048-Speed 5183.92 samples/sec Loss 4.4267 LearningRate 0.0682 Epoch: 3 Global Step: 58240 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:09:33,057-Speed 5099.14 samples/sec Loss 4.3593 LearningRate 0.0681 Epoch: 3 Global Step: 58250 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:09:35,036-Speed 5177.09 samples/sec Loss 4.4139 LearningRate 0.0681 Epoch: 3 Global Step: 58260 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:09:37,008-Speed 5193.35 samples/sec Loss 4.3787 LearningRate 0.0681 Epoch: 3 Global Step: 58270 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:09:38,985-Speed 5180.33 samples/sec Loss 4.4225 LearningRate 0.0681 Epoch: 3 Global Step: 58280 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:09:40,950-Speed 5213.67 samples/sec Loss 4.4453 LearningRate 0.0681 Epoch: 3 Global Step: 58290 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:09:42,921-Speed 5196.78 samples/sec Loss 4.4627 LearningRate 0.0681 Epoch: 3 Global Step: 58300 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:09:44,891-Speed 5200.06 samples/sec Loss 4.5652 LearningRate 0.0681 Epoch: 3 Global Step: 58310 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:09:46,862-Speed 5196.62 samples/sec Loss 4.4491 LearningRate 0.0681 Epoch: 3 Global Step: 58320 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:09:48,832-Speed 5201.98 samples/sec Loss 4.4368 LearningRate 0.0681 Epoch: 3 Global Step: 58330 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:09:50,819-Speed 5153.99 samples/sec Loss 4.4962 LearningRate 0.0681 Epoch: 3 Global Step: 58340 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:09:52,808-Speed 5150.95 samples/sec Loss 4.4634 LearningRate 0.0681 Epoch: 3 Global Step: 58350 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:09:54,777-Speed 5202.53 samples/sec Loss 4.4182 LearningRate 0.0681 Epoch: 3 Global Step: 58360 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:09:56,755-Speed 5181.69 samples/sec Loss 4.3682 LearningRate 0.0681 Epoch: 3 Global Step: 58370 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:09:58,729-Speed 5187.66 samples/sec Loss 4.4929 LearningRate 0.0681 Epoch: 3 Global Step: 58380 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:10:00,724-Speed 5134.35 samples/sec Loss 4.4686 LearningRate 0.0681 Epoch: 3 Global Step: 58390 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:10:02,729-Speed 5108.80 samples/sec Loss 4.3506 LearningRate 0.0681 Epoch: 3 Global Step: 58400 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:10:04,716-Speed 5155.52 samples/sec Loss 4.4028 LearningRate 0.0681 Epoch: 3 Global Step: 58410 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:10:06,699-Speed 5165.48 samples/sec Loss 4.4604 LearningRate 0.0681 Epoch: 3 Global Step: 58420 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:10:08,686-Speed 5155.31 samples/sec Loss 4.5266 LearningRate 0.0681 Epoch: 3 Global Step: 58430 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:10:10,684-Speed 5126.77 samples/sec Loss 4.4382 LearningRate 0.0681 Epoch: 3 Global Step: 58440 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:10:12,678-Speed 5136.86 samples/sec Loss 4.4662 LearningRate 0.0680 Epoch: 3 Global Step: 58450 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:10:14,651-Speed 5193.88 samples/sec Loss 4.3477 LearningRate 0.0680 Epoch: 3 Global Step: 58460 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:10:16,629-Speed 5177.42 samples/sec Loss 4.4153 LearningRate 0.0680 Epoch: 3 Global Step: 58470 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:10:18,617-Speed 5153.91 samples/sec Loss 4.4206 LearningRate 0.0680 Epoch: 3 Global Step: 58480 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:10:20,611-Speed 5135.28 samples/sec Loss 4.3460 LearningRate 0.0680 Epoch: 3 Global Step: 58490 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:10:22,590-Speed 5177.37 samples/sec Loss 4.4515 LearningRate 0.0680 Epoch: 3 Global Step: 58500 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:10:24,576-Speed 5156.26 samples/sec Loss 4.4085 LearningRate 0.0680 Epoch: 3 Global Step: 58510 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:10:26,559-Speed 5165.65 samples/sec Loss 4.3872 LearningRate 0.0680 Epoch: 3 Global Step: 58520 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:10:28,542-Speed 5167.42 samples/sec Loss 4.4898 LearningRate 0.0680 Epoch: 3 Global Step: 58530 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:10:30,513-Speed 5196.88 samples/sec Loss 4.4613 LearningRate 0.0680 Epoch: 3 Global Step: 58540 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:10:32,499-Speed 5157.37 samples/sec Loss 4.3823 LearningRate 0.0680 Epoch: 3 Global Step: 58550 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:10:34,471-Speed 5195.28 samples/sec Loss 4.4530 LearningRate 0.0680 Epoch: 3 Global Step: 58560 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:10:36,447-Speed 5183.93 samples/sec Loss 4.3973 LearningRate 0.0680 Epoch: 3 Global Step: 58570 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:10:38,424-Speed 5182.09 samples/sec Loss 4.3919 LearningRate 0.0680 Epoch: 3 Global Step: 58580 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:10:40,401-Speed 5180.91 samples/sec Loss 4.4755 LearningRate 0.0680 Epoch: 3 Global Step: 58590 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:10:42,370-Speed 5200.85 samples/sec Loss 4.3996 LearningRate 0.0680 Epoch: 3 Global Step: 58600 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:10:44,349-Speed 5175.61 samples/sec Loss 4.4416 LearningRate 0.0680 Epoch: 3 Global Step: 58610 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:10:46,336-Speed 5156.09 samples/sec Loss 4.4046 LearningRate 0.0680 Epoch: 3 Global Step: 58620 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:10:48,342-Speed 5106.52 samples/sec Loss 4.3550 LearningRate 0.0680 Epoch: 3 Global Step: 58630 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:10:50,338-Speed 5133.32 samples/sec Loss 4.4251 LearningRate 0.0680 Epoch: 3 Global Step: 58640 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:10:52,309-Speed 5195.37 samples/sec Loss 4.5224 LearningRate 0.0679 Epoch: 3 Global Step: 58650 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:10:54,291-Speed 5168.80 samples/sec Loss 4.4256 LearningRate 0.0679 Epoch: 3 Global Step: 58660 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:10:56,264-Speed 5191.55 samples/sec Loss 4.4027 LearningRate 0.0679 Epoch: 3 Global Step: 58670 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:10:58,235-Speed 5198.13 samples/sec Loss 4.5420 LearningRate 0.0679 Epoch: 3 Global Step: 58680 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:11:00,205-Speed 5199.09 samples/sec Loss 4.4549 LearningRate 0.0679 Epoch: 3 Global Step: 58690 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:11:02,177-Speed 5195.22 samples/sec Loss 4.4120 LearningRate 0.0679 Epoch: 3 Global Step: 58700 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:11:04,163-Speed 5158.24 samples/sec Loss 4.4915 LearningRate 0.0679 Epoch: 3 Global Step: 58710 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:11:06,140-Speed 5180.23 samples/sec Loss 4.4191 LearningRate 0.0679 Epoch: 3 Global Step: 58720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:11:08,108-Speed 5204.46 samples/sec Loss 4.3712 LearningRate 0.0679 Epoch: 3 Global Step: 58730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:11:10,088-Speed 5172.41 samples/sec Loss 4.3528 LearningRate 0.0679 Epoch: 3 Global Step: 58740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:11:12,076-Speed 5155.23 samples/sec Loss 4.4522 LearningRate 0.0679 Epoch: 3 Global Step: 58750 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:11:14,068-Speed 5141.07 samples/sec Loss 4.4309 LearningRate 0.0679 Epoch: 3 Global Step: 58760 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:11:16,061-Speed 5141.07 samples/sec Loss 4.5482 LearningRate 0.0679 Epoch: 3 Global Step: 58770 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:11:18,044-Speed 5164.77 samples/sec Loss 4.4156 LearningRate 0.0679 Epoch: 3 Global Step: 58780 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:11:20,020-Speed 5183.31 samples/sec Loss 4.3520 LearningRate 0.0679 Epoch: 3 Global Step: 58790 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:11:21,991-Speed 5196.86 samples/sec Loss 4.4916 LearningRate 0.0679 Epoch: 3 Global Step: 58800 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:11:23,972-Speed 5172.32 samples/sec Loss 4.4723 LearningRate 0.0679 Epoch: 3 Global Step: 58810 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:11:25,951-Speed 5175.58 samples/sec Loss 4.4274 LearningRate 0.0679 Epoch: 3 Global Step: 58820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:11:27,919-Speed 5203.44 samples/sec Loss 4.3735 LearningRate 0.0679 Epoch: 3 Global Step: 58830 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:11:29,928-Speed 5097.86 samples/sec Loss 4.2982 LearningRate 0.0679 Epoch: 3 Global Step: 58840 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:11:31,919-Speed 5145.21 samples/sec Loss 4.4294 LearningRate 0.0678 Epoch: 3 Global Step: 58850 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:11:33,929-Speed 5097.20 samples/sec Loss 4.4636 LearningRate 0.0678 Epoch: 3 Global Step: 58860 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:11:35,964-Speed 5035.55 samples/sec Loss 4.4305 LearningRate 0.0678 Epoch: 3 Global Step: 58870 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:11:37,956-Speed 5141.13 samples/sec Loss 4.4135 LearningRate 0.0678 Epoch: 3 Global Step: 58880 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:11:39,947-Speed 5146.25 samples/sec Loss 4.4353 LearningRate 0.0678 Epoch: 3 Global Step: 58890 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:11:41,919-Speed 5193.85 samples/sec Loss 4.3231 LearningRate 0.0678 Epoch: 3 Global Step: 58900 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:11:43,918-Speed 5124.20 samples/sec Loss 4.3060 LearningRate 0.0678 Epoch: 3 Global Step: 58910 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:11:45,906-Speed 5152.10 samples/sec Loss 4.5070 LearningRate 0.0678 Epoch: 3 Global Step: 58920 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:11:47,879-Speed 5192.72 samples/sec Loss 4.4354 LearningRate 0.0678 Epoch: 3 Global Step: 58930 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:11:49,860-Speed 5169.71 samples/sec Loss 4.4914 LearningRate 0.0678 Epoch: 3 Global Step: 58940 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:11:51,835-Speed 5187.67 samples/sec Loss 4.4499 LearningRate 0.0678 Epoch: 3 Global Step: 58950 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:11:53,810-Speed 5187.29 samples/sec Loss 4.3778 LearningRate 0.0678 Epoch: 3 Global Step: 58960 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:11:55,782-Speed 5195.56 samples/sec Loss 4.3648 LearningRate 0.0678 Epoch: 3 Global Step: 58970 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:11:57,767-Speed 5159.26 samples/sec Loss 4.3793 LearningRate 0.0678 Epoch: 3 Global Step: 58980 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:11:59,746-Speed 5175.10 samples/sec Loss 4.4662 LearningRate 0.0678 Epoch: 3 Global Step: 58990 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:12:01,734-Speed 5152.78 samples/sec Loss 4.4707 LearningRate 0.0678 Epoch: 3 Global Step: 59000 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:12:03,708-Speed 5189.79 samples/sec Loss 4.4542 LearningRate 0.0678 Epoch: 3 Global Step: 59010 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:12:05,698-Speed 5146.26 samples/sec Loss 4.4134 LearningRate 0.0678 Epoch: 3 Global Step: 59020 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:12:07,669-Speed 5198.76 samples/sec Loss 4.4063 LearningRate 0.0678 Epoch: 3 Global Step: 59030 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:12:09,645-Speed 5182.77 samples/sec Loss 4.5530 LearningRate 0.0678 Epoch: 3 Global Step: 59040 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:12:11,640-Speed 5133.17 samples/sec Loss 4.3485 LearningRate 0.0678 Epoch: 3 Global Step: 59050 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:12:13,619-Speed 5175.94 samples/sec Loss 4.4912 LearningRate 0.0677 Epoch: 3 Global Step: 59060 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:12:15,592-Speed 5194.01 samples/sec Loss 4.4257 LearningRate 0.0677 Epoch: 3 Global Step: 59070 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:12:17,603-Speed 5094.95 samples/sec Loss 4.4405 LearningRate 0.0677 Epoch: 3 Global Step: 59080 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:12:19,602-Speed 5124.10 samples/sec Loss 4.5510 LearningRate 0.0677 Epoch: 3 Global Step: 59090 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:12:21,600-Speed 5125.83 samples/sec Loss 4.3835 LearningRate 0.0677 Epoch: 3 Global Step: 59100 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:12:23,601-Speed 5119.52 samples/sec Loss 4.4181 LearningRate 0.0677 Epoch: 3 Global Step: 59110 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:12:25,578-Speed 5181.80 samples/sec Loss 4.4391 LearningRate 0.0677 Epoch: 3 Global Step: 59120 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:12:27,557-Speed 5176.49 samples/sec Loss 4.5081 LearningRate 0.0677 Epoch: 3 Global Step: 59130 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-04-11 03:12:29,534-Speed 5181.23 samples/sec Loss 4.3829 LearningRate 0.0677 Epoch: 3 Global Step: 59140 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:12:31,510-Speed 5183.44 samples/sec Loss 4.4371 LearningRate 0.0677 Epoch: 3 Global Step: 59150 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:12:33,498-Speed 5154.01 samples/sec Loss 4.3948 LearningRate 0.0677 Epoch: 3 Global Step: 59160 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:12:35,492-Speed 5136.55 samples/sec Loss 4.4382 LearningRate 0.0677 Epoch: 3 Global Step: 59170 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:12:37,465-Speed 5191.11 samples/sec Loss 4.4255 LearningRate 0.0677 Epoch: 3 Global Step: 59180 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:12:39,467-Speed 5115.58 samples/sec Loss 4.4981 LearningRate 0.0677 Epoch: 3 Global Step: 59190 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:12:41,454-Speed 5157.50 samples/sec Loss 4.4299 LearningRate 0.0677 Epoch: 3 Global Step: 59200 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:12:43,429-Speed 5185.17 samples/sec Loss 4.5189 LearningRate 0.0677 Epoch: 3 Global Step: 59210 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:12:45,420-Speed 5146.39 samples/sec Loss 4.4277 LearningRate 0.0677 Epoch: 3 Global Step: 59220 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:12:47,404-Speed 5160.74 samples/sec Loss 4.4597 LearningRate 0.0677 Epoch: 3 Global Step: 59230 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:12:49,380-Speed 5185.07 samples/sec Loss 4.4468 LearningRate 0.0677 Epoch: 3 Global Step: 59240 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:12:51,374-Speed 5136.85 samples/sec Loss 4.4483 LearningRate 0.0677 Epoch: 3 Global Step: 59250 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:12:53,351-Speed 5183.04 samples/sec Loss 4.4101 LearningRate 0.0676 Epoch: 3 Global Step: 59260 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:12:55,336-Speed 5159.88 samples/sec Loss 4.4651 LearningRate 0.0676 Epoch: 3 Global Step: 59270 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:12:57,312-Speed 5184.03 samples/sec Loss 4.4946 LearningRate 0.0676 Epoch: 3 Global Step: 59280 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:12:59,292-Speed 5173.22 samples/sec Loss 4.3998 LearningRate 0.0676 Epoch: 3 Global Step: 59290 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:13:01,269-Speed 5181.22 samples/sec Loss 4.3396 LearningRate 0.0676 Epoch: 3 Global Step: 59300 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:13:03,245-Speed 5184.48 samples/sec Loss 4.4490 LearningRate 0.0676 Epoch: 3 Global Step: 59310 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:13:05,233-Speed 5151.13 samples/sec Loss 4.4070 LearningRate 0.0676 Epoch: 3 Global Step: 59320 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:13:07,214-Speed 5170.22 samples/sec Loss 4.4767 LearningRate 0.0676 Epoch: 3 Global Step: 59330 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:13:09,206-Speed 5143.11 samples/sec Loss 4.4250 LearningRate 0.0676 Epoch: 3 Global Step: 59340 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:13:11,192-Speed 5159.75 samples/sec Loss 4.4253 LearningRate 0.0676 Epoch: 3 Global Step: 59350 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:13:13,163-Speed 5194.81 samples/sec Loss 4.3858 LearningRate 0.0676 Epoch: 3 Global Step: 59360 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:13:15,138-Speed 5188.73 samples/sec Loss 4.3706 LearningRate 0.0676 Epoch: 3 Global Step: 59370 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:13:17,132-Speed 5135.90 samples/sec Loss 4.3949 LearningRate 0.0676 Epoch: 3 Global Step: 59380 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:13:19,122-Speed 5146.90 samples/sec Loss 4.4308 LearningRate 0.0676 Epoch: 3 Global Step: 59390 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:13:21,096-Speed 5188.55 samples/sec Loss 4.4264 LearningRate 0.0676 Epoch: 3 Global Step: 59400 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:13:23,073-Speed 5183.90 samples/sec Loss 4.4027 LearningRate 0.0676 Epoch: 3 Global Step: 59410 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:13:25,050-Speed 5180.55 samples/sec Loss 4.3837 LearningRate 0.0676 Epoch: 3 Global Step: 59420 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:13:27,039-Speed 5150.11 samples/sec Loss 4.4896 LearningRate 0.0676 Epoch: 3 Global Step: 59430 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:13:29,022-Speed 5165.15 samples/sec Loss 4.5020 LearningRate 0.0676 Epoch: 3 Global Step: 59440 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:13:31,001-Speed 5178.12 samples/sec Loss 4.4839 LearningRate 0.0676 Epoch: 3 Global Step: 59450 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:13:32,972-Speed 5196.73 samples/sec Loss 4.3599 LearningRate 0.0675 Epoch: 3 Global Step: 59460 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:13:34,946-Speed 5188.81 samples/sec Loss 4.4447 LearningRate 0.0675 Epoch: 3 Global Step: 59470 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:13:36,937-Speed 5145.50 samples/sec Loss 4.3580 LearningRate 0.0675 Epoch: 3 Global Step: 59480 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:13:38,920-Speed 5164.61 samples/sec Loss 4.3664 LearningRate 0.0675 Epoch: 3 Global Step: 59490 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:13:40,896-Speed 5183.94 samples/sec Loss 4.5074 LearningRate 0.0675 Epoch: 3 Global Step: 59500 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:13:42,871-Speed 5184.71 samples/sec Loss 4.5820 LearningRate 0.0675 Epoch: 3 Global Step: 59510 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:13:44,866-Speed 5134.76 samples/sec Loss 4.3812 LearningRate 0.0675 Epoch: 3 Global Step: 59520 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:13:46,846-Speed 5173.21 samples/sec Loss 4.4438 LearningRate 0.0675 Epoch: 3 Global Step: 59530 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:13:48,833-Speed 5156.76 samples/sec Loss 4.3478 LearningRate 0.0675 Epoch: 3 Global Step: 59540 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:13:50,824-Speed 5146.10 samples/sec Loss 4.4728 LearningRate 0.0675 Epoch: 3 Global Step: 59550 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:13:52,799-Speed 5186.72 samples/sec Loss 4.4056 LearningRate 0.0675 Epoch: 3 Global Step: 59560 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:13:54,773-Speed 5189.58 samples/sec Loss 4.4220 LearningRate 0.0675 Epoch: 3 Global Step: 59570 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:13:56,763-Speed 5145.21 samples/sec Loss 4.4994 LearningRate 0.0675 Epoch: 3 Global Step: 59580 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:13:58,781-Speed 5077.66 samples/sec Loss 4.4069 LearningRate 0.0675 Epoch: 3 Global Step: 59590 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:14:00,811-Speed 5044.46 samples/sec Loss 4.4776 LearningRate 0.0675 Epoch: 3 Global Step: 59600 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:14:02,802-Speed 5145.19 samples/sec Loss 4.4718 LearningRate 0.0675 Epoch: 3 Global Step: 59610 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:14:04,776-Speed 5188.87 samples/sec Loss 4.3068 LearningRate 0.0675 Epoch: 3 Global Step: 59620 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:14:06,758-Speed 5170.14 samples/sec Loss 4.4424 LearningRate 0.0675 Epoch: 3 Global Step: 59630 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:14:08,741-Speed 5164.00 samples/sec Loss 4.3598 LearningRate 0.0675 Epoch: 3 Global Step: 59640 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:14:10,714-Speed 5193.65 samples/sec Loss 4.4560 LearningRate 0.0675 Epoch: 3 Global Step: 59650 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:14:12,688-Speed 5189.20 samples/sec Loss 4.3863 LearningRate 0.0675 Epoch: 3 Global Step: 59660 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:14:14,661-Speed 5191.73 samples/sec Loss 4.3519 LearningRate 0.0674 Epoch: 3 Global Step: 59670 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:14:16,641-Speed 5171.71 samples/sec Loss 4.4769 LearningRate 0.0674 Epoch: 3 Global Step: 59680 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:14:18,625-Speed 5164.49 samples/sec Loss 4.4941 LearningRate 0.0674 Epoch: 3 Global Step: 59690 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:14:20,598-Speed 5190.41 samples/sec Loss 4.3756 LearningRate 0.0674 Epoch: 3 Global Step: 59700 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:14:22,578-Speed 5173.59 samples/sec Loss 4.4102 LearningRate 0.0674 Epoch: 3 Global Step: 59710 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:14:24,552-Speed 5188.61 samples/sec Loss 4.3846 LearningRate 0.0674 Epoch: 3 Global Step: 59720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:14:26,560-Speed 5102.50 samples/sec Loss 4.3192 LearningRate 0.0674 Epoch: 3 Global Step: 59730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:14:28,543-Speed 5165.32 samples/sec Loss 4.3142 LearningRate 0.0674 Epoch: 3 Global Step: 59740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:14:30,516-Speed 5191.33 samples/sec Loss 4.4694 LearningRate 0.0674 Epoch: 3 Global Step: 59750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:14:32,490-Speed 5190.40 samples/sec Loss 4.4143 LearningRate 0.0674 Epoch: 3 Global Step: 59760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:14:34,463-Speed 5191.02 samples/sec Loss 4.4426 LearningRate 0.0674 Epoch: 3 Global Step: 59770 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:14:36,522-Speed 4974.43 samples/sec Loss 4.3995 LearningRate 0.0674 Epoch: 3 Global Step: 59780 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:14:38,503-Speed 5172.61 samples/sec Loss 4.4903 LearningRate 0.0674 Epoch: 3 Global Step: 59790 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:14:40,478-Speed 5186.40 samples/sec Loss 4.3987 LearningRate 0.0674 Epoch: 3 Global Step: 59800 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:14:42,449-Speed 5195.14 samples/sec Loss 4.3955 LearningRate 0.0674 Epoch: 3 Global Step: 59810 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:14:44,443-Speed 5137.44 samples/sec Loss 4.4249 LearningRate 0.0674 Epoch: 3 Global Step: 59820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:14:46,432-Speed 5151.40 samples/sec Loss 4.4336 LearningRate 0.0674 Epoch: 3 Global Step: 59830 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:14:48,431-Speed 5122.67 samples/sec Loss 4.3882 LearningRate 0.0674 Epoch: 3 Global Step: 59840 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:14:50,421-Speed 5147.67 samples/sec Loss 4.4838 LearningRate 0.0674 Epoch: 3 Global Step: 59850 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:14:52,404-Speed 5166.13 samples/sec Loss 4.4392 LearningRate 0.0674 Epoch: 3 Global Step: 59860 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:14:54,389-Speed 5162.24 samples/sec Loss 4.4069 LearningRate 0.0673 Epoch: 3 Global Step: 59870 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:14:56,363-Speed 5189.03 samples/sec Loss 4.3783 LearningRate 0.0673 Epoch: 3 Global Step: 59880 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:14:58,347-Speed 5163.02 samples/sec Loss 4.3770 LearningRate 0.0673 Epoch: 3 Global Step: 59890 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:15:00,322-Speed 5186.94 samples/sec Loss 4.3388 LearningRate 0.0673 Epoch: 3 Global Step: 59900 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:15:02,298-Speed 5183.23 samples/sec Loss 4.4406 LearningRate 0.0673 Epoch: 3 Global Step: 59910 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:15:04,305-Speed 5104.19 samples/sec Loss 4.4262 LearningRate 0.0673 Epoch: 3 Global Step: 59920 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:15:06,282-Speed 5180.69 samples/sec Loss 4.4053 LearningRate 0.0673 Epoch: 3 Global Step: 59930 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:15:08,258-Speed 5182.54 samples/sec Loss 4.4687 LearningRate 0.0673 Epoch: 3 Global Step: 59940 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:15:10,237-Speed 5175.77 samples/sec Loss 4.4511 LearningRate 0.0673 Epoch: 3 Global Step: 59950 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:15:12,216-Speed 5177.02 samples/sec Loss 4.4800 LearningRate 0.0673 Epoch: 3 Global Step: 59960 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:15:14,201-Speed 5160.55 samples/sec Loss 4.5008 LearningRate 0.0673 Epoch: 3 Global Step: 59970 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:15:16,187-Speed 5158.72 samples/sec Loss 4.4470 LearningRate 0.0673 Epoch: 3 Global Step: 59980 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:15:18,162-Speed 5187.65 samples/sec Loss 4.4574 LearningRate 0.0673 Epoch: 3 Global Step: 59990 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:15:20,133-Speed 5196.32 samples/sec Loss 4.3768 LearningRate 0.0673 Epoch: 3 Global Step: 60000 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:15:46,719-[lfw][60000]XNorm: 21.458544 Training: 2022-04-11 03:15:46,719-[lfw][60000]Accuracy-Flip: 0.99717+-0.00269 Training: 2022-04-11 03:15:46,720-[lfw][60000]Accuracy-Highest: 0.99800 Training: 2022-04-11 03:16:17,499-[cfp_fp][60000]XNorm: 19.130385 Training: 2022-04-11 03:16:17,500-[cfp_fp][60000]Accuracy-Flip: 0.97500+-0.00698 Training: 2022-04-11 03:16:17,501-[cfp_fp][60000]Accuracy-Highest: 0.97871 Training: 2022-04-11 03:16:44,054-[agedb_30][60000]XNorm: 21.027660 Training: 2022-04-11 03:16:44,055-[agedb_30][60000]Accuracy-Flip: 0.97700+-0.00816 Training: 2022-04-11 03:16:44,055-[agedb_30][60000]Accuracy-Highest: 0.97717 Training: 2022-04-11 03:16:46,038-Speed 119.20 samples/sec Loss 4.3789 LearningRate 0.0673 Epoch: 3 Global Step: 60010 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:16:48,006-Speed 5205.70 samples/sec Loss 4.3440 LearningRate 0.0673 Epoch: 3 Global Step: 60020 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:16:49,991-Speed 5159.96 samples/sec Loss 4.3987 LearningRate 0.0673 Epoch: 3 Global Step: 60030 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:16:51,959-Speed 5204.49 samples/sec Loss 4.4378 LearningRate 0.0673 Epoch: 3 Global Step: 60040 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:16:53,930-Speed 5198.57 samples/sec Loss 4.3679 LearningRate 0.0673 Epoch: 3 Global Step: 60050 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:16:55,900-Speed 5200.52 samples/sec Loss 4.4246 LearningRate 0.0673 Epoch: 3 Global Step: 60060 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:16:57,869-Speed 5200.17 samples/sec Loss 4.5012 LearningRate 0.0672 Epoch: 3 Global Step: 60070 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:16:59,844-Speed 5186.61 samples/sec Loss 4.2864 LearningRate 0.0672 Epoch: 3 Global Step: 60080 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:17:01,811-Speed 5209.43 samples/sec Loss 4.4871 LearningRate 0.0672 Epoch: 3 Global Step: 60090 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:17:03,780-Speed 5200.70 samples/sec Loss 4.3241 LearningRate 0.0672 Epoch: 3 Global Step: 60100 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:17:05,770-Speed 5148.03 samples/sec Loss 4.4518 LearningRate 0.0672 Epoch: 3 Global Step: 60110 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:17:07,746-Speed 5185.55 samples/sec Loss 4.2629 LearningRate 0.0672 Epoch: 3 Global Step: 60120 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:17:09,728-Speed 5167.74 samples/sec Loss 4.3653 LearningRate 0.0672 Epoch: 3 Global Step: 60130 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:17:11,741-Speed 5088.84 samples/sec Loss 4.4081 LearningRate 0.0672 Epoch: 3 Global Step: 60140 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:17:13,720-Speed 5175.87 samples/sec Loss 4.3011 LearningRate 0.0672 Epoch: 3 Global Step: 60150 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:17:15,701-Speed 5171.49 samples/sec Loss 4.3918 LearningRate 0.0672 Epoch: 3 Global Step: 60160 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:17:17,686-Speed 5160.51 samples/sec Loss 4.3693 LearningRate 0.0672 Epoch: 3 Global Step: 60170 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:17:19,659-Speed 5191.59 samples/sec Loss 4.4794 LearningRate 0.0672 Epoch: 3 Global Step: 60180 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:17:21,661-Speed 5116.53 samples/sec Loss 4.4127 LearningRate 0.0672 Epoch: 3 Global Step: 60190 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:17:23,650-Speed 5150.36 samples/sec Loss 4.3588 LearningRate 0.0672 Epoch: 3 Global Step: 60200 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:17:25,644-Speed 5135.11 samples/sec Loss 4.4138 LearningRate 0.0672 Epoch: 3 Global Step: 60210 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:17:27,622-Speed 5180.41 samples/sec Loss 4.3500 LearningRate 0.0672 Epoch: 3 Global Step: 60220 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:17:29,619-Speed 5128.62 samples/sec Loss 4.4054 LearningRate 0.0672 Epoch: 3 Global Step: 60230 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:17:31,592-Speed 5192.28 samples/sec Loss 4.4304 LearningRate 0.0672 Epoch: 3 Global Step: 60240 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:17:33,571-Speed 5177.10 samples/sec Loss 4.4106 LearningRate 0.0672 Epoch: 3 Global Step: 60250 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:17:35,565-Speed 5135.65 samples/sec Loss 4.3809 LearningRate 0.0672 Epoch: 3 Global Step: 60260 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:17:37,542-Speed 5181.70 samples/sec Loss 4.3430 LearningRate 0.0672 Epoch: 3 Global Step: 60270 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:17:39,521-Speed 5174.59 samples/sec Loss 4.3788 LearningRate 0.0671 Epoch: 3 Global Step: 60280 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:17:41,490-Speed 5203.35 samples/sec Loss 4.3619 LearningRate 0.0671 Epoch: 3 Global Step: 60290 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:17:43,457-Speed 5206.59 samples/sec Loss 4.3085 LearningRate 0.0671 Epoch: 3 Global Step: 60300 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:17:45,436-Speed 5175.92 samples/sec Loss 4.3277 LearningRate 0.0671 Epoch: 3 Global Step: 60310 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:17:47,416-Speed 5174.55 samples/sec Loss 4.3259 LearningRate 0.0671 Epoch: 3 Global Step: 60320 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:17:49,408-Speed 5142.11 samples/sec Loss 4.3786 LearningRate 0.0671 Epoch: 3 Global Step: 60330 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:17:51,378-Speed 5201.43 samples/sec Loss 4.3453 LearningRate 0.0671 Epoch: 3 Global Step: 60340 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:17:53,348-Speed 5198.82 samples/sec Loss 4.5173 LearningRate 0.0671 Epoch: 3 Global Step: 60350 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:17:55,314-Speed 5211.31 samples/sec Loss 4.3230 LearningRate 0.0671 Epoch: 3 Global Step: 60360 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:17:57,305-Speed 5144.22 samples/sec Loss 4.4472 LearningRate 0.0671 Epoch: 3 Global Step: 60370 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:17:59,310-Speed 5107.02 samples/sec Loss 4.2814 LearningRate 0.0671 Epoch: 3 Global Step: 60380 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:18:01,296-Speed 5158.16 samples/sec Loss 4.3849 LearningRate 0.0671 Epoch: 3 Global Step: 60390 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:18:03,286-Speed 5149.62 samples/sec Loss 4.3901 LearningRate 0.0671 Epoch: 3 Global Step: 60400 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:18:05,274-Speed 5152.52 samples/sec Loss 4.4154 LearningRate 0.0671 Epoch: 3 Global Step: 60410 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:18:07,259-Speed 5159.22 samples/sec Loss 4.4035 LearningRate 0.0671 Epoch: 3 Global Step: 60420 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:18:09,244-Speed 5161.58 samples/sec Loss 4.3874 LearningRate 0.0671 Epoch: 3 Global Step: 60430 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:18:11,224-Speed 5173.76 samples/sec Loss 4.3560 LearningRate 0.0671 Epoch: 3 Global Step: 60440 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:18:13,193-Speed 5201.23 samples/sec Loss 4.3203 LearningRate 0.0671 Epoch: 3 Global Step: 60450 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:18:15,173-Speed 5174.73 samples/sec Loss 4.4095 LearningRate 0.0671 Epoch: 3 Global Step: 60460 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:18:17,151-Speed 5178.89 samples/sec Loss 4.4033 LearningRate 0.0671 Epoch: 3 Global Step: 60470 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:18:19,120-Speed 5201.54 samples/sec Loss 4.3253 LearningRate 0.0670 Epoch: 3 Global Step: 60480 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:18:21,104-Speed 5162.40 samples/sec Loss 4.4297 LearningRate 0.0670 Epoch: 3 Global Step: 60490 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:18:23,083-Speed 5176.31 samples/sec Loss 4.3524 LearningRate 0.0670 Epoch: 3 Global Step: 60500 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:18:25,054-Speed 5196.33 samples/sec Loss 4.3909 LearningRate 0.0670 Epoch: 3 Global Step: 60510 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:18:27,028-Speed 5189.54 samples/sec Loss 4.3496 LearningRate 0.0670 Epoch: 3 Global Step: 60520 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:18:29,000-Speed 5192.89 samples/sec Loss 4.3740 LearningRate 0.0670 Epoch: 3 Global Step: 60530 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:18:30,974-Speed 5189.74 samples/sec Loss 4.3630 LearningRate 0.0670 Epoch: 3 Global Step: 60540 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:18:32,959-Speed 5163.70 samples/sec Loss 4.3804 LearningRate 0.0670 Epoch: 3 Global Step: 60550 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:18:34,928-Speed 5202.16 samples/sec Loss 4.4237 LearningRate 0.0670 Epoch: 3 Global Step: 60560 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:18:36,924-Speed 5131.69 samples/sec Loss 4.3155 LearningRate 0.0670 Epoch: 3 Global Step: 60570 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:18:38,900-Speed 5183.22 samples/sec Loss 4.4190 LearningRate 0.0670 Epoch: 3 Global Step: 60580 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:18:40,865-Speed 5212.84 samples/sec Loss 4.3915 LearningRate 0.0670 Epoch: 3 Global Step: 60590 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:18:42,832-Speed 5205.85 samples/sec Loss 4.4516 LearningRate 0.0670 Epoch: 3 Global Step: 60600 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:18:44,802-Speed 5200.63 samples/sec Loss 4.3668 LearningRate 0.0670 Epoch: 3 Global Step: 60610 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:18:46,770-Speed 5204.32 samples/sec Loss 4.2843 LearningRate 0.0670 Epoch: 3 Global Step: 60620 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:18:48,754-Speed 5164.81 samples/sec Loss 4.3578 LearningRate 0.0670 Epoch: 3 Global Step: 60630 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:18:50,737-Speed 5164.63 samples/sec Loss 4.4729 LearningRate 0.0670 Epoch: 3 Global Step: 60640 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:18:52,711-Speed 5190.69 samples/sec Loss 4.3242 LearningRate 0.0670 Epoch: 3 Global Step: 60650 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:18:54,688-Speed 5179.97 samples/sec Loss 4.3691 LearningRate 0.0670 Epoch: 3 Global Step: 60660 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:18:56,664-Speed 5184.31 samples/sec Loss 4.3657 LearningRate 0.0670 Epoch: 3 Global Step: 60670 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:18:58,638-Speed 5190.79 samples/sec Loss 4.3908 LearningRate 0.0669 Epoch: 3 Global Step: 60680 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:19:00,607-Speed 5201.54 samples/sec Loss 4.3460 LearningRate 0.0669 Epoch: 3 Global Step: 60690 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:19:02,582-Speed 5185.86 samples/sec Loss 4.3580 LearningRate 0.0669 Epoch: 3 Global Step: 60700 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:19:04,548-Speed 5210.65 samples/sec Loss 4.4933 LearningRate 0.0669 Epoch: 3 Global Step: 60710 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:19:06,520-Speed 5192.90 samples/sec Loss 4.3867 LearningRate 0.0669 Epoch: 3 Global Step: 60720 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:19:08,495-Speed 5186.23 samples/sec Loss 4.3127 LearningRate 0.0669 Epoch: 3 Global Step: 60730 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:19:10,491-Speed 5132.03 samples/sec Loss 4.4111 LearningRate 0.0669 Epoch: 3 Global Step: 60740 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:19:12,477-Speed 5159.95 samples/sec Loss 4.2961 LearningRate 0.0669 Epoch: 3 Global Step: 60750 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:19:14,460-Speed 5164.36 samples/sec Loss 4.4002 LearningRate 0.0669 Epoch: 3 Global Step: 60760 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:19:16,450-Speed 5147.44 samples/sec Loss 4.3702 LearningRate 0.0669 Epoch: 3 Global Step: 60770 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:19:18,428-Speed 5178.48 samples/sec Loss 4.2916 LearningRate 0.0669 Epoch: 3 Global Step: 60780 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:19:20,410-Speed 5169.60 samples/sec Loss 4.3387 LearningRate 0.0669 Epoch: 3 Global Step: 60790 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:19:22,396-Speed 5157.85 samples/sec Loss 4.3964 LearningRate 0.0669 Epoch: 3 Global Step: 60800 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:19:24,368-Speed 5195.77 samples/sec Loss 4.4360 LearningRate 0.0669 Epoch: 3 Global Step: 60810 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:19:26,333-Speed 5212.17 samples/sec Loss 4.2872 LearningRate 0.0669 Epoch: 3 Global Step: 60820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:19:28,328-Speed 5134.39 samples/sec Loss 4.3244 LearningRate 0.0669 Epoch: 3 Global Step: 60830 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:19:30,312-Speed 5163.23 samples/sec Loss 4.4315 LearningRate 0.0669 Epoch: 3 Global Step: 60840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:19:32,288-Speed 5183.57 samples/sec Loss 4.4146 LearningRate 0.0669 Epoch: 3 Global Step: 60850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:19:34,283-Speed 5135.32 samples/sec Loss 4.3237 LearningRate 0.0669 Epoch: 3 Global Step: 60860 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:19:36,267-Speed 5161.74 samples/sec Loss 4.3860 LearningRate 0.0669 Epoch: 3 Global Step: 60870 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:19:38,253-Speed 5157.88 samples/sec Loss 4.3815 LearningRate 0.0669 Epoch: 3 Global Step: 60880 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:19:40,231-Speed 5179.65 samples/sec Loss 4.3383 LearningRate 0.0668 Epoch: 3 Global Step: 60890 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:19:42,201-Speed 5197.96 samples/sec Loss 4.2501 LearningRate 0.0668 Epoch: 3 Global Step: 60900 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:19:44,191-Speed 5148.86 samples/sec Loss 4.3625 LearningRate 0.0668 Epoch: 3 Global Step: 60910 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:19:46,154-Speed 5218.18 samples/sec Loss 4.3865 LearningRate 0.0668 Epoch: 3 Global Step: 60920 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:19:48,125-Speed 5196.30 samples/sec Loss 4.3973 LearningRate 0.0668 Epoch: 3 Global Step: 60930 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:19:50,093-Speed 5205.41 samples/sec Loss 4.3053 LearningRate 0.0668 Epoch: 3 Global Step: 60940 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:19:52,079-Speed 5156.46 samples/sec Loss 4.4160 LearningRate 0.0668 Epoch: 3 Global Step: 60950 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:19:54,065-Speed 5158.99 samples/sec Loss 4.3798 LearningRate 0.0668 Epoch: 3 Global Step: 60960 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:19:56,035-Speed 5200.94 samples/sec Loss 4.3487 LearningRate 0.0668 Epoch: 3 Global Step: 60970 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:19:58,012-Speed 5181.53 samples/sec Loss 4.5534 LearningRate 0.0668 Epoch: 3 Global Step: 60980 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:19:59,985-Speed 5192.18 samples/sec Loss 4.3613 LearningRate 0.0668 Epoch: 3 Global Step: 60990 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:20:01,948-Speed 5218.10 samples/sec Loss 4.3557 LearningRate 0.0668 Epoch: 3 Global Step: 61000 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:20:03,926-Speed 5180.86 samples/sec Loss 4.3659 LearningRate 0.0668 Epoch: 3 Global Step: 61010 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:20:05,896-Speed 5200.75 samples/sec Loss 4.4041 LearningRate 0.0668 Epoch: 3 Global Step: 61020 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:20:07,877-Speed 5169.25 samples/sec Loss 4.3485 LearningRate 0.0668 Epoch: 3 Global Step: 61030 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:20:09,868-Speed 5144.00 samples/sec Loss 4.3789 LearningRate 0.0668 Epoch: 3 Global Step: 61040 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:20:11,879-Speed 5094.41 samples/sec Loss 4.4713 LearningRate 0.0668 Epoch: 3 Global Step: 61050 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:20:13,873-Speed 5135.87 samples/sec Loss 4.3863 LearningRate 0.0668 Epoch: 3 Global Step: 61060 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:20:15,842-Speed 5203.99 samples/sec Loss 4.3619 LearningRate 0.0668 Epoch: 3 Global Step: 61070 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:20:17,821-Speed 5175.92 samples/sec Loss 4.4093 LearningRate 0.0668 Epoch: 3 Global Step: 61080 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:20:19,809-Speed 5154.35 samples/sec Loss 4.4068 LearningRate 0.0667 Epoch: 3 Global Step: 61090 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:20:21,781-Speed 5192.95 samples/sec Loss 4.4440 LearningRate 0.0667 Epoch: 3 Global Step: 61100 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:20:23,750-Speed 5202.91 samples/sec Loss 4.3718 LearningRate 0.0667 Epoch: 3 Global Step: 61110 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:20:25,764-Speed 5084.88 samples/sec Loss 4.2829 LearningRate 0.0667 Epoch: 3 Global Step: 61120 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:20:27,735-Speed 5197.91 samples/sec Loss 4.3711 LearningRate 0.0667 Epoch: 3 Global Step: 61130 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:20:29,714-Speed 5176.94 samples/sec Loss 4.3551 LearningRate 0.0667 Epoch: 3 Global Step: 61140 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:20:31,691-Speed 5180.68 samples/sec Loss 4.4112 LearningRate 0.0667 Epoch: 3 Global Step: 61150 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:20:33,666-Speed 5186.16 samples/sec Loss 4.3833 LearningRate 0.0667 Epoch: 3 Global Step: 61160 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:20:35,652-Speed 5158.71 samples/sec Loss 4.4012 LearningRate 0.0667 Epoch: 3 Global Step: 61170 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:20:37,646-Speed 5136.64 samples/sec Loss 4.2939 LearningRate 0.0667 Epoch: 3 Global Step: 61180 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:20:39,640-Speed 5138.45 samples/sec Loss 4.4637 LearningRate 0.0667 Epoch: 3 Global Step: 61190 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:20:41,637-Speed 5129.79 samples/sec Loss 4.3118 LearningRate 0.0667 Epoch: 3 Global Step: 61200 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:20:43,630-Speed 5138.06 samples/sec Loss 4.2986 LearningRate 0.0667 Epoch: 3 Global Step: 61210 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:20:45,608-Speed 5178.04 samples/sec Loss 4.2962 LearningRate 0.0667 Epoch: 3 Global Step: 61220 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:20:47,615-Speed 5105.41 samples/sec Loss 4.3828 LearningRate 0.0667 Epoch: 3 Global Step: 61230 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:20:49,590-Speed 5184.70 samples/sec Loss 4.4502 LearningRate 0.0667 Epoch: 3 Global Step: 61240 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:20:51,571-Speed 5172.04 samples/sec Loss 4.2973 LearningRate 0.0667 Epoch: 3 Global Step: 61250 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:20:53,572-Speed 5120.39 samples/sec Loss 4.3412 LearningRate 0.0667 Epoch: 3 Global Step: 61260 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:20:55,541-Speed 5201.99 samples/sec Loss 4.3954 LearningRate 0.0667 Epoch: 3 Global Step: 61270 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:20:57,520-Speed 5174.65 samples/sec Loss 4.3554 LearningRate 0.0667 Epoch: 3 Global Step: 61280 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:20:59,499-Speed 5176.92 samples/sec Loss 4.2632 LearningRate 0.0667 Epoch: 3 Global Step: 61290 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:21:01,485-Speed 5159.61 samples/sec Loss 4.3068 LearningRate 0.0666 Epoch: 3 Global Step: 61300 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:21:03,484-Speed 5123.55 samples/sec Loss 4.3971 LearningRate 0.0666 Epoch: 3 Global Step: 61310 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-04-11 03:21:05,449-Speed 5213.53 samples/sec Loss 4.4550 LearningRate 0.0666 Epoch: 3 Global Step: 61320 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:21:07,417-Speed 5204.57 samples/sec Loss 4.4492 LearningRate 0.0666 Epoch: 3 Global Step: 61330 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:21:09,397-Speed 5175.35 samples/sec Loss 4.3580 LearningRate 0.0666 Epoch: 3 Global Step: 61340 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:21:11,384-Speed 5154.49 samples/sec Loss 4.3061 LearningRate 0.0666 Epoch: 3 Global Step: 61350 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:21:13,359-Speed 5185.80 samples/sec Loss 4.3903 LearningRate 0.0666 Epoch: 3 Global Step: 61360 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:21:15,377-Speed 5074.88 samples/sec Loss 4.4366 LearningRate 0.0666 Epoch: 3 Global Step: 61370 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:21:17,361-Speed 5165.26 samples/sec Loss 4.3009 LearningRate 0.0666 Epoch: 3 Global Step: 61380 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:21:19,333-Speed 5194.47 samples/sec Loss 4.3046 LearningRate 0.0666 Epoch: 3 Global Step: 61390 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:21:21,324-Speed 5143.70 samples/sec Loss 4.3432 LearningRate 0.0666 Epoch: 3 Global Step: 61400 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:21:23,330-Speed 5107.73 samples/sec Loss 4.3598 LearningRate 0.0666 Epoch: 3 Global Step: 61410 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:21:25,324-Speed 5136.69 samples/sec Loss 4.4032 LearningRate 0.0666 Epoch: 3 Global Step: 61420 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:21:27,298-Speed 5189.75 samples/sec Loss 4.3379 LearningRate 0.0666 Epoch: 3 Global Step: 61430 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:21:29,273-Speed 5186.14 samples/sec Loss 4.4507 LearningRate 0.0666 Epoch: 3 Global Step: 61440 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:21:31,247-Speed 5189.37 samples/sec Loss 4.3402 LearningRate 0.0666 Epoch: 3 Global Step: 61450 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:21:33,227-Speed 5171.93 samples/sec Loss 4.2661 LearningRate 0.0666 Epoch: 3 Global Step: 61460 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:21:35,202-Speed 5186.07 samples/sec Loss 4.3693 LearningRate 0.0666 Epoch: 3 Global Step: 61470 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:21:37,188-Speed 5159.71 samples/sec Loss 4.3044 LearningRate 0.0666 Epoch: 3 Global Step: 61480 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:21:39,170-Speed 5167.66 samples/sec Loss 4.3462 LearningRate 0.0666 Epoch: 3 Global Step: 61490 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:21:41,158-Speed 5152.35 samples/sec Loss 4.2979 LearningRate 0.0665 Epoch: 3 Global Step: 61500 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:21:43,132-Speed 5188.80 samples/sec Loss 4.3605 LearningRate 0.0665 Epoch: 3 Global Step: 61510 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:21:45,110-Speed 5178.13 samples/sec Loss 4.3616 LearningRate 0.0665 Epoch: 3 Global Step: 61520 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:21:47,096-Speed 5159.76 samples/sec Loss 4.3056 LearningRate 0.0665 Epoch: 3 Global Step: 61530 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:21:49,069-Speed 5189.67 samples/sec Loss 4.3499 LearningRate 0.0665 Epoch: 3 Global Step: 61540 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:21:51,050-Speed 5171.85 samples/sec Loss 4.4342 LearningRate 0.0665 Epoch: 3 Global Step: 61550 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:21:53,028-Speed 5179.55 samples/sec Loss 4.3252 LearningRate 0.0665 Epoch: 3 Global Step: 61560 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:21:55,012-Speed 5162.59 samples/sec Loss 4.3054 LearningRate 0.0665 Epoch: 3 Global Step: 61570 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:21:56,988-Speed 5183.49 samples/sec Loss 4.4732 LearningRate 0.0665 Epoch: 3 Global Step: 61580 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:21:58,967-Speed 5176.31 samples/sec Loss 4.2679 LearningRate 0.0665 Epoch: 3 Global Step: 61590 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:00,941-Speed 5190.60 samples/sec Loss 4.3748 LearningRate 0.0665 Epoch: 3 Global Step: 61600 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:02,927-Speed 5155.81 samples/sec Loss 4.2654 LearningRate 0.0665 Epoch: 3 Global Step: 61610 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:04,905-Speed 5180.49 samples/sec Loss 4.3444 LearningRate 0.0665 Epoch: 3 Global Step: 61620 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:06,884-Speed 5175.77 samples/sec Loss 4.3914 LearningRate 0.0665 Epoch: 3 Global Step: 61630 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:08,867-Speed 5165.22 samples/sec Loss 4.3941 LearningRate 0.0665 Epoch: 3 Global Step: 61640 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:10,865-Speed 5127.95 samples/sec Loss 4.3941 LearningRate 0.0665 Epoch: 3 Global Step: 61650 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:12,837-Speed 5194.09 samples/sec Loss 4.3774 LearningRate 0.0665 Epoch: 3 Global Step: 61660 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:14,811-Speed 5187.63 samples/sec Loss 4.4771 LearningRate 0.0665 Epoch: 3 Global Step: 61670 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:16,798-Speed 5156.86 samples/sec Loss 4.3852 LearningRate 0.0665 Epoch: 3 Global Step: 61680 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-04-11 03:22:18,773-Speed 5184.85 samples/sec Loss 4.3486 LearningRate 0.0665 Epoch: 3 Global Step: 61690 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:20,752-Speed 5178.18 samples/sec Loss 4.3117 LearningRate 0.0665 Epoch: 3 Global Step: 61700 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:22,724-Speed 5193.45 samples/sec Loss 4.3206 LearningRate 0.0664 Epoch: 3 Global Step: 61710 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:24,703-Speed 5175.31 samples/sec Loss 4.4213 LearningRate 0.0664 Epoch: 3 Global Step: 61720 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:26,689-Speed 5156.93 samples/sec Loss 4.3539 LearningRate 0.0664 Epoch: 3 Global Step: 61730 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:28,691-Speed 5117.79 samples/sec Loss 4.2402 LearningRate 0.0664 Epoch: 3 Global Step: 61740 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:30,669-Speed 5179.59 samples/sec Loss 4.4484 LearningRate 0.0664 Epoch: 3 Global Step: 61750 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:32,650-Speed 5168.54 samples/sec Loss 4.3836 LearningRate 0.0664 Epoch: 3 Global Step: 61760 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:34,643-Speed 5141.37 samples/sec Loss 4.4084 LearningRate 0.0664 Epoch: 3 Global Step: 61770 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:36,631-Speed 5152.62 samples/sec Loss 4.4172 LearningRate 0.0664 Epoch: 3 Global Step: 61780 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:38,601-Speed 5199.97 samples/sec Loss 4.3877 LearningRate 0.0664 Epoch: 3 Global Step: 61790 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:40,612-Speed 5094.26 samples/sec Loss 4.3212 LearningRate 0.0664 Epoch: 3 Global Step: 61800 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:42,606-Speed 5137.29 samples/sec Loss 4.3735 LearningRate 0.0664 Epoch: 3 Global Step: 61810 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:44,579-Speed 5190.17 samples/sec Loss 4.3295 LearningRate 0.0664 Epoch: 3 Global Step: 61820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:46,582-Speed 5115.06 samples/sec Loss 4.4403 LearningRate 0.0664 Epoch: 3 Global Step: 61830 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:48,568-Speed 5155.59 samples/sec Loss 4.3309 LearningRate 0.0664 Epoch: 3 Global Step: 61840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:50,595-Speed 5056.10 samples/sec Loss 4.2935 LearningRate 0.0664 Epoch: 3 Global Step: 61850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:52,585-Speed 5146.24 samples/sec Loss 4.4047 LearningRate 0.0664 Epoch: 3 Global Step: 61860 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:54,569-Speed 5161.95 samples/sec Loss 4.3511 LearningRate 0.0664 Epoch: 3 Global Step: 61870 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:56,558-Speed 5150.36 samples/sec Loss 4.3459 LearningRate 0.0664 Epoch: 3 Global Step: 61880 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:22:58,540-Speed 5168.93 samples/sec Loss 4.4630 LearningRate 0.0664 Epoch: 3 Global Step: 61890 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:23:00,515-Speed 5187.98 samples/sec Loss 4.3728 LearningRate 0.0664 Epoch: 3 Global Step: 61900 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:23:02,515-Speed 5122.17 samples/sec Loss 4.3879 LearningRate 0.0663 Epoch: 3 Global Step: 61910 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:23:04,487-Speed 5193.51 samples/sec Loss 4.3102 LearningRate 0.0663 Epoch: 3 Global Step: 61920 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:23:06,473-Speed 5157.45 samples/sec Loss 4.3355 LearningRate 0.0663 Epoch: 3 Global Step: 61930 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:23:08,457-Speed 5163.11 samples/sec Loss 4.3845 LearningRate 0.0663 Epoch: 3 Global Step: 61940 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:23:10,446-Speed 5149.08 samples/sec Loss 4.3170 LearningRate 0.0663 Epoch: 3 Global Step: 61950 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:23:12,433-Speed 5155.11 samples/sec Loss 4.2089 LearningRate 0.0663 Epoch: 3 Global Step: 61960 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:23:14,428-Speed 5134.39 samples/sec Loss 4.2733 LearningRate 0.0663 Epoch: 3 Global Step: 61970 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:23:16,431-Speed 5114.08 samples/sec Loss 4.3220 LearningRate 0.0663 Epoch: 3 Global Step: 61980 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:23:18,408-Speed 5181.50 samples/sec Loss 4.3053 LearningRate 0.0663 Epoch: 3 Global Step: 61990 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:23:20,399-Speed 5145.09 samples/sec Loss 4.4737 LearningRate 0.0663 Epoch: 3 Global Step: 62000 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:23:46,993-[lfw][62000]XNorm: 22.179909 Training: 2022-04-11 03:23:46,994-[lfw][62000]Accuracy-Flip: 0.99717+-0.00350 Training: 2022-04-11 03:23:46,994-[lfw][62000]Accuracy-Highest: 0.99800 Training: 2022-04-11 03:24:17,782-[cfp_fp][62000]XNorm: 19.814532 Training: 2022-04-11 03:24:17,783-[cfp_fp][62000]Accuracy-Flip: 0.97600+-0.00711 Training: 2022-04-11 03:24:17,783-[cfp_fp][62000]Accuracy-Highest: 0.97871 Training: 2022-04-11 03:24:44,264-[agedb_30][62000]XNorm: 22.011563 Training: 2022-04-11 03:24:44,265-[agedb_30][62000]Accuracy-Flip: 0.97633+-0.00849 Training: 2022-04-11 03:24:44,265-[agedb_30][62000]Accuracy-Highest: 0.97717 Training: 2022-04-11 03:24:46,264-Speed 119.26 samples/sec Loss 4.3044 LearningRate 0.0663 Epoch: 3 Global Step: 62010 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:24:48,237-Speed 5193.30 samples/sec Loss 4.3072 LearningRate 0.0663 Epoch: 3 Global Step: 62020 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:24:50,207-Speed 5197.61 samples/sec Loss 4.2605 LearningRate 0.0663 Epoch: 3 Global Step: 62030 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:24:52,177-Speed 5200.60 samples/sec Loss 4.4124 LearningRate 0.0663 Epoch: 3 Global Step: 62040 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:24:54,147-Speed 5199.95 samples/sec Loss 4.2544 LearningRate 0.0663 Epoch: 3 Global Step: 62050 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:24:56,112-Speed 5213.07 samples/sec Loss 4.3890 LearningRate 0.0663 Epoch: 3 Global Step: 62060 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:24:58,083-Speed 5196.17 samples/sec Loss 4.3464 LearningRate 0.0663 Epoch: 3 Global Step: 62070 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:25:00,053-Speed 5201.83 samples/sec Loss 4.3416 LearningRate 0.0663 Epoch: 3 Global Step: 62080 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:25:02,043-Speed 5146.57 samples/sec Loss 4.2583 LearningRate 0.0663 Epoch: 3 Global Step: 62090 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:25:04,021-Speed 5178.77 samples/sec Loss 4.4187 LearningRate 0.0663 Epoch: 3 Global Step: 62100 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:25:05,986-Speed 5212.27 samples/sec Loss 4.3074 LearningRate 0.0663 Epoch: 3 Global Step: 62110 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-04-11 03:25:07,948-Speed 5220.74 samples/sec Loss 4.3700 LearningRate 0.0662 Epoch: 3 Global Step: 62120 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:25:09,945-Speed 5130.12 samples/sec Loss 4.3495 LearningRate 0.0662 Epoch: 3 Global Step: 62130 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:25:11,912-Speed 5206.74 samples/sec Loss 4.2955 LearningRate 0.0662 Epoch: 3 Global Step: 62140 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:25:13,886-Speed 5190.08 samples/sec Loss 4.4567 LearningRate 0.0662 Epoch: 3 Global Step: 62150 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:25:15,861-Speed 5185.71 samples/sec Loss 4.3856 LearningRate 0.0662 Epoch: 3 Global Step: 62160 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:25:17,846-Speed 5161.24 samples/sec Loss 4.2794 LearningRate 0.0662 Epoch: 3 Global Step: 62170 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:25:19,810-Speed 5214.63 samples/sec Loss 4.3909 LearningRate 0.0662 Epoch: 3 Global Step: 62180 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:25:21,778-Speed 5204.83 samples/sec Loss 4.2805 LearningRate 0.0662 Epoch: 3 Global Step: 62190 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:25:23,769-Speed 5145.02 samples/sec Loss 4.3207 LearningRate 0.0662 Epoch: 3 Global Step: 62200 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:25:25,770-Speed 5118.27 samples/sec Loss 4.3742 LearningRate 0.0662 Epoch: 3 Global Step: 62210 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:25:27,740-Speed 5202.27 samples/sec Loss 4.3967 LearningRate 0.0662 Epoch: 3 Global Step: 62220 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:25:29,709-Speed 5202.13 samples/sec Loss 4.3020 LearningRate 0.0662 Epoch: 3 Global Step: 62230 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:25:31,677-Speed 5204.78 samples/sec Loss 4.3994 LearningRate 0.0662 Epoch: 3 Global Step: 62240 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:25:33,666-Speed 5150.43 samples/sec Loss 4.4275 LearningRate 0.0662 Epoch: 3 Global Step: 62250 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:25:35,641-Speed 5186.57 samples/sec Loss 4.4347 LearningRate 0.0662 Epoch: 3 Global Step: 62260 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:25:37,624-Speed 5163.87 samples/sec Loss 4.4190 LearningRate 0.0662 Epoch: 3 Global Step: 62270 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:25:39,608-Speed 5162.77 samples/sec Loss 4.3584 LearningRate 0.0662 Epoch: 3 Global Step: 62280 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:25:41,578-Speed 5203.01 samples/sec Loss 4.3027 LearningRate 0.0662 Epoch: 3 Global Step: 62290 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:25:43,550-Speed 5193.36 samples/sec Loss 4.3714 LearningRate 0.0662 Epoch: 3 Global Step: 62300 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:25:45,586-Speed 5029.76 samples/sec Loss 4.4233 LearningRate 0.0662 Epoch: 3 Global Step: 62310 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:25:47,577-Speed 5145.60 samples/sec Loss 4.3097 LearningRate 0.0661 Epoch: 3 Global Step: 62320 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:25:49,561-Speed 5164.73 samples/sec Loss 4.3124 LearningRate 0.0661 Epoch: 3 Global Step: 62330 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:25:51,542-Speed 5169.76 samples/sec Loss 4.4337 LearningRate 0.0661 Epoch: 3 Global Step: 62340 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:25:53,524-Speed 5169.41 samples/sec Loss 4.3462 LearningRate 0.0661 Epoch: 3 Global Step: 62350 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:25:55,491-Speed 5208.54 samples/sec Loss 4.3964 LearningRate 0.0661 Epoch: 3 Global Step: 62360 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:25:57,471-Speed 5173.38 samples/sec Loss 4.3518 LearningRate 0.0661 Epoch: 3 Global Step: 62370 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:25:59,464-Speed 5139.09 samples/sec Loss 4.3209 LearningRate 0.0661 Epoch: 3 Global Step: 62380 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:26:01,440-Speed 5184.00 samples/sec Loss 4.4211 LearningRate 0.0661 Epoch: 3 Global Step: 62390 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:26:03,411-Speed 5197.28 samples/sec Loss 4.3656 LearningRate 0.0661 Epoch: 3 Global Step: 62400 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:26:05,388-Speed 5180.96 samples/sec Loss 4.3603 LearningRate 0.0661 Epoch: 3 Global Step: 62410 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:26:07,365-Speed 5181.77 samples/sec Loss 4.4020 LearningRate 0.0661 Epoch: 3 Global Step: 62420 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:26:09,362-Speed 5129.53 samples/sec Loss 4.2766 LearningRate 0.0661 Epoch: 3 Global Step: 62430 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:26:11,359-Speed 5132.37 samples/sec Loss 4.2923 LearningRate 0.0661 Epoch: 3 Global Step: 62440 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:26:13,331-Speed 5194.11 samples/sec Loss 4.3066 LearningRate 0.0661 Epoch: 3 Global Step: 62450 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:26:15,321-Speed 5147.17 samples/sec Loss 4.3233 LearningRate 0.0661 Epoch: 3 Global Step: 62460 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:26:17,310-Speed 5148.76 samples/sec Loss 4.2962 LearningRate 0.0661 Epoch: 3 Global Step: 62470 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:26:19,283-Speed 5192.03 samples/sec Loss 4.3574 LearningRate 0.0661 Epoch: 3 Global Step: 62480 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:26:21,255-Speed 5195.77 samples/sec Loss 4.3890 LearningRate 0.0661 Epoch: 3 Global Step: 62490 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:26:23,234-Speed 5175.40 samples/sec Loss 4.3434 LearningRate 0.0661 Epoch: 3 Global Step: 62500 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:26:25,223-Speed 5150.49 samples/sec Loss 4.3384 LearningRate 0.0661 Epoch: 3 Global Step: 62510 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:26:27,198-Speed 5185.91 samples/sec Loss 4.3427 LearningRate 0.0661 Epoch: 3 Global Step: 62520 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:26:29,161-Speed 5218.72 samples/sec Loss 4.3145 LearningRate 0.0660 Epoch: 3 Global Step: 62530 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:26:31,128-Speed 5209.05 samples/sec Loss 4.3318 LearningRate 0.0660 Epoch: 3 Global Step: 62540 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:26:33,093-Speed 5210.91 samples/sec Loss 4.3140 LearningRate 0.0660 Epoch: 3 Global Step: 62550 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:26:35,082-Speed 5149.33 samples/sec Loss 4.3517 LearningRate 0.0660 Epoch: 3 Global Step: 62560 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:26:37,057-Speed 5188.12 samples/sec Loss 4.3529 LearningRate 0.0660 Epoch: 3 Global Step: 62570 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:26:39,047-Speed 5146.54 samples/sec Loss 4.2355 LearningRate 0.0660 Epoch: 3 Global Step: 62580 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:26:41,028-Speed 5170.80 samples/sec Loss 4.2544 LearningRate 0.0660 Epoch: 3 Global Step: 62590 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:26:43,003-Speed 5187.10 samples/sec Loss 4.2589 LearningRate 0.0660 Epoch: 3 Global Step: 62600 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:26:44,994-Speed 5146.26 samples/sec Loss 4.3606 LearningRate 0.0660 Epoch: 3 Global Step: 62610 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:26:46,977-Speed 5164.51 samples/sec Loss 4.4051 LearningRate 0.0660 Epoch: 3 Global Step: 62620 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:26:48,958-Speed 5171.66 samples/sec Loss 4.3430 LearningRate 0.0660 Epoch: 3 Global Step: 62630 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:26:50,967-Speed 5097.67 samples/sec Loss 4.2760 LearningRate 0.0660 Epoch: 3 Global Step: 62640 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:26:52,939-Speed 5195.01 samples/sec Loss 4.3427 LearningRate 0.0660 Epoch: 3 Global Step: 62650 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:26:54,911-Speed 5193.03 samples/sec Loss 4.3681 LearningRate 0.0660 Epoch: 3 Global Step: 62660 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:26:56,891-Speed 5175.18 samples/sec Loss 4.3677 LearningRate 0.0660 Epoch: 3 Global Step: 62670 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:26:58,861-Speed 5199.47 samples/sec Loss 4.3279 LearningRate 0.0660 Epoch: 3 Global Step: 62680 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:27:00,832-Speed 5196.97 samples/sec Loss 4.3321 LearningRate 0.0660 Epoch: 3 Global Step: 62690 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:27:02,823-Speed 5144.57 samples/sec Loss 4.3909 LearningRate 0.0660 Epoch: 3 Global Step: 62700 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:27:04,802-Speed 5175.95 samples/sec Loss 4.3274 LearningRate 0.0660 Epoch: 3 Global Step: 62710 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:27:06,774-Speed 5195.95 samples/sec Loss 4.3098 LearningRate 0.0660 Epoch: 3 Global Step: 62720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:27:08,744-Speed 5199.22 samples/sec Loss 4.4305 LearningRate 0.0659 Epoch: 3 Global Step: 62730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:27:10,729-Speed 5160.58 samples/sec Loss 4.2890 LearningRate 0.0659 Epoch: 3 Global Step: 62740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:27:12,695-Speed 5209.21 samples/sec Loss 4.2679 LearningRate 0.0659 Epoch: 3 Global Step: 62750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:27:14,674-Speed 5177.79 samples/sec Loss 4.3766 LearningRate 0.0659 Epoch: 3 Global Step: 62760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:27:16,657-Speed 5165.03 samples/sec Loss 4.3624 LearningRate 0.0659 Epoch: 3 Global Step: 62770 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:27:18,638-Speed 5169.52 samples/sec Loss 4.3488 LearningRate 0.0659 Epoch: 3 Global Step: 62780 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:27:20,615-Speed 5181.91 samples/sec Loss 4.4397 LearningRate 0.0659 Epoch: 3 Global Step: 62790 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:27:22,591-Speed 5184.94 samples/sec Loss 4.3822 LearningRate 0.0659 Epoch: 3 Global Step: 62800 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:27:24,580-Speed 5149.50 samples/sec Loss 4.3872 LearningRate 0.0659 Epoch: 3 Global Step: 62810 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:27:26,552-Speed 5194.38 samples/sec Loss 4.3755 LearningRate 0.0659 Epoch: 3 Global Step: 62820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:27:28,525-Speed 5191.40 samples/sec Loss 4.3507 LearningRate 0.0659 Epoch: 3 Global Step: 62830 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:27:30,526-Speed 5119.18 samples/sec Loss 4.2922 LearningRate 0.0659 Epoch: 3 Global Step: 62840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:27:32,505-Speed 5177.13 samples/sec Loss 4.3728 LearningRate 0.0659 Epoch: 3 Global Step: 62850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:27:34,496-Speed 5145.71 samples/sec Loss 4.3293 LearningRate 0.0659 Epoch: 3 Global Step: 62860 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:27:36,485-Speed 5148.62 samples/sec Loss 4.3419 LearningRate 0.0659 Epoch: 3 Global Step: 62870 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:27:38,465-Speed 5173.17 samples/sec Loss 4.3418 LearningRate 0.0659 Epoch: 3 Global Step: 62880 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:27:40,463-Speed 5126.66 samples/sec Loss 4.3552 LearningRate 0.0659 Epoch: 3 Global Step: 62890 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:27:42,442-Speed 5175.18 samples/sec Loss 4.2620 LearningRate 0.0659 Epoch: 3 Global Step: 62900 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:27:44,429-Speed 5157.11 samples/sec Loss 4.2957 LearningRate 0.0659 Epoch: 3 Global Step: 62910 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:27:46,415-Speed 5156.27 samples/sec Loss 4.2813 LearningRate 0.0659 Epoch: 3 Global Step: 62920 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-04-11 03:27:48,375-Speed 5227.42 samples/sec Loss 4.1861 LearningRate 0.0659 Epoch: 3 Global Step: 62930 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:27:50,347-Speed 5194.61 samples/sec Loss 4.3042 LearningRate 0.0658 Epoch: 3 Global Step: 62940 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:27:52,314-Speed 5208.39 samples/sec Loss 4.2549 LearningRate 0.0658 Epoch: 3 Global Step: 62950 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:27:54,283-Speed 5201.19 samples/sec Loss 4.2420 LearningRate 0.0658 Epoch: 3 Global Step: 62960 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:27:56,262-Speed 5175.89 samples/sec Loss 4.3693 LearningRate 0.0658 Epoch: 3 Global Step: 62970 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:27:58,251-Speed 5151.45 samples/sec Loss 4.3362 LearningRate 0.0658 Epoch: 3 Global Step: 62980 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:28:00,244-Speed 5139.58 samples/sec Loss 4.2673 LearningRate 0.0658 Epoch: 3 Global Step: 62990 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:28:02,236-Speed 5140.73 samples/sec Loss 4.3212 LearningRate 0.0658 Epoch: 3 Global Step: 63000 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:28:04,217-Speed 5170.96 samples/sec Loss 4.3385 LearningRate 0.0658 Epoch: 3 Global Step: 63010 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:28:06,203-Speed 5159.01 samples/sec Loss 4.2478 LearningRate 0.0658 Epoch: 3 Global Step: 63020 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:28:08,179-Speed 5183.76 samples/sec Loss 4.3242 LearningRate 0.0658 Epoch: 3 Global Step: 63030 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:28:10,158-Speed 5178.12 samples/sec Loss 4.2913 LearningRate 0.0658 Epoch: 3 Global Step: 63040 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:28:12,129-Speed 5195.08 samples/sec Loss 4.3030 LearningRate 0.0658 Epoch: 3 Global Step: 63050 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:28:14,091-Speed 5221.30 samples/sec Loss 4.2820 LearningRate 0.0658 Epoch: 3 Global Step: 63060 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:28:16,064-Speed 5192.98 samples/sec Loss 4.2440 LearningRate 0.0658 Epoch: 3 Global Step: 63070 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:28:18,041-Speed 5179.51 samples/sec Loss 4.2399 LearningRate 0.0658 Epoch: 3 Global Step: 63080 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:28:20,023-Speed 5168.32 samples/sec Loss 4.3538 LearningRate 0.0658 Epoch: 3 Global Step: 63090 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:28:22,006-Speed 5167.24 samples/sec Loss 4.3634 LearningRate 0.0658 Epoch: 3 Global Step: 63100 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:28:23,990-Speed 5162.61 samples/sec Loss 4.2942 LearningRate 0.0658 Epoch: 3 Global Step: 63110 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:28:25,975-Speed 5160.74 samples/sec Loss 4.3192 LearningRate 0.0658 Epoch: 3 Global Step: 63120 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:28:27,953-Speed 5178.46 samples/sec Loss 4.3934 LearningRate 0.0658 Epoch: 3 Global Step: 63130 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:28:29,934-Speed 5171.10 samples/sec Loss 4.2801 LearningRate 0.0657 Epoch: 3 Global Step: 63140 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:28:31,916-Speed 5167.28 samples/sec Loss 4.4037 LearningRate 0.0657 Epoch: 3 Global Step: 63150 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:28:33,891-Speed 5187.50 samples/sec Loss 4.2583 LearningRate 0.0657 Epoch: 3 Global Step: 63160 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:28:35,882-Speed 5144.70 samples/sec Loss 4.3568 LearningRate 0.0657 Epoch: 3 Global Step: 63170 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:28:37,868-Speed 5157.94 samples/sec Loss 4.3318 LearningRate 0.0657 Epoch: 3 Global Step: 63180 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:28:39,849-Speed 5169.85 samples/sec Loss 4.2878 LearningRate 0.0657 Epoch: 3 Global Step: 63190 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:28:41,819-Speed 5200.41 samples/sec Loss 4.3141 LearningRate 0.0657 Epoch: 3 Global Step: 63200 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:28:43,787-Speed 5204.43 samples/sec Loss 4.2168 LearningRate 0.0657 Epoch: 3 Global Step: 63210 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:28:45,762-Speed 5186.56 samples/sec Loss 4.3159 LearningRate 0.0657 Epoch: 3 Global Step: 63220 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:28:47,748-Speed 5157.91 samples/sec Loss 4.3178 LearningRate 0.0657 Epoch: 3 Global Step: 63230 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:28:49,749-Speed 5118.61 samples/sec Loss 4.2578 LearningRate 0.0657 Epoch: 3 Global Step: 63240 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:28:51,728-Speed 5176.90 samples/sec Loss 4.3455 LearningRate 0.0657 Epoch: 3 Global Step: 63250 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:28:53,708-Speed 5174.18 samples/sec Loss 4.2818 LearningRate 0.0657 Epoch: 3 Global Step: 63260 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:28:55,677-Speed 5200.75 samples/sec Loss 4.1701 LearningRate 0.0657 Epoch: 3 Global Step: 63270 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:28:57,661-Speed 5163.36 samples/sec Loss 4.3170 LearningRate 0.0657 Epoch: 3 Global Step: 63280 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:28:59,633-Speed 5194.74 samples/sec Loss 4.2534 LearningRate 0.0657 Epoch: 3 Global Step: 63290 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:29:01,610-Speed 5180.88 samples/sec Loss 4.3141 LearningRate 0.0657 Epoch: 3 Global Step: 63300 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:29:03,583-Speed 5192.44 samples/sec Loss 4.3298 LearningRate 0.0657 Epoch: 3 Global Step: 63310 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:29:05,551-Speed 5203.93 samples/sec Loss 4.3214 LearningRate 0.0657 Epoch: 3 Global Step: 63320 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:29:07,538-Speed 5155.53 samples/sec Loss 4.3244 LearningRate 0.0657 Epoch: 3 Global Step: 63330 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:29:09,508-Speed 5200.02 samples/sec Loss 4.3097 LearningRate 0.0657 Epoch: 3 Global Step: 63340 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:29:11,494-Speed 5157.96 samples/sec Loss 4.3596 LearningRate 0.0656 Epoch: 3 Global Step: 63350 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:29:13,478-Speed 5163.50 samples/sec Loss 4.2843 LearningRate 0.0656 Epoch: 3 Global Step: 63360 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:29:15,457-Speed 5176.82 samples/sec Loss 4.2586 LearningRate 0.0656 Epoch: 3 Global Step: 63370 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:29:17,433-Speed 5182.33 samples/sec Loss 4.4535 LearningRate 0.0656 Epoch: 3 Global Step: 63380 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:29:19,399-Speed 5209.91 samples/sec Loss 4.3057 LearningRate 0.0656 Epoch: 3 Global Step: 63390 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:29:21,373-Speed 5189.83 samples/sec Loss 4.2995 LearningRate 0.0656 Epoch: 3 Global Step: 63400 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:29:23,349-Speed 5183.52 samples/sec Loss 4.3107 LearningRate 0.0656 Epoch: 3 Global Step: 63410 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:29:25,366-Speed 5080.22 samples/sec Loss 4.3940 LearningRate 0.0656 Epoch: 3 Global Step: 63420 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:29:27,382-Speed 5080.11 samples/sec Loss 4.3248 LearningRate 0.0656 Epoch: 3 Global Step: 63430 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:29:29,365-Speed 5165.45 samples/sec Loss 4.3499 LearningRate 0.0656 Epoch: 3 Global Step: 63440 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:29:31,339-Speed 5190.45 samples/sec Loss 4.2843 LearningRate 0.0656 Epoch: 3 Global Step: 63450 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:29:33,337-Speed 5125.10 samples/sec Loss 4.2104 LearningRate 0.0656 Epoch: 3 Global Step: 63460 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:29:35,312-Speed 5188.51 samples/sec Loss 4.2887 LearningRate 0.0656 Epoch: 3 Global Step: 63470 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:29:37,284-Speed 5193.61 samples/sec Loss 4.3254 LearningRate 0.0656 Epoch: 3 Global Step: 63480 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:29:39,258-Speed 5188.22 samples/sec Loss 4.2756 LearningRate 0.0656 Epoch: 3 Global Step: 63490 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:29:41,229-Speed 5197.06 samples/sec Loss 4.3173 LearningRate 0.0656 Epoch: 3 Global Step: 63500 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:29:43,209-Speed 5174.61 samples/sec Loss 4.3172 LearningRate 0.0656 Epoch: 3 Global Step: 63510 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:29:45,180-Speed 5195.92 samples/sec Loss 4.2895 LearningRate 0.0656 Epoch: 3 Global Step: 63520 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:29:47,163-Speed 5165.99 samples/sec Loss 4.3582 LearningRate 0.0656 Epoch: 3 Global Step: 63530 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:29:49,144-Speed 5171.79 samples/sec Loss 4.3462 LearningRate 0.0656 Epoch: 3 Global Step: 63540 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:29:51,135-Speed 5144.64 samples/sec Loss 4.2952 LearningRate 0.0655 Epoch: 3 Global Step: 63550 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:29:53,121-Speed 5157.72 samples/sec Loss 4.3466 LearningRate 0.0655 Epoch: 3 Global Step: 63560 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:29:55,090-Speed 5203.17 samples/sec Loss 4.4138 LearningRate 0.0655 Epoch: 3 Global Step: 63570 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:29:57,058-Speed 5204.61 samples/sec Loss 4.3000 LearningRate 0.0655 Epoch: 3 Global Step: 63580 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:29:59,038-Speed 5174.08 samples/sec Loss 4.3940 LearningRate 0.0655 Epoch: 3 Global Step: 63590 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:30:01,028-Speed 5146.07 samples/sec Loss 4.3459 LearningRate 0.0655 Epoch: 3 Global Step: 63600 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:30:03,011-Speed 5165.72 samples/sec Loss 4.4278 LearningRate 0.0655 Epoch: 3 Global Step: 63610 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:30:04,975-Speed 5215.78 samples/sec Loss 4.3469 LearningRate 0.0655 Epoch: 3 Global Step: 63620 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:30:06,945-Speed 5200.22 samples/sec Loss 4.3295 LearningRate 0.0655 Epoch: 3 Global Step: 63630 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:30:08,921-Speed 5183.07 samples/sec Loss 4.3084 LearningRate 0.0655 Epoch: 3 Global Step: 63640 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:30:10,905-Speed 5162.57 samples/sec Loss 4.3038 LearningRate 0.0655 Epoch: 3 Global Step: 63650 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:30:12,882-Speed 5186.74 samples/sec Loss 4.3379 LearningRate 0.0655 Epoch: 3 Global Step: 63660 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:30:14,853-Speed 5197.89 samples/sec Loss 4.2644 LearningRate 0.0655 Epoch: 3 Global Step: 63670 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:30:16,849-Speed 5131.21 samples/sec Loss 4.3568 LearningRate 0.0655 Epoch: 3 Global Step: 63680 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:30:18,826-Speed 5182.10 samples/sec Loss 4.2638 LearningRate 0.0655 Epoch: 3 Global Step: 63690 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:30:20,825-Speed 5123.17 samples/sec Loss 4.3137 LearningRate 0.0655 Epoch: 3 Global Step: 63700 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:30:22,813-Speed 5153.57 samples/sec Loss 4.3092 LearningRate 0.0655 Epoch: 3 Global Step: 63710 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:30:24,794-Speed 5171.20 samples/sec Loss 4.3678 LearningRate 0.0655 Epoch: 3 Global Step: 63720 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:30:26,776-Speed 5166.52 samples/sec Loss 4.2753 LearningRate 0.0655 Epoch: 3 Global Step: 63730 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:30:28,770-Speed 5136.94 samples/sec Loss 4.2781 LearningRate 0.0655 Epoch: 3 Global Step: 63740 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:30:30,755-Speed 5160.25 samples/sec Loss 4.3016 LearningRate 0.0655 Epoch: 3 Global Step: 63750 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:30:32,738-Speed 5166.01 samples/sec Loss 4.3655 LearningRate 0.0654 Epoch: 3 Global Step: 63760 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:30:34,726-Speed 5153.64 samples/sec Loss 4.2510 LearningRate 0.0654 Epoch: 3 Global Step: 63770 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:30:36,702-Speed 5182.66 samples/sec Loss 4.3858 LearningRate 0.0654 Epoch: 3 Global Step: 63780 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:30:38,680-Speed 5178.82 samples/sec Loss 4.3263 LearningRate 0.0654 Epoch: 3 Global Step: 63790 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:30:40,661-Speed 5171.57 samples/sec Loss 4.2545 LearningRate 0.0654 Epoch: 3 Global Step: 63800 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:30:42,639-Speed 5179.83 samples/sec Loss 4.3060 LearningRate 0.0654 Epoch: 3 Global Step: 63810 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:30:44,625-Speed 5157.09 samples/sec Loss 4.3660 LearningRate 0.0654 Epoch: 3 Global Step: 63820 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:30:46,601-Speed 5182.19 samples/sec Loss 4.2686 LearningRate 0.0654 Epoch: 3 Global Step: 63830 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:30:48,588-Speed 5154.85 samples/sec Loss 4.3028 LearningRate 0.0654 Epoch: 3 Global Step: 63840 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:30:50,575-Speed 5157.62 samples/sec Loss 4.3548 LearningRate 0.0654 Epoch: 3 Global Step: 63850 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:30:52,584-Speed 5097.79 samples/sec Loss 4.2725 LearningRate 0.0654 Epoch: 3 Global Step: 63860 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:30:54,556-Speed 5195.17 samples/sec Loss 4.2501 LearningRate 0.0654 Epoch: 3 Global Step: 63870 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:30:56,537-Speed 5170.37 samples/sec Loss 4.3407 LearningRate 0.0654 Epoch: 3 Global Step: 63880 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:30:58,510-Speed 5193.80 samples/sec Loss 4.3026 LearningRate 0.0654 Epoch: 3 Global Step: 63890 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:31:00,492-Speed 5166.63 samples/sec Loss 4.2735 LearningRate 0.0654 Epoch: 3 Global Step: 63900 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:31:02,477-Speed 5160.63 samples/sec Loss 4.2605 LearningRate 0.0654 Epoch: 3 Global Step: 63910 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:31:04,446-Speed 5201.74 samples/sec Loss 4.1877 LearningRate 0.0654 Epoch: 3 Global Step: 63920 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:31:06,428-Speed 5168.45 samples/sec Loss 4.3294 LearningRate 0.0654 Epoch: 3 Global Step: 63930 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:31:08,422-Speed 5136.34 samples/sec Loss 4.1815 LearningRate 0.0654 Epoch: 3 Global Step: 63940 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:31:10,433-Speed 5094.28 samples/sec Loss 4.2712 LearningRate 0.0654 Epoch: 3 Global Step: 63950 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:31:12,407-Speed 5190.37 samples/sec Loss 4.2427 LearningRate 0.0654 Epoch: 3 Global Step: 63960 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:31:14,399-Speed 5142.10 samples/sec Loss 4.2058 LearningRate 0.0653 Epoch: 3 Global Step: 63970 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:31:16,389-Speed 5148.21 samples/sec Loss 4.2823 LearningRate 0.0653 Epoch: 3 Global Step: 63980 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:31:18,398-Speed 5099.06 samples/sec Loss 4.2394 LearningRate 0.0653 Epoch: 3 Global Step: 63990 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:31:20,382-Speed 5162.57 samples/sec Loss 4.2595 LearningRate 0.0653 Epoch: 3 Global Step: 64000 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:31:46,958-[lfw][64000]XNorm: 21.218271 Training: 2022-04-11 03:31:46,959-[lfw][64000]Accuracy-Flip: 0.99717+-0.00350 Training: 2022-04-11 03:31:46,959-[lfw][64000]Accuracy-Highest: 0.99800 Training: 2022-04-11 03:32:17,713-[cfp_fp][64000]XNorm: 18.928848 Training: 2022-04-11 03:32:17,713-[cfp_fp][64000]Accuracy-Flip: 0.97357+-0.00604 Training: 2022-04-11 03:32:17,714-[cfp_fp][64000]Accuracy-Highest: 0.97871 Training: 2022-04-11 03:32:44,169-[agedb_30][64000]XNorm: 20.761993 Training: 2022-04-11 03:32:44,169-[agedb_30][64000]Accuracy-Flip: 0.97650+-0.00883 Training: 2022-04-11 03:32:44,170-[agedb_30][64000]Accuracy-Highest: 0.97717 Training: 2022-04-11 03:32:46,177-Speed 119.35 samples/sec Loss 4.2910 LearningRate 0.0653 Epoch: 3 Global Step: 64010 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:32:48,173-Speed 5131.86 samples/sec Loss 4.2691 LearningRate 0.0653 Epoch: 3 Global Step: 64020 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:32:50,146-Speed 5194.05 samples/sec Loss 4.2837 LearningRate 0.0653 Epoch: 3 Global Step: 64030 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:32:52,121-Speed 5185.76 samples/sec Loss 4.2741 LearningRate 0.0653 Epoch: 3 Global Step: 64040 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:32:54,097-Speed 5183.06 samples/sec Loss 4.2980 LearningRate 0.0653 Epoch: 3 Global Step: 64050 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:32:56,076-Speed 5175.56 samples/sec Loss 4.2277 LearningRate 0.0653 Epoch: 3 Global Step: 64060 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:32:58,076-Speed 5123.34 samples/sec Loss 4.2702 LearningRate 0.0653 Epoch: 3 Global Step: 64070 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:33:00,060-Speed 5162.59 samples/sec Loss 4.3371 LearningRate 0.0653 Epoch: 3 Global Step: 64080 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:33:02,046-Speed 5156.47 samples/sec Loss 4.1961 LearningRate 0.0653 Epoch: 3 Global Step: 64090 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:33:04,034-Speed 5154.09 samples/sec Loss 4.3257 LearningRate 0.0653 Epoch: 3 Global Step: 64100 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:33:06,037-Speed 5113.19 samples/sec Loss 4.2867 LearningRate 0.0653 Epoch: 3 Global Step: 64110 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:33:08,000-Speed 5218.14 samples/sec Loss 4.2288 LearningRate 0.0653 Epoch: 3 Global Step: 64120 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:33:09,980-Speed 5172.72 samples/sec Loss 4.2641 LearningRate 0.0653 Epoch: 3 Global Step: 64130 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:33:11,963-Speed 5168.06 samples/sec Loss 4.2926 LearningRate 0.0653 Epoch: 3 Global Step: 64140 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:33:13,934-Speed 5195.49 samples/sec Loss 4.3922 LearningRate 0.0653 Epoch: 3 Global Step: 64150 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:33:15,912-Speed 5179.07 samples/sec Loss 4.2604 LearningRate 0.0653 Epoch: 3 Global Step: 64160 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:33:17,885-Speed 5190.35 samples/sec Loss 4.2248 LearningRate 0.0652 Epoch: 3 Global Step: 64170 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:33:19,860-Speed 5186.22 samples/sec Loss 4.2472 LearningRate 0.0652 Epoch: 3 Global Step: 64180 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:33:21,842-Speed 5170.24 samples/sec Loss 4.2653 LearningRate 0.0652 Epoch: 3 Global Step: 64190 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:33:23,853-Speed 5094.23 samples/sec Loss 4.2231 LearningRate 0.0652 Epoch: 3 Global Step: 64200 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:33:25,843-Speed 5146.73 samples/sec Loss 4.1587 LearningRate 0.0652 Epoch: 3 Global Step: 64210 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:33:27,839-Speed 5133.24 samples/sec Loss 4.3232 LearningRate 0.0652 Epoch: 3 Global Step: 64220 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:33:29,824-Speed 5160.50 samples/sec Loss 4.1361 LearningRate 0.0652 Epoch: 3 Global Step: 64230 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:33:31,796-Speed 5193.05 samples/sec Loss 4.2616 LearningRate 0.0652 Epoch: 3 Global Step: 64240 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:33:33,792-Speed 5132.30 samples/sec Loss 4.2234 LearningRate 0.0652 Epoch: 3 Global Step: 64250 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:33:35,780-Speed 5153.27 samples/sec Loss 4.3643 LearningRate 0.0652 Epoch: 3 Global Step: 64260 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:33:37,801-Speed 5067.20 samples/sec Loss 4.2496 LearningRate 0.0652 Epoch: 3 Global Step: 64270 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:33:39,793-Speed 5144.22 samples/sec Loss 4.2286 LearningRate 0.0652 Epoch: 3 Global Step: 64280 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:33:41,782-Speed 5149.80 samples/sec Loss 4.2983 LearningRate 0.0652 Epoch: 3 Global Step: 64290 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:33:43,766-Speed 5161.61 samples/sec Loss 4.2663 LearningRate 0.0652 Epoch: 3 Global Step: 64300 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:33:45,755-Speed 5148.95 samples/sec Loss 4.3141 LearningRate 0.0652 Epoch: 3 Global Step: 64310 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:33:47,747-Speed 5144.08 samples/sec Loss 4.1559 LearningRate 0.0652 Epoch: 3 Global Step: 64320 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:33:49,728-Speed 5171.57 samples/sec Loss 4.3330 LearningRate 0.0652 Epoch: 3 Global Step: 64330 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:33:51,712-Speed 5162.50 samples/sec Loss 4.2898 LearningRate 0.0652 Epoch: 3 Global Step: 64340 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:33:53,698-Speed 5157.71 samples/sec Loss 4.1827 LearningRate 0.0652 Epoch: 3 Global Step: 64350 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:33:55,671-Speed 5191.40 samples/sec Loss 4.2633 LearningRate 0.0652 Epoch: 3 Global Step: 64360 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:33:57,642-Speed 5198.94 samples/sec Loss 4.2677 LearningRate 0.0652 Epoch: 3 Global Step: 64370 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:33:59,638-Speed 5129.99 samples/sec Loss 4.3132 LearningRate 0.0651 Epoch: 3 Global Step: 64380 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:34:01,651-Speed 5089.41 samples/sec Loss 4.2767 LearningRate 0.0651 Epoch: 3 Global Step: 64390 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:34:03,631-Speed 5174.20 samples/sec Loss 4.3591 LearningRate 0.0651 Epoch: 3 Global Step: 64400 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:34:05,626-Speed 5134.18 samples/sec Loss 4.2534 LearningRate 0.0651 Epoch: 3 Global Step: 64410 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:34:07,619-Speed 5138.71 samples/sec Loss 4.3748 LearningRate 0.0651 Epoch: 3 Global Step: 64420 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:34:09,601-Speed 5168.70 samples/sec Loss 4.3020 LearningRate 0.0651 Epoch: 3 Global Step: 64430 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:34:11,582-Speed 5173.37 samples/sec Loss 4.2797 LearningRate 0.0651 Epoch: 3 Global Step: 64440 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:34:13,555-Speed 5191.68 samples/sec Loss 4.3091 LearningRate 0.0651 Epoch: 3 Global Step: 64450 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:34:15,535-Speed 5173.25 samples/sec Loss 4.3039 LearningRate 0.0651 Epoch: 3 Global Step: 64460 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:34:17,523-Speed 5151.15 samples/sec Loss 4.3347 LearningRate 0.0651 Epoch: 3 Global Step: 64470 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:34:19,518-Speed 5133.02 samples/sec Loss 4.3710 LearningRate 0.0651 Epoch: 3 Global Step: 64480 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:34:21,495-Speed 5181.25 samples/sec Loss 4.3395 LearningRate 0.0651 Epoch: 3 Global Step: 64490 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:34:23,482-Speed 5155.67 samples/sec Loss 4.2409 LearningRate 0.0651 Epoch: 3 Global Step: 64500 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:34:25,461-Speed 5177.72 samples/sec Loss 4.2874 LearningRate 0.0651 Epoch: 3 Global Step: 64510 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:34:27,449-Speed 5150.57 samples/sec Loss 4.3133 LearningRate 0.0651 Epoch: 3 Global Step: 64520 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:34:29,434-Speed 5161.17 samples/sec Loss 4.3279 LearningRate 0.0651 Epoch: 3 Global Step: 64530 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:34:31,410-Speed 5185.72 samples/sec Loss 4.3304 LearningRate 0.0651 Epoch: 3 Global Step: 64540 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:34:33,380-Speed 5200.26 samples/sec Loss 4.1914 LearningRate 0.0651 Epoch: 3 Global Step: 64550 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:34:35,349-Speed 5202.65 samples/sec Loss 4.2543 LearningRate 0.0651 Epoch: 3 Global Step: 64560 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:34:37,316-Speed 5207.40 samples/sec Loss 4.2574 LearningRate 0.0651 Epoch: 3 Global Step: 64570 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:34:39,288-Speed 5193.45 samples/sec Loss 4.2779 LearningRate 0.0651 Epoch: 3 Global Step: 64580 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:34:41,257-Speed 5202.17 samples/sec Loss 4.2148 LearningRate 0.0650 Epoch: 3 Global Step: 64590 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:34:43,248-Speed 5145.28 samples/sec Loss 4.3807 LearningRate 0.0650 Epoch: 3 Global Step: 64600 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:34:45,258-Speed 5097.13 samples/sec Loss 4.3713 LearningRate 0.0650 Epoch: 3 Global Step: 64610 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:34:47,245-Speed 5154.72 samples/sec Loss 4.2484 LearningRate 0.0650 Epoch: 3 Global Step: 64620 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:34:49,228-Speed 5163.55 samples/sec Loss 4.2487 LearningRate 0.0650 Epoch: 3 Global Step: 64630 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:34:51,218-Speed 5149.08 samples/sec Loss 4.3159 LearningRate 0.0650 Epoch: 3 Global Step: 64640 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:34:53,207-Speed 5149.88 samples/sec Loss 4.2579 LearningRate 0.0650 Epoch: 3 Global Step: 64650 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:34:55,175-Speed 5204.63 samples/sec Loss 4.2369 LearningRate 0.0650 Epoch: 3 Global Step: 64660 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:34:57,146-Speed 5197.93 samples/sec Loss 4.2344 LearningRate 0.0650 Epoch: 3 Global Step: 64670 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:34:59,131-Speed 5159.56 samples/sec Loss 4.2619 LearningRate 0.0650 Epoch: 3 Global Step: 64680 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:35:01,107-Speed 5185.53 samples/sec Loss 4.3614 LearningRate 0.0650 Epoch: 3 Global Step: 64690 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:35:03,081-Speed 5187.84 samples/sec Loss 4.2803 LearningRate 0.0650 Epoch: 3 Global Step: 64700 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:35:05,059-Speed 5178.51 samples/sec Loss 4.3496 LearningRate 0.0650 Epoch: 3 Global Step: 64710 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:35:07,025-Speed 5209.39 samples/sec Loss 4.2454 LearningRate 0.0650 Epoch: 3 Global Step: 64720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:35:09,013-Speed 5153.67 samples/sec Loss 4.2120 LearningRate 0.0650 Epoch: 3 Global Step: 64730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:35:11,019-Speed 5105.78 samples/sec Loss 4.3945 LearningRate 0.0650 Epoch: 3 Global Step: 64740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:35:13,008-Speed 5152.06 samples/sec Loss 4.3232 LearningRate 0.0650 Epoch: 3 Global Step: 64750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:35:14,980-Speed 5195.12 samples/sec Loss 4.2548 LearningRate 0.0650 Epoch: 3 Global Step: 64760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:35:16,951-Speed 5196.73 samples/sec Loss 4.3030 LearningRate 0.0650 Epoch: 3 Global Step: 64770 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:35:18,924-Speed 5191.16 samples/sec Loss 4.2832 LearningRate 0.0650 Epoch: 3 Global Step: 64780 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:35:20,894-Speed 5198.33 samples/sec Loss 4.3134 LearningRate 0.0649 Epoch: 3 Global Step: 64790 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:35:22,874-Speed 5174.15 samples/sec Loss 4.3303 LearningRate 0.0649 Epoch: 3 Global Step: 64800 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:35:24,865-Speed 5144.05 samples/sec Loss 4.2553 LearningRate 0.0649 Epoch: 3 Global Step: 64810 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:35:26,842-Speed 5181.77 samples/sec Loss 4.3091 LearningRate 0.0649 Epoch: 3 Global Step: 64820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:35:28,823-Speed 5170.80 samples/sec Loss 4.2745 LearningRate 0.0649 Epoch: 3 Global Step: 64830 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:35:30,790-Speed 5208.05 samples/sec Loss 4.2700 LearningRate 0.0649 Epoch: 3 Global Step: 64840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:35:32,771-Speed 5173.20 samples/sec Loss 4.3574 LearningRate 0.0649 Epoch: 3 Global Step: 64850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:35:34,786-Speed 5082.58 samples/sec Loss 4.2655 LearningRate 0.0649 Epoch: 3 Global Step: 64860 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:35:36,770-Speed 5161.52 samples/sec Loss 4.2327 LearningRate 0.0649 Epoch: 3 Global Step: 64870 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:35:38,738-Speed 5206.00 samples/sec Loss 4.2554 LearningRate 0.0649 Epoch: 3 Global Step: 64880 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:35:40,717-Speed 5177.68 samples/sec Loss 4.2433 LearningRate 0.0649 Epoch: 3 Global Step: 64890 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:35:42,692-Speed 5184.28 samples/sec Loss 4.2402 LearningRate 0.0649 Epoch: 3 Global Step: 64900 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:35:44,664-Speed 5195.72 samples/sec Loss 4.2948 LearningRate 0.0649 Epoch: 3 Global Step: 64910 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:35:46,671-Speed 5102.92 samples/sec Loss 4.3053 LearningRate 0.0649 Epoch: 3 Global Step: 64920 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:35:48,648-Speed 5182.17 samples/sec Loss 4.1785 LearningRate 0.0649 Epoch: 3 Global Step: 64930 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:35:50,625-Speed 5179.83 samples/sec Loss 4.2979 LearningRate 0.0649 Epoch: 3 Global Step: 64940 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:35:52,606-Speed 5171.48 samples/sec Loss 4.2620 LearningRate 0.0649 Epoch: 3 Global Step: 64950 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:35:54,584-Speed 5179.61 samples/sec Loss 4.3544 LearningRate 0.0649 Epoch: 3 Global Step: 64960 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:35:56,564-Speed 5172.71 samples/sec Loss 4.3356 LearningRate 0.0649 Epoch: 3 Global Step: 64970 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:35:58,545-Speed 5170.60 samples/sec Loss 4.2779 LearningRate 0.0649 Epoch: 3 Global Step: 64980 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:36:00,552-Speed 5106.05 samples/sec Loss 4.2586 LearningRate 0.0649 Epoch: 3 Global Step: 64990 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:36:02,551-Speed 5122.55 samples/sec Loss 4.2446 LearningRate 0.0648 Epoch: 3 Global Step: 65000 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:36:04,538-Speed 5155.79 samples/sec Loss 4.2023 LearningRate 0.0648 Epoch: 3 Global Step: 65010 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:36:06,514-Speed 5183.06 samples/sec Loss 4.3406 LearningRate 0.0648 Epoch: 3 Global Step: 65020 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:36:08,501-Speed 5155.63 samples/sec Loss 4.2810 LearningRate 0.0648 Epoch: 3 Global Step: 65030 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:36:10,472-Speed 5196.92 samples/sec Loss 4.2561 LearningRate 0.0648 Epoch: 3 Global Step: 65040 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:36:12,452-Speed 5173.28 samples/sec Loss 4.3312 LearningRate 0.0648 Epoch: 3 Global Step: 65050 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:36:14,426-Speed 5188.82 samples/sec Loss 4.2174 LearningRate 0.0648 Epoch: 3 Global Step: 65060 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:36:16,405-Speed 5176.94 samples/sec Loss 4.1931 LearningRate 0.0648 Epoch: 3 Global Step: 65070 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:36:18,382-Speed 5181.80 samples/sec Loss 4.1309 LearningRate 0.0648 Epoch: 3 Global Step: 65080 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:36:20,352-Speed 5199.24 samples/sec Loss 4.2978 LearningRate 0.0648 Epoch: 3 Global Step: 65090 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:36:22,325-Speed 5197.41 samples/sec Loss 4.2067 LearningRate 0.0648 Epoch: 3 Global Step: 65100 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-04-11 03:36:24,291-Speed 5207.91 samples/sec Loss 4.2543 LearningRate 0.0648 Epoch: 3 Global Step: 65110 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:36:26,290-Speed 5125.68 samples/sec Loss 4.2146 LearningRate 0.0648 Epoch: 3 Global Step: 65120 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:36:28,263-Speed 5191.05 samples/sec Loss 4.2224 LearningRate 0.0648 Epoch: 3 Global Step: 65130 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:36:30,245-Speed 5167.21 samples/sec Loss 4.2502 LearningRate 0.0648 Epoch: 3 Global Step: 65140 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:36:32,242-Speed 5129.80 samples/sec Loss 4.3047 LearningRate 0.0648 Epoch: 3 Global Step: 65150 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:36:34,221-Speed 5178.60 samples/sec Loss 4.2345 LearningRate 0.0648 Epoch: 3 Global Step: 65160 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:36:36,193-Speed 5194.14 samples/sec Loss 4.3049 LearningRate 0.0648 Epoch: 3 Global Step: 65170 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:36:38,170-Speed 5181.35 samples/sec Loss 4.2050 LearningRate 0.0648 Epoch: 3 Global Step: 65180 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:36:40,167-Speed 5129.54 samples/sec Loss 4.2323 LearningRate 0.0648 Epoch: 3 Global Step: 65190 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:36:42,157-Speed 5146.67 samples/sec Loss 4.2594 LearningRate 0.0648 Epoch: 3 Global Step: 65200 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:36:44,119-Speed 5219.87 samples/sec Loss 4.3183 LearningRate 0.0647 Epoch: 3 Global Step: 65210 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:36:46,102-Speed 5166.40 samples/sec Loss 4.1598 LearningRate 0.0647 Epoch: 3 Global Step: 65220 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:36:48,103-Speed 5119.69 samples/sec Loss 4.2160 LearningRate 0.0647 Epoch: 3 Global Step: 65230 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:36:50,107-Speed 5111.82 samples/sec Loss 4.3244 LearningRate 0.0647 Epoch: 3 Global Step: 65240 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:36:52,101-Speed 5136.80 samples/sec Loss 4.2716 LearningRate 0.0647 Epoch: 3 Global Step: 65250 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:36:54,093-Speed 5142.51 samples/sec Loss 4.2832 LearningRate 0.0647 Epoch: 3 Global Step: 65260 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:36:56,058-Speed 5213.29 samples/sec Loss 4.2567 LearningRate 0.0647 Epoch: 3 Global Step: 65270 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:36:58,029-Speed 5197.54 samples/sec Loss 4.2267 LearningRate 0.0647 Epoch: 3 Global Step: 65280 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:37:00,010-Speed 5171.10 samples/sec Loss 4.2822 LearningRate 0.0647 Epoch: 3 Global Step: 65290 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:37:01,982-Speed 5193.05 samples/sec Loss 4.2622 LearningRate 0.0647 Epoch: 3 Global Step: 65300 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:37:03,952-Speed 5200.00 samples/sec Loss 4.2261 LearningRate 0.0647 Epoch: 3 Global Step: 65310 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:37:05,920-Speed 5203.81 samples/sec Loss 4.2323 LearningRate 0.0647 Epoch: 3 Global Step: 65320 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:37:07,911-Speed 5145.96 samples/sec Loss 4.2463 LearningRate 0.0647 Epoch: 3 Global Step: 65330 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:37:09,878-Speed 5206.72 samples/sec Loss 4.3031 LearningRate 0.0647 Epoch: 3 Global Step: 65340 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:37:11,849-Speed 5197.21 samples/sec Loss 4.3181 LearningRate 0.0647 Epoch: 3 Global Step: 65350 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:37:13,823-Speed 5190.63 samples/sec Loss 4.1623 LearningRate 0.0647 Epoch: 3 Global Step: 65360 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:37:15,806-Speed 5165.52 samples/sec Loss 4.2064 LearningRate 0.0647 Epoch: 3 Global Step: 65370 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:37:17,797-Speed 5143.76 samples/sec Loss 4.2098 LearningRate 0.0647 Epoch: 3 Global Step: 65380 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:37:19,768-Speed 5198.63 samples/sec Loss 4.3034 LearningRate 0.0647 Epoch: 3 Global Step: 65390 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:37:21,757-Speed 5148.97 samples/sec Loss 4.2242 LearningRate 0.0647 Epoch: 3 Global Step: 65400 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:37:23,737-Speed 5175.81 samples/sec Loss 4.2149 LearningRate 0.0647 Epoch: 3 Global Step: 65410 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:37:25,711-Speed 5188.20 samples/sec Loss 4.2954 LearningRate 0.0646 Epoch: 3 Global Step: 65420 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:37:27,687-Speed 5183.57 samples/sec Loss 4.3066 LearningRate 0.0646 Epoch: 3 Global Step: 65430 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:37:29,683-Speed 5130.98 samples/sec Loss 4.2683 LearningRate 0.0646 Epoch: 3 Global Step: 65440 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:37:31,659-Speed 5184.04 samples/sec Loss 4.2135 LearningRate 0.0646 Epoch: 3 Global Step: 65450 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:37:33,639-Speed 5173.92 samples/sec Loss 4.2545 LearningRate 0.0646 Epoch: 3 Global Step: 65460 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:37:35,624-Speed 5161.34 samples/sec Loss 4.3303 LearningRate 0.0646 Epoch: 3 Global Step: 65470 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:37:37,600-Speed 5183.09 samples/sec Loss 4.3243 LearningRate 0.0646 Epoch: 3 Global Step: 65480 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:37:39,580-Speed 5173.38 samples/sec Loss 4.2767 LearningRate 0.0646 Epoch: 3 Global Step: 65490 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:37:41,554-Speed 5189.16 samples/sec Loss 4.1648 LearningRate 0.0646 Epoch: 3 Global Step: 65500 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:37:43,524-Speed 5199.29 samples/sec Loss 4.1968 LearningRate 0.0646 Epoch: 3 Global Step: 65510 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:37:45,531-Speed 5104.12 samples/sec Loss 4.1588 LearningRate 0.0646 Epoch: 3 Global Step: 65520 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:37:47,509-Speed 5180.40 samples/sec Loss 4.2707 LearningRate 0.0646 Epoch: 3 Global Step: 65530 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:37:49,494-Speed 5159.01 samples/sec Loss 4.2593 LearningRate 0.0646 Epoch: 3 Global Step: 65540 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:37:51,472-Speed 5179.82 samples/sec Loss 4.2475 LearningRate 0.0646 Epoch: 3 Global Step: 65550 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:37:53,451-Speed 5174.58 samples/sec Loss 4.1844 LearningRate 0.0646 Epoch: 3 Global Step: 65560 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:37:55,450-Speed 5124.30 samples/sec Loss 4.1841 LearningRate 0.0646 Epoch: 3 Global Step: 65570 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:37:57,454-Speed 5111.46 samples/sec Loss 4.2414 LearningRate 0.0646 Epoch: 3 Global Step: 65580 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:37:59,431-Speed 5183.39 samples/sec Loss 4.2608 LearningRate 0.0646 Epoch: 3 Global Step: 65590 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:38:01,420-Speed 5149.19 samples/sec Loss 4.2471 LearningRate 0.0646 Epoch: 3 Global Step: 65600 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:38:03,395-Speed 5186.13 samples/sec Loss 4.2535 LearningRate 0.0646 Epoch: 3 Global Step: 65610 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:38:05,366-Speed 5197.56 samples/sec Loss 4.2351 LearningRate 0.0645 Epoch: 3 Global Step: 65620 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:38:07,337-Speed 5195.94 samples/sec Loss 4.2842 LearningRate 0.0645 Epoch: 3 Global Step: 65630 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:38:09,316-Speed 5177.83 samples/sec Loss 4.2555 LearningRate 0.0645 Epoch: 3 Global Step: 65640 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:38:11,298-Speed 5167.29 samples/sec Loss 4.2909 LearningRate 0.0645 Epoch: 3 Global Step: 65650 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:38:13,269-Speed 5197.11 samples/sec Loss 4.2942 LearningRate 0.0645 Epoch: 3 Global Step: 65660 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:38:15,247-Speed 5180.09 samples/sec Loss 4.2885 LearningRate 0.0645 Epoch: 3 Global Step: 65670 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:38:17,225-Speed 5176.18 samples/sec Loss 4.2223 LearningRate 0.0645 Epoch: 3 Global Step: 65680 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:38:19,214-Speed 5150.97 samples/sec Loss 4.3482 LearningRate 0.0645 Epoch: 3 Global Step: 65690 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:38:21,196-Speed 5170.18 samples/sec Loss 4.3202 LearningRate 0.0645 Epoch: 3 Global Step: 65700 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:38:23,205-Speed 5097.10 samples/sec Loss 4.3138 LearningRate 0.0645 Epoch: 3 Global Step: 65710 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:38:25,190-Speed 5162.21 samples/sec Loss 4.3511 LearningRate 0.0645 Epoch: 3 Global Step: 65720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:38:27,185-Speed 5134.62 samples/sec Loss 4.3279 LearningRate 0.0645 Epoch: 3 Global Step: 65730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:38:29,156-Speed 5196.56 samples/sec Loss 4.3394 LearningRate 0.0645 Epoch: 3 Global Step: 65740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:38:31,138-Speed 5168.26 samples/sec Loss 4.2388 LearningRate 0.0645 Epoch: 3 Global Step: 65750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:38:33,112-Speed 5190.00 samples/sec Loss 4.2896 LearningRate 0.0645 Epoch: 3 Global Step: 65760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:38:35,092-Speed 5172.98 samples/sec Loss 4.2143 LearningRate 0.0645 Epoch: 3 Global Step: 65770 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:38:37,099-Speed 5102.13 samples/sec Loss 4.2203 LearningRate 0.0645 Epoch: 3 Global Step: 65780 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:38:39,084-Speed 5162.26 samples/sec Loss 4.2057 LearningRate 0.0645 Epoch: 3 Global Step: 65790 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:38:41,083-Speed 5123.84 samples/sec Loss 4.1949 LearningRate 0.0645 Epoch: 3 Global Step: 65800 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:38:43,062-Speed 5177.08 samples/sec Loss 4.2435 LearningRate 0.0645 Epoch: 3 Global Step: 65810 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:38:45,030-Speed 5204.56 samples/sec Loss 4.2461 LearningRate 0.0645 Epoch: 3 Global Step: 65820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:38:47,015-Speed 5160.64 samples/sec Loss 4.1965 LearningRate 0.0644 Epoch: 3 Global Step: 65830 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:38:48,999-Speed 5162.56 samples/sec Loss 4.2897 LearningRate 0.0644 Epoch: 3 Global Step: 65840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:38:51,006-Speed 5102.24 samples/sec Loss 4.3222 LearningRate 0.0644 Epoch: 3 Global Step: 65850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:38:52,984-Speed 5178.65 samples/sec Loss 4.2842 LearningRate 0.0644 Epoch: 3 Global Step: 65860 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:38:54,958-Speed 5207.52 samples/sec Loss 4.2643 LearningRate 0.0644 Epoch: 3 Global Step: 65870 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:38:56,956-Speed 5128.58 samples/sec Loss 4.2168 LearningRate 0.0644 Epoch: 3 Global Step: 65880 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:38:58,942-Speed 5156.00 samples/sec Loss 4.3022 LearningRate 0.0644 Epoch: 3 Global Step: 65890 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:39:00,932-Speed 5147.98 samples/sec Loss 4.2440 LearningRate 0.0644 Epoch: 3 Global Step: 65900 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:39:02,933-Speed 5120.06 samples/sec Loss 4.2431 LearningRate 0.0644 Epoch: 3 Global Step: 65910 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:39:04,909-Speed 5185.63 samples/sec Loss 4.1766 LearningRate 0.0644 Epoch: 3 Global Step: 65920 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:39:06,891-Speed 5167.45 samples/sec Loss 4.2075 LearningRate 0.0644 Epoch: 3 Global Step: 65930 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:39:08,867-Speed 5183.84 samples/sec Loss 4.1832 LearningRate 0.0644 Epoch: 3 Global Step: 65940 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:39:10,867-Speed 5122.15 samples/sec Loss 4.2629 LearningRate 0.0644 Epoch: 3 Global Step: 65950 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:39:12,866-Speed 5123.53 samples/sec Loss 4.1834 LearningRate 0.0644 Epoch: 3 Global Step: 65960 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:39:14,874-Speed 5101.88 samples/sec Loss 4.2955 LearningRate 0.0644 Epoch: 3 Global Step: 65970 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-04-11 03:39:16,846-Speed 5194.21 samples/sec Loss 4.2693 LearningRate 0.0644 Epoch: 3 Global Step: 65980 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:39:18,819-Speed 5190.90 samples/sec Loss 4.2544 LearningRate 0.0644 Epoch: 3 Global Step: 65990 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:39:20,791-Speed 5193.40 samples/sec Loss 4.1956 LearningRate 0.0644 Epoch: 3 Global Step: 66000 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:39:47,438-[lfw][66000]XNorm: 19.839199 Training: 2022-04-11 03:39:47,438-[lfw][66000]Accuracy-Flip: 0.99750+-0.00250 Training: 2022-04-11 03:39:47,439-[lfw][66000]Accuracy-Highest: 0.99800 Training: 2022-04-11 03:40:18,258-[cfp_fp][66000]XNorm: 17.984252 Training: 2022-04-11 03:40:18,259-[cfp_fp][66000]Accuracy-Flip: 0.97486+-0.00652 Training: 2022-04-11 03:40:18,259-[cfp_fp][66000]Accuracy-Highest: 0.97871 Training: 2022-04-11 03:40:44,876-[agedb_30][66000]XNorm: 20.000499 Training: 2022-04-11 03:40:44,876-[agedb_30][66000]Accuracy-Flip: 0.97700+-0.00686 Training: 2022-04-11 03:40:44,877-[agedb_30][66000]Accuracy-Highest: 0.97717 Training: 2022-04-11 03:40:46,872-Speed 118.96 samples/sec Loss 4.2681 LearningRate 0.0644 Epoch: 3 Global Step: 66010 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:40:48,848-Speed 5185.95 samples/sec Loss 4.2699 LearningRate 0.0644 Epoch: 3 Global Step: 66020 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:40:50,811-Speed 5216.30 samples/sec Loss 4.2099 LearningRate 0.0644 Epoch: 3 Global Step: 66030 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:40:52,806-Speed 5135.63 samples/sec Loss 4.2297 LearningRate 0.0643 Epoch: 3 Global Step: 66040 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:40:54,772-Speed 5211.03 samples/sec Loss 4.1581 LearningRate 0.0643 Epoch: 3 Global Step: 66050 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:40:56,746-Speed 5189.14 samples/sec Loss 4.2947 LearningRate 0.0643 Epoch: 3 Global Step: 66060 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:40:58,733-Speed 5155.47 samples/sec Loss 4.2298 LearningRate 0.0643 Epoch: 3 Global Step: 66070 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:41:00,691-Speed 5229.11 samples/sec Loss 4.3223 LearningRate 0.0643 Epoch: 3 Global Step: 66080 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:41:02,664-Speed 5192.35 samples/sec Loss 4.2147 LearningRate 0.0643 Epoch: 3 Global Step: 66090 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:41:04,639-Speed 5186.48 samples/sec Loss 4.3660 LearningRate 0.0643 Epoch: 3 Global Step: 66100 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:41:06,613-Speed 5189.21 samples/sec Loss 4.2056 LearningRate 0.0643 Epoch: 3 Global Step: 66110 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:41:08,582-Speed 5204.00 samples/sec Loss 4.2242 LearningRate 0.0643 Epoch: 3 Global Step: 66120 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:41:10,558-Speed 5183.55 samples/sec Loss 4.2596 LearningRate 0.0643 Epoch: 3 Global Step: 66130 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:41:12,525-Speed 5208.45 samples/sec Loss 4.2356 LearningRate 0.0643 Epoch: 3 Global Step: 66140 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:41:14,495-Speed 5199.19 samples/sec Loss 4.2728 LearningRate 0.0643 Epoch: 3 Global Step: 66150 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:41:16,468-Speed 5190.30 samples/sec Loss 4.1474 LearningRate 0.0643 Epoch: 3 Global Step: 66160 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:41:18,440-Speed 5194.69 samples/sec Loss 4.2094 LearningRate 0.0643 Epoch: 3 Global Step: 66170 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:41:20,408-Speed 5204.89 samples/sec Loss 4.2977 LearningRate 0.0643 Epoch: 3 Global Step: 66180 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:41:22,378-Speed 5201.25 samples/sec Loss 4.2285 LearningRate 0.0643 Epoch: 3 Global Step: 66190 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:41:24,370-Speed 5142.65 samples/sec Loss 4.1334 LearningRate 0.0643 Epoch: 3 Global Step: 66200 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:41:26,368-Speed 5124.66 samples/sec Loss 4.1788 LearningRate 0.0643 Epoch: 3 Global Step: 66210 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:41:28,350-Speed 5169.04 samples/sec Loss 4.2493 LearningRate 0.0643 Epoch: 3 Global Step: 66220 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:41:30,335-Speed 5159.22 samples/sec Loss 4.2539 LearningRate 0.0643 Epoch: 3 Global Step: 66230 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:41:32,331-Speed 5132.28 samples/sec Loss 4.2971 LearningRate 0.0643 Epoch: 3 Global Step: 66240 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:41:34,318-Speed 5155.91 samples/sec Loss 4.1798 LearningRate 0.0642 Epoch: 3 Global Step: 66250 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:41:36,297-Speed 5177.55 samples/sec Loss 4.1706 LearningRate 0.0642 Epoch: 3 Global Step: 66260 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:41:38,280-Speed 5165.47 samples/sec Loss 4.1170 LearningRate 0.0642 Epoch: 3 Global Step: 66270 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:41:40,293-Speed 5089.05 samples/sec Loss 4.2102 LearningRate 0.0642 Epoch: 3 Global Step: 66280 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:41:42,268-Speed 5186.70 samples/sec Loss 4.2700 LearningRate 0.0642 Epoch: 3 Global Step: 66290 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:41:44,241-Speed 5190.11 samples/sec Loss 4.1870 LearningRate 0.0642 Epoch: 3 Global Step: 66300 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:41:46,225-Speed 5163.27 samples/sec Loss 4.2224 LearningRate 0.0642 Epoch: 3 Global Step: 66310 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:41:48,216-Speed 5143.71 samples/sec Loss 4.2619 LearningRate 0.0642 Epoch: 3 Global Step: 66320 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:41:50,202-Speed 5160.09 samples/sec Loss 4.2484 LearningRate 0.0642 Epoch: 3 Global Step: 66330 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:41:52,173-Speed 5196.92 samples/sec Loss 4.1636 LearningRate 0.0642 Epoch: 3 Global Step: 66340 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:41:54,145-Speed 5194.11 samples/sec Loss 4.2295 LearningRate 0.0642 Epoch: 3 Global Step: 66350 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:41:56,139-Speed 5136.53 samples/sec Loss 4.2449 LearningRate 0.0642 Epoch: 3 Global Step: 66360 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:41:58,134-Speed 5136.07 samples/sec Loss 4.1842 LearningRate 0.0642 Epoch: 3 Global Step: 66370 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:42:00,118-Speed 5162.68 samples/sec Loss 4.2547 LearningRate 0.0642 Epoch: 3 Global Step: 66380 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:42:02,115-Speed 5128.05 samples/sec Loss 4.2749 LearningRate 0.0642 Epoch: 3 Global Step: 66390 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:42:04,100-Speed 5161.21 samples/sec Loss 4.1374 LearningRate 0.0642 Epoch: 3 Global Step: 66400 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:42:06,085-Speed 5160.82 samples/sec Loss 4.2488 LearningRate 0.0642 Epoch: 3 Global Step: 66410 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:42:08,072-Speed 5155.44 samples/sec Loss 4.2049 LearningRate 0.0642 Epoch: 3 Global Step: 66420 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:42:10,050-Speed 5179.36 samples/sec Loss 4.2071 LearningRate 0.0642 Epoch: 3 Global Step: 66430 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:42:12,033-Speed 5163.89 samples/sec Loss 4.2252 LearningRate 0.0642 Epoch: 3 Global Step: 66440 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:42:14,007-Speed 5190.90 samples/sec Loss 4.2519 LearningRate 0.0642 Epoch: 3 Global Step: 66450 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:42:15,976-Speed 5202.26 samples/sec Loss 4.2918 LearningRate 0.0641 Epoch: 3 Global Step: 66460 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:42:17,944-Speed 5203.95 samples/sec Loss 4.2267 LearningRate 0.0641 Epoch: 3 Global Step: 66470 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:42:19,920-Speed 5184.91 samples/sec Loss 4.2641 LearningRate 0.0641 Epoch: 3 Global Step: 66480 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:42:21,914-Speed 5136.57 samples/sec Loss 4.1860 LearningRate 0.0641 Epoch: 3 Global Step: 66490 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:42:23,919-Speed 5108.24 samples/sec Loss 4.1994 LearningRate 0.0641 Epoch: 3 Global Step: 66500 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:42:25,919-Speed 5122.48 samples/sec Loss 4.2594 LearningRate 0.0641 Epoch: 3 Global Step: 66510 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:42:27,914-Speed 5133.74 samples/sec Loss 4.2244 LearningRate 0.0641 Epoch: 3 Global Step: 66520 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:42:29,890-Speed 5183.31 samples/sec Loss 4.2719 LearningRate 0.0641 Epoch: 3 Global Step: 66530 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:42:31,864-Speed 5189.12 samples/sec Loss 4.2716 LearningRate 0.0641 Epoch: 3 Global Step: 66540 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:42:33,851-Speed 5156.06 samples/sec Loss 4.1712 LearningRate 0.0641 Epoch: 3 Global Step: 66550 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:42:35,837-Speed 5157.08 samples/sec Loss 4.1648 LearningRate 0.0641 Epoch: 3 Global Step: 66560 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:42:37,823-Speed 5158.41 samples/sec Loss 4.1650 LearningRate 0.0641 Epoch: 3 Global Step: 66570 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:42:39,794-Speed 5198.65 samples/sec Loss 4.2667 LearningRate 0.0641 Epoch: 3 Global Step: 66580 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:42:41,761-Speed 5207.28 samples/sec Loss 4.2094 LearningRate 0.0641 Epoch: 3 Global Step: 66590 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:42:43,726-Speed 5212.45 samples/sec Loss 4.3178 LearningRate 0.0641 Epoch: 3 Global Step: 66600 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:42:45,712-Speed 5159.09 samples/sec Loss 4.2260 LearningRate 0.0641 Epoch: 3 Global Step: 66610 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:42:47,676-Speed 5214.98 samples/sec Loss 4.2611 LearningRate 0.0641 Epoch: 3 Global Step: 66620 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:42:49,672-Speed 5131.51 samples/sec Loss 4.2196 LearningRate 0.0641 Epoch: 3 Global Step: 66630 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:42:51,648-Speed 5185.50 samples/sec Loss 4.2733 LearningRate 0.0641 Epoch: 3 Global Step: 66640 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:42:53,624-Speed 5182.87 samples/sec Loss 4.1705 LearningRate 0.0641 Epoch: 3 Global Step: 66650 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:42:55,588-Speed 5216.30 samples/sec Loss 4.2123 LearningRate 0.0640 Epoch: 3 Global Step: 66660 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:42:57,557-Speed 5203.07 samples/sec Loss 4.0536 LearningRate 0.0640 Epoch: 3 Global Step: 66670 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:42:59,531-Speed 5189.62 samples/sec Loss 4.2331 LearningRate 0.0640 Epoch: 3 Global Step: 66680 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:43:01,501-Speed 5198.67 samples/sec Loss 4.2296 LearningRate 0.0640 Epoch: 3 Global Step: 66690 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:43:03,473-Speed 5193.28 samples/sec Loss 4.2137 LearningRate 0.0640 Epoch: 3 Global Step: 66700 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:43:05,446-Speed 5193.78 samples/sec Loss 4.2712 LearningRate 0.0640 Epoch: 3 Global Step: 66710 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:43:07,426-Speed 5171.97 samples/sec Loss 4.1991 LearningRate 0.0640 Epoch: 3 Global Step: 66720 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:43:09,420-Speed 5137.11 samples/sec Loss 4.2570 LearningRate 0.0640 Epoch: 3 Global Step: 66730 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:43:11,406-Speed 5158.54 samples/sec Loss 4.2923 LearningRate 0.0640 Epoch: 3 Global Step: 66740 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:43:13,410-Speed 5111.92 samples/sec Loss 4.2108 LearningRate 0.0640 Epoch: 3 Global Step: 66750 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:43:15,617-Speed 4641.46 samples/sec Loss 4.3072 LearningRate 0.0640 Epoch: 3 Global Step: 66760 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:43:44,685-Speed 352.30 samples/sec Loss 3.8135 LearningRate 0.0640 Epoch: 4 Global Step: 66770 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:43:47,098-Speed 4245.28 samples/sec Loss 3.5080 LearningRate 0.0640 Epoch: 4 Global Step: 66780 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:43:49,077-Speed 5178.04 samples/sec Loss 3.6368 LearningRate 0.0640 Epoch: 4 Global Step: 66790 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:43:51,044-Speed 5206.08 samples/sec Loss 3.5745 LearningRate 0.0640 Epoch: 4 Global Step: 66800 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:43:53,004-Speed 5229.01 samples/sec Loss 3.6490 LearningRate 0.0640 Epoch: 4 Global Step: 66810 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:43:55,109-Speed 4870.93 samples/sec Loss 3.5354 LearningRate 0.0640 Epoch: 4 Global Step: 66820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:43:57,074-Speed 5212.29 samples/sec Loss 3.5671 LearningRate 0.0640 Epoch: 4 Global Step: 66830 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:43:59,067-Speed 5140.07 samples/sec Loss 3.5934 LearningRate 0.0640 Epoch: 4 Global Step: 66840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:44:01,048-Speed 5170.45 samples/sec Loss 3.5250 LearningRate 0.0640 Epoch: 4 Global Step: 66850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:44:03,034-Speed 5156.91 samples/sec Loss 3.6015 LearningRate 0.0640 Epoch: 4 Global Step: 66860 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:44:05,021-Speed 5156.75 samples/sec Loss 3.6463 LearningRate 0.0639 Epoch: 4 Global Step: 66870 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:44:06,991-Speed 5200.60 samples/sec Loss 3.5979 LearningRate 0.0639 Epoch: 4 Global Step: 66880 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:44:08,954-Speed 5217.04 samples/sec Loss 3.5729 LearningRate 0.0639 Epoch: 4 Global Step: 66890 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:44:10,957-Speed 5116.22 samples/sec Loss 3.5686 LearningRate 0.0639 Epoch: 4 Global Step: 66900 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:44:12,928-Speed 5195.37 samples/sec Loss 3.6401 LearningRate 0.0639 Epoch: 4 Global Step: 66910 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:44:14,913-Speed 5160.86 samples/sec Loss 3.5229 LearningRate 0.0639 Epoch: 4 Global Step: 66920 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:44:16,924-Speed 5094.64 samples/sec Loss 3.4399 LearningRate 0.0639 Epoch: 4 Global Step: 66930 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:44:18,901-Speed 5180.44 samples/sec Loss 3.6089 LearningRate 0.0639 Epoch: 4 Global Step: 66940 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:44:21,341-Speed 4198.76 samples/sec Loss 3.6518 LearningRate 0.0639 Epoch: 4 Global Step: 66950 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:44:23,368-Speed 5053.06 samples/sec Loss 3.6363 LearningRate 0.0639 Epoch: 4 Global Step: 66960 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:44:25,336-Speed 5203.11 samples/sec Loss 3.5919 LearningRate 0.0639 Epoch: 4 Global Step: 66970 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:44:27,320-Speed 5163.79 samples/sec Loss 3.6236 LearningRate 0.0639 Epoch: 4 Global Step: 66980 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:44:29,290-Speed 5200.29 samples/sec Loss 3.5537 LearningRate 0.0639 Epoch: 4 Global Step: 66990 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:44:31,281-Speed 5144.33 samples/sec Loss 3.6252 LearningRate 0.0639 Epoch: 4 Global Step: 67000 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:44:33,256-Speed 5185.53 samples/sec Loss 3.5200 LearningRate 0.0639 Epoch: 4 Global Step: 67010 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:44:35,249-Speed 5139.98 samples/sec Loss 3.6188 LearningRate 0.0639 Epoch: 4 Global Step: 67020 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:44:37,241-Speed 5146.03 samples/sec Loss 3.6065 LearningRate 0.0639 Epoch: 4 Global Step: 67030 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:44:39,236-Speed 5135.04 samples/sec Loss 3.5473 LearningRate 0.0639 Epoch: 4 Global Step: 67040 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:44:41,225-Speed 5148.86 samples/sec Loss 3.6234 LearningRate 0.0639 Epoch: 4 Global Step: 67050 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:44:43,199-Speed 5190.66 samples/sec Loss 3.5614 LearningRate 0.0639 Epoch: 4 Global Step: 67060 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:44:45,169-Speed 5199.48 samples/sec Loss 3.5954 LearningRate 0.0639 Epoch: 4 Global Step: 67070 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:44:47,145-Speed 5182.77 samples/sec Loss 3.6595 LearningRate 0.0638 Epoch: 4 Global Step: 67080 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:44:49,130-Speed 5161.62 samples/sec Loss 3.6347 LearningRate 0.0638 Epoch: 4 Global Step: 67090 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:44:51,102-Speed 5192.40 samples/sec Loss 3.5608 LearningRate 0.0638 Epoch: 4 Global Step: 67100 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:44:53,080-Speed 5178.94 samples/sec Loss 3.6795 LearningRate 0.0638 Epoch: 4 Global Step: 67110 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:44:55,062-Speed 5169.18 samples/sec Loss 3.6185 LearningRate 0.0638 Epoch: 4 Global Step: 67120 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:44:57,038-Speed 5184.61 samples/sec Loss 3.5715 LearningRate 0.0638 Epoch: 4 Global Step: 67130 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:44:59,016-Speed 5177.78 samples/sec Loss 3.5791 LearningRate 0.0638 Epoch: 4 Global Step: 67140 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:45:00,993-Speed 5181.10 samples/sec Loss 3.6224 LearningRate 0.0638 Epoch: 4 Global Step: 67150 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:45:02,979-Speed 5157.28 samples/sec Loss 3.6855 LearningRate 0.0638 Epoch: 4 Global Step: 67160 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:45:04,961-Speed 5168.92 samples/sec Loss 3.6799 LearningRate 0.0638 Epoch: 4 Global Step: 67170 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:45:06,932-Speed 5197.54 samples/sec Loss 3.6534 LearningRate 0.0638 Epoch: 4 Global Step: 67180 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:45:08,903-Speed 5197.66 samples/sec Loss 3.5860 LearningRate 0.0638 Epoch: 4 Global Step: 67190 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:45:10,898-Speed 5135.38 samples/sec Loss 3.5526 LearningRate 0.0638 Epoch: 4 Global Step: 67200 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:45:12,884-Speed 5156.00 samples/sec Loss 3.5882 LearningRate 0.0638 Epoch: 4 Global Step: 67210 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:45:15,045-Speed 4741.36 samples/sec Loss 3.6618 LearningRate 0.0638 Epoch: 4 Global Step: 67220 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:45:17,055-Speed 5094.32 samples/sec Loss 3.6925 LearningRate 0.0638 Epoch: 4 Global Step: 67230 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:45:19,036-Speed 5171.92 samples/sec Loss 3.6704 LearningRate 0.0638 Epoch: 4 Global Step: 67240 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:45:21,042-Speed 5106.89 samples/sec Loss 3.6541 LearningRate 0.0638 Epoch: 4 Global Step: 67250 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:45:23,019-Speed 5181.78 samples/sec Loss 3.5788 LearningRate 0.0638 Epoch: 4 Global Step: 67260 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:45:25,020-Speed 5118.23 samples/sec Loss 3.5831 LearningRate 0.0638 Epoch: 4 Global Step: 67270 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:45:26,993-Speed 5192.47 samples/sec Loss 3.6920 LearningRate 0.0638 Epoch: 4 Global Step: 67280 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:45:28,977-Speed 5161.49 samples/sec Loss 3.6313 LearningRate 0.0637 Epoch: 4 Global Step: 67290 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:45:30,962-Speed 5162.05 samples/sec Loss 3.5955 LearningRate 0.0637 Epoch: 4 Global Step: 67300 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:45:32,941-Speed 5173.82 samples/sec Loss 3.6742 LearningRate 0.0637 Epoch: 4 Global Step: 67310 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:45:34,915-Speed 5189.59 samples/sec Loss 3.6761 LearningRate 0.0637 Epoch: 4 Global Step: 67320 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:45:36,915-Speed 5123.17 samples/sec Loss 3.6002 LearningRate 0.0637 Epoch: 4 Global Step: 67330 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:45:38,924-Speed 5099.05 samples/sec Loss 3.7103 LearningRate 0.0637 Epoch: 4 Global Step: 67340 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:45:40,911-Speed 5154.75 samples/sec Loss 3.6034 LearningRate 0.0637 Epoch: 4 Global Step: 67350 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:45:42,894-Speed 5165.38 samples/sec Loss 3.6889 LearningRate 0.0637 Epoch: 4 Global Step: 67360 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:45:44,883-Speed 5150.87 samples/sec Loss 3.5433 LearningRate 0.0637 Epoch: 4 Global Step: 67370 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:45:46,867-Speed 5162.43 samples/sec Loss 3.5736 LearningRate 0.0637 Epoch: 4 Global Step: 67380 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:45:48,851-Speed 5163.08 samples/sec Loss 3.6392 LearningRate 0.0637 Epoch: 4 Global Step: 67390 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-04-11 03:45:50,832-Speed 5170.11 samples/sec Loss 3.5648 LearningRate 0.0637 Epoch: 4 Global Step: 67400 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:45:52,807-Speed 5187.37 samples/sec Loss 3.6433 LearningRate 0.0637 Epoch: 4 Global Step: 67410 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:45:54,791-Speed 5164.07 samples/sec Loss 3.6259 LearningRate 0.0637 Epoch: 4 Global Step: 67420 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:45:56,772-Speed 5168.38 samples/sec Loss 3.6464 LearningRate 0.0637 Epoch: 4 Global Step: 67430 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:45:58,753-Speed 5172.91 samples/sec Loss 3.6649 LearningRate 0.0637 Epoch: 4 Global Step: 67440 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:00,735-Speed 5168.62 samples/sec Loss 3.7251 LearningRate 0.0637 Epoch: 4 Global Step: 67450 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:02,734-Speed 5122.94 samples/sec Loss 3.6630 LearningRate 0.0637 Epoch: 4 Global Step: 67460 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:04,751-Speed 5077.84 samples/sec Loss 3.5660 LearningRate 0.0637 Epoch: 4 Global Step: 67470 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:06,726-Speed 5187.41 samples/sec Loss 3.6461 LearningRate 0.0637 Epoch: 4 Global Step: 67480 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:08,700-Speed 5191.25 samples/sec Loss 3.6411 LearningRate 0.0637 Epoch: 4 Global Step: 67490 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:10,661-Speed 5221.28 samples/sec Loss 3.6530 LearningRate 0.0636 Epoch: 4 Global Step: 67500 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:12,646-Speed 5165.72 samples/sec Loss 3.7205 LearningRate 0.0636 Epoch: 4 Global Step: 67510 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:14,616-Speed 5199.53 samples/sec Loss 3.6588 LearningRate 0.0636 Epoch: 4 Global Step: 67520 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:16,587-Speed 5196.48 samples/sec Loss 3.7398 LearningRate 0.0636 Epoch: 4 Global Step: 67530 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:18,559-Speed 5194.77 samples/sec Loss 3.5488 LearningRate 0.0636 Epoch: 4 Global Step: 67540 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:20,533-Speed 5189.05 samples/sec Loss 3.5987 LearningRate 0.0636 Epoch: 4 Global Step: 67550 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:22,502-Speed 5203.29 samples/sec Loss 3.6682 LearningRate 0.0636 Epoch: 4 Global Step: 67560 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:24,501-Speed 5124.32 samples/sec Loss 3.6400 LearningRate 0.0636 Epoch: 4 Global Step: 67570 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:26,479-Speed 5178.31 samples/sec Loss 3.6446 LearningRate 0.0636 Epoch: 4 Global Step: 67580 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:28,467-Speed 5153.85 samples/sec Loss 3.6621 LearningRate 0.0636 Epoch: 4 Global Step: 67590 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:30,457-Speed 5147.96 samples/sec Loss 3.6967 LearningRate 0.0636 Epoch: 4 Global Step: 67600 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:32,430-Speed 5189.52 samples/sec Loss 3.6339 LearningRate 0.0636 Epoch: 4 Global Step: 67610 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:34,406-Speed 5184.64 samples/sec Loss 3.7125 LearningRate 0.0636 Epoch: 4 Global Step: 67620 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:36,397-Speed 5144.51 samples/sec Loss 3.7098 LearningRate 0.0636 Epoch: 4 Global Step: 67630 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:38,376-Speed 5176.99 samples/sec Loss 3.7596 LearningRate 0.0636 Epoch: 4 Global Step: 67640 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:40,369-Speed 5140.05 samples/sec Loss 3.6556 LearningRate 0.0636 Epoch: 4 Global Step: 67650 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:42,350-Speed 5171.29 samples/sec Loss 3.6782 LearningRate 0.0636 Epoch: 4 Global Step: 67660 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:44,343-Speed 5138.60 samples/sec Loss 3.5997 LearningRate 0.0636 Epoch: 4 Global Step: 67670 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:46,337-Speed 5138.94 samples/sec Loss 3.6139 LearningRate 0.0636 Epoch: 4 Global Step: 67680 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:48,330-Speed 5138.88 samples/sec Loss 3.7001 LearningRate 0.0636 Epoch: 4 Global Step: 67690 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:50,304-Speed 5189.77 samples/sec Loss 3.6572 LearningRate 0.0636 Epoch: 4 Global Step: 67700 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:52,284-Speed 5175.83 samples/sec Loss 3.6398 LearningRate 0.0635 Epoch: 4 Global Step: 67710 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:54,262-Speed 5177.86 samples/sec Loss 3.6777 LearningRate 0.0635 Epoch: 4 Global Step: 67720 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:56,251-Speed 5150.17 samples/sec Loss 3.7016 LearningRate 0.0635 Epoch: 4 Global Step: 67730 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:46:58,260-Speed 5098.97 samples/sec Loss 3.7058 LearningRate 0.0635 Epoch: 4 Global Step: 67740 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:47:00,265-Speed 5108.24 samples/sec Loss 3.7479 LearningRate 0.0635 Epoch: 4 Global Step: 67750 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:47:02,236-Speed 5196.81 samples/sec Loss 3.7071 LearningRate 0.0635 Epoch: 4 Global Step: 67760 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:47:04,239-Speed 5114.55 samples/sec Loss 3.6661 LearningRate 0.0635 Epoch: 4 Global Step: 67770 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:47:06,223-Speed 5163.07 samples/sec Loss 3.6163 LearningRate 0.0635 Epoch: 4 Global Step: 67780 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:47:08,221-Speed 5129.13 samples/sec Loss 3.7354 LearningRate 0.0635 Epoch: 4 Global Step: 67790 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:47:10,201-Speed 5172.98 samples/sec Loss 3.7154 LearningRate 0.0635 Epoch: 4 Global Step: 67800 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:47:12,203-Speed 5117.00 samples/sec Loss 3.6863 LearningRate 0.0635 Epoch: 4 Global Step: 67810 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:47:14,181-Speed 5178.53 samples/sec Loss 3.7033 LearningRate 0.0635 Epoch: 4 Global Step: 67820 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:47:16,156-Speed 5186.37 samples/sec Loss 3.7206 LearningRate 0.0635 Epoch: 4 Global Step: 67830 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:47:18,143-Speed 5155.46 samples/sec Loss 3.6609 LearningRate 0.0635 Epoch: 4 Global Step: 67840 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:47:20,120-Speed 5182.33 samples/sec Loss 3.7205 LearningRate 0.0635 Epoch: 4 Global Step: 67850 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:47:22,094-Speed 5187.91 samples/sec Loss 3.7702 LearningRate 0.0635 Epoch: 4 Global Step: 67860 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:47:24,084-Speed 5148.20 samples/sec Loss 3.7758 LearningRate 0.0635 Epoch: 4 Global Step: 67870 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:47:26,074-Speed 5148.82 samples/sec Loss 3.7243 LearningRate 0.0635 Epoch: 4 Global Step: 67880 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:47:28,051-Speed 5180.43 samples/sec Loss 3.7438 LearningRate 0.0635 Epoch: 4 Global Step: 67890 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:47:30,033-Speed 5168.77 samples/sec Loss 3.6698 LearningRate 0.0635 Epoch: 4 Global Step: 67900 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:47:32,005-Speed 5191.99 samples/sec Loss 3.7097 LearningRate 0.0635 Epoch: 4 Global Step: 67910 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:47:33,986-Speed 5171.59 samples/sec Loss 3.7206 LearningRate 0.0634 Epoch: 4 Global Step: 67920 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:47:35,963-Speed 5181.82 samples/sec Loss 3.7267 LearningRate 0.0634 Epoch: 4 Global Step: 67930 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:47:37,958-Speed 5135.67 samples/sec Loss 3.6458 LearningRate 0.0634 Epoch: 4 Global Step: 67940 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:47:39,945-Speed 5153.26 samples/sec Loss 3.6691 LearningRate 0.0634 Epoch: 4 Global Step: 67950 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:47:41,937-Speed 5144.52 samples/sec Loss 3.7507 LearningRate 0.0634 Epoch: 4 Global Step: 67960 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:47:43,918-Speed 5170.63 samples/sec Loss 3.7087 LearningRate 0.0634 Epoch: 4 Global Step: 67970 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:47:45,911-Speed 5139.67 samples/sec Loss 3.6814 LearningRate 0.0634 Epoch: 4 Global Step: 67980 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:47:47,885-Speed 5188.91 samples/sec Loss 3.7589 LearningRate 0.0634 Epoch: 4 Global Step: 67990 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:47:49,856-Speed 5198.33 samples/sec Loss 3.7736 LearningRate 0.0634 Epoch: 4 Global Step: 68000 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:48:16,561-[lfw][68000]XNorm: 22.633383 Training: 2022-04-11 03:48:16,562-[lfw][68000]Accuracy-Flip: 0.99683+-0.00263 Training: 2022-04-11 03:48:16,562-[lfw][68000]Accuracy-Highest: 0.99800 Training: 2022-04-11 03:48:47,498-[cfp_fp][68000]XNorm: 20.566822 Training: 2022-04-11 03:48:47,498-[cfp_fp][68000]Accuracy-Flip: 0.97943+-0.00698 Training: 2022-04-11 03:48:47,499-[cfp_fp][68000]Accuracy-Highest: 0.97943 Training: 2022-04-11 03:49:14,089-[agedb_30][68000]XNorm: 22.628426 Training: 2022-04-11 03:49:14,090-[agedb_30][68000]Accuracy-Flip: 0.97667+-0.00719 Training: 2022-04-11 03:49:14,090-[agedb_30][68000]Accuracy-Highest: 0.97717 Training: 2022-04-11 03:49:16,093-Speed 118.74 samples/sec Loss 3.6813 LearningRate 0.0634 Epoch: 4 Global Step: 68010 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:49:18,078-Speed 5159.58 samples/sec Loss 3.7691 LearningRate 0.0634 Epoch: 4 Global Step: 68020 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:49:20,059-Speed 5170.69 samples/sec Loss 3.7862 LearningRate 0.0634 Epoch: 4 Global Step: 68030 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:49:22,042-Speed 5166.70 samples/sec Loss 3.7329 LearningRate 0.0634 Epoch: 4 Global Step: 68040 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:49:24,017-Speed 5186.05 samples/sec Loss 3.8225 LearningRate 0.0634 Epoch: 4 Global Step: 68050 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:49:25,989-Speed 5195.84 samples/sec Loss 3.8220 LearningRate 0.0634 Epoch: 4 Global Step: 68060 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:49:27,981-Speed 5142.13 samples/sec Loss 3.7443 LearningRate 0.0634 Epoch: 4 Global Step: 68070 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:49:29,953-Speed 5193.97 samples/sec Loss 3.7425 LearningRate 0.0634 Epoch: 4 Global Step: 68080 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:49:31,952-Speed 5122.23 samples/sec Loss 3.7832 LearningRate 0.0634 Epoch: 4 Global Step: 68090 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:49:33,920-Speed 5206.84 samples/sec Loss 3.8342 LearningRate 0.0634 Epoch: 4 Global Step: 68100 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:49:35,890-Speed 5199.38 samples/sec Loss 3.7767 LearningRate 0.0634 Epoch: 4 Global Step: 68110 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:49:37,874-Speed 5162.72 samples/sec Loss 3.7226 LearningRate 0.0634 Epoch: 4 Global Step: 68120 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:49:39,869-Speed 5134.74 samples/sec Loss 3.7588 LearningRate 0.0633 Epoch: 4 Global Step: 68130 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:49:41,850-Speed 5171.10 samples/sec Loss 3.8030 LearningRate 0.0633 Epoch: 4 Global Step: 68140 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:49:43,844-Speed 5137.95 samples/sec Loss 3.8147 LearningRate 0.0633 Epoch: 4 Global Step: 68150 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:49:45,816-Speed 5195.00 samples/sec Loss 3.7103 LearningRate 0.0633 Epoch: 4 Global Step: 68160 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:49:47,787-Speed 5195.64 samples/sec Loss 3.7196 LearningRate 0.0633 Epoch: 4 Global Step: 68170 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:49:49,769-Speed 5170.00 samples/sec Loss 3.7658 LearningRate 0.0633 Epoch: 4 Global Step: 68180 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:49:51,743-Speed 5187.33 samples/sec Loss 3.7577 LearningRate 0.0633 Epoch: 4 Global Step: 68190 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:49:53,713-Speed 5199.87 samples/sec Loss 3.7673 LearningRate 0.0633 Epoch: 4 Global Step: 68200 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:49:55,712-Speed 5125.27 samples/sec Loss 3.7609 LearningRate 0.0633 Epoch: 4 Global Step: 68210 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:49:57,687-Speed 5186.62 samples/sec Loss 3.8132 LearningRate 0.0633 Epoch: 4 Global Step: 68220 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:49:59,655-Speed 5205.42 samples/sec Loss 3.7682 LearningRate 0.0633 Epoch: 4 Global Step: 68230 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:50:01,630-Speed 5185.38 samples/sec Loss 3.8188 LearningRate 0.0633 Epoch: 4 Global Step: 68240 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:50:03,649-Speed 5074.50 samples/sec Loss 3.6977 LearningRate 0.0633 Epoch: 4 Global Step: 68250 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:50:05,628-Speed 5176.67 samples/sec Loss 3.7006 LearningRate 0.0633 Epoch: 4 Global Step: 68260 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:50:07,612-Speed 5162.55 samples/sec Loss 3.7278 LearningRate 0.0633 Epoch: 4 Global Step: 68270 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:50:09,587-Speed 5184.68 samples/sec Loss 3.8087 LearningRate 0.0633 Epoch: 4 Global Step: 68280 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:50:11,569-Speed 5170.04 samples/sec Loss 3.7433 LearningRate 0.0633 Epoch: 4 Global Step: 68290 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:50:13,550-Speed 5171.19 samples/sec Loss 3.7734 LearningRate 0.0633 Epoch: 4 Global Step: 68300 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:50:15,539-Speed 5149.91 samples/sec Loss 3.6579 LearningRate 0.0633 Epoch: 4 Global Step: 68310 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:50:17,513-Speed 5188.12 samples/sec Loss 3.8201 LearningRate 0.0633 Epoch: 4 Global Step: 68320 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:50:19,500-Speed 5155.61 samples/sec Loss 3.7128 LearningRate 0.0633 Epoch: 4 Global Step: 68330 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:50:21,483-Speed 5166.17 samples/sec Loss 3.7935 LearningRate 0.0632 Epoch: 4 Global Step: 68340 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:50:23,484-Speed 5118.26 samples/sec Loss 3.7513 LearningRate 0.0632 Epoch: 4 Global Step: 68350 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:50:25,458-Speed 5189.64 samples/sec Loss 3.7529 LearningRate 0.0632 Epoch: 4 Global Step: 68360 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:50:27,443-Speed 5158.82 samples/sec Loss 3.8564 LearningRate 0.0632 Epoch: 4 Global Step: 68370 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:50:29,425-Speed 5168.39 samples/sec Loss 3.7600 LearningRate 0.0632 Epoch: 4 Global Step: 68380 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:50:31,394-Speed 5204.45 samples/sec Loss 3.8481 LearningRate 0.0632 Epoch: 4 Global Step: 68390 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:50:33,373-Speed 5174.36 samples/sec Loss 3.8157 LearningRate 0.0632 Epoch: 4 Global Step: 68400 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:50:35,354-Speed 5173.22 samples/sec Loss 3.7487 LearningRate 0.0632 Epoch: 4 Global Step: 68410 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:50:37,334-Speed 5172.29 samples/sec Loss 3.7620 LearningRate 0.0632 Epoch: 4 Global Step: 68420 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:50:39,314-Speed 5173.37 samples/sec Loss 3.8015 LearningRate 0.0632 Epoch: 4 Global Step: 68430 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:50:41,289-Speed 5187.70 samples/sec Loss 3.7610 LearningRate 0.0632 Epoch: 4 Global Step: 68440 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:50:43,258-Speed 5202.41 samples/sec Loss 3.7554 LearningRate 0.0632 Epoch: 4 Global Step: 68450 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:50:45,243-Speed 5159.74 samples/sec Loss 3.7780 LearningRate 0.0632 Epoch: 4 Global Step: 68460 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:50:47,232-Speed 5149.76 samples/sec Loss 3.8468 LearningRate 0.0632 Epoch: 4 Global Step: 68470 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:50:49,199-Speed 5207.53 samples/sec Loss 3.8177 LearningRate 0.0632 Epoch: 4 Global Step: 68480 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:50:51,183-Speed 5162.76 samples/sec Loss 3.8504 LearningRate 0.0632 Epoch: 4 Global Step: 68490 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:50:53,177-Speed 5137.68 samples/sec Loss 3.8173 LearningRate 0.0632 Epoch: 4 Global Step: 68500 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:50:55,149-Speed 5195.53 samples/sec Loss 3.7963 LearningRate 0.0632 Epoch: 4 Global Step: 68510 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:50:57,120-Speed 5195.04 samples/sec Loss 3.8090 LearningRate 0.0632 Epoch: 4 Global Step: 68520 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:50:59,110-Speed 5148.03 samples/sec Loss 3.8407 LearningRate 0.0632 Epoch: 4 Global Step: 68530 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:51:01,078-Speed 5206.29 samples/sec Loss 3.7728 LearningRate 0.0632 Epoch: 4 Global Step: 68540 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:51:03,043-Speed 5211.61 samples/sec Loss 3.7635 LearningRate 0.0631 Epoch: 4 Global Step: 68550 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:51:05,032-Speed 5149.35 samples/sec Loss 3.7945 LearningRate 0.0631 Epoch: 4 Global Step: 68560 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:51:06,992-Speed 5225.67 samples/sec Loss 3.6939 LearningRate 0.0631 Epoch: 4 Global Step: 68570 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:51:08,981-Speed 5149.91 samples/sec Loss 3.7457 LearningRate 0.0631 Epoch: 4 Global Step: 68580 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:51:10,972-Speed 5145.04 samples/sec Loss 3.8007 LearningRate 0.0631 Epoch: 4 Global Step: 68590 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:51:12,985-Speed 5091.69 samples/sec Loss 3.8835 LearningRate 0.0631 Epoch: 4 Global Step: 68600 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:51:14,968-Speed 5164.47 samples/sec Loss 3.8377 LearningRate 0.0631 Epoch: 4 Global Step: 68610 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:51:16,937-Speed 5201.17 samples/sec Loss 3.7431 LearningRate 0.0631 Epoch: 4 Global Step: 68620 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:51:18,916-Speed 5177.67 samples/sec Loss 3.8294 LearningRate 0.0631 Epoch: 4 Global Step: 68630 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:51:20,904-Speed 5153.02 samples/sec Loss 3.8167 LearningRate 0.0631 Epoch: 4 Global Step: 68640 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:51:22,873-Speed 5202.21 samples/sec Loss 3.8164 LearningRate 0.0631 Epoch: 4 Global Step: 68650 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:51:24,849-Speed 5184.35 samples/sec Loss 3.8586 LearningRate 0.0631 Epoch: 4 Global Step: 68660 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:51:26,820-Speed 5196.71 samples/sec Loss 3.7864 LearningRate 0.0631 Epoch: 4 Global Step: 68670 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:51:28,791-Speed 5195.97 samples/sec Loss 3.8664 LearningRate 0.0631 Epoch: 4 Global Step: 68680 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:51:30,762-Speed 5196.84 samples/sec Loss 3.8607 LearningRate 0.0631 Epoch: 4 Global Step: 68690 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:51:32,753-Speed 5146.23 samples/sec Loss 3.8342 LearningRate 0.0631 Epoch: 4 Global Step: 68700 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:51:34,738-Speed 5162.31 samples/sec Loss 3.7842 LearningRate 0.0631 Epoch: 4 Global Step: 68710 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:51:36,746-Speed 5100.44 samples/sec Loss 3.7337 LearningRate 0.0631 Epoch: 4 Global Step: 68720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:51:38,720-Speed 5187.08 samples/sec Loss 3.7458 LearningRate 0.0631 Epoch: 4 Global Step: 68730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:51:40,706-Speed 5157.66 samples/sec Loss 3.9079 LearningRate 0.0631 Epoch: 4 Global Step: 68740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:51:42,681-Speed 5188.99 samples/sec Loss 3.7713 LearningRate 0.0631 Epoch: 4 Global Step: 68750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:51:44,687-Speed 5105.98 samples/sec Loss 3.9076 LearningRate 0.0630 Epoch: 4 Global Step: 68760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:51:46,670-Speed 5165.12 samples/sec Loss 3.8495 LearningRate 0.0630 Epoch: 4 Global Step: 68770 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:51:48,660-Speed 5147.13 samples/sec Loss 3.7486 LearningRate 0.0630 Epoch: 4 Global Step: 68780 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:51:50,641-Speed 5171.68 samples/sec Loss 3.8150 LearningRate 0.0630 Epoch: 4 Global Step: 68790 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:51:52,616-Speed 5186.76 samples/sec Loss 3.7889 LearningRate 0.0630 Epoch: 4 Global Step: 68800 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:51:54,590-Speed 5188.18 samples/sec Loss 3.7656 LearningRate 0.0630 Epoch: 4 Global Step: 68810 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:51:56,576-Speed 5158.58 samples/sec Loss 3.7804 LearningRate 0.0630 Epoch: 4 Global Step: 68820 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:51:58,577-Speed 5118.86 samples/sec Loss 3.8751 LearningRate 0.0630 Epoch: 4 Global Step: 68830 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:52:00,568-Speed 5145.48 samples/sec Loss 3.8462 LearningRate 0.0630 Epoch: 4 Global Step: 68840 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:52:02,546-Speed 5178.06 samples/sec Loss 3.8569 LearningRate 0.0630 Epoch: 4 Global Step: 68850 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:52:04,552-Speed 5105.48 samples/sec Loss 3.7087 LearningRate 0.0630 Epoch: 4 Global Step: 68860 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:52:06,522-Speed 5200.09 samples/sec Loss 3.8100 LearningRate 0.0630 Epoch: 4 Global Step: 68870 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:52:08,511-Speed 5150.99 samples/sec Loss 3.8449 LearningRate 0.0630 Epoch: 4 Global Step: 68880 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:52:10,500-Speed 5147.36 samples/sec Loss 3.7846 LearningRate 0.0630 Epoch: 4 Global Step: 68890 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:52:12,482-Speed 5171.20 samples/sec Loss 3.7688 LearningRate 0.0630 Epoch: 4 Global Step: 68900 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:52:14,463-Speed 5170.92 samples/sec Loss 3.8624 LearningRate 0.0630 Epoch: 4 Global Step: 68910 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:52:16,439-Speed 5181.90 samples/sec Loss 3.9250 LearningRate 0.0630 Epoch: 4 Global Step: 68920 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:52:18,439-Speed 5121.61 samples/sec Loss 3.8287 LearningRate 0.0630 Epoch: 4 Global Step: 68930 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:52:20,410-Speed 5197.22 samples/sec Loss 3.8163 LearningRate 0.0630 Epoch: 4 Global Step: 68940 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:52:22,413-Speed 5115.85 samples/sec Loss 3.8655 LearningRate 0.0630 Epoch: 4 Global Step: 68950 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:52:24,397-Speed 5161.65 samples/sec Loss 3.7893 LearningRate 0.0630 Epoch: 4 Global Step: 68960 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:52:26,377-Speed 5173.11 samples/sec Loss 3.8490 LearningRate 0.0629 Epoch: 4 Global Step: 68970 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:52:28,353-Speed 5182.74 samples/sec Loss 3.8910 LearningRate 0.0629 Epoch: 4 Global Step: 68980 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:52:30,340-Speed 5155.76 samples/sec Loss 3.9332 LearningRate 0.0629 Epoch: 4 Global Step: 68990 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:52:32,312-Speed 5196.54 samples/sec Loss 3.8238 LearningRate 0.0629 Epoch: 4 Global Step: 69000 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:52:34,288-Speed 5184.14 samples/sec Loss 3.9151 LearningRate 0.0629 Epoch: 4 Global Step: 69010 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:52:36,267-Speed 5175.81 samples/sec Loss 3.8017 LearningRate 0.0629 Epoch: 4 Global Step: 69020 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:52:38,254-Speed 5153.29 samples/sec Loss 3.8028 LearningRate 0.0629 Epoch: 4 Global Step: 69030 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:52:40,239-Speed 5161.00 samples/sec Loss 3.8807 LearningRate 0.0629 Epoch: 4 Global Step: 69040 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:52:42,214-Speed 5187.40 samples/sec Loss 3.7969 LearningRate 0.0629 Epoch: 4 Global Step: 69050 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:52:44,218-Speed 5112.04 samples/sec Loss 3.9259 LearningRate 0.0629 Epoch: 4 Global Step: 69060 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:52:46,199-Speed 5169.19 samples/sec Loss 3.8735 LearningRate 0.0629 Epoch: 4 Global Step: 69070 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:52:48,183-Speed 5162.61 samples/sec Loss 3.8088 LearningRate 0.0629 Epoch: 4 Global Step: 69080 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:52:50,180-Speed 5130.44 samples/sec Loss 3.8228 LearningRate 0.0629 Epoch: 4 Global Step: 69090 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:52:52,151-Speed 5197.74 samples/sec Loss 3.8166 LearningRate 0.0629 Epoch: 4 Global Step: 69100 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:52:54,125-Speed 5189.30 samples/sec Loss 3.8387 LearningRate 0.0629 Epoch: 4 Global Step: 69110 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:52:56,097-Speed 5194.83 samples/sec Loss 3.8326 LearningRate 0.0629 Epoch: 4 Global Step: 69120 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:52:58,079-Speed 5167.02 samples/sec Loss 3.7505 LearningRate 0.0629 Epoch: 4 Global Step: 69130 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:53:00,066-Speed 5156.38 samples/sec Loss 3.8178 LearningRate 0.0629 Epoch: 4 Global Step: 69140 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:53:02,045-Speed 5176.18 samples/sec Loss 3.9133 LearningRate 0.0629 Epoch: 4 Global Step: 69150 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:53:04,021-Speed 5182.42 samples/sec Loss 3.8197 LearningRate 0.0629 Epoch: 4 Global Step: 69160 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:53:06,019-Speed 5128.27 samples/sec Loss 3.8571 LearningRate 0.0629 Epoch: 4 Global Step: 69170 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:53:08,000-Speed 5169.87 samples/sec Loss 3.8042 LearningRate 0.0628 Epoch: 4 Global Step: 69180 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:53:09,973-Speed 5193.15 samples/sec Loss 3.8685 LearningRate 0.0628 Epoch: 4 Global Step: 69190 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:53:11,945-Speed 5192.63 samples/sec Loss 3.7768 LearningRate 0.0628 Epoch: 4 Global Step: 69200 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:53:13,949-Speed 5113.32 samples/sec Loss 3.7766 LearningRate 0.0628 Epoch: 4 Global Step: 69210 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:53:15,934-Speed 5159.99 samples/sec Loss 3.8201 LearningRate 0.0628 Epoch: 4 Global Step: 69220 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:53:17,907-Speed 5190.79 samples/sec Loss 3.8395 LearningRate 0.0628 Epoch: 4 Global Step: 69230 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:53:19,878-Speed 5196.97 samples/sec Loss 3.8529 LearningRate 0.0628 Epoch: 4 Global Step: 69240 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:53:21,853-Speed 5187.76 samples/sec Loss 3.7957 LearningRate 0.0628 Epoch: 4 Global Step: 69250 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:53:23,828-Speed 5186.94 samples/sec Loss 3.8455 LearningRate 0.0628 Epoch: 4 Global Step: 69260 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:53:25,814-Speed 5156.64 samples/sec Loss 3.9133 LearningRate 0.0628 Epoch: 4 Global Step: 69270 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:53:27,799-Speed 5160.53 samples/sec Loss 3.8945 LearningRate 0.0628 Epoch: 4 Global Step: 69280 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:53:29,787-Speed 5153.41 samples/sec Loss 3.8710 LearningRate 0.0628 Epoch: 4 Global Step: 69290 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:53:31,761-Speed 5188.86 samples/sec Loss 3.8230 LearningRate 0.0628 Epoch: 4 Global Step: 69300 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:53:33,752-Speed 5146.09 samples/sec Loss 3.9016 LearningRate 0.0628 Epoch: 4 Global Step: 69310 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:53:35,729-Speed 5180.67 samples/sec Loss 3.9189 LearningRate 0.0628 Epoch: 4 Global Step: 69320 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:53:38,477-Speed 3726.94 samples/sec Loss 3.8910 LearningRate 0.0628 Epoch: 4 Global Step: 69330 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:53:40,458-Speed 5169.49 samples/sec Loss 3.8947 LearningRate 0.0628 Epoch: 4 Global Step: 69340 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:53:42,454-Speed 5134.01 samples/sec Loss 3.8766 LearningRate 0.0628 Epoch: 4 Global Step: 69350 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:53:44,426-Speed 5195.23 samples/sec Loss 3.9956 LearningRate 0.0628 Epoch: 4 Global Step: 69360 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:53:46,406-Speed 5172.45 samples/sec Loss 3.8502 LearningRate 0.0628 Epoch: 4 Global Step: 69370 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:53:48,391-Speed 5160.74 samples/sec Loss 3.8796 LearningRate 0.0628 Epoch: 4 Global Step: 69380 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:53:50,376-Speed 5160.91 samples/sec Loss 3.7909 LearningRate 0.0627 Epoch: 4 Global Step: 69390 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:53:52,358-Speed 5165.97 samples/sec Loss 3.8690 LearningRate 0.0627 Epoch: 4 Global Step: 69400 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:53:54,340-Speed 5169.20 samples/sec Loss 3.8080 LearningRate 0.0627 Epoch: 4 Global Step: 69410 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:53:56,316-Speed 5184.78 samples/sec Loss 3.8396 LearningRate 0.0627 Epoch: 4 Global Step: 69420 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:53:58,296-Speed 5173.96 samples/sec Loss 3.8635 LearningRate 0.0627 Epoch: 4 Global Step: 69430 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:54:00,280-Speed 5161.37 samples/sec Loss 3.8981 LearningRate 0.0627 Epoch: 4 Global Step: 69440 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:54:02,247-Speed 5209.03 samples/sec Loss 3.9273 LearningRate 0.0627 Epoch: 4 Global Step: 69450 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:54:04,245-Speed 5126.63 samples/sec Loss 3.8573 LearningRate 0.0627 Epoch: 4 Global Step: 69460 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:54:06,220-Speed 5186.44 samples/sec Loss 3.8752 LearningRate 0.0627 Epoch: 4 Global Step: 69470 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:54:08,194-Speed 5189.50 samples/sec Loss 3.8983 LearningRate 0.0627 Epoch: 4 Global Step: 69480 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:54:10,176-Speed 5166.13 samples/sec Loss 3.8713 LearningRate 0.0627 Epoch: 4 Global Step: 69490 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:54:12,183-Speed 5103.94 samples/sec Loss 3.8750 LearningRate 0.0627 Epoch: 4 Global Step: 69500 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:54:14,157-Speed 5189.16 samples/sec Loss 3.8577 LearningRate 0.0627 Epoch: 4 Global Step: 69510 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:54:16,139-Speed 5167.94 samples/sec Loss 3.9012 LearningRate 0.0627 Epoch: 4 Global Step: 69520 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:54:18,122-Speed 5165.65 samples/sec Loss 3.9383 LearningRate 0.0627 Epoch: 4 Global Step: 69530 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:54:20,127-Speed 5111.36 samples/sec Loss 3.8323 LearningRate 0.0627 Epoch: 4 Global Step: 69540 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:54:22,105-Speed 5179.09 samples/sec Loss 3.9081 LearningRate 0.0627 Epoch: 4 Global Step: 69550 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:54:24,095-Speed 5146.79 samples/sec Loss 3.8959 LearningRate 0.0627 Epoch: 4 Global Step: 69560 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:54:26,089-Speed 5137.13 samples/sec Loss 3.8173 LearningRate 0.0627 Epoch: 4 Global Step: 69570 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:54:28,061-Speed 5193.96 samples/sec Loss 3.8610 LearningRate 0.0627 Epoch: 4 Global Step: 69580 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:54:30,040-Speed 5174.88 samples/sec Loss 3.8555 LearningRate 0.0627 Epoch: 4 Global Step: 69590 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:54:32,031-Speed 5145.29 samples/sec Loss 3.9080 LearningRate 0.0626 Epoch: 4 Global Step: 69600 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:54:34,003-Speed 5195.85 samples/sec Loss 3.9149 LearningRate 0.0626 Epoch: 4 Global Step: 69610 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:54:35,984-Speed 5170.14 samples/sec Loss 3.9477 LearningRate 0.0626 Epoch: 4 Global Step: 69620 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:54:37,990-Speed 5105.14 samples/sec Loss 3.8516 LearningRate 0.0626 Epoch: 4 Global Step: 69630 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:54:39,969-Speed 5177.39 samples/sec Loss 3.8841 LearningRate 0.0626 Epoch: 4 Global Step: 69640 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:54:41,945-Speed 5183.93 samples/sec Loss 3.9155 LearningRate 0.0626 Epoch: 4 Global Step: 69650 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:54:43,917-Speed 5194.60 samples/sec Loss 3.8922 LearningRate 0.0626 Epoch: 4 Global Step: 69660 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:54:45,904-Speed 5156.17 samples/sec Loss 3.8576 LearningRate 0.0626 Epoch: 4 Global Step: 69670 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:54:47,919-Speed 5081.95 samples/sec Loss 3.9478 LearningRate 0.0626 Epoch: 4 Global Step: 69680 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:54:49,902-Speed 5167.19 samples/sec Loss 3.8442 LearningRate 0.0626 Epoch: 4 Global Step: 69690 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:54:51,902-Speed 5122.13 samples/sec Loss 3.9172 LearningRate 0.0626 Epoch: 4 Global Step: 69700 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:54:53,900-Speed 5125.60 samples/sec Loss 3.9026 LearningRate 0.0626 Epoch: 4 Global Step: 69710 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:54:55,887-Speed 5154.91 samples/sec Loss 3.8781 LearningRate 0.0626 Epoch: 4 Global Step: 69720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:54:57,880-Speed 5138.54 samples/sec Loss 3.9643 LearningRate 0.0626 Epoch: 4 Global Step: 69730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:54:59,881-Speed 5121.35 samples/sec Loss 3.8917 LearningRate 0.0626 Epoch: 4 Global Step: 69740 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:55:01,854-Speed 5190.98 samples/sec Loss 3.9250 LearningRate 0.0626 Epoch: 4 Global Step: 69750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:55:03,827-Speed 5192.48 samples/sec Loss 3.8882 LearningRate 0.0626 Epoch: 4 Global Step: 69760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:55:05,803-Speed 5184.53 samples/sec Loss 3.9411 LearningRate 0.0626 Epoch: 4 Global Step: 69770 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:55:07,776-Speed 5191.18 samples/sec Loss 3.9429 LearningRate 0.0626 Epoch: 4 Global Step: 69780 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:55:09,760-Speed 5162.06 samples/sec Loss 4.0036 LearningRate 0.0626 Epoch: 4 Global Step: 69790 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:55:11,727-Speed 5208.60 samples/sec Loss 3.9461 LearningRate 0.0626 Epoch: 4 Global Step: 69800 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:55:13,703-Speed 5185.03 samples/sec Loss 3.9025 LearningRate 0.0625 Epoch: 4 Global Step: 69810 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:55:15,692-Speed 5147.73 samples/sec Loss 3.9031 LearningRate 0.0625 Epoch: 4 Global Step: 69820 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:55:17,672-Speed 5173.88 samples/sec Loss 3.9098 LearningRate 0.0625 Epoch: 4 Global Step: 69830 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:55:19,654-Speed 5167.86 samples/sec Loss 3.8955 LearningRate 0.0625 Epoch: 4 Global Step: 69840 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:55:21,637-Speed 5168.00 samples/sec Loss 3.9222 LearningRate 0.0625 Epoch: 4 Global Step: 69850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:55:23,621-Speed 5161.75 samples/sec Loss 3.8921 LearningRate 0.0625 Epoch: 4 Global Step: 69860 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:55:25,595-Speed 5188.59 samples/sec Loss 3.9013 LearningRate 0.0625 Epoch: 4 Global Step: 69870 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:55:27,566-Speed 5196.03 samples/sec Loss 3.9369 LearningRate 0.0625 Epoch: 4 Global Step: 69880 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:55:29,663-Speed 5200.78 samples/sec Loss 3.9126 LearningRate 0.0625 Epoch: 4 Global Step: 69890 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:55:31,643-Speed 5174.59 samples/sec Loss 3.9284 LearningRate 0.0625 Epoch: 4 Global Step: 69900 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:55:33,645-Speed 5116.29 samples/sec Loss 3.9212 LearningRate 0.0625 Epoch: 4 Global Step: 69910 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:55:35,621-Speed 5185.03 samples/sec Loss 3.9473 LearningRate 0.0625 Epoch: 4 Global Step: 69920 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:55:37,621-Speed 5121.55 samples/sec Loss 3.8978 LearningRate 0.0625 Epoch: 4 Global Step: 69930 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:55:39,621-Speed 5121.88 samples/sec Loss 3.8239 LearningRate 0.0625 Epoch: 4 Global Step: 69940 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:55:41,593-Speed 5192.16 samples/sec Loss 3.9830 LearningRate 0.0625 Epoch: 4 Global Step: 69950 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:55:43,588-Speed 5137.26 samples/sec Loss 3.9282 LearningRate 0.0625 Epoch: 4 Global Step: 69960 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:55:45,562-Speed 5187.48 samples/sec Loss 3.8454 LearningRate 0.0625 Epoch: 4 Global Step: 69970 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:55:47,559-Speed 5129.50 samples/sec Loss 3.8213 LearningRate 0.0625 Epoch: 4 Global Step: 69980 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:55:49,544-Speed 5159.84 samples/sec Loss 3.9091 LearningRate 0.0625 Epoch: 4 Global Step: 69990 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:55:51,528-Speed 5165.07 samples/sec Loss 3.9025 LearningRate 0.0625 Epoch: 4 Global Step: 70000 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:56:18,062-[lfw][70000]XNorm: 23.345378 Training: 2022-04-11 03:56:18,063-[lfw][70000]Accuracy-Flip: 0.99767+-0.00281 Training: 2022-04-11 03:56:18,063-[lfw][70000]Accuracy-Highest: 0.99800 Training: 2022-04-11 03:56:48,836-[cfp_fp][70000]XNorm: 21.326361 Training: 2022-04-11 03:56:48,837-[cfp_fp][70000]Accuracy-Flip: 0.98029+-0.00490 Training: 2022-04-11 03:56:48,837-[cfp_fp][70000]Accuracy-Highest: 0.98029 Training: 2022-04-11 03:57:15,469-[agedb_30][70000]XNorm: 23.462302 Training: 2022-04-11 03:57:15,470-[agedb_30][70000]Accuracy-Flip: 0.97567+-0.00886 Training: 2022-04-11 03:57:15,470-[agedb_30][70000]Accuracy-Highest: 0.97717 Training: 2022-04-11 03:57:17,475-Speed 119.14 samples/sec Loss 3.9291 LearningRate 0.0625 Epoch: 4 Global Step: 70010 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:57:19,454-Speed 5176.65 samples/sec Loss 3.9006 LearningRate 0.0624 Epoch: 4 Global Step: 70020 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:57:21,439-Speed 5159.88 samples/sec Loss 3.9092 LearningRate 0.0624 Epoch: 4 Global Step: 70030 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:57:23,414-Speed 5186.04 samples/sec Loss 3.8764 LearningRate 0.0624 Epoch: 4 Global Step: 70040 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:57:25,402-Speed 5154.38 samples/sec Loss 3.9327 LearningRate 0.0624 Epoch: 4 Global Step: 70050 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:57:27,372-Speed 5198.48 samples/sec Loss 3.8382 LearningRate 0.0624 Epoch: 4 Global Step: 70060 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:57:29,348-Speed 5183.85 samples/sec Loss 3.9219 LearningRate 0.0624 Epoch: 4 Global Step: 70070 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:57:31,311-Speed 5219.70 samples/sec Loss 3.8984 LearningRate 0.0624 Epoch: 4 Global Step: 70080 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:57:33,280-Speed 5202.74 samples/sec Loss 3.9395 LearningRate 0.0624 Epoch: 4 Global Step: 70090 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:57:35,277-Speed 5130.16 samples/sec Loss 3.9732 LearningRate 0.0624 Epoch: 4 Global Step: 70100 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:57:37,264-Speed 5153.65 samples/sec Loss 3.9046 LearningRate 0.0624 Epoch: 4 Global Step: 70110 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:57:39,254-Speed 5146.91 samples/sec Loss 3.8755 LearningRate 0.0624 Epoch: 4 Global Step: 70120 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:57:41,235-Speed 5171.36 samples/sec Loss 3.9250 LearningRate 0.0624 Epoch: 4 Global Step: 70130 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:57:43,209-Speed 5189.52 samples/sec Loss 3.9168 LearningRate 0.0624 Epoch: 4 Global Step: 70140 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:57:45,192-Speed 5164.91 samples/sec Loss 3.8979 LearningRate 0.0624 Epoch: 4 Global Step: 70150 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:57:47,177-Speed 5161.32 samples/sec Loss 3.8966 LearningRate 0.0624 Epoch: 4 Global Step: 70160 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:57:49,164-Speed 5154.06 samples/sec Loss 4.0583 LearningRate 0.0624 Epoch: 4 Global Step: 70170 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:57:51,152-Speed 5153.51 samples/sec Loss 3.8743 LearningRate 0.0624 Epoch: 4 Global Step: 70180 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:57:53,139-Speed 5154.48 samples/sec Loss 3.8330 LearningRate 0.0624 Epoch: 4 Global Step: 70190 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:57:55,126-Speed 5157.90 samples/sec Loss 3.9074 LearningRate 0.0624 Epoch: 4 Global Step: 70200 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:57:57,115-Speed 5148.10 samples/sec Loss 3.9797 LearningRate 0.0624 Epoch: 4 Global Step: 70210 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:57:59,097-Speed 5167.97 samples/sec Loss 3.9498 LearningRate 0.0624 Epoch: 4 Global Step: 70220 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:58:01,087-Speed 5147.04 samples/sec Loss 3.9716 LearningRate 0.0623 Epoch: 4 Global Step: 70230 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:58:03,070-Speed 5166.07 samples/sec Loss 3.9415 LearningRate 0.0623 Epoch: 4 Global Step: 70240 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:58:05,052-Speed 5168.17 samples/sec Loss 3.9401 LearningRate 0.0623 Epoch: 4 Global Step: 70250 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:58:07,025-Speed 5191.69 samples/sec Loss 3.8289 LearningRate 0.0623 Epoch: 4 Global Step: 70260 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:58:08,997-Speed 5194.82 samples/sec Loss 3.8255 LearningRate 0.0623 Epoch: 4 Global Step: 70270 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:58:11,009-Speed 5090.71 samples/sec Loss 4.0153 LearningRate 0.0623 Epoch: 4 Global Step: 70280 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:58:12,999-Speed 5148.82 samples/sec Loss 3.8832 LearningRate 0.0623 Epoch: 4 Global Step: 70290 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:58:14,975-Speed 5185.01 samples/sec Loss 3.8668 LearningRate 0.0623 Epoch: 4 Global Step: 70300 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:58:16,957-Speed 5166.56 samples/sec Loss 3.8670 LearningRate 0.0623 Epoch: 4 Global Step: 70310 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:58:18,933-Speed 5183.23 samples/sec Loss 3.9372 LearningRate 0.0623 Epoch: 4 Global Step: 70320 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:58:20,934-Speed 5120.83 samples/sec Loss 3.9003 LearningRate 0.0623 Epoch: 4 Global Step: 70330 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:58:22,927-Speed 5139.83 samples/sec Loss 4.0046 LearningRate 0.0623 Epoch: 4 Global Step: 70340 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:58:24,926-Speed 5124.56 samples/sec Loss 3.9569 LearningRate 0.0623 Epoch: 4 Global Step: 70350 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:58:26,926-Speed 5119.75 samples/sec Loss 3.9839 LearningRate 0.0623 Epoch: 4 Global Step: 70360 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:58:28,903-Speed 5181.13 samples/sec Loss 3.9770 LearningRate 0.0623 Epoch: 4 Global Step: 70370 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:58:30,881-Speed 5180.32 samples/sec Loss 3.8129 LearningRate 0.0623 Epoch: 4 Global Step: 70380 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:58:32,852-Speed 5196.52 samples/sec Loss 3.8872 LearningRate 0.0623 Epoch: 4 Global Step: 70390 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:58:34,833-Speed 5172.71 samples/sec Loss 3.9988 LearningRate 0.0623 Epoch: 4 Global Step: 70400 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:58:36,815-Speed 5165.56 samples/sec Loss 3.9066 LearningRate 0.0623 Epoch: 4 Global Step: 70410 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:58:38,795-Speed 5175.79 samples/sec Loss 3.9004 LearningRate 0.0623 Epoch: 4 Global Step: 70420 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:58:40,781-Speed 5156.97 samples/sec Loss 3.8414 LearningRate 0.0623 Epoch: 4 Global Step: 70430 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:58:42,763-Speed 5168.46 samples/sec Loss 3.9014 LearningRate 0.0623 Epoch: 4 Global Step: 70440 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:58:44,739-Speed 5181.73 samples/sec Loss 3.9235 LearningRate 0.0622 Epoch: 4 Global Step: 70450 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:58:46,751-Speed 5092.39 samples/sec Loss 3.9145 LearningRate 0.0622 Epoch: 4 Global Step: 70460 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:58:48,729-Speed 5178.38 samples/sec Loss 3.8841 LearningRate 0.0622 Epoch: 4 Global Step: 70470 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:58:50,704-Speed 5187.02 samples/sec Loss 3.9966 LearningRate 0.0622 Epoch: 4 Global Step: 70480 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:58:52,697-Speed 5140.37 samples/sec Loss 3.8697 LearningRate 0.0622 Epoch: 4 Global Step: 70490 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:58:54,674-Speed 5181.87 samples/sec Loss 3.9583 LearningRate 0.0622 Epoch: 4 Global Step: 70500 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:58:56,644-Speed 5197.98 samples/sec Loss 3.8333 LearningRate 0.0622 Epoch: 4 Global Step: 70510 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:58:58,616-Speed 5195.09 samples/sec Loss 3.9416 LearningRate 0.0622 Epoch: 4 Global Step: 70520 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:59:00,594-Speed 5178.95 samples/sec Loss 3.8748 LearningRate 0.0622 Epoch: 4 Global Step: 70530 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:59:02,568-Speed 5188.37 samples/sec Loss 3.9267 LearningRate 0.0622 Epoch: 4 Global Step: 70540 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:59:04,551-Speed 5165.85 samples/sec Loss 3.9068 LearningRate 0.0622 Epoch: 4 Global Step: 70550 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:59:06,523-Speed 5193.48 samples/sec Loss 3.9442 LearningRate 0.0622 Epoch: 4 Global Step: 70560 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:59:08,515-Speed 5143.36 samples/sec Loss 3.9029 LearningRate 0.0622 Epoch: 4 Global Step: 70570 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 03:59:10,504-Speed 5150.25 samples/sec Loss 3.8836 LearningRate 0.0622 Epoch: 4 Global Step: 70580 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:59:12,474-Speed 5198.91 samples/sec Loss 3.9345 LearningRate 0.0622 Epoch: 4 Global Step: 70590 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:59:14,460-Speed 5159.93 samples/sec Loss 3.9156 LearningRate 0.0622 Epoch: 4 Global Step: 70600 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:59:16,432-Speed 5193.33 samples/sec Loss 3.9309 LearningRate 0.0622 Epoch: 4 Global Step: 70610 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:59:18,422-Speed 5148.55 samples/sec Loss 3.7993 LearningRate 0.0622 Epoch: 4 Global Step: 70620 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:59:20,390-Speed 5203.46 samples/sec Loss 3.8841 LearningRate 0.0622 Epoch: 4 Global Step: 70630 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:59:22,378-Speed 5152.19 samples/sec Loss 3.9061 LearningRate 0.0622 Epoch: 4 Global Step: 70640 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:59:24,353-Speed 5188.67 samples/sec Loss 3.9866 LearningRate 0.0622 Epoch: 4 Global Step: 70650 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:59:26,362-Speed 5097.68 samples/sec Loss 4.0094 LearningRate 0.0621 Epoch: 4 Global Step: 70660 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:59:28,360-Speed 5125.29 samples/sec Loss 3.9911 LearningRate 0.0621 Epoch: 4 Global Step: 70670 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:59:30,342-Speed 5169.36 samples/sec Loss 3.8279 LearningRate 0.0621 Epoch: 4 Global Step: 70680 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:59:32,307-Speed 5212.40 samples/sec Loss 3.9851 LearningRate 0.0621 Epoch: 4 Global Step: 70690 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:59:34,286-Speed 5178.06 samples/sec Loss 3.9176 LearningRate 0.0621 Epoch: 4 Global Step: 70700 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:59:36,287-Speed 5119.14 samples/sec Loss 3.9356 LearningRate 0.0621 Epoch: 4 Global Step: 70710 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:59:38,283-Speed 5132.12 samples/sec Loss 3.8864 LearningRate 0.0621 Epoch: 4 Global Step: 70720 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:59:40,264-Speed 5168.70 samples/sec Loss 3.9820 LearningRate 0.0621 Epoch: 4 Global Step: 70730 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:59:42,230-Speed 5209.95 samples/sec Loss 3.9610 LearningRate 0.0621 Epoch: 4 Global Step: 70740 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:59:44,206-Speed 5183.75 samples/sec Loss 3.9729 LearningRate 0.0621 Epoch: 4 Global Step: 70750 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:59:46,180-Speed 5189.99 samples/sec Loss 3.9196 LearningRate 0.0621 Epoch: 4 Global Step: 70760 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-11 03:59:48,163-Speed 5166.03 samples/sec Loss 3.9374 LearningRate 0.0621 Epoch: 4 Global Step: 70770 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:59:50,157-Speed 5135.88 samples/sec Loss 3.9017 LearningRate 0.0621 Epoch: 4 Global Step: 70780 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:59:52,138-Speed 5170.67 samples/sec Loss 3.9971 LearningRate 0.0621 Epoch: 4 Global Step: 70790 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:59:54,118-Speed 5175.61 samples/sec Loss 3.9240 LearningRate 0.0621 Epoch: 4 Global Step: 70800 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:59:56,101-Speed 5166.13 samples/sec Loss 3.8330 LearningRate 0.0621 Epoch: 4 Global Step: 70810 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 03:59:58,092-Speed 5144.47 samples/sec Loss 3.9466 LearningRate 0.0621 Epoch: 4 Global Step: 70820 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:00:00,073-Speed 5169.91 samples/sec Loss 3.9871 LearningRate 0.0621 Epoch: 4 Global Step: 70830 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:00:02,075-Speed 5117.74 samples/sec Loss 3.9659 LearningRate 0.0621 Epoch: 4 Global Step: 70840 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:00:04,056-Speed 5172.13 samples/sec Loss 4.0276 LearningRate 0.0621 Epoch: 4 Global Step: 70850 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:00:06,060-Speed 5110.16 samples/sec Loss 3.9508 LearningRate 0.0621 Epoch: 4 Global Step: 70860 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:00:08,025-Speed 5212.65 samples/sec Loss 3.9928 LearningRate 0.0620 Epoch: 4 Global Step: 70870 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:00:09,999-Speed 5189.37 samples/sec Loss 3.9486 LearningRate 0.0620 Epoch: 4 Global Step: 70880 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:00:11,980-Speed 5169.75 samples/sec Loss 3.9371 LearningRate 0.0620 Epoch: 4 Global Step: 70890 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:00:13,969-Speed 5149.04 samples/sec Loss 3.9093 LearningRate 0.0620 Epoch: 4 Global Step: 70900 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:00:15,941-Speed 5200.32 samples/sec Loss 3.9467 LearningRate 0.0620 Epoch: 4 Global Step: 70910 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:00:17,921-Speed 5173.49 samples/sec Loss 3.9401 LearningRate 0.0620 Epoch: 4 Global Step: 70920 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:00:19,893-Speed 5193.56 samples/sec Loss 3.9297 LearningRate 0.0620 Epoch: 4 Global Step: 70930 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:00:21,878-Speed 5162.46 samples/sec Loss 3.9824 LearningRate 0.0620 Epoch: 4 Global Step: 70940 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:00:23,855-Speed 5180.92 samples/sec Loss 4.0222 LearningRate 0.0620 Epoch: 4 Global Step: 70950 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:00:25,842-Speed 5155.41 samples/sec Loss 3.9390 LearningRate 0.0620 Epoch: 4 Global Step: 70960 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:00:27,827-Speed 5159.47 samples/sec Loss 3.9917 LearningRate 0.0620 Epoch: 4 Global Step: 70970 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:00:29,809-Speed 5167.51 samples/sec Loss 3.9044 LearningRate 0.0620 Epoch: 4 Global Step: 70980 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:00:31,804-Speed 5133.71 samples/sec Loss 3.8974 LearningRate 0.0620 Epoch: 4 Global Step: 70990 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:00:33,793-Speed 5151.29 samples/sec Loss 3.8826 LearningRate 0.0620 Epoch: 4 Global Step: 71000 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:00:35,811-Speed 5075.26 samples/sec Loss 3.9569 LearningRate 0.0620 Epoch: 4 Global Step: 71010 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:00:37,786-Speed 5187.67 samples/sec Loss 3.9741 LearningRate 0.0620 Epoch: 4 Global Step: 71020 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:00:39,784-Speed 5126.54 samples/sec Loss 3.9791 LearningRate 0.0620 Epoch: 4 Global Step: 71030 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:00:41,769-Speed 5161.09 samples/sec Loss 3.9219 LearningRate 0.0620 Epoch: 4 Global Step: 71040 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:00:43,738-Speed 5203.21 samples/sec Loss 3.8722 LearningRate 0.0620 Epoch: 4 Global Step: 71050 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:00:45,705-Speed 5206.56 samples/sec Loss 3.9014 LearningRate 0.0620 Epoch: 4 Global Step: 71060 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:00:47,676-Speed 5197.22 samples/sec Loss 3.8800 LearningRate 0.0620 Epoch: 4 Global Step: 71070 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:00:49,666-Speed 5146.53 samples/sec Loss 3.9250 LearningRate 0.0619 Epoch: 4 Global Step: 71080 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:00:51,652-Speed 5157.13 samples/sec Loss 3.9474 LearningRate 0.0619 Epoch: 4 Global Step: 71090 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:00:53,631-Speed 5178.43 samples/sec Loss 3.8812 LearningRate 0.0619 Epoch: 4 Global Step: 71100 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:00:55,602-Speed 5195.40 samples/sec Loss 3.9071 LearningRate 0.0619 Epoch: 4 Global Step: 71110 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:00:57,588-Speed 5157.59 samples/sec Loss 3.8842 LearningRate 0.0619 Epoch: 4 Global Step: 71120 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:00:59,586-Speed 5128.69 samples/sec Loss 3.9219 LearningRate 0.0619 Epoch: 4 Global Step: 71130 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:01:01,569-Speed 5166.53 samples/sec Loss 3.9801 LearningRate 0.0619 Epoch: 4 Global Step: 71140 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:01:03,550-Speed 5169.41 samples/sec Loss 3.9473 LearningRate 0.0619 Epoch: 4 Global Step: 71150 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:01:05,522-Speed 5194.60 samples/sec Loss 3.9052 LearningRate 0.0619 Epoch: 4 Global Step: 71160 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:01:07,501-Speed 5175.35 samples/sec Loss 3.9187 LearningRate 0.0619 Epoch: 4 Global Step: 71170 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:01:09,497-Speed 5133.69 samples/sec Loss 3.8578 LearningRate 0.0619 Epoch: 4 Global Step: 71180 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:01:11,487-Speed 5147.14 samples/sec Loss 3.8884 LearningRate 0.0619 Epoch: 4 Global Step: 71190 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:01:13,482-Speed 5134.08 samples/sec Loss 3.9790 LearningRate 0.0619 Epoch: 4 Global Step: 71200 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:01:15,487-Speed 5106.82 samples/sec Loss 3.9583 LearningRate 0.0619 Epoch: 4 Global Step: 71210 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:01:17,469-Speed 5169.32 samples/sec Loss 3.8836 LearningRate 0.0619 Epoch: 4 Global Step: 71220 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:01:19,450-Speed 5172.51 samples/sec Loss 3.8790 LearningRate 0.0619 Epoch: 4 Global Step: 71230 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:01:21,426-Speed 5181.97 samples/sec Loss 3.9590 LearningRate 0.0619 Epoch: 4 Global Step: 71240 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:01:23,404-Speed 5181.21 samples/sec Loss 3.9328 LearningRate 0.0619 Epoch: 4 Global Step: 71250 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:01:25,385-Speed 5169.84 samples/sec Loss 3.9356 LearningRate 0.0619 Epoch: 4 Global Step: 71260 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:01:27,375-Speed 5148.89 samples/sec Loss 3.9464 LearningRate 0.0619 Epoch: 4 Global Step: 71270 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:01:29,362-Speed 5155.40 samples/sec Loss 3.9445 LearningRate 0.0619 Epoch: 4 Global Step: 71280 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:01:31,346-Speed 5161.16 samples/sec Loss 3.9182 LearningRate 0.0618 Epoch: 4 Global Step: 71290 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:01:33,332-Speed 5157.36 samples/sec Loss 3.9512 LearningRate 0.0618 Epoch: 4 Global Step: 71300 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:01:35,313-Speed 5173.12 samples/sec Loss 3.9090 LearningRate 0.0618 Epoch: 4 Global Step: 71310 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:01:37,295-Speed 5166.26 samples/sec Loss 3.8565 LearningRate 0.0618 Epoch: 4 Global Step: 71320 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:01:39,306-Speed 5094.04 samples/sec Loss 3.9314 LearningRate 0.0618 Epoch: 4 Global Step: 71330 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:01:41,288-Speed 5170.00 samples/sec Loss 3.9065 LearningRate 0.0618 Epoch: 4 Global Step: 71340 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:01:43,259-Speed 5195.26 samples/sec Loss 4.0294 LearningRate 0.0618 Epoch: 4 Global Step: 71350 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:01:45,246-Speed 5155.36 samples/sec Loss 3.9334 LearningRate 0.0618 Epoch: 4 Global Step: 71360 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:01:47,229-Speed 5167.16 samples/sec Loss 3.9224 LearningRate 0.0618 Epoch: 4 Global Step: 71370 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:01:49,238-Speed 5098.62 samples/sec Loss 3.9901 LearningRate 0.0618 Epoch: 4 Global Step: 71380 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:01:51,217-Speed 5176.08 samples/sec Loss 3.9320 LearningRate 0.0618 Epoch: 4 Global Step: 71390 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:01:53,201-Speed 5161.27 samples/sec Loss 3.9649 LearningRate 0.0618 Epoch: 4 Global Step: 71400 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:01:55,190-Speed 5152.73 samples/sec Loss 3.9357 LearningRate 0.0618 Epoch: 4 Global Step: 71410 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:01:57,164-Speed 5189.60 samples/sec Loss 3.9403 LearningRate 0.0618 Epoch: 4 Global Step: 71420 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:01:59,146-Speed 5169.91 samples/sec Loss 3.9340 LearningRate 0.0618 Epoch: 4 Global Step: 71430 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:02:01,130-Speed 5162.95 samples/sec Loss 3.9632 LearningRate 0.0618 Epoch: 4 Global Step: 71440 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:02:03,113-Speed 5166.26 samples/sec Loss 4.0077 LearningRate 0.0618 Epoch: 4 Global Step: 71450 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:02:05,095-Speed 5168.58 samples/sec Loss 3.8956 LearningRate 0.0618 Epoch: 4 Global Step: 71460 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:02:07,066-Speed 5195.96 samples/sec Loss 3.8582 LearningRate 0.0618 Epoch: 4 Global Step: 71470 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:02:09,041-Speed 5185.55 samples/sec Loss 3.9325 LearningRate 0.0618 Epoch: 4 Global Step: 71480 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:02:11,025-Speed 5162.70 samples/sec Loss 3.9602 LearningRate 0.0618 Epoch: 4 Global Step: 71490 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:02:13,001-Speed 5184.00 samples/sec Loss 3.9658 LearningRate 0.0618 Epoch: 4 Global Step: 71500 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:02:14,977-Speed 5183.46 samples/sec Loss 3.9607 LearningRate 0.0617 Epoch: 4 Global Step: 71510 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:02:16,949-Speed 5195.40 samples/sec Loss 3.9146 LearningRate 0.0617 Epoch: 4 Global Step: 71520 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:02:18,930-Speed 5172.28 samples/sec Loss 3.9377 LearningRate 0.0617 Epoch: 4 Global Step: 71530 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:02:20,906-Speed 5183.72 samples/sec Loss 4.0334 LearningRate 0.0617 Epoch: 4 Global Step: 71540 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:02:22,880-Speed 5188.81 samples/sec Loss 3.8992 LearningRate 0.0617 Epoch: 4 Global Step: 71550 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:02:24,855-Speed 5187.17 samples/sec Loss 3.9960 LearningRate 0.0617 Epoch: 4 Global Step: 71560 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:02:26,839-Speed 5162.74 samples/sec Loss 3.9158 LearningRate 0.0617 Epoch: 4 Global Step: 71570 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:02:28,826-Speed 5154.84 samples/sec Loss 3.9468 LearningRate 0.0617 Epoch: 4 Global Step: 71580 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:02:30,812-Speed 5156.30 samples/sec Loss 3.8716 LearningRate 0.0617 Epoch: 4 Global Step: 71590 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:02:32,789-Speed 5183.18 samples/sec Loss 3.9546 LearningRate 0.0617 Epoch: 4 Global Step: 71600 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:02:34,783-Speed 5136.83 samples/sec Loss 3.9313 LearningRate 0.0617 Epoch: 4 Global Step: 71610 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:02:36,778-Speed 5133.49 samples/sec Loss 4.0442 LearningRate 0.0617 Epoch: 4 Global Step: 71620 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:02:38,769-Speed 5146.96 samples/sec Loss 3.8923 LearningRate 0.0617 Epoch: 4 Global Step: 71630 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:02:40,777-Speed 5101.29 samples/sec Loss 3.9815 LearningRate 0.0617 Epoch: 4 Global Step: 71640 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:02:42,762-Speed 5160.06 samples/sec Loss 3.9833 LearningRate 0.0617 Epoch: 4 Global Step: 71650 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:02:44,732-Speed 5199.81 samples/sec Loss 3.9701 LearningRate 0.0617 Epoch: 4 Global Step: 71660 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:02:46,706-Speed 5189.66 samples/sec Loss 3.9931 LearningRate 0.0617 Epoch: 4 Global Step: 71670 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:02:48,682-Speed 5184.00 samples/sec Loss 4.0066 LearningRate 0.0617 Epoch: 4 Global Step: 71680 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:02:50,672-Speed 5148.36 samples/sec Loss 4.0009 LearningRate 0.0617 Epoch: 4 Global Step: 71690 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:02:52,660-Speed 5153.26 samples/sec Loss 3.9304 LearningRate 0.0617 Epoch: 4 Global Step: 71700 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:02:54,659-Speed 5123.95 samples/sec Loss 3.9722 LearningRate 0.0617 Epoch: 4 Global Step: 71710 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:02:56,637-Speed 5177.46 samples/sec Loss 3.9836 LearningRate 0.0616 Epoch: 4 Global Step: 71720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:02:58,620-Speed 5167.53 samples/sec Loss 3.8471 LearningRate 0.0616 Epoch: 4 Global Step: 71730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:03:00,632-Speed 5089.79 samples/sec Loss 3.9308 LearningRate 0.0616 Epoch: 4 Global Step: 71740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:03:02,614-Speed 5170.72 samples/sec Loss 4.0040 LearningRate 0.0616 Epoch: 4 Global Step: 71750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:03:04,620-Speed 5105.15 samples/sec Loss 4.0263 LearningRate 0.0616 Epoch: 4 Global Step: 71760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:03:06,593-Speed 5192.52 samples/sec Loss 3.8960 LearningRate 0.0616 Epoch: 4 Global Step: 71770 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:03:08,590-Speed 5126.83 samples/sec Loss 3.8973 LearningRate 0.0616 Epoch: 4 Global Step: 71780 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:03:10,574-Speed 5164.91 samples/sec Loss 3.9999 LearningRate 0.0616 Epoch: 4 Global Step: 71790 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:03:12,557-Speed 5165.47 samples/sec Loss 3.8789 LearningRate 0.0616 Epoch: 4 Global Step: 71800 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:03:14,539-Speed 5167.41 samples/sec Loss 3.9670 LearningRate 0.0616 Epoch: 4 Global Step: 71810 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:03:16,525-Speed 5158.39 samples/sec Loss 3.9102 LearningRate 0.0616 Epoch: 4 Global Step: 71820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:03:18,500-Speed 5187.99 samples/sec Loss 3.9983 LearningRate 0.0616 Epoch: 4 Global Step: 71830 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:03:20,492-Speed 5140.94 samples/sec Loss 4.0362 LearningRate 0.0616 Epoch: 4 Global Step: 71840 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:03:22,490-Speed 5125.74 samples/sec Loss 3.8750 LearningRate 0.0616 Epoch: 4 Global Step: 71850 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:03:24,463-Speed 5193.72 samples/sec Loss 3.9685 LearningRate 0.0616 Epoch: 4 Global Step: 71860 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:03:26,452-Speed 5148.45 samples/sec Loss 3.9608 LearningRate 0.0616 Epoch: 4 Global Step: 71870 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:03:28,443-Speed 5145.71 samples/sec Loss 3.9550 LearningRate 0.0616 Epoch: 4 Global Step: 71880 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:03:30,445-Speed 5115.22 samples/sec Loss 3.9841 LearningRate 0.0616 Epoch: 4 Global Step: 71890 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:03:32,421-Speed 5185.85 samples/sec Loss 3.9463 LearningRate 0.0616 Epoch: 4 Global Step: 71900 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:03:34,414-Speed 5138.91 samples/sec Loss 4.0735 LearningRate 0.0616 Epoch: 4 Global Step: 71910 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:03:36,419-Speed 5109.30 samples/sec Loss 4.0088 LearningRate 0.0616 Epoch: 4 Global Step: 71920 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:03:38,413-Speed 5136.76 samples/sec Loss 3.9431 LearningRate 0.0615 Epoch: 4 Global Step: 71930 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:03:40,392-Speed 5176.26 samples/sec Loss 3.9201 LearningRate 0.0615 Epoch: 4 Global Step: 71940 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:03:42,364-Speed 5194.26 samples/sec Loss 3.9989 LearningRate 0.0615 Epoch: 4 Global Step: 71950 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:03:44,339-Speed 5186.85 samples/sec Loss 3.9833 LearningRate 0.0615 Epoch: 4 Global Step: 71960 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:03:46,314-Speed 5186.88 samples/sec Loss 3.9943 LearningRate 0.0615 Epoch: 4 Global Step: 71970 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:03:48,308-Speed 5136.38 samples/sec Loss 3.9671 LearningRate 0.0615 Epoch: 4 Global Step: 71980 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:03:50,301-Speed 5140.10 samples/sec Loss 3.9599 LearningRate 0.0615 Epoch: 4 Global Step: 71990 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:03:52,289-Speed 5152.03 samples/sec Loss 4.0073 LearningRate 0.0615 Epoch: 4 Global Step: 72000 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:04:19,163-[lfw][72000]XNorm: 23.946980 Training: 2022-04-11 04:04:19,164-[lfw][72000]Accuracy-Flip: 0.99750+-0.00291 Training: 2022-04-11 04:04:19,164-[lfw][72000]Accuracy-Highest: 0.99800 Training: 2022-04-11 04:04:49,938-[cfp_fp][72000]XNorm: 21.730928 Training: 2022-04-11 04:04:49,938-[cfp_fp][72000]Accuracy-Flip: 0.97643+-0.00554 Training: 2022-04-11 04:04:49,939-[cfp_fp][72000]Accuracy-Highest: 0.98029 Training: 2022-04-11 04:05:16,539-[agedb_30][72000]XNorm: 23.809445 Training: 2022-04-11 04:05:16,539-[agedb_30][72000]Accuracy-Flip: 0.97533+-0.00698 Training: 2022-04-11 04:05:16,540-[agedb_30][72000]Accuracy-Highest: 0.97717 Training: 2022-04-11 04:05:18,519-Speed 118.75 samples/sec Loss 3.9489 LearningRate 0.0615 Epoch: 4 Global Step: 72010 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:05:20,498-Speed 5176.88 samples/sec Loss 3.9978 LearningRate 0.0615 Epoch: 4 Global Step: 72020 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:05:22,463-Speed 5211.22 samples/sec Loss 3.9511 LearningRate 0.0615 Epoch: 4 Global Step: 72030 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:05:24,447-Speed 5163.50 samples/sec Loss 4.0188 LearningRate 0.0615 Epoch: 4 Global Step: 72040 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:05:26,419-Speed 5194.27 samples/sec Loss 3.9985 LearningRate 0.0615 Epoch: 4 Global Step: 72050 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:05:28,388-Speed 5202.62 samples/sec Loss 4.0924 LearningRate 0.0615 Epoch: 4 Global Step: 72060 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:05:30,356-Speed 5204.35 samples/sec Loss 3.9352 LearningRate 0.0615 Epoch: 4 Global Step: 72070 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:05:32,328-Speed 5194.18 samples/sec Loss 3.9670 LearningRate 0.0615 Epoch: 4 Global Step: 72080 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:05:34,294-Speed 5210.91 samples/sec Loss 3.9886 LearningRate 0.0615 Epoch: 4 Global Step: 72090 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:05:36,280-Speed 5156.65 samples/sec Loss 3.9709 LearningRate 0.0615 Epoch: 4 Global Step: 72100 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:05:38,273-Speed 5140.46 samples/sec Loss 3.9298 LearningRate 0.0615 Epoch: 4 Global Step: 72110 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:05:40,248-Speed 5186.85 samples/sec Loss 3.9830 LearningRate 0.0615 Epoch: 4 Global Step: 72120 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:05:42,221-Speed 5192.47 samples/sec Loss 4.0073 LearningRate 0.0615 Epoch: 4 Global Step: 72130 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:05:44,191-Speed 5198.52 samples/sec Loss 3.9653 LearningRate 0.0614 Epoch: 4 Global Step: 72140 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:05:46,169-Speed 5178.84 samples/sec Loss 4.0182 LearningRate 0.0614 Epoch: 4 Global Step: 72150 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:05:48,155-Speed 5159.67 samples/sec Loss 3.9587 LearningRate 0.0614 Epoch: 4 Global Step: 72160 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:05:50,157-Speed 5115.61 samples/sec Loss 3.9964 LearningRate 0.0614 Epoch: 4 Global Step: 72170 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:05:52,137-Speed 5172.74 samples/sec Loss 4.0555 LearningRate 0.0614 Epoch: 4 Global Step: 72180 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:05:54,113-Speed 5183.73 samples/sec Loss 3.9649 LearningRate 0.0614 Epoch: 4 Global Step: 72190 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:05:56,093-Speed 5175.14 samples/sec Loss 3.9655 LearningRate 0.0614 Epoch: 4 Global Step: 72200 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:05:58,073-Speed 5173.29 samples/sec Loss 3.9890 LearningRate 0.0614 Epoch: 4 Global Step: 72210 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:06:00,092-Speed 5072.85 samples/sec Loss 3.9545 LearningRate 0.0614 Epoch: 4 Global Step: 72220 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:06:02,089-Speed 5129.58 samples/sec Loss 3.9767 LearningRate 0.0614 Epoch: 4 Global Step: 72230 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:06:04,074-Speed 5160.18 samples/sec Loss 3.9724 LearningRate 0.0614 Epoch: 4 Global Step: 72240 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:06:06,055-Speed 5172.32 samples/sec Loss 3.9529 LearningRate 0.0614 Epoch: 4 Global Step: 72250 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:06:08,056-Speed 5119.55 samples/sec Loss 3.9408 LearningRate 0.0614 Epoch: 4 Global Step: 72260 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:06:10,032-Speed 5182.79 samples/sec Loss 4.0213 LearningRate 0.0614 Epoch: 4 Global Step: 72270 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:06:12,023-Speed 5143.28 samples/sec Loss 3.9800 LearningRate 0.0614 Epoch: 4 Global Step: 72280 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:06:13,993-Speed 5201.90 samples/sec Loss 4.0617 LearningRate 0.0614 Epoch: 4 Global Step: 72290 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:06:15,997-Speed 5109.19 samples/sec Loss 3.8615 LearningRate 0.0614 Epoch: 4 Global Step: 72300 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:06:17,985-Speed 5153.26 samples/sec Loss 3.8841 LearningRate 0.0614 Epoch: 4 Global Step: 72310 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:06:19,980-Speed 5133.68 samples/sec Loss 3.9744 LearningRate 0.0614 Epoch: 4 Global Step: 72320 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:06:21,965-Speed 5159.97 samples/sec Loss 3.9213 LearningRate 0.0614 Epoch: 4 Global Step: 72330 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:06:23,960-Speed 5135.82 samples/sec Loss 4.0215 LearningRate 0.0614 Epoch: 4 Global Step: 72340 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:06:25,954-Speed 5138.24 samples/sec Loss 3.9312 LearningRate 0.0614 Epoch: 4 Global Step: 72350 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:06:27,923-Speed 5201.02 samples/sec Loss 3.8674 LearningRate 0.0613 Epoch: 4 Global Step: 72360 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:06:29,897-Speed 5190.85 samples/sec Loss 4.0402 LearningRate 0.0613 Epoch: 4 Global Step: 72370 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:06:31,896-Speed 5121.92 samples/sec Loss 3.9605 LearningRate 0.0613 Epoch: 4 Global Step: 72380 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:06:33,871-Speed 5188.96 samples/sec Loss 4.0080 LearningRate 0.0613 Epoch: 4 Global Step: 72390 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:06:35,854-Speed 5163.58 samples/sec Loss 3.9764 LearningRate 0.0613 Epoch: 4 Global Step: 72400 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:06:37,840-Speed 5158.98 samples/sec Loss 3.9298 LearningRate 0.0613 Epoch: 4 Global Step: 72410 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:06:39,837-Speed 5127.99 samples/sec Loss 3.9310 LearningRate 0.0613 Epoch: 4 Global Step: 72420 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:06:41,809-Speed 5194.05 samples/sec Loss 4.0470 LearningRate 0.0613 Epoch: 4 Global Step: 72430 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:06:43,773-Speed 5215.83 samples/sec Loss 3.9459 LearningRate 0.0613 Epoch: 4 Global Step: 72440 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:06:45,752-Speed 5176.78 samples/sec Loss 3.9809 LearningRate 0.0613 Epoch: 4 Global Step: 72450 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:06:47,736-Speed 5163.17 samples/sec Loss 3.8869 LearningRate 0.0613 Epoch: 4 Global Step: 72460 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:06:49,742-Speed 5108.02 samples/sec Loss 3.9537 LearningRate 0.0613 Epoch: 4 Global Step: 72470 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:06:55,861-Speed 1673.50 samples/sec Loss 4.0051 LearningRate 0.0613 Epoch: 4 Global Step: 72480 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:06:57,856-Speed 5134.43 samples/sec Loss 3.9751 LearningRate 0.0613 Epoch: 4 Global Step: 72490 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:06:59,831-Speed 5186.24 samples/sec Loss 3.9712 LearningRate 0.0613 Epoch: 4 Global Step: 72500 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:07:01,854-Speed 5063.25 samples/sec Loss 3.9392 LearningRate 0.0613 Epoch: 4 Global Step: 72510 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:07:03,863-Speed 5101.72 samples/sec Loss 3.8961 LearningRate 0.0613 Epoch: 4 Global Step: 72520 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:07:05,866-Speed 5114.17 samples/sec Loss 3.9438 LearningRate 0.0613 Epoch: 4 Global Step: 72530 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:07:07,846-Speed 5171.40 samples/sec Loss 3.9806 LearningRate 0.0613 Epoch: 4 Global Step: 72540 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:07:09,851-Speed 5109.03 samples/sec Loss 3.8421 LearningRate 0.0613 Epoch: 4 Global Step: 72550 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:07:11,848-Speed 5129.19 samples/sec Loss 3.9734 LearningRate 0.0613 Epoch: 4 Global Step: 72560 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:07:13,852-Speed 5111.30 samples/sec Loss 3.9783 LearningRate 0.0612 Epoch: 4 Global Step: 72570 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:07:15,836-Speed 5162.77 samples/sec Loss 3.9778 LearningRate 0.0612 Epoch: 4 Global Step: 72580 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:07:17,812-Speed 5185.58 samples/sec Loss 3.9777 LearningRate 0.0612 Epoch: 4 Global Step: 72590 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:07:19,797-Speed 5166.45 samples/sec Loss 4.0525 LearningRate 0.0612 Epoch: 4 Global Step: 72600 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:07:21,785-Speed 5150.63 samples/sec Loss 3.9658 LearningRate 0.0612 Epoch: 4 Global Step: 72610 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:07:23,775-Speed 5148.31 samples/sec Loss 3.9517 LearningRate 0.0612 Epoch: 4 Global Step: 72620 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:07:25,743-Speed 5205.75 samples/sec Loss 3.9842 LearningRate 0.0612 Epoch: 4 Global Step: 72630 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:07:27,749-Speed 5106.55 samples/sec Loss 4.0222 LearningRate 0.0612 Epoch: 4 Global Step: 72640 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:07:29,728-Speed 5175.65 samples/sec Loss 3.9894 LearningRate 0.0612 Epoch: 4 Global Step: 72650 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:07:31,696-Speed 5206.09 samples/sec Loss 3.9601 LearningRate 0.0612 Epoch: 4 Global Step: 72660 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:07:33,701-Speed 5112.40 samples/sec Loss 3.9683 LearningRate 0.0612 Epoch: 4 Global Step: 72670 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:07:35,725-Speed 5059.91 samples/sec Loss 3.9269 LearningRate 0.0612 Epoch: 4 Global Step: 72680 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:07:37,735-Speed 5097.20 samples/sec Loss 3.9015 LearningRate 0.0612 Epoch: 4 Global Step: 72690 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:07:39,764-Speed 5048.36 samples/sec Loss 3.9553 LearningRate 0.0612 Epoch: 4 Global Step: 72700 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:07:41,760-Speed 5131.25 samples/sec Loss 4.0053 LearningRate 0.0612 Epoch: 4 Global Step: 72710 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-11 04:07:43,728-Speed 5204.56 samples/sec Loss 3.9740 LearningRate 0.0612 Epoch: 4 Global Step: 72720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:07:45,725-Speed 5130.57 samples/sec Loss 3.9445 LearningRate 0.0612 Epoch: 4 Global Step: 72730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:07:47,740-Speed 5084.64 samples/sec Loss 4.0025 LearningRate 0.0612 Epoch: 4 Global Step: 72740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:07:49,729-Speed 5149.16 samples/sec Loss 4.0082 LearningRate 0.0612 Epoch: 4 Global Step: 72750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:07:51,704-Speed 5185.98 samples/sec Loss 4.0030 LearningRate 0.0612 Epoch: 4 Global Step: 72760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:07:53,688-Speed 5162.04 samples/sec Loss 3.8954 LearningRate 0.0612 Epoch: 4 Global Step: 72770 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:07:55,657-Speed 5202.48 samples/sec Loss 3.9251 LearningRate 0.0611 Epoch: 4 Global Step: 72780 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:07:57,637-Speed 5172.88 samples/sec Loss 3.9256 LearningRate 0.0611 Epoch: 4 Global Step: 72790 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:07:59,617-Speed 5175.59 samples/sec Loss 3.8506 LearningRate 0.0611 Epoch: 4 Global Step: 72800 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:08:01,608-Speed 5142.46 samples/sec Loss 3.9973 LearningRate 0.0611 Epoch: 4 Global Step: 72810 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-11 04:08:03,600-Speed 5144.64 samples/sec Loss 3.9979 LearningRate 0.0611 Epoch: 4 Global Step: 72820 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:08:05,582-Speed 5166.83 samples/sec Loss 3.9337 LearningRate 0.0611 Epoch: 4 Global Step: 72830 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:08:07,568-Speed 5159.34 samples/sec Loss 3.9110 LearningRate 0.0611 Epoch: 4 Global Step: 72840 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:08:09,541-Speed 5191.91 samples/sec Loss 4.0423 LearningRate 0.0611 Epoch: 4 Global Step: 72850 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:08:11,514-Speed 5189.73 samples/sec Loss 3.9608 LearningRate 0.0611 Epoch: 4 Global Step: 72860 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:08:13,482-Speed 5205.96 samples/sec Loss 3.9994 LearningRate 0.0611 Epoch: 4 Global Step: 72870 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:08:15,460-Speed 5178.04 samples/sec Loss 3.9267 LearningRate 0.0611 Epoch: 4 Global Step: 72880 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:08:17,467-Speed 5104.53 samples/sec Loss 3.9693 LearningRate 0.0611 Epoch: 4 Global Step: 72890 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:08:19,440-Speed 5190.62 samples/sec Loss 3.9921 LearningRate 0.0611 Epoch: 4 Global Step: 72900 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:08:21,425-Speed 5160.74 samples/sec Loss 4.0375 LearningRate 0.0611 Epoch: 4 Global Step: 72910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:08:23,441-Speed 5080.06 samples/sec Loss 4.0022 LearningRate 0.0611 Epoch: 4 Global Step: 72920 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:08:25,436-Speed 5135.06 samples/sec Loss 4.0370 LearningRate 0.0611 Epoch: 4 Global Step: 72930 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:08:27,440-Speed 5112.25 samples/sec Loss 3.9274 LearningRate 0.0611 Epoch: 4 Global Step: 72940 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:08:29,433-Speed 5140.08 samples/sec Loss 3.8840 LearningRate 0.0611 Epoch: 4 Global Step: 72950 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:08:31,413-Speed 5173.93 samples/sec Loss 3.9908 LearningRate 0.0611 Epoch: 4 Global Step: 72960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:08:33,433-Speed 5071.46 samples/sec Loss 3.8547 LearningRate 0.0611 Epoch: 4 Global Step: 72970 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:08:35,441-Speed 5100.67 samples/sec Loss 3.8710 LearningRate 0.0611 Epoch: 4 Global Step: 72980 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:08:37,481-Speed 5020.20 samples/sec Loss 4.1259 LearningRate 0.0611 Epoch: 4 Global Step: 72990 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:08:39,499-Speed 5077.04 samples/sec Loss 4.0133 LearningRate 0.0610 Epoch: 4 Global Step: 73000 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:08:41,495-Speed 5131.17 samples/sec Loss 3.9948 LearningRate 0.0610 Epoch: 4 Global Step: 73010 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:08:43,483-Speed 5153.58 samples/sec Loss 3.9424 LearningRate 0.0610 Epoch: 4 Global Step: 73020 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:08:45,473-Speed 5147.76 samples/sec Loss 3.8253 LearningRate 0.0610 Epoch: 4 Global Step: 73030 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:08:47,455-Speed 5167.67 samples/sec Loss 3.8846 LearningRate 0.0610 Epoch: 4 Global Step: 73040 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:08:49,466-Speed 5094.82 samples/sec Loss 3.9685 LearningRate 0.0610 Epoch: 4 Global Step: 73050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:08:51,464-Speed 5125.88 samples/sec Loss 4.0174 LearningRate 0.0610 Epoch: 4 Global Step: 73060 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:08:53,452-Speed 5151.22 samples/sec Loss 3.9896 LearningRate 0.0610 Epoch: 4 Global Step: 73070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:08:55,426-Speed 5189.84 samples/sec Loss 3.8499 LearningRate 0.0610 Epoch: 4 Global Step: 73080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:08:57,409-Speed 5165.74 samples/sec Loss 3.9179 LearningRate 0.0610 Epoch: 4 Global Step: 73090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:08:59,431-Speed 5065.42 samples/sec Loss 3.9094 LearningRate 0.0610 Epoch: 4 Global Step: 73100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:09:01,449-Speed 5076.51 samples/sec Loss 3.9469 LearningRate 0.0610 Epoch: 4 Global Step: 73110 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:09:03,428-Speed 5176.59 samples/sec Loss 4.0332 LearningRate 0.0610 Epoch: 4 Global Step: 73120 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:09:05,403-Speed 5185.58 samples/sec Loss 4.0136 LearningRate 0.0610 Epoch: 4 Global Step: 73130 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:09:07,380-Speed 5181.88 samples/sec Loss 3.8850 LearningRate 0.0610 Epoch: 4 Global Step: 73140 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:09:09,381-Speed 5119.36 samples/sec Loss 3.9601 LearningRate 0.0610 Epoch: 4 Global Step: 73150 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:09:11,447-Speed 4957.52 samples/sec Loss 3.9188 LearningRate 0.0610 Epoch: 4 Global Step: 73160 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:09:13,468-Speed 5069.62 samples/sec Loss 3.9997 LearningRate 0.0610 Epoch: 4 Global Step: 73170 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:09:15,468-Speed 5121.05 samples/sec Loss 3.9864 LearningRate 0.0610 Epoch: 4 Global Step: 73180 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:09:17,494-Speed 5057.65 samples/sec Loss 4.0149 LearningRate 0.0610 Epoch: 4 Global Step: 73190 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:09:19,479-Speed 5159.68 samples/sec Loss 3.9951 LearningRate 0.0610 Epoch: 4 Global Step: 73200 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:09:21,483-Speed 5111.35 samples/sec Loss 3.9245 LearningRate 0.0609 Epoch: 4 Global Step: 73210 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:09:23,479-Speed 5131.27 samples/sec Loss 3.9664 LearningRate 0.0609 Epoch: 4 Global Step: 73220 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:09:25,455-Speed 5183.30 samples/sec Loss 4.0128 LearningRate 0.0609 Epoch: 4 Global Step: 73230 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:09:27,454-Speed 5123.88 samples/sec Loss 3.9465 LearningRate 0.0609 Epoch: 4 Global Step: 73240 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:09:29,438-Speed 5162.74 samples/sec Loss 4.0544 LearningRate 0.0609 Epoch: 4 Global Step: 73250 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:09:31,415-Speed 5182.01 samples/sec Loss 3.9210 LearningRate 0.0609 Epoch: 4 Global Step: 73260 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:09:33,429-Speed 5088.15 samples/sec Loss 3.9678 LearningRate 0.0609 Epoch: 4 Global Step: 73270 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:09:35,424-Speed 5133.65 samples/sec Loss 3.8712 LearningRate 0.0609 Epoch: 4 Global Step: 73280 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:09:37,438-Speed 5086.33 samples/sec Loss 3.8877 LearningRate 0.0609 Epoch: 4 Global Step: 73290 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:09:39,419-Speed 5170.11 samples/sec Loss 4.0920 LearningRate 0.0609 Epoch: 4 Global Step: 73300 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:09:41,397-Speed 5178.31 samples/sec Loss 3.9554 LearningRate 0.0609 Epoch: 4 Global Step: 73310 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:09:43,385-Speed 5152.47 samples/sec Loss 4.0430 LearningRate 0.0609 Epoch: 4 Global Step: 73320 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:09:45,387-Speed 5117.20 samples/sec Loss 3.9922 LearningRate 0.0609 Epoch: 4 Global Step: 73330 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:09:47,381-Speed 5137.17 samples/sec Loss 4.0664 LearningRate 0.0609 Epoch: 4 Global Step: 73340 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:09:49,377-Speed 5131.89 samples/sec Loss 3.9297 LearningRate 0.0609 Epoch: 4 Global Step: 73350 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:09:51,359-Speed 5167.90 samples/sec Loss 4.0420 LearningRate 0.0609 Epoch: 4 Global Step: 73360 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:09:53,342-Speed 5166.64 samples/sec Loss 3.9688 LearningRate 0.0609 Epoch: 4 Global Step: 73370 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:09:55,318-Speed 5182.65 samples/sec Loss 3.9747 LearningRate 0.0609 Epoch: 4 Global Step: 73380 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:09:57,354-Speed 5033.32 samples/sec Loss 3.8775 LearningRate 0.0609 Epoch: 4 Global Step: 73390 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:09:59,372-Speed 5073.46 samples/sec Loss 3.8831 LearningRate 0.0609 Epoch: 4 Global Step: 73400 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:10:01,351-Speed 5177.34 samples/sec Loss 3.9447 LearningRate 0.0609 Epoch: 4 Global Step: 73410 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:10:03,356-Speed 5110.12 samples/sec Loss 3.9576 LearningRate 0.0608 Epoch: 4 Global Step: 73420 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:10:05,343-Speed 5154.70 samples/sec Loss 3.9673 LearningRate 0.0608 Epoch: 4 Global Step: 73430 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:10:07,334-Speed 5143.88 samples/sec Loss 3.8805 LearningRate 0.0608 Epoch: 4 Global Step: 73440 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:10:09,314-Speed 5172.64 samples/sec Loss 3.9285 LearningRate 0.0608 Epoch: 4 Global Step: 73450 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:10:11,294-Speed 5173.33 samples/sec Loss 3.9358 LearningRate 0.0608 Epoch: 4 Global Step: 73460 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:10:13,282-Speed 5154.50 samples/sec Loss 3.9508 LearningRate 0.0608 Epoch: 4 Global Step: 73470 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:10:15,319-Speed 5028.28 samples/sec Loss 4.0252 LearningRate 0.0608 Epoch: 4 Global Step: 73480 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:10:17,300-Speed 5169.89 samples/sec Loss 3.9515 LearningRate 0.0608 Epoch: 4 Global Step: 73490 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:10:19,275-Speed 5187.71 samples/sec Loss 4.0094 LearningRate 0.0608 Epoch: 4 Global Step: 73500 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:10:21,259-Speed 5161.91 samples/sec Loss 4.0103 LearningRate 0.0608 Epoch: 4 Global Step: 73510 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:10:23,272-Speed 5089.55 samples/sec Loss 3.9499 LearningRate 0.0608 Epoch: 4 Global Step: 73520 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:10:25,297-Speed 5057.78 samples/sec Loss 4.0062 LearningRate 0.0608 Epoch: 4 Global Step: 73530 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:10:27,294-Speed 5129.25 samples/sec Loss 3.9703 LearningRate 0.0608 Epoch: 4 Global Step: 73540 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:10:29,302-Speed 5101.27 samples/sec Loss 3.8658 LearningRate 0.0608 Epoch: 4 Global Step: 73550 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:10:31,280-Speed 5178.47 samples/sec Loss 4.0095 LearningRate 0.0608 Epoch: 4 Global Step: 73560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:10:33,258-Speed 5179.00 samples/sec Loss 3.9840 LearningRate 0.0608 Epoch: 4 Global Step: 73570 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:10:35,235-Speed 5181.70 samples/sec Loss 3.9904 LearningRate 0.0608 Epoch: 4 Global Step: 73580 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:10:37,300-Speed 4960.21 samples/sec Loss 3.9884 LearningRate 0.0608 Epoch: 4 Global Step: 73590 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:10:39,292-Speed 5142.21 samples/sec Loss 3.9478 LearningRate 0.0608 Epoch: 4 Global Step: 73600 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:10:41,265-Speed 5192.55 samples/sec Loss 4.0277 LearningRate 0.0608 Epoch: 4 Global Step: 73610 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:10:43,239-Speed 5188.60 samples/sec Loss 4.0251 LearningRate 0.0608 Epoch: 4 Global Step: 73620 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:10:45,244-Speed 5109.35 samples/sec Loss 3.9502 LearningRate 0.0608 Epoch: 4 Global Step: 73630 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:10:47,224-Speed 5173.28 samples/sec Loss 4.0434 LearningRate 0.0607 Epoch: 4 Global Step: 73640 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:10:49,221-Speed 5130.13 samples/sec Loss 3.9243 LearningRate 0.0607 Epoch: 4 Global Step: 73650 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:10:51,227-Speed 5104.32 samples/sec Loss 4.0670 LearningRate 0.0607 Epoch: 4 Global Step: 73660 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:10:53,222-Speed 5134.60 samples/sec Loss 3.9566 LearningRate 0.0607 Epoch: 4 Global Step: 73670 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:10:55,204-Speed 5169.39 samples/sec Loss 3.9429 LearningRate 0.0607 Epoch: 4 Global Step: 73680 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:10:57,208-Speed 5112.41 samples/sec Loss 3.9609 LearningRate 0.0607 Epoch: 4 Global Step: 73690 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:10:59,217-Speed 5099.34 samples/sec Loss 3.8561 LearningRate 0.0607 Epoch: 4 Global Step: 73700 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:11:01,205-Speed 5150.89 samples/sec Loss 3.9875 LearningRate 0.0607 Epoch: 4 Global Step: 73710 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:11:03,188-Speed 5166.87 samples/sec Loss 3.9917 LearningRate 0.0607 Epoch: 4 Global Step: 73720 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:11:05,183-Speed 5133.80 samples/sec Loss 3.9357 LearningRate 0.0607 Epoch: 4 Global Step: 73730 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:11:07,169-Speed 5158.56 samples/sec Loss 3.9759 LearningRate 0.0607 Epoch: 4 Global Step: 73740 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:11:09,163-Speed 5136.15 samples/sec Loss 3.9834 LearningRate 0.0607 Epoch: 4 Global Step: 73750 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:11:11,174-Speed 5092.70 samples/sec Loss 4.0043 LearningRate 0.0607 Epoch: 4 Global Step: 73760 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:11:13,189-Speed 5082.73 samples/sec Loss 3.9453 LearningRate 0.0607 Epoch: 4 Global Step: 73770 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:11:15,215-Speed 5057.17 samples/sec Loss 4.0053 LearningRate 0.0607 Epoch: 4 Global Step: 73780 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:11:17,202-Speed 5155.48 samples/sec Loss 3.9478 LearningRate 0.0607 Epoch: 4 Global Step: 73790 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:11:19,181-Speed 5176.22 samples/sec Loss 4.0097 LearningRate 0.0607 Epoch: 4 Global Step: 73800 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:11:21,181-Speed 5121.93 samples/sec Loss 3.9451 LearningRate 0.0607 Epoch: 4 Global Step: 73810 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:11:23,201-Speed 5070.40 samples/sec Loss 3.9606 LearningRate 0.0607 Epoch: 4 Global Step: 73820 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:11:25,180-Speed 5176.79 samples/sec Loss 3.9631 LearningRate 0.0607 Epoch: 4 Global Step: 73830 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:11:27,160-Speed 5174.25 samples/sec Loss 3.9987 LearningRate 0.0607 Epoch: 4 Global Step: 73840 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:11:29,148-Speed 5151.12 samples/sec Loss 3.8920 LearningRate 0.0606 Epoch: 4 Global Step: 73850 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:11:31,132-Speed 5164.23 samples/sec Loss 3.9582 LearningRate 0.0606 Epoch: 4 Global Step: 73860 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:11:33,147-Speed 5081.69 samples/sec Loss 4.0024 LearningRate 0.0606 Epoch: 4 Global Step: 73870 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:11:35,168-Speed 5068.78 samples/sec Loss 3.9613 LearningRate 0.0606 Epoch: 4 Global Step: 73880 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:11:37,160-Speed 5144.19 samples/sec Loss 3.9954 LearningRate 0.0606 Epoch: 4 Global Step: 73890 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:11:39,177-Speed 5078.47 samples/sec Loss 3.9534 LearningRate 0.0606 Epoch: 4 Global Step: 73900 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:11:41,178-Speed 5117.66 samples/sec Loss 3.9333 LearningRate 0.0606 Epoch: 4 Global Step: 73910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:11:43,155-Speed 5180.82 samples/sec Loss 3.9416 LearningRate 0.0606 Epoch: 4 Global Step: 73920 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:11:45,150-Speed 5136.10 samples/sec Loss 3.9434 LearningRate 0.0606 Epoch: 4 Global Step: 73930 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:11:47,149-Speed 5122.60 samples/sec Loss 3.9676 LearningRate 0.0606 Epoch: 4 Global Step: 73940 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:11:49,142-Speed 5140.45 samples/sec Loss 3.9963 LearningRate 0.0606 Epoch: 4 Global Step: 73950 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:11:51,167-Speed 5057.38 samples/sec Loss 4.0367 LearningRate 0.0606 Epoch: 4 Global Step: 73960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:11:53,146-Speed 5177.84 samples/sec Loss 3.9030 LearningRate 0.0606 Epoch: 4 Global Step: 73970 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:11:55,122-Speed 5181.87 samples/sec Loss 4.0021 LearningRate 0.0606 Epoch: 4 Global Step: 73980 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:11:57,112-Speed 5150.03 samples/sec Loss 3.9317 LearningRate 0.0606 Epoch: 4 Global Step: 73990 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:11:59,136-Speed 5060.03 samples/sec Loss 3.9963 LearningRate 0.0606 Epoch: 4 Global Step: 74000 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:12:25,845-[lfw][74000]XNorm: 21.764229 Training: 2022-04-11 04:12:25,845-[lfw][74000]Accuracy-Flip: 0.99733+-0.00300 Training: 2022-04-11 04:12:25,846-[lfw][74000]Accuracy-Highest: 0.99800 Training: 2022-04-11 04:12:56,775-[cfp_fp][74000]XNorm: 19.641552 Training: 2022-04-11 04:12:56,776-[cfp_fp][74000]Accuracy-Flip: 0.97414+-0.00724 Training: 2022-04-11 04:12:56,777-[cfp_fp][74000]Accuracy-Highest: 0.98029 Training: 2022-04-11 04:13:23,511-[agedb_30][74000]XNorm: 21.830857 Training: 2022-04-11 04:13:23,511-[agedb_30][74000]Accuracy-Flip: 0.97717+-0.00796 Training: 2022-04-11 04:13:23,511-[agedb_30][74000]Accuracy-Highest: 0.97717 Training: 2022-04-11 04:13:25,524-Speed 118.54 samples/sec Loss 3.9572 LearningRate 0.0606 Epoch: 4 Global Step: 74010 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:13:27,518-Speed 5138.98 samples/sec Loss 3.9125 LearningRate 0.0606 Epoch: 4 Global Step: 74020 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:13:29,499-Speed 5170.47 samples/sec Loss 3.9402 LearningRate 0.0606 Epoch: 4 Global Step: 74030 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:13:31,491-Speed 5142.33 samples/sec Loss 3.9504 LearningRate 0.0606 Epoch: 4 Global Step: 74040 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:13:33,479-Speed 5152.31 samples/sec Loss 4.0858 LearningRate 0.0606 Epoch: 4 Global Step: 74050 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:13:35,459-Speed 5172.62 samples/sec Loss 3.8907 LearningRate 0.0606 Epoch: 4 Global Step: 74060 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:13:37,476-Speed 5078.19 samples/sec Loss 3.9447 LearningRate 0.0605 Epoch: 4 Global Step: 74070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:13:39,450-Speed 5189.96 samples/sec Loss 3.9509 LearningRate 0.0605 Epoch: 4 Global Step: 74080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:13:41,420-Speed 5199.54 samples/sec Loss 3.9584 LearningRate 0.0605 Epoch: 4 Global Step: 74090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:13:43,405-Speed 5161.10 samples/sec Loss 3.9631 LearningRate 0.0605 Epoch: 4 Global Step: 74100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:13:45,381-Speed 5183.25 samples/sec Loss 3.9666 LearningRate 0.0605 Epoch: 4 Global Step: 74110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:13:47,365-Speed 5162.17 samples/sec Loss 4.0480 LearningRate 0.0605 Epoch: 4 Global Step: 74120 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:13:49,352-Speed 5157.18 samples/sec Loss 4.0100 LearningRate 0.0605 Epoch: 4 Global Step: 74130 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:13:51,339-Speed 5154.69 samples/sec Loss 4.0391 LearningRate 0.0605 Epoch: 4 Global Step: 74140 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:13:53,337-Speed 5126.41 samples/sec Loss 3.9809 LearningRate 0.0605 Epoch: 4 Global Step: 74150 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:13:55,321-Speed 5162.29 samples/sec Loss 3.9390 LearningRate 0.0605 Epoch: 4 Global Step: 74160 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:13:57,307-Speed 5158.90 samples/sec Loss 3.9220 LearningRate 0.0605 Epoch: 4 Global Step: 74170 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:13:59,290-Speed 5166.13 samples/sec Loss 3.9599 LearningRate 0.0605 Epoch: 4 Global Step: 74180 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:14:01,292-Speed 5114.54 samples/sec Loss 3.9676 LearningRate 0.0605 Epoch: 4 Global Step: 74190 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:14:03,293-Speed 5119.04 samples/sec Loss 4.0195 LearningRate 0.0605 Epoch: 4 Global Step: 74200 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:14:05,293-Speed 5122.32 samples/sec Loss 4.0618 LearningRate 0.0605 Epoch: 4 Global Step: 74210 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:14:07,269-Speed 5184.41 samples/sec Loss 3.9958 LearningRate 0.0605 Epoch: 4 Global Step: 74220 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:14:09,246-Speed 5181.25 samples/sec Loss 4.0113 LearningRate 0.0605 Epoch: 4 Global Step: 74230 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:14:11,249-Speed 5113.25 samples/sec Loss 4.1255 LearningRate 0.0605 Epoch: 4 Global Step: 74240 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:14:13,257-Speed 5100.96 samples/sec Loss 3.9296 LearningRate 0.0605 Epoch: 4 Global Step: 74250 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:14:15,244-Speed 5156.28 samples/sec Loss 3.9993 LearningRate 0.0605 Epoch: 4 Global Step: 74260 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:14:17,217-Speed 5191.49 samples/sec Loss 3.8622 LearningRate 0.0605 Epoch: 4 Global Step: 74270 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:14:19,189-Speed 5193.73 samples/sec Loss 4.0109 LearningRate 0.0604 Epoch: 4 Global Step: 74280 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:14:21,188-Speed 5124.36 samples/sec Loss 3.9091 LearningRate 0.0604 Epoch: 4 Global Step: 74290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:14:23,187-Speed 5123.76 samples/sec Loss 3.9424 LearningRate 0.0604 Epoch: 4 Global Step: 74300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:14:25,175-Speed 5153.00 samples/sec Loss 4.0014 LearningRate 0.0604 Epoch: 4 Global Step: 74310 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:14:27,200-Speed 5059.26 samples/sec Loss 3.9801 LearningRate 0.0604 Epoch: 4 Global Step: 74320 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:14:29,175-Speed 5187.27 samples/sec Loss 4.0537 LearningRate 0.0604 Epoch: 4 Global Step: 74330 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:14:31,147-Speed 5194.65 samples/sec Loss 3.9158 LearningRate 0.0604 Epoch: 4 Global Step: 74340 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:14:33,148-Speed 5119.64 samples/sec Loss 3.9791 LearningRate 0.0604 Epoch: 4 Global Step: 74350 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:14:35,142-Speed 5136.45 samples/sec Loss 4.0214 LearningRate 0.0604 Epoch: 4 Global Step: 74360 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:14:37,120-Speed 5177.47 samples/sec Loss 3.9936 LearningRate 0.0604 Epoch: 4 Global Step: 74370 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:14:39,154-Speed 5037.29 samples/sec Loss 4.0559 LearningRate 0.0604 Epoch: 4 Global Step: 74380 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:14:41,167-Speed 5087.56 samples/sec Loss 3.9448 LearningRate 0.0604 Epoch: 4 Global Step: 74390 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:14:43,153-Speed 5158.12 samples/sec Loss 3.9842 LearningRate 0.0604 Epoch: 4 Global Step: 74400 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:14:45,122-Speed 5201.04 samples/sec Loss 4.0491 LearningRate 0.0604 Epoch: 4 Global Step: 74410 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:14:47,089-Speed 5207.47 samples/sec Loss 3.9380 LearningRate 0.0604 Epoch: 4 Global Step: 74420 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:14:49,076-Speed 5155.32 samples/sec Loss 3.9867 LearningRate 0.0604 Epoch: 4 Global Step: 74430 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:14:51,078-Speed 5117.13 samples/sec Loss 3.9302 LearningRate 0.0604 Epoch: 4 Global Step: 74440 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:14:53,067-Speed 5152.27 samples/sec Loss 4.0527 LearningRate 0.0604 Epoch: 4 Global Step: 74450 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:14:55,045-Speed 5178.39 samples/sec Loss 4.0387 LearningRate 0.0604 Epoch: 4 Global Step: 74460 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:14:57,065-Speed 5069.91 samples/sec Loss 3.9559 LearningRate 0.0604 Epoch: 4 Global Step: 74470 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:14:59,045-Speed 5174.47 samples/sec Loss 3.9928 LearningRate 0.0604 Epoch: 4 Global Step: 74480 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:15:01,021-Speed 5182.06 samples/sec Loss 4.0011 LearningRate 0.0604 Epoch: 4 Global Step: 74490 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:15:02,992-Speed 5197.92 samples/sec Loss 4.0411 LearningRate 0.0603 Epoch: 4 Global Step: 74500 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:15:04,969-Speed 5181.55 samples/sec Loss 3.9338 LearningRate 0.0603 Epoch: 4 Global Step: 74510 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:15:06,950-Speed 5169.44 samples/sec Loss 4.0289 LearningRate 0.0603 Epoch: 4 Global Step: 74520 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:15:08,950-Speed 5121.32 samples/sec Loss 3.9522 LearningRate 0.0603 Epoch: 4 Global Step: 74530 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:15:10,952-Speed 5118.28 samples/sec Loss 4.0129 LearningRate 0.0603 Epoch: 4 Global Step: 74540 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:15:12,980-Speed 5049.28 samples/sec Loss 3.8843 LearningRate 0.0603 Epoch: 4 Global Step: 74550 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:15:14,964-Speed 5163.41 samples/sec Loss 3.8774 LearningRate 0.0603 Epoch: 4 Global Step: 74560 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:15:16,978-Speed 5086.55 samples/sec Loss 4.0755 LearningRate 0.0603 Epoch: 4 Global Step: 74570 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:15:18,949-Speed 5198.63 samples/sec Loss 3.9772 LearningRate 0.0603 Epoch: 4 Global Step: 74580 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:15:20,927-Speed 5178.71 samples/sec Loss 4.0084 LearningRate 0.0603 Epoch: 4 Global Step: 74590 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:15:22,930-Speed 5111.57 samples/sec Loss 3.9807 LearningRate 0.0603 Epoch: 4 Global Step: 74600 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:15:24,920-Speed 5147.48 samples/sec Loss 3.9331 LearningRate 0.0603 Epoch: 4 Global Step: 74610 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-04-11 04:15:26,899-Speed 5176.58 samples/sec Loss 4.0413 LearningRate 0.0603 Epoch: 4 Global Step: 74620 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:15:28,891-Speed 5142.74 samples/sec Loss 3.9508 LearningRate 0.0603 Epoch: 4 Global Step: 74630 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:15:30,866-Speed 5186.29 samples/sec Loss 3.9368 LearningRate 0.0603 Epoch: 4 Global Step: 74640 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:15:32,845-Speed 5175.81 samples/sec Loss 3.9941 LearningRate 0.0603 Epoch: 4 Global Step: 74650 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:15:34,855-Speed 5096.88 samples/sec Loss 3.9900 LearningRate 0.0603 Epoch: 4 Global Step: 74660 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:15:36,824-Speed 5202.54 samples/sec Loss 3.9997 LearningRate 0.0603 Epoch: 4 Global Step: 74670 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:15:38,798-Speed 5190.10 samples/sec Loss 3.9597 LearningRate 0.0603 Epoch: 4 Global Step: 74680 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:15:40,781-Speed 5165.81 samples/sec Loss 4.0062 LearningRate 0.0603 Epoch: 4 Global Step: 74690 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:15:42,755-Speed 5187.01 samples/sec Loss 3.9435 LearningRate 0.0603 Epoch: 4 Global Step: 74700 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:15:44,726-Speed 5197.18 samples/sec Loss 3.9664 LearningRate 0.0602 Epoch: 4 Global Step: 74710 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:15:46,711-Speed 5161.86 samples/sec Loss 3.9177 LearningRate 0.0602 Epoch: 4 Global Step: 74720 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:15:48,698-Speed 5155.46 samples/sec Loss 3.9126 LearningRate 0.0602 Epoch: 4 Global Step: 74730 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:15:50,692-Speed 5135.85 samples/sec Loss 3.8392 LearningRate 0.0602 Epoch: 4 Global Step: 74740 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:15:52,714-Speed 5065.99 samples/sec Loss 4.0127 LearningRate 0.0602 Epoch: 4 Global Step: 74750 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:15:54,691-Speed 5180.71 samples/sec Loss 4.0251 LearningRate 0.0602 Epoch: 4 Global Step: 74760 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:15:56,680-Speed 5152.74 samples/sec Loss 3.9862 LearningRate 0.0602 Epoch: 4 Global Step: 74770 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:15:58,647-Speed 5205.25 samples/sec Loss 3.9813 LearningRate 0.0602 Epoch: 4 Global Step: 74780 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:16:00,649-Speed 5118.00 samples/sec Loss 3.9526 LearningRate 0.0602 Epoch: 4 Global Step: 74790 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:16:02,633-Speed 5161.10 samples/sec Loss 3.9396 LearningRate 0.0602 Epoch: 4 Global Step: 74800 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:16:04,606-Speed 5192.66 samples/sec Loss 3.9371 LearningRate 0.0602 Epoch: 4 Global Step: 74810 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:16:06,601-Speed 5134.51 samples/sec Loss 4.0516 LearningRate 0.0602 Epoch: 4 Global Step: 74820 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:16:08,585-Speed 5162.35 samples/sec Loss 3.9832 LearningRate 0.0602 Epoch: 4 Global Step: 74830 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:16:10,566-Speed 5171.20 samples/sec Loss 4.0178 LearningRate 0.0602 Epoch: 4 Global Step: 74840 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:16:12,534-Speed 5205.84 samples/sec Loss 3.9593 LearningRate 0.0602 Epoch: 4 Global Step: 74850 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:16:14,511-Speed 5181.90 samples/sec Loss 3.9353 LearningRate 0.0602 Epoch: 4 Global Step: 74860 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:16:16,500-Speed 5149.63 samples/sec Loss 3.9925 LearningRate 0.0602 Epoch: 4 Global Step: 74870 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:16:18,473-Speed 5192.76 samples/sec Loss 4.0435 LearningRate 0.0602 Epoch: 4 Global Step: 74880 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:16:20,449-Speed 5182.80 samples/sec Loss 3.9091 LearningRate 0.0602 Epoch: 4 Global Step: 74890 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:16:22,458-Speed 5097.53 samples/sec Loss 4.0556 LearningRate 0.0602 Epoch: 4 Global Step: 74900 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:16:24,457-Speed 5126.14 samples/sec Loss 4.0080 LearningRate 0.0602 Epoch: 4 Global Step: 74910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:16:26,473-Speed 5081.28 samples/sec Loss 4.0445 LearningRate 0.0602 Epoch: 4 Global Step: 74920 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:16:28,449-Speed 5182.32 samples/sec Loss 3.9861 LearningRate 0.0601 Epoch: 4 Global Step: 74930 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:16:30,428-Speed 5175.35 samples/sec Loss 3.9877 LearningRate 0.0601 Epoch: 4 Global Step: 74940 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:16:32,396-Speed 5205.23 samples/sec Loss 3.9219 LearningRate 0.0601 Epoch: 4 Global Step: 74950 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:16:34,384-Speed 5155.20 samples/sec Loss 3.9443 LearningRate 0.0601 Epoch: 4 Global Step: 74960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:16:36,397-Speed 5088.69 samples/sec Loss 3.9917 LearningRate 0.0601 Epoch: 4 Global Step: 74970 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:16:38,389-Speed 5140.57 samples/sec Loss 3.9338 LearningRate 0.0601 Epoch: 4 Global Step: 74980 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:16:40,442-Speed 4989.57 samples/sec Loss 4.0057 LearningRate 0.0601 Epoch: 4 Global Step: 74990 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:16:42,408-Speed 5209.27 samples/sec Loss 3.9972 LearningRate 0.0601 Epoch: 4 Global Step: 75000 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:16:44,390-Speed 5169.78 samples/sec Loss 4.0907 LearningRate 0.0601 Epoch: 4 Global Step: 75010 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:16:46,391-Speed 5117.48 samples/sec Loss 4.0186 LearningRate 0.0601 Epoch: 4 Global Step: 75020 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:16:48,380-Speed 5151.73 samples/sec Loss 3.9801 LearningRate 0.0601 Epoch: 4 Global Step: 75030 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:16:50,375-Speed 5132.52 samples/sec Loss 3.9880 LearningRate 0.0601 Epoch: 4 Global Step: 75040 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:16:52,387-Speed 5092.49 samples/sec Loss 3.9540 LearningRate 0.0601 Epoch: 4 Global Step: 75050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:16:54,368-Speed 5169.92 samples/sec Loss 3.9578 LearningRate 0.0601 Epoch: 4 Global Step: 75060 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:16:56,352-Speed 5165.55 samples/sec Loss 3.8963 LearningRate 0.0601 Epoch: 4 Global Step: 75070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:16:58,347-Speed 5132.80 samples/sec Loss 3.8903 LearningRate 0.0601 Epoch: 4 Global Step: 75080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:17:00,341-Speed 5137.00 samples/sec Loss 4.0038 LearningRate 0.0601 Epoch: 4 Global Step: 75090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:17:02,316-Speed 5186.67 samples/sec Loss 3.9644 LearningRate 0.0601 Epoch: 4 Global Step: 75100 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:17:04,289-Speed 5190.87 samples/sec Loss 3.9890 LearningRate 0.0601 Epoch: 4 Global Step: 75110 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:17:06,280-Speed 5147.05 samples/sec Loss 3.9452 LearningRate 0.0601 Epoch: 4 Global Step: 75120 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:17:08,283-Speed 5111.92 samples/sec Loss 3.9955 LearningRate 0.0601 Epoch: 4 Global Step: 75130 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:17:10,285-Speed 5118.88 samples/sec Loss 3.9901 LearningRate 0.0600 Epoch: 4 Global Step: 75140 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:17:12,269-Speed 5162.41 samples/sec Loss 4.0317 LearningRate 0.0600 Epoch: 4 Global Step: 75150 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:17:14,262-Speed 5140.20 samples/sec Loss 3.9573 LearningRate 0.0600 Epoch: 4 Global Step: 75160 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:17:16,243-Speed 5169.26 samples/sec Loss 3.9468 LearningRate 0.0600 Epoch: 4 Global Step: 75170 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:17:18,216-Speed 5191.34 samples/sec Loss 3.9776 LearningRate 0.0600 Epoch: 4 Global Step: 75180 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:17:20,201-Speed 5160.11 samples/sec Loss 3.9276 LearningRate 0.0600 Epoch: 4 Global Step: 75190 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:17:22,178-Speed 5183.06 samples/sec Loss 4.0087 LearningRate 0.0600 Epoch: 4 Global Step: 75200 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:17:24,185-Speed 5102.98 samples/sec Loss 3.9841 LearningRate 0.0600 Epoch: 4 Global Step: 75210 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:17:26,160-Speed 5186.35 samples/sec Loss 4.0006 LearningRate 0.0600 Epoch: 4 Global Step: 75220 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:17:28,157-Speed 5129.68 samples/sec Loss 3.9657 LearningRate 0.0600 Epoch: 4 Global Step: 75230 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:17:30,128-Speed 5195.94 samples/sec Loss 3.9973 LearningRate 0.0600 Epoch: 4 Global Step: 75240 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:17:32,099-Speed 5199.04 samples/sec Loss 3.9300 LearningRate 0.0600 Epoch: 4 Global Step: 75250 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:17:34,094-Speed 5133.98 samples/sec Loss 3.8989 LearningRate 0.0600 Epoch: 4 Global Step: 75260 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:17:36,109-Speed 5084.87 samples/sec Loss 3.9732 LearningRate 0.0600 Epoch: 4 Global Step: 75270 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:17:38,120-Speed 5092.65 samples/sec Loss 3.9649 LearningRate 0.0600 Epoch: 4 Global Step: 75280 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:17:40,107-Speed 5156.24 samples/sec Loss 4.0348 LearningRate 0.0600 Epoch: 4 Global Step: 75290 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:17:42,086-Speed 5175.11 samples/sec Loss 3.9923 LearningRate 0.0600 Epoch: 4 Global Step: 75300 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:17:44,076-Speed 5147.72 samples/sec Loss 3.9921 LearningRate 0.0600 Epoch: 4 Global Step: 75310 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:17:46,061-Speed 5160.81 samples/sec Loss 4.0384 LearningRate 0.0600 Epoch: 4 Global Step: 75320 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:17:48,049-Speed 5150.41 samples/sec Loss 3.9629 LearningRate 0.0600 Epoch: 4 Global Step: 75330 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:17:50,036-Speed 5156.53 samples/sec Loss 3.9481 LearningRate 0.0600 Epoch: 4 Global Step: 75340 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:17:52,009-Speed 5190.64 samples/sec Loss 3.9502 LearningRate 0.0600 Epoch: 4 Global Step: 75350 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:17:53,986-Speed 5182.55 samples/sec Loss 3.9799 LearningRate 0.0599 Epoch: 4 Global Step: 75360 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:17:55,952-Speed 5210.26 samples/sec Loss 3.9357 LearningRate 0.0599 Epoch: 4 Global Step: 75370 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:17:57,939-Speed 5155.74 samples/sec Loss 3.9067 LearningRate 0.0599 Epoch: 4 Global Step: 75380 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:17:59,936-Speed 5127.73 samples/sec Loss 3.9975 LearningRate 0.0599 Epoch: 4 Global Step: 75390 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:18:01,920-Speed 5165.96 samples/sec Loss 3.9379 LearningRate 0.0599 Epoch: 4 Global Step: 75400 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:18:03,919-Speed 5123.97 samples/sec Loss 4.0316 LearningRate 0.0599 Epoch: 4 Global Step: 75410 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:18:05,934-Speed 5081.67 samples/sec Loss 3.9462 LearningRate 0.0599 Epoch: 4 Global Step: 75420 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:18:07,910-Speed 5185.21 samples/sec Loss 4.0030 LearningRate 0.0599 Epoch: 4 Global Step: 75430 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:18:09,904-Speed 5137.76 samples/sec Loss 4.0941 LearningRate 0.0599 Epoch: 4 Global Step: 75440 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:18:11,886-Speed 5165.65 samples/sec Loss 4.0358 LearningRate 0.0599 Epoch: 4 Global Step: 75450 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:18:13,885-Speed 5126.42 samples/sec Loss 4.0364 LearningRate 0.0599 Epoch: 4 Global Step: 75460 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:18:15,863-Speed 5178.63 samples/sec Loss 4.0259 LearningRate 0.0599 Epoch: 4 Global Step: 75470 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:18:17,866-Speed 5114.44 samples/sec Loss 3.9758 LearningRate 0.0599 Epoch: 4 Global Step: 75480 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:18:19,842-Speed 5184.20 samples/sec Loss 3.9428 LearningRate 0.0599 Epoch: 4 Global Step: 75490 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:18:21,834-Speed 5140.38 samples/sec Loss 4.0285 LearningRate 0.0599 Epoch: 4 Global Step: 75500 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:18:23,816-Speed 5168.79 samples/sec Loss 3.9841 LearningRate 0.0599 Epoch: 4 Global Step: 75510 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:18:25,810-Speed 5136.58 samples/sec Loss 4.0019 LearningRate 0.0599 Epoch: 4 Global Step: 75520 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:18:27,801-Speed 5146.83 samples/sec Loss 3.9810 LearningRate 0.0599 Epoch: 4 Global Step: 75530 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:18:29,777-Speed 5181.47 samples/sec Loss 3.9340 LearningRate 0.0599 Epoch: 4 Global Step: 75540 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:18:31,757-Speed 5175.48 samples/sec Loss 3.9199 LearningRate 0.0599 Epoch: 4 Global Step: 75550 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:18:33,749-Speed 5142.02 samples/sec Loss 4.0706 LearningRate 0.0599 Epoch: 4 Global Step: 75560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:18:35,722-Speed 5190.90 samples/sec Loss 3.9243 LearningRate 0.0598 Epoch: 4 Global Step: 75570 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:18:37,700-Speed 5178.73 samples/sec Loss 3.8964 LearningRate 0.0598 Epoch: 4 Global Step: 75580 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:18:39,705-Speed 5110.79 samples/sec Loss 4.0039 LearningRate 0.0598 Epoch: 4 Global Step: 75590 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:18:41,681-Speed 5182.65 samples/sec Loss 4.0204 LearningRate 0.0598 Epoch: 4 Global Step: 75600 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:18:43,664-Speed 5167.04 samples/sec Loss 3.8892 LearningRate 0.0598 Epoch: 4 Global Step: 75610 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:18:45,639-Speed 5185.98 samples/sec Loss 3.9646 LearningRate 0.0598 Epoch: 4 Global Step: 75620 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:18:47,633-Speed 5136.88 samples/sec Loss 3.8909 LearningRate 0.0598 Epoch: 4 Global Step: 75630 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:18:49,629-Speed 5130.69 samples/sec Loss 4.0090 LearningRate 0.0598 Epoch: 4 Global Step: 75640 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:18:51,615-Speed 5158.25 samples/sec Loss 3.9748 LearningRate 0.0598 Epoch: 4 Global Step: 75650 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:18:53,607-Speed 5143.45 samples/sec Loss 3.9922 LearningRate 0.0598 Epoch: 4 Global Step: 75660 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:18:55,576-Speed 5201.67 samples/sec Loss 3.9507 LearningRate 0.0598 Epoch: 4 Global Step: 75670 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:18:57,574-Speed 5126.29 samples/sec Loss 3.9682 LearningRate 0.0598 Epoch: 4 Global Step: 75680 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:18:59,600-Speed 5057.18 samples/sec Loss 3.9522 LearningRate 0.0598 Epoch: 4 Global Step: 75690 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:19:01,578-Speed 5177.23 samples/sec Loss 3.9503 LearningRate 0.0598 Epoch: 4 Global Step: 75700 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:19:03,554-Speed 5184.82 samples/sec Loss 3.9651 LearningRate 0.0598 Epoch: 4 Global Step: 75710 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:19:05,531-Speed 5181.47 samples/sec Loss 3.9620 LearningRate 0.0598 Epoch: 4 Global Step: 75720 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:19:07,512-Speed 5169.02 samples/sec Loss 4.0308 LearningRate 0.0598 Epoch: 4 Global Step: 75730 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:19:09,518-Speed 5108.17 samples/sec Loss 3.9363 LearningRate 0.0598 Epoch: 4 Global Step: 75740 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:19:11,530-Speed 5090.02 samples/sec Loss 3.8509 LearningRate 0.0598 Epoch: 4 Global Step: 75750 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:19:13,550-Speed 5071.08 samples/sec Loss 3.9349 LearningRate 0.0598 Epoch: 4 Global Step: 75760 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:19:15,541-Speed 5144.48 samples/sec Loss 3.9536 LearningRate 0.0598 Epoch: 4 Global Step: 75770 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:19:17,530-Speed 5149.78 samples/sec Loss 3.9991 LearningRate 0.0598 Epoch: 4 Global Step: 75780 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:19:19,497-Speed 5209.42 samples/sec Loss 3.9377 LearningRate 0.0597 Epoch: 4 Global Step: 75790 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:19:21,485-Speed 5152.18 samples/sec Loss 3.9273 LearningRate 0.0597 Epoch: 4 Global Step: 75800 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:19:23,462-Speed 5180.47 samples/sec Loss 3.9423 LearningRate 0.0597 Epoch: 4 Global Step: 75810 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:19:25,448-Speed 5159.37 samples/sec Loss 3.9180 LearningRate 0.0597 Epoch: 4 Global Step: 75820 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:19:27,432-Speed 5160.83 samples/sec Loss 3.9594 LearningRate 0.0597 Epoch: 4 Global Step: 75830 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:19:29,420-Speed 5154.64 samples/sec Loss 3.9392 LearningRate 0.0597 Epoch: 4 Global Step: 75840 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:19:31,422-Speed 5116.62 samples/sec Loss 4.1160 LearningRate 0.0597 Epoch: 4 Global Step: 75850 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:19:33,415-Speed 5137.96 samples/sec Loss 3.9744 LearningRate 0.0597 Epoch: 4 Global Step: 75860 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:19:35,434-Speed 5073.98 samples/sec Loss 3.8902 LearningRate 0.0597 Epoch: 4 Global Step: 75870 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:19:37,436-Speed 5117.08 samples/sec Loss 3.9559 LearningRate 0.0597 Epoch: 4 Global Step: 75880 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:19:39,426-Speed 5149.14 samples/sec Loss 4.0420 LearningRate 0.0597 Epoch: 4 Global Step: 75890 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:19:41,416-Speed 5147.81 samples/sec Loss 3.9321 LearningRate 0.0597 Epoch: 4 Global Step: 75900 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:19:43,387-Speed 5196.06 samples/sec Loss 3.9900 LearningRate 0.0597 Epoch: 4 Global Step: 75910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:19:45,382-Speed 5133.82 samples/sec Loss 3.9922 LearningRate 0.0597 Epoch: 4 Global Step: 75920 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:19:47,361-Speed 5175.65 samples/sec Loss 3.8546 LearningRate 0.0597 Epoch: 4 Global Step: 75930 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:19:49,356-Speed 5134.79 samples/sec Loss 3.8792 LearningRate 0.0597 Epoch: 4 Global Step: 75940 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:19:51,406-Speed 4997.48 samples/sec Loss 3.8684 LearningRate 0.0597 Epoch: 4 Global Step: 75950 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:19:53,421-Speed 5083.74 samples/sec Loss 3.9380 LearningRate 0.0597 Epoch: 4 Global Step: 75960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:19:55,418-Speed 5127.15 samples/sec Loss 3.9315 LearningRate 0.0597 Epoch: 4 Global Step: 75970 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:19:57,418-Speed 5122.69 samples/sec Loss 3.9021 LearningRate 0.0597 Epoch: 4 Global Step: 75980 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:19:59,421-Speed 5114.74 samples/sec Loss 4.0050 LearningRate 0.0597 Epoch: 4 Global Step: 75990 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:20:01,413-Speed 5143.46 samples/sec Loss 3.8784 LearningRate 0.0596 Epoch: 4 Global Step: 76000 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:20:28,035-[lfw][76000]XNorm: 23.440488 Training: 2022-04-11 04:20:28,036-[lfw][76000]Accuracy-Flip: 0.99700+-0.00407 Training: 2022-04-11 04:20:28,036-[lfw][76000]Accuracy-Highest: 0.99800 Training: 2022-04-11 04:20:59,033-[cfp_fp][76000]XNorm: 21.196331 Training: 2022-04-11 04:20:59,034-[cfp_fp][76000]Accuracy-Flip: 0.98086+-0.00415 Training: 2022-04-11 04:20:59,034-[cfp_fp][76000]Accuracy-Highest: 0.98086 Training: 2022-04-11 04:21:25,627-[agedb_30][76000]XNorm: 23.167129 Training: 2022-04-11 04:21:25,628-[agedb_30][76000]Accuracy-Flip: 0.97733+-0.00593 Training: 2022-04-11 04:21:25,628-[agedb_30][76000]Accuracy-Highest: 0.97733 Training: 2022-04-11 04:21:27,610-Speed 118.80 samples/sec Loss 3.9513 LearningRate 0.0596 Epoch: 4 Global Step: 76010 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:21:29,593-Speed 5163.37 samples/sec Loss 3.9659 LearningRate 0.0596 Epoch: 4 Global Step: 76020 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:21:31,578-Speed 5159.84 samples/sec Loss 3.9754 LearningRate 0.0596 Epoch: 4 Global Step: 76030 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:21:33,586-Speed 5103.26 samples/sec Loss 3.9252 LearningRate 0.0596 Epoch: 4 Global Step: 76040 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:21:35,568-Speed 5166.57 samples/sec Loss 4.0147 LearningRate 0.0596 Epoch: 4 Global Step: 76050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:21:37,568-Speed 5121.26 samples/sec Loss 3.9422 LearningRate 0.0596 Epoch: 4 Global Step: 76060 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:21:39,541-Speed 5193.40 samples/sec Loss 3.9624 LearningRate 0.0596 Epoch: 4 Global Step: 76070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:21:41,554-Speed 5087.45 samples/sec Loss 3.9506 LearningRate 0.0596 Epoch: 4 Global Step: 76080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:21:43,521-Speed 5208.58 samples/sec Loss 4.0462 LearningRate 0.0596 Epoch: 4 Global Step: 76090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:21:45,497-Speed 5183.02 samples/sec Loss 3.9927 LearningRate 0.0596 Epoch: 4 Global Step: 76100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:21:47,491-Speed 5137.66 samples/sec Loss 3.9722 LearningRate 0.0596 Epoch: 4 Global Step: 76110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:21:49,462-Speed 5195.29 samples/sec Loss 3.9567 LearningRate 0.0596 Epoch: 4 Global Step: 76120 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:21:51,448-Speed 5159.31 samples/sec Loss 3.9484 LearningRate 0.0596 Epoch: 4 Global Step: 76130 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:21:53,415-Speed 5208.23 samples/sec Loss 3.9260 LearningRate 0.0596 Epoch: 4 Global Step: 76140 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:21:55,383-Speed 5203.75 samples/sec Loss 4.0214 LearningRate 0.0596 Epoch: 4 Global Step: 76150 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:21:57,365-Speed 5169.17 samples/sec Loss 3.8960 LearningRate 0.0596 Epoch: 4 Global Step: 76160 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:21:59,361-Speed 5132.58 samples/sec Loss 3.9673 LearningRate 0.0596 Epoch: 4 Global Step: 76170 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:22:01,344-Speed 5165.76 samples/sec Loss 3.9524 LearningRate 0.0596 Epoch: 4 Global Step: 76180 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:22:03,345-Speed 5118.55 samples/sec Loss 4.0301 LearningRate 0.0596 Epoch: 4 Global Step: 76190 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:22:05,321-Speed 5181.97 samples/sec Loss 3.9365 LearningRate 0.0596 Epoch: 4 Global Step: 76200 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:22:07,321-Speed 5123.81 samples/sec Loss 3.9103 LearningRate 0.0596 Epoch: 4 Global Step: 76210 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:22:09,301-Speed 5171.65 samples/sec Loss 3.9645 LearningRate 0.0595 Epoch: 4 Global Step: 76220 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:22:11,326-Speed 5060.03 samples/sec Loss 3.9593 LearningRate 0.0595 Epoch: 4 Global Step: 76230 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:22:13,307-Speed 5172.56 samples/sec Loss 3.9616 LearningRate 0.0595 Epoch: 4 Global Step: 76240 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:22:15,293-Speed 5155.61 samples/sec Loss 3.9389 LearningRate 0.0595 Epoch: 4 Global Step: 76250 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:22:17,271-Speed 5179.76 samples/sec Loss 3.9379 LearningRate 0.0595 Epoch: 4 Global Step: 76260 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:22:19,250-Speed 5174.96 samples/sec Loss 4.0024 LearningRate 0.0595 Epoch: 4 Global Step: 76270 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:22:21,243-Speed 5140.04 samples/sec Loss 3.9969 LearningRate 0.0595 Epoch: 4 Global Step: 76280 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:22:23,236-Speed 5139.92 samples/sec Loss 3.8496 LearningRate 0.0595 Epoch: 4 Global Step: 76290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:22:25,219-Speed 5165.86 samples/sec Loss 3.9144 LearningRate 0.0595 Epoch: 4 Global Step: 76300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:22:27,185-Speed 5208.71 samples/sec Loss 3.8724 LearningRate 0.0595 Epoch: 4 Global Step: 76310 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:22:29,159-Speed 5188.61 samples/sec Loss 3.8973 LearningRate 0.0595 Epoch: 4 Global Step: 76320 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:22:31,135-Speed 5185.56 samples/sec Loss 3.9116 LearningRate 0.0595 Epoch: 4 Global Step: 76330 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:22:33,134-Speed 5123.90 samples/sec Loss 3.9491 LearningRate 0.0595 Epoch: 4 Global Step: 76340 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:22:35,123-Speed 5151.37 samples/sec Loss 3.8947 LearningRate 0.0595 Epoch: 4 Global Step: 76350 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:22:37,112-Speed 5151.34 samples/sec Loss 3.8751 LearningRate 0.0595 Epoch: 4 Global Step: 76360 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:22:39,127-Speed 5084.05 samples/sec Loss 3.8999 LearningRate 0.0595 Epoch: 4 Global Step: 76370 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:22:41,122-Speed 5133.28 samples/sec Loss 3.9066 LearningRate 0.0595 Epoch: 4 Global Step: 76380 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:22:43,098-Speed 5185.62 samples/sec Loss 3.8924 LearningRate 0.0595 Epoch: 4 Global Step: 76390 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:22:45,085-Speed 5155.02 samples/sec Loss 3.9763 LearningRate 0.0595 Epoch: 4 Global Step: 76400 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:22:47,072-Speed 5153.18 samples/sec Loss 3.9085 LearningRate 0.0595 Epoch: 4 Global Step: 76410 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:22:49,088-Speed 5081.65 samples/sec Loss 3.8613 LearningRate 0.0595 Epoch: 4 Global Step: 76420 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:22:51,058-Speed 5200.71 samples/sec Loss 3.9518 LearningRate 0.0595 Epoch: 4 Global Step: 76430 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:22:53,031-Speed 5191.73 samples/sec Loss 3.9552 LearningRate 0.0594 Epoch: 4 Global Step: 76440 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:22:55,003-Speed 5194.11 samples/sec Loss 3.9164 LearningRate 0.0594 Epoch: 4 Global Step: 76450 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:22:56,990-Speed 5155.00 samples/sec Loss 3.9992 LearningRate 0.0594 Epoch: 4 Global Step: 76460 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:22:58,980-Speed 5146.24 samples/sec Loss 3.9820 LearningRate 0.0594 Epoch: 4 Global Step: 76470 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:23:00,968-Speed 5153.63 samples/sec Loss 4.0324 LearningRate 0.0594 Epoch: 4 Global Step: 76480 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:23:02,944-Speed 5185.62 samples/sec Loss 3.9929 LearningRate 0.0594 Epoch: 4 Global Step: 76490 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:23:04,917-Speed 5190.68 samples/sec Loss 3.9525 LearningRate 0.0594 Epoch: 4 Global Step: 76500 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:23:06,887-Speed 5199.29 samples/sec Loss 3.9682 LearningRate 0.0594 Epoch: 4 Global Step: 76510 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:23:08,880-Speed 5139.63 samples/sec Loss 3.9563 LearningRate 0.0594 Epoch: 4 Global Step: 76520 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:23:10,847-Speed 5206.94 samples/sec Loss 3.9452 LearningRate 0.0594 Epoch: 4 Global Step: 76530 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:23:12,846-Speed 5125.84 samples/sec Loss 3.9310 LearningRate 0.0594 Epoch: 4 Global Step: 76540 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:23:14,834-Speed 5151.99 samples/sec Loss 3.9568 LearningRate 0.0594 Epoch: 4 Global Step: 76550 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:23:16,838-Speed 5112.32 samples/sec Loss 3.9591 LearningRate 0.0594 Epoch: 4 Global Step: 76560 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:23:18,828-Speed 5148.65 samples/sec Loss 3.9769 LearningRate 0.0594 Epoch: 4 Global Step: 76570 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:23:20,800-Speed 5193.12 samples/sec Loss 3.9262 LearningRate 0.0594 Epoch: 4 Global Step: 76580 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:23:22,787-Speed 5155.91 samples/sec Loss 3.9809 LearningRate 0.0594 Epoch: 4 Global Step: 76590 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:23:24,793-Speed 5107.62 samples/sec Loss 3.9280 LearningRate 0.0594 Epoch: 4 Global Step: 76600 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:23:26,778-Speed 5162.03 samples/sec Loss 3.9756 LearningRate 0.0594 Epoch: 4 Global Step: 76610 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:23:28,739-Speed 5221.44 samples/sec Loss 3.9421 LearningRate 0.0594 Epoch: 4 Global Step: 76620 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:23:30,712-Speed 5191.91 samples/sec Loss 3.8523 LearningRate 0.0594 Epoch: 4 Global Step: 76630 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:23:32,697-Speed 5160.62 samples/sec Loss 3.9698 LearningRate 0.0594 Epoch: 4 Global Step: 76640 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:23:34,694-Speed 5130.67 samples/sec Loss 3.8488 LearningRate 0.0593 Epoch: 4 Global Step: 76650 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:23:36,672-Speed 5179.70 samples/sec Loss 3.8363 LearningRate 0.0593 Epoch: 4 Global Step: 76660 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:23:38,650-Speed 5178.94 samples/sec Loss 3.9259 LearningRate 0.0593 Epoch: 4 Global Step: 76670 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:23:40,630-Speed 5172.28 samples/sec Loss 3.9335 LearningRate 0.0593 Epoch: 4 Global Step: 76680 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:23:42,614-Speed 5161.60 samples/sec Loss 3.8935 LearningRate 0.0593 Epoch: 4 Global Step: 76690 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:23:44,585-Speed 5198.86 samples/sec Loss 3.8633 LearningRate 0.0593 Epoch: 4 Global Step: 76700 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:23:46,559-Speed 5189.23 samples/sec Loss 3.9763 LearningRate 0.0593 Epoch: 4 Global Step: 76710 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:23:48,543-Speed 5161.33 samples/sec Loss 3.9231 LearningRate 0.0593 Epoch: 4 Global Step: 76720 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:23:50,520-Speed 5181.34 samples/sec Loss 3.9179 LearningRate 0.0593 Epoch: 4 Global Step: 76730 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:23:52,517-Speed 5129.97 samples/sec Loss 3.9662 LearningRate 0.0593 Epoch: 4 Global Step: 76740 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:23:54,497-Speed 5173.67 samples/sec Loss 3.9542 LearningRate 0.0593 Epoch: 4 Global Step: 76750 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:23:56,474-Speed 5181.39 samples/sec Loss 3.9026 LearningRate 0.0593 Epoch: 4 Global Step: 76760 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:23:58,460-Speed 5158.93 samples/sec Loss 3.9553 LearningRate 0.0593 Epoch: 4 Global Step: 76770 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:24:00,432-Speed 5194.56 samples/sec Loss 3.9084 LearningRate 0.0593 Epoch: 4 Global Step: 76780 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:24:02,410-Speed 5177.30 samples/sec Loss 4.0583 LearningRate 0.0593 Epoch: 4 Global Step: 76790 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:24:04,381-Speed 5197.51 samples/sec Loss 3.9586 LearningRate 0.0593 Epoch: 4 Global Step: 76800 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:24:06,358-Speed 5181.51 samples/sec Loss 4.0158 LearningRate 0.0593 Epoch: 4 Global Step: 76810 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:24:08,326-Speed 5204.33 samples/sec Loss 3.9932 LearningRate 0.0593 Epoch: 4 Global Step: 76820 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:24:10,324-Speed 5125.95 samples/sec Loss 3.8227 LearningRate 0.0593 Epoch: 4 Global Step: 76830 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:24:12,309-Speed 5159.74 samples/sec Loss 4.0158 LearningRate 0.0593 Epoch: 4 Global Step: 76840 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:24:14,289-Speed 5174.98 samples/sec Loss 3.9015 LearningRate 0.0593 Epoch: 4 Global Step: 76850 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:24:16,281-Speed 5141.49 samples/sec Loss 3.9911 LearningRate 0.0593 Epoch: 4 Global Step: 76860 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:24:18,250-Speed 5204.70 samples/sec Loss 3.8835 LearningRate 0.0592 Epoch: 4 Global Step: 76870 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:24:20,218-Speed 5204.77 samples/sec Loss 4.0080 LearningRate 0.0592 Epoch: 4 Global Step: 76880 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:24:22,217-Speed 5123.32 samples/sec Loss 4.0319 LearningRate 0.0592 Epoch: 4 Global Step: 76890 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:24:24,225-Speed 5100.69 samples/sec Loss 4.0182 LearningRate 0.0592 Epoch: 4 Global Step: 76900 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:24:26,216-Speed 5144.93 samples/sec Loss 3.9182 LearningRate 0.0592 Epoch: 4 Global Step: 76910 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:24:28,219-Speed 5116.00 samples/sec Loss 3.9555 LearningRate 0.0592 Epoch: 4 Global Step: 76920 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:24:30,188-Speed 5201.01 samples/sec Loss 3.9633 LearningRate 0.0592 Epoch: 4 Global Step: 76930 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:24:32,166-Speed 5177.38 samples/sec Loss 4.0079 LearningRate 0.0592 Epoch: 4 Global Step: 76940 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:24:34,155-Speed 5150.37 samples/sec Loss 3.9192 LearningRate 0.0592 Epoch: 4 Global Step: 76950 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:24:36,155-Speed 5122.00 samples/sec Loss 3.9719 LearningRate 0.0592 Epoch: 4 Global Step: 76960 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:24:38,153-Speed 5126.71 samples/sec Loss 3.9766 LearningRate 0.0592 Epoch: 4 Global Step: 76970 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:24:40,161-Speed 5102.96 samples/sec Loss 3.9830 LearningRate 0.0592 Epoch: 4 Global Step: 76980 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:24:42,134-Speed 5190.38 samples/sec Loss 4.0193 LearningRate 0.0592 Epoch: 4 Global Step: 76990 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:24:44,099-Speed 5212.90 samples/sec Loss 3.9640 LearningRate 0.0592 Epoch: 4 Global Step: 77000 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:24:46,082-Speed 5166.61 samples/sec Loss 3.9071 LearningRate 0.0592 Epoch: 4 Global Step: 77010 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:24:48,074-Speed 5142.68 samples/sec Loss 3.9655 LearningRate 0.0592 Epoch: 4 Global Step: 77020 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:24:50,051-Speed 5181.17 samples/sec Loss 3.9833 LearningRate 0.0592 Epoch: 4 Global Step: 77030 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:24:52,039-Speed 5151.55 samples/sec Loss 3.9587 LearningRate 0.0592 Epoch: 4 Global Step: 77040 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:24:54,026-Speed 5156.35 samples/sec Loss 3.9789 LearningRate 0.0592 Epoch: 4 Global Step: 77050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:24:56,015-Speed 5149.97 samples/sec Loss 3.9186 LearningRate 0.0592 Epoch: 4 Global Step: 77060 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:24:57,996-Speed 5170.92 samples/sec Loss 3.9022 LearningRate 0.0592 Epoch: 4 Global Step: 77070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:24:59,985-Speed 5150.78 samples/sec Loss 3.9372 LearningRate 0.0592 Epoch: 4 Global Step: 77080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:25:01,960-Speed 5186.46 samples/sec Loss 4.0168 LearningRate 0.0591 Epoch: 4 Global Step: 77090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:25:03,936-Speed 5184.04 samples/sec Loss 4.0178 LearningRate 0.0591 Epoch: 4 Global Step: 77100 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:25:05,915-Speed 5176.31 samples/sec Loss 3.8914 LearningRate 0.0591 Epoch: 4 Global Step: 77110 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:25:07,912-Speed 5129.11 samples/sec Loss 3.9302 LearningRate 0.0591 Epoch: 4 Global Step: 77120 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:25:09,894-Speed 5168.59 samples/sec Loss 3.9042 LearningRate 0.0591 Epoch: 4 Global Step: 77130 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:25:11,872-Speed 5177.57 samples/sec Loss 3.9671 LearningRate 0.0591 Epoch: 4 Global Step: 77140 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:25:13,860-Speed 5151.14 samples/sec Loss 3.9893 LearningRate 0.0591 Epoch: 4 Global Step: 77150 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:25:15,842-Speed 5170.22 samples/sec Loss 3.8581 LearningRate 0.0591 Epoch: 4 Global Step: 77160 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:25:17,833-Speed 5145.11 samples/sec Loss 3.9923 LearningRate 0.0591 Epoch: 4 Global Step: 77170 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:25:19,817-Speed 5162.50 samples/sec Loss 3.9995 LearningRate 0.0591 Epoch: 4 Global Step: 77180 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:25:21,811-Speed 5136.75 samples/sec Loss 3.9826 LearningRate 0.0591 Epoch: 4 Global Step: 77190 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:25:23,776-Speed 5212.95 samples/sec Loss 3.9466 LearningRate 0.0591 Epoch: 4 Global Step: 77200 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:25:25,757-Speed 5171.86 samples/sec Loss 3.9461 LearningRate 0.0591 Epoch: 4 Global Step: 77210 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:25:27,720-Speed 5218.76 samples/sec Loss 3.9682 LearningRate 0.0591 Epoch: 4 Global Step: 77220 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:25:29,692-Speed 5194.12 samples/sec Loss 4.0234 LearningRate 0.0591 Epoch: 4 Global Step: 77230 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:25:31,662-Speed 5198.19 samples/sec Loss 3.9716 LearningRate 0.0591 Epoch: 4 Global Step: 77240 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:25:33,644-Speed 5167.70 samples/sec Loss 3.9290 LearningRate 0.0591 Epoch: 4 Global Step: 77250 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:25:35,630-Speed 5159.67 samples/sec Loss 4.0017 LearningRate 0.0591 Epoch: 4 Global Step: 77260 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:25:37,611-Speed 5169.24 samples/sec Loss 3.8586 LearningRate 0.0591 Epoch: 4 Global Step: 77270 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:25:39,592-Speed 5172.69 samples/sec Loss 3.9479 LearningRate 0.0591 Epoch: 4 Global Step: 77280 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:25:41,587-Speed 5134.98 samples/sec Loss 3.8957 LearningRate 0.0591 Epoch: 4 Global Step: 77290 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:25:43,557-Speed 5197.75 samples/sec Loss 3.9826 LearningRate 0.0590 Epoch: 4 Global Step: 77300 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:25:45,544-Speed 5155.65 samples/sec Loss 3.9386 LearningRate 0.0590 Epoch: 4 Global Step: 77310 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:25:47,521-Speed 5183.18 samples/sec Loss 4.0175 LearningRate 0.0590 Epoch: 4 Global Step: 77320 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:25:49,530-Speed 5097.70 samples/sec Loss 3.9980 LearningRate 0.0590 Epoch: 4 Global Step: 77330 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:25:51,506-Speed 5183.28 samples/sec Loss 3.9338 LearningRate 0.0590 Epoch: 4 Global Step: 77340 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:25:53,522-Speed 5080.17 samples/sec Loss 3.9874 LearningRate 0.0590 Epoch: 4 Global Step: 77350 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:25:55,511-Speed 5151.68 samples/sec Loss 3.9331 LearningRate 0.0590 Epoch: 4 Global Step: 77360 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:25:57,499-Speed 5151.54 samples/sec Loss 3.9268 LearningRate 0.0590 Epoch: 4 Global Step: 77370 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:25:59,473-Speed 5189.48 samples/sec Loss 3.8192 LearningRate 0.0590 Epoch: 4 Global Step: 77380 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:26:01,437-Speed 5216.59 samples/sec Loss 3.8999 LearningRate 0.0590 Epoch: 4 Global Step: 77390 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:26:03,418-Speed 5171.79 samples/sec Loss 3.9589 LearningRate 0.0590 Epoch: 4 Global Step: 77400 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:26:05,397-Speed 5173.67 samples/sec Loss 3.8982 LearningRate 0.0590 Epoch: 4 Global Step: 77410 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:26:07,370-Speed 5191.46 samples/sec Loss 3.8652 LearningRate 0.0590 Epoch: 4 Global Step: 77420 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:26:09,365-Speed 5136.02 samples/sec Loss 3.9711 LearningRate 0.0590 Epoch: 4 Global Step: 77430 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:26:11,338-Speed 5190.84 samples/sec Loss 3.9048 LearningRate 0.0590 Epoch: 4 Global Step: 77440 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:26:13,315-Speed 5180.85 samples/sec Loss 4.0044 LearningRate 0.0590 Epoch: 4 Global Step: 77450 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:26:15,291-Speed 5185.07 samples/sec Loss 4.0128 LearningRate 0.0590 Epoch: 4 Global Step: 77460 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:26:17,268-Speed 5181.37 samples/sec Loss 4.0184 LearningRate 0.0590 Epoch: 4 Global Step: 77470 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:26:19,246-Speed 5180.03 samples/sec Loss 3.9260 LearningRate 0.0590 Epoch: 4 Global Step: 77480 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:26:21,224-Speed 5178.70 samples/sec Loss 3.9025 LearningRate 0.0590 Epoch: 4 Global Step: 77490 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:26:23,212-Speed 5152.67 samples/sec Loss 3.8976 LearningRate 0.0590 Epoch: 4 Global Step: 77500 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:26:25,208-Speed 5132.36 samples/sec Loss 3.9432 LearningRate 0.0590 Epoch: 4 Global Step: 77510 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:26:27,190-Speed 5168.77 samples/sec Loss 3.9834 LearningRate 0.0589 Epoch: 4 Global Step: 77520 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:26:29,163-Speed 5190.50 samples/sec Loss 3.9534 LearningRate 0.0589 Epoch: 4 Global Step: 77530 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:26:31,150-Speed 5153.66 samples/sec Loss 4.0264 LearningRate 0.0589 Epoch: 4 Global Step: 77540 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:26:33,125-Speed 5188.63 samples/sec Loss 4.0824 LearningRate 0.0589 Epoch: 4 Global Step: 77550 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:26:35,118-Speed 5138.49 samples/sec Loss 3.9529 LearningRate 0.0589 Epoch: 4 Global Step: 77560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:26:37,127-Speed 5099.93 samples/sec Loss 3.9067 LearningRate 0.0589 Epoch: 4 Global Step: 77570 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:26:39,103-Speed 5183.49 samples/sec Loss 3.8971 LearningRate 0.0589 Epoch: 4 Global Step: 77580 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:26:41,104-Speed 5119.64 samples/sec Loss 3.9421 LearningRate 0.0589 Epoch: 4 Global Step: 77590 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:26:43,085-Speed 5169.50 samples/sec Loss 3.9462 LearningRate 0.0589 Epoch: 4 Global Step: 77600 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:26:45,063-Speed 5178.68 samples/sec Loss 3.9248 LearningRate 0.0589 Epoch: 4 Global Step: 77610 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:26:47,040-Speed 5181.69 samples/sec Loss 3.9914 LearningRate 0.0589 Epoch: 4 Global Step: 77620 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:26:49,017-Speed 5182.83 samples/sec Loss 3.9324 LearningRate 0.0589 Epoch: 4 Global Step: 77630 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:26:50,992-Speed 5185.71 samples/sec Loss 3.9193 LearningRate 0.0589 Epoch: 4 Global Step: 77640 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:26:52,987-Speed 5134.67 samples/sec Loss 3.9369 LearningRate 0.0589 Epoch: 4 Global Step: 77650 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:26:54,968-Speed 5171.45 samples/sec Loss 3.9366 LearningRate 0.0589 Epoch: 4 Global Step: 77660 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:26:56,941-Speed 5191.26 samples/sec Loss 3.9132 LearningRate 0.0589 Epoch: 4 Global Step: 77670 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:26:58,934-Speed 5139.16 samples/sec Loss 3.9443 LearningRate 0.0589 Epoch: 4 Global Step: 77680 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:27:00,906-Speed 5194.85 samples/sec Loss 3.8564 LearningRate 0.0589 Epoch: 4 Global Step: 77690 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:27:02,885-Speed 5175.54 samples/sec Loss 3.8963 LearningRate 0.0589 Epoch: 4 Global Step: 77700 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:27:04,861-Speed 5183.76 samples/sec Loss 3.8487 LearningRate 0.0589 Epoch: 4 Global Step: 77710 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:27:06,849-Speed 5153.52 samples/sec Loss 3.8700 LearningRate 0.0589 Epoch: 4 Global Step: 77720 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:27:08,814-Speed 5212.10 samples/sec Loss 3.9177 LearningRate 0.0589 Epoch: 4 Global Step: 77730 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:27:10,806-Speed 5142.50 samples/sec Loss 3.8799 LearningRate 0.0588 Epoch: 4 Global Step: 77740 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:27:12,860-Speed 4987.76 samples/sec Loss 3.9438 LearningRate 0.0588 Epoch: 4 Global Step: 77750 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:27:14,853-Speed 5139.12 samples/sec Loss 3.9687 LearningRate 0.0588 Epoch: 4 Global Step: 77760 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:27:16,854-Speed 5119.23 samples/sec Loss 3.9546 LearningRate 0.0588 Epoch: 4 Global Step: 77770 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:27:18,857-Speed 5115.76 samples/sec Loss 3.9343 LearningRate 0.0588 Epoch: 4 Global Step: 77780 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:27:20,845-Speed 5151.92 samples/sec Loss 3.9254 LearningRate 0.0588 Epoch: 4 Global Step: 77790 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:27:22,835-Speed 5147.90 samples/sec Loss 3.8872 LearningRate 0.0588 Epoch: 4 Global Step: 77800 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:27:24,827-Speed 5141.06 samples/sec Loss 3.9311 LearningRate 0.0588 Epoch: 4 Global Step: 77810 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:27:26,846-Speed 5074.87 samples/sec Loss 3.9224 LearningRate 0.0588 Epoch: 4 Global Step: 77820 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:27:28,827-Speed 5169.45 samples/sec Loss 3.8511 LearningRate 0.0588 Epoch: 4 Global Step: 77830 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:27:30,808-Speed 5171.29 samples/sec Loss 3.9674 LearningRate 0.0588 Epoch: 4 Global Step: 77840 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:27:32,783-Speed 5187.17 samples/sec Loss 3.9383 LearningRate 0.0588 Epoch: 4 Global Step: 77850 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:27:34,765-Speed 5166.71 samples/sec Loss 3.9333 LearningRate 0.0588 Epoch: 4 Global Step: 77860 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:27:36,752-Speed 5156.55 samples/sec Loss 3.8664 LearningRate 0.0588 Epoch: 4 Global Step: 77870 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:27:38,742-Speed 5145.93 samples/sec Loss 3.8679 LearningRate 0.0588 Epoch: 4 Global Step: 77880 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:27:40,731-Speed 5151.96 samples/sec Loss 3.8871 LearningRate 0.0588 Epoch: 4 Global Step: 77890 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:27:42,718-Speed 5154.22 samples/sec Loss 3.8353 LearningRate 0.0588 Epoch: 4 Global Step: 77900 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:27:44,694-Speed 5184.75 samples/sec Loss 3.8819 LearningRate 0.0588 Epoch: 4 Global Step: 77910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:27:46,674-Speed 5172.69 samples/sec Loss 3.9555 LearningRate 0.0588 Epoch: 4 Global Step: 77920 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:27:48,656-Speed 5168.75 samples/sec Loss 3.9136 LearningRate 0.0588 Epoch: 4 Global Step: 77930 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:27:50,654-Speed 5127.57 samples/sec Loss 3.8983 LearningRate 0.0588 Epoch: 4 Global Step: 77940 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:27:52,666-Speed 5090.38 samples/sec Loss 3.9590 LearningRate 0.0588 Epoch: 4 Global Step: 77950 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:27:54,645-Speed 5175.23 samples/sec Loss 3.9157 LearningRate 0.0587 Epoch: 4 Global Step: 77960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:27:56,635-Speed 5146.49 samples/sec Loss 3.9548 LearningRate 0.0587 Epoch: 4 Global Step: 77970 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:27:58,612-Speed 5182.11 samples/sec Loss 3.9098 LearningRate 0.0587 Epoch: 4 Global Step: 77980 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:28:00,597-Speed 5160.73 samples/sec Loss 3.9160 LearningRate 0.0587 Epoch: 4 Global Step: 77990 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:28:02,601-Speed 5111.72 samples/sec Loss 3.9509 LearningRate 0.0587 Epoch: 4 Global Step: 78000 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:28:29,230-[lfw][78000]XNorm: 21.721800 Training: 2022-04-11 04:28:29,231-[lfw][78000]Accuracy-Flip: 0.99783+-0.00259 Training: 2022-04-11 04:28:29,231-[lfw][78000]Accuracy-Highest: 0.99800 Training: 2022-04-11 04:29:00,007-[cfp_fp][78000]XNorm: 20.030428 Training: 2022-04-11 04:29:00,007-[cfp_fp][78000]Accuracy-Flip: 0.98000+-0.00571 Training: 2022-04-11 04:29:00,008-[cfp_fp][78000]Accuracy-Highest: 0.98086 Training: 2022-04-11 04:29:26,568-[agedb_30][78000]XNorm: 21.696560 Training: 2022-04-11 04:29:26,569-[agedb_30][78000]Accuracy-Flip: 0.97900+-0.00757 Training: 2022-04-11 04:29:26,569-[agedb_30][78000]Accuracy-Highest: 0.97900 Training: 2022-04-11 04:29:28,559-Speed 119.13 samples/sec Loss 3.9094 LearningRate 0.0587 Epoch: 4 Global Step: 78010 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:29:30,536-Speed 5180.57 samples/sec Loss 4.0212 LearningRate 0.0587 Epoch: 4 Global Step: 78020 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:29:32,527-Speed 5146.20 samples/sec Loss 3.8882 LearningRate 0.0587 Epoch: 4 Global Step: 78030 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:29:34,492-Speed 5212.16 samples/sec Loss 3.9216 LearningRate 0.0587 Epoch: 4 Global Step: 78040 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:29:36,460-Speed 5204.93 samples/sec Loss 3.9733 LearningRate 0.0587 Epoch: 4 Global Step: 78050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:29:38,436-Speed 5183.83 samples/sec Loss 3.9483 LearningRate 0.0587 Epoch: 4 Global Step: 78060 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:29:40,395-Speed 5229.42 samples/sec Loss 3.8978 LearningRate 0.0587 Epoch: 4 Global Step: 78070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:29:42,358-Speed 5217.78 samples/sec Loss 3.9164 LearningRate 0.0587 Epoch: 4 Global Step: 78080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:29:44,321-Speed 5217.32 samples/sec Loss 3.8354 LearningRate 0.0587 Epoch: 4 Global Step: 78090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:29:46,282-Speed 5224.44 samples/sec Loss 3.8516 LearningRate 0.0587 Epoch: 4 Global Step: 78100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:29:48,254-Speed 5193.28 samples/sec Loss 4.1236 LearningRate 0.0587 Epoch: 4 Global Step: 78110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:29:50,252-Speed 5129.81 samples/sec Loss 3.9517 LearningRate 0.0587 Epoch: 4 Global Step: 78120 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:29:52,227-Speed 5186.83 samples/sec Loss 4.0596 LearningRate 0.0587 Epoch: 4 Global Step: 78130 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:29:54,191-Speed 5214.66 samples/sec Loss 3.9392 LearningRate 0.0587 Epoch: 4 Global Step: 78140 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:29:56,159-Speed 5205.34 samples/sec Loss 3.9636 LearningRate 0.0587 Epoch: 4 Global Step: 78150 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:29:58,137-Speed 5177.91 samples/sec Loss 3.9480 LearningRate 0.0587 Epoch: 4 Global Step: 78160 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:30:00,119-Speed 5169.29 samples/sec Loss 3.9110 LearningRate 0.0586 Epoch: 4 Global Step: 78170 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:30:02,120-Speed 5121.10 samples/sec Loss 3.9380 LearningRate 0.0586 Epoch: 4 Global Step: 78180 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:30:04,095-Speed 5187.13 samples/sec Loss 3.8957 LearningRate 0.0586 Epoch: 4 Global Step: 78190 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:30:06,066-Speed 5195.74 samples/sec Loss 3.9868 LearningRate 0.0586 Epoch: 4 Global Step: 78200 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:30:08,059-Speed 5138.89 samples/sec Loss 3.8873 LearningRate 0.0586 Epoch: 4 Global Step: 78210 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:30:10,036-Speed 5181.99 samples/sec Loss 3.9326 LearningRate 0.0586 Epoch: 4 Global Step: 78220 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:30:12,013-Speed 5181.86 samples/sec Loss 3.8955 LearningRate 0.0586 Epoch: 4 Global Step: 78230 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:30:13,986-Speed 5193.91 samples/sec Loss 3.9000 LearningRate 0.0586 Epoch: 4 Global Step: 78240 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:30:15,955-Speed 5200.77 samples/sec Loss 3.9419 LearningRate 0.0586 Epoch: 4 Global Step: 78250 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-04-11 04:30:17,924-Speed 5202.38 samples/sec Loss 3.9723 LearningRate 0.0586 Epoch: 4 Global Step: 78260 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:30:19,890-Speed 5210.78 samples/sec Loss 3.8879 LearningRate 0.0586 Epoch: 4 Global Step: 78270 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:30:21,907-Speed 5077.38 samples/sec Loss 3.8747 LearningRate 0.0586 Epoch: 4 Global Step: 78280 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:30:23,870-Speed 5220.12 samples/sec Loss 3.9273 LearningRate 0.0586 Epoch: 4 Global Step: 78290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:30:25,862-Speed 5140.95 samples/sec Loss 4.0042 LearningRate 0.0586 Epoch: 4 Global Step: 78300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:30:27,829-Speed 5207.90 samples/sec Loss 3.9657 LearningRate 0.0586 Epoch: 4 Global Step: 78310 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:30:29,793-Speed 5214.33 samples/sec Loss 3.8952 LearningRate 0.0586 Epoch: 4 Global Step: 78320 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:30:31,776-Speed 5167.81 samples/sec Loss 3.8908 LearningRate 0.0586 Epoch: 4 Global Step: 78330 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:30:33,739-Speed 5218.83 samples/sec Loss 3.8376 LearningRate 0.0586 Epoch: 4 Global Step: 78340 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:30:35,720-Speed 5171.25 samples/sec Loss 3.9614 LearningRate 0.0586 Epoch: 4 Global Step: 78350 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:30:37,689-Speed 5201.08 samples/sec Loss 3.9528 LearningRate 0.0586 Epoch: 4 Global Step: 78360 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:30:39,673-Speed 5164.31 samples/sec Loss 3.9017 LearningRate 0.0586 Epoch: 4 Global Step: 78370 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:30:41,692-Speed 5073.44 samples/sec Loss 4.0454 LearningRate 0.0586 Epoch: 4 Global Step: 78380 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:30:43,661-Speed 5201.57 samples/sec Loss 3.9515 LearningRate 0.0585 Epoch: 4 Global Step: 78390 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:30:45,640-Speed 5175.84 samples/sec Loss 3.8592 LearningRate 0.0585 Epoch: 4 Global Step: 78400 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:30:47,608-Speed 5204.81 samples/sec Loss 3.7669 LearningRate 0.0585 Epoch: 4 Global Step: 78410 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:30:49,603-Speed 5132.73 samples/sec Loss 3.8718 LearningRate 0.0585 Epoch: 4 Global Step: 78420 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:30:51,575-Speed 5195.92 samples/sec Loss 3.8699 LearningRate 0.0585 Epoch: 4 Global Step: 78430 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:30:53,551-Speed 5184.41 samples/sec Loss 3.9557 LearningRate 0.0585 Epoch: 4 Global Step: 78440 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:30:55,518-Speed 5208.93 samples/sec Loss 3.9278 LearningRate 0.0585 Epoch: 4 Global Step: 78450 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:30:57,493-Speed 5186.52 samples/sec Loss 3.9954 LearningRate 0.0585 Epoch: 4 Global Step: 78460 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:30:59,492-Speed 5122.42 samples/sec Loss 3.7669 LearningRate 0.0585 Epoch: 4 Global Step: 78470 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:31:01,458-Speed 5210.27 samples/sec Loss 3.8223 LearningRate 0.0585 Epoch: 4 Global Step: 78480 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:31:03,430-Speed 5196.20 samples/sec Loss 3.9407 LearningRate 0.0585 Epoch: 4 Global Step: 78490 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:31:05,403-Speed 5189.61 samples/sec Loss 4.0019 LearningRate 0.0585 Epoch: 4 Global Step: 78500 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:31:07,385-Speed 5170.14 samples/sec Loss 3.9451 LearningRate 0.0585 Epoch: 4 Global Step: 78510 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:31:09,362-Speed 5179.60 samples/sec Loss 3.8978 LearningRate 0.0585 Epoch: 4 Global Step: 78520 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:31:11,351-Speed 5151.06 samples/sec Loss 4.0015 LearningRate 0.0585 Epoch: 4 Global Step: 78530 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:31:13,316-Speed 5212.59 samples/sec Loss 3.8707 LearningRate 0.0585 Epoch: 4 Global Step: 78540 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:31:15,281-Speed 5214.55 samples/sec Loss 3.9025 LearningRate 0.0585 Epoch: 4 Global Step: 78550 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:31:17,248-Speed 5207.26 samples/sec Loss 3.8657 LearningRate 0.0585 Epoch: 4 Global Step: 78560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:31:19,212-Speed 5214.00 samples/sec Loss 3.9181 LearningRate 0.0585 Epoch: 4 Global Step: 78570 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:31:21,181-Speed 5204.09 samples/sec Loss 4.0270 LearningRate 0.0585 Epoch: 4 Global Step: 78580 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:31:23,148-Speed 5206.68 samples/sec Loss 4.0045 LearningRate 0.0585 Epoch: 4 Global Step: 78590 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:31:25,123-Speed 5187.51 samples/sec Loss 3.9091 LearningRate 0.0585 Epoch: 4 Global Step: 78600 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:31:27,101-Speed 5177.27 samples/sec Loss 3.8757 LearningRate 0.0584 Epoch: 4 Global Step: 78610 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:31:29,088-Speed 5156.00 samples/sec Loss 3.8818 LearningRate 0.0584 Epoch: 4 Global Step: 78620 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:31:31,073-Speed 5158.42 samples/sec Loss 3.9641 LearningRate 0.0584 Epoch: 4 Global Step: 78630 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:31:33,061-Speed 5154.68 samples/sec Loss 3.9070 LearningRate 0.0584 Epoch: 4 Global Step: 78640 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:31:35,052-Speed 5144.63 samples/sec Loss 4.0469 LearningRate 0.0584 Epoch: 4 Global Step: 78650 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:31:37,022-Speed 5200.35 samples/sec Loss 3.9677 LearningRate 0.0584 Epoch: 4 Global Step: 78660 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:31:39,028-Speed 5106.61 samples/sec Loss 3.8947 LearningRate 0.0584 Epoch: 4 Global Step: 78670 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:31:41,002-Speed 5189.41 samples/sec Loss 3.8432 LearningRate 0.0584 Epoch: 4 Global Step: 78680 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:31:42,970-Speed 5203.17 samples/sec Loss 3.8918 LearningRate 0.0584 Epoch: 4 Global Step: 78690 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:31:44,947-Speed 5183.22 samples/sec Loss 3.8951 LearningRate 0.0584 Epoch: 4 Global Step: 78700 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:31:46,927-Speed 5170.93 samples/sec Loss 3.8491 LearningRate 0.0584 Epoch: 4 Global Step: 78710 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:31:48,902-Speed 5187.08 samples/sec Loss 3.9376 LearningRate 0.0584 Epoch: 4 Global Step: 78720 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:31:50,903-Speed 5120.43 samples/sec Loss 3.9613 LearningRate 0.0584 Epoch: 4 Global Step: 78730 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:31:52,880-Speed 5178.83 samples/sec Loss 3.9206 LearningRate 0.0584 Epoch: 4 Global Step: 78740 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:31:54,847-Speed 5209.95 samples/sec Loss 3.8699 LearningRate 0.0584 Epoch: 4 Global Step: 78750 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:31:56,817-Speed 5200.98 samples/sec Loss 3.8897 LearningRate 0.0584 Epoch: 4 Global Step: 78760 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:31:58,803-Speed 5157.26 samples/sec Loss 3.9437 LearningRate 0.0584 Epoch: 4 Global Step: 78770 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:32:00,775-Speed 5194.48 samples/sec Loss 3.9648 LearningRate 0.0584 Epoch: 4 Global Step: 78780 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:32:02,740-Speed 5211.39 samples/sec Loss 3.8882 LearningRate 0.0584 Epoch: 4 Global Step: 78790 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:32:04,731-Speed 5146.24 samples/sec Loss 3.8772 LearningRate 0.0584 Epoch: 4 Global Step: 78800 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:32:06,695-Speed 5215.01 samples/sec Loss 3.9866 LearningRate 0.0584 Epoch: 4 Global Step: 78810 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:32:08,670-Speed 5187.70 samples/sec Loss 3.9698 LearningRate 0.0584 Epoch: 4 Global Step: 78820 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:32:10,650-Speed 5173.65 samples/sec Loss 3.8667 LearningRate 0.0583 Epoch: 4 Global Step: 78830 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:32:12,615-Speed 5213.03 samples/sec Loss 3.7963 LearningRate 0.0583 Epoch: 4 Global Step: 78840 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:32:14,581-Speed 5208.34 samples/sec Loss 3.9519 LearningRate 0.0583 Epoch: 4 Global Step: 78850 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:32:16,554-Speed 5192.59 samples/sec Loss 3.8325 LearningRate 0.0583 Epoch: 4 Global Step: 78860 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:32:18,533-Speed 5175.84 samples/sec Loss 3.9519 LearningRate 0.0583 Epoch: 4 Global Step: 78870 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:32:20,502-Speed 5203.41 samples/sec Loss 4.0132 LearningRate 0.0583 Epoch: 4 Global Step: 78880 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:32:22,481-Speed 5174.22 samples/sec Loss 3.8952 LearningRate 0.0583 Epoch: 4 Global Step: 78890 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:32:24,449-Speed 5205.28 samples/sec Loss 3.8854 LearningRate 0.0583 Epoch: 4 Global Step: 78900 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:32:26,437-Speed 5154.01 samples/sec Loss 4.0105 LearningRate 0.0583 Epoch: 4 Global Step: 78910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:32:28,415-Speed 5179.40 samples/sec Loss 3.9407 LearningRate 0.0583 Epoch: 4 Global Step: 78920 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:32:30,382-Speed 5207.17 samples/sec Loss 3.9716 LearningRate 0.0583 Epoch: 4 Global Step: 78930 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:32:32,355-Speed 5190.45 samples/sec Loss 3.8471 LearningRate 0.0583 Epoch: 4 Global Step: 78940 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:32:34,329-Speed 5189.10 samples/sec Loss 3.8721 LearningRate 0.0583 Epoch: 4 Global Step: 78950 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:32:36,296-Speed 5207.72 samples/sec Loss 3.9374 LearningRate 0.0583 Epoch: 4 Global Step: 78960 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:32:38,261-Speed 5214.68 samples/sec Loss 3.8418 LearningRate 0.0583 Epoch: 4 Global Step: 78970 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:32:40,232-Speed 5196.95 samples/sec Loss 4.0156 LearningRate 0.0583 Epoch: 4 Global Step: 78980 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:32:42,215-Speed 5164.02 samples/sec Loss 3.9223 LearningRate 0.0583 Epoch: 4 Global Step: 78990 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:32:44,203-Speed 5152.37 samples/sec Loss 3.9647 LearningRate 0.0583 Epoch: 4 Global Step: 79000 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:32:46,233-Speed 5047.96 samples/sec Loss 3.8671 LearningRate 0.0583 Epoch: 4 Global Step: 79010 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:32:48,237-Speed 5111.49 samples/sec Loss 3.9167 LearningRate 0.0583 Epoch: 4 Global Step: 79020 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:32:50,230-Speed 5137.75 samples/sec Loss 3.9181 LearningRate 0.0583 Epoch: 4 Global Step: 79030 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:32:52,200-Speed 5201.12 samples/sec Loss 3.9453 LearningRate 0.0583 Epoch: 4 Global Step: 79040 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:32:54,176-Speed 5184.67 samples/sec Loss 3.8890 LearningRate 0.0582 Epoch: 4 Global Step: 79050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:32:56,142-Speed 5209.52 samples/sec Loss 3.9409 LearningRate 0.0582 Epoch: 4 Global Step: 79060 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:32:58,120-Speed 5179.26 samples/sec Loss 3.9159 LearningRate 0.0582 Epoch: 4 Global Step: 79070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:33:00,092-Speed 5191.81 samples/sec Loss 3.9550 LearningRate 0.0582 Epoch: 4 Global Step: 79080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:33:02,072-Speed 5173.81 samples/sec Loss 3.8948 LearningRate 0.0582 Epoch: 4 Global Step: 79090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:33:04,076-Speed 5111.79 samples/sec Loss 3.8962 LearningRate 0.0582 Epoch: 4 Global Step: 79100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:33:06,065-Speed 5150.67 samples/sec Loss 3.8861 LearningRate 0.0582 Epoch: 4 Global Step: 79110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:33:08,047-Speed 5169.38 samples/sec Loss 3.9439 LearningRate 0.0582 Epoch: 4 Global Step: 79120 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:33:10,032-Speed 5159.38 samples/sec Loss 3.9320 LearningRate 0.0582 Epoch: 4 Global Step: 79130 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:33:12,006-Speed 5189.96 samples/sec Loss 3.8924 LearningRate 0.0582 Epoch: 4 Global Step: 79140 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:33:13,978-Speed 5194.54 samples/sec Loss 3.8975 LearningRate 0.0582 Epoch: 4 Global Step: 79150 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:33:15,946-Speed 5203.20 samples/sec Loss 3.9164 LearningRate 0.0582 Epoch: 4 Global Step: 79160 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:33:17,922-Speed 5184.77 samples/sec Loss 3.9722 LearningRate 0.0582 Epoch: 4 Global Step: 79170 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:33:19,906-Speed 5163.12 samples/sec Loss 3.8735 LearningRate 0.0582 Epoch: 4 Global Step: 79180 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:33:21,892-Speed 5158.40 samples/sec Loss 3.8913 LearningRate 0.0582 Epoch: 4 Global Step: 79190 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:33:23,865-Speed 5191.06 samples/sec Loss 3.8585 LearningRate 0.0582 Epoch: 4 Global Step: 79200 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:33:25,831-Speed 5209.90 samples/sec Loss 3.8386 LearningRate 0.0582 Epoch: 4 Global Step: 79210 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:33:27,800-Speed 5203.53 samples/sec Loss 3.9765 LearningRate 0.0582 Epoch: 4 Global Step: 79220 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:33:29,771-Speed 5196.12 samples/sec Loss 3.8700 LearningRate 0.0582 Epoch: 4 Global Step: 79230 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:33:31,734-Speed 5218.85 samples/sec Loss 3.8384 LearningRate 0.0582 Epoch: 4 Global Step: 79240 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:33:33,717-Speed 5168.52 samples/sec Loss 3.9368 LearningRate 0.0582 Epoch: 4 Global Step: 79250 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:33:35,684-Speed 5207.51 samples/sec Loss 3.9117 LearningRate 0.0582 Epoch: 4 Global Step: 79260 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:33:37,673-Speed 5149.30 samples/sec Loss 3.8339 LearningRate 0.0581 Epoch: 4 Global Step: 79270 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:33:39,660-Speed 5155.89 samples/sec Loss 3.8802 LearningRate 0.0581 Epoch: 4 Global Step: 79280 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:33:41,648-Speed 5153.38 samples/sec Loss 3.8667 LearningRate 0.0581 Epoch: 4 Global Step: 79290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:33:43,641-Speed 5138.94 samples/sec Loss 3.9005 LearningRate 0.0581 Epoch: 4 Global Step: 79300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:33:45,631-Speed 5148.85 samples/sec Loss 3.9619 LearningRate 0.0581 Epoch: 4 Global Step: 79310 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:33:47,599-Speed 5204.62 samples/sec Loss 3.8856 LearningRate 0.0581 Epoch: 4 Global Step: 79320 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:33:49,571-Speed 5194.83 samples/sec Loss 3.8748 LearningRate 0.0581 Epoch: 4 Global Step: 79330 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:33:51,542-Speed 5194.92 samples/sec Loss 3.8991 LearningRate 0.0581 Epoch: 4 Global Step: 79340 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-04-11 04:33:53,498-Speed 5236.77 samples/sec Loss 3.9026 LearningRate 0.0581 Epoch: 4 Global Step: 79350 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:33:55,465-Speed 5209.35 samples/sec Loss 3.8466 LearningRate 0.0581 Epoch: 4 Global Step: 79360 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:33:57,447-Speed 5167.22 samples/sec Loss 3.9274 LearningRate 0.0581 Epoch: 4 Global Step: 79370 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:33:59,425-Speed 5178.60 samples/sec Loss 4.0068 LearningRate 0.0581 Epoch: 4 Global Step: 79380 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:34:01,409-Speed 5163.06 samples/sec Loss 3.9196 LearningRate 0.0581 Epoch: 4 Global Step: 79390 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:34:03,395-Speed 5158.43 samples/sec Loss 3.9601 LearningRate 0.0581 Epoch: 4 Global Step: 79400 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:34:05,375-Speed 5172.94 samples/sec Loss 3.9553 LearningRate 0.0581 Epoch: 4 Global Step: 79410 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:34:07,364-Speed 5150.66 samples/sec Loss 3.9340 LearningRate 0.0581 Epoch: 4 Global Step: 79420 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:34:09,339-Speed 5186.50 samples/sec Loss 3.9292 LearningRate 0.0581 Epoch: 4 Global Step: 79430 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:34:11,323-Speed 5163.30 samples/sec Loss 3.8637 LearningRate 0.0581 Epoch: 4 Global Step: 79440 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:34:13,288-Speed 5211.30 samples/sec Loss 3.8600 LearningRate 0.0581 Epoch: 4 Global Step: 79450 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:34:15,268-Speed 5174.00 samples/sec Loss 3.9301 LearningRate 0.0581 Epoch: 4 Global Step: 79460 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:34:17,250-Speed 5169.48 samples/sec Loss 3.8377 LearningRate 0.0581 Epoch: 4 Global Step: 79470 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:34:19,227-Speed 5180.95 samples/sec Loss 3.8768 LearningRate 0.0581 Epoch: 4 Global Step: 79480 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:34:21,218-Speed 5143.38 samples/sec Loss 3.9116 LearningRate 0.0580 Epoch: 4 Global Step: 79490 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:34:23,208-Speed 5148.51 samples/sec Loss 3.9683 LearningRate 0.0580 Epoch: 4 Global Step: 79500 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:34:25,192-Speed 5163.94 samples/sec Loss 3.9364 LearningRate 0.0580 Epoch: 4 Global Step: 79510 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:34:27,189-Speed 5129.39 samples/sec Loss 3.9011 LearningRate 0.0580 Epoch: 4 Global Step: 79520 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:34:29,161-Speed 5193.31 samples/sec Loss 3.9386 LearningRate 0.0580 Epoch: 4 Global Step: 79530 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:34:31,128-Speed 5207.32 samples/sec Loss 3.8630 LearningRate 0.0580 Epoch: 4 Global Step: 79540 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:34:33,109-Speed 5171.46 samples/sec Loss 3.8421 LearningRate 0.0580 Epoch: 4 Global Step: 79550 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:34:35,088-Speed 5176.13 samples/sec Loss 3.9463 LearningRate 0.0580 Epoch: 4 Global Step: 79560 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:34:37,083-Speed 5134.11 samples/sec Loss 3.8998 LearningRate 0.0580 Epoch: 4 Global Step: 79570 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:34:39,072-Speed 5149.94 samples/sec Loss 3.8069 LearningRate 0.0580 Epoch: 4 Global Step: 79580 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:34:41,058-Speed 5156.72 samples/sec Loss 3.8891 LearningRate 0.0580 Epoch: 4 Global Step: 79590 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:34:43,027-Speed 5203.59 samples/sec Loss 3.8585 LearningRate 0.0580 Epoch: 4 Global Step: 79600 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:34:45,002-Speed 5187.47 samples/sec Loss 3.9723 LearningRate 0.0580 Epoch: 4 Global Step: 79610 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:34:46,981-Speed 5176.10 samples/sec Loss 3.9085 LearningRate 0.0580 Epoch: 4 Global Step: 79620 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:34:48,945-Speed 5214.84 samples/sec Loss 3.9726 LearningRate 0.0580 Epoch: 4 Global Step: 79630 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:34:50,941-Speed 5131.86 samples/sec Loss 3.8531 LearningRate 0.0580 Epoch: 4 Global Step: 79640 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:34:52,913-Speed 5193.79 samples/sec Loss 4.0133 LearningRate 0.0580 Epoch: 4 Global Step: 79650 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:34:54,888-Speed 5187.61 samples/sec Loss 3.8117 LearningRate 0.0580 Epoch: 4 Global Step: 79660 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:34:56,860-Speed 5192.75 samples/sec Loss 3.8872 LearningRate 0.0580 Epoch: 4 Global Step: 79670 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:34:58,834-Speed 5190.22 samples/sec Loss 3.9365 LearningRate 0.0580 Epoch: 4 Global Step: 79680 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:35:00,807-Speed 5191.05 samples/sec Loss 3.9319 LearningRate 0.0580 Epoch: 4 Global Step: 79690 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:35:02,789-Speed 5169.22 samples/sec Loss 3.8588 LearningRate 0.0579 Epoch: 4 Global Step: 79700 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:35:04,777-Speed 5152.29 samples/sec Loss 3.9232 LearningRate 0.0579 Epoch: 4 Global Step: 79710 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:35:06,763-Speed 5157.67 samples/sec Loss 3.9637 LearningRate 0.0579 Epoch: 4 Global Step: 79720 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:35:08,740-Speed 5182.62 samples/sec Loss 3.9194 LearningRate 0.0579 Epoch: 4 Global Step: 79730 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:35:10,736-Speed 5130.17 samples/sec Loss 3.9500 LearningRate 0.0579 Epoch: 4 Global Step: 79740 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:35:12,718-Speed 5169.23 samples/sec Loss 3.9656 LearningRate 0.0579 Epoch: 4 Global Step: 79750 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:35:14,695-Speed 5179.74 samples/sec Loss 3.8949 LearningRate 0.0579 Epoch: 4 Global Step: 79760 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:35:16,677-Speed 5169.52 samples/sec Loss 3.9504 LearningRate 0.0579 Epoch: 4 Global Step: 79770 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:35:18,669-Speed 5141.68 samples/sec Loss 3.9538 LearningRate 0.0579 Epoch: 4 Global Step: 79780 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:35:20,642-Speed 5192.25 samples/sec Loss 3.9301 LearningRate 0.0579 Epoch: 4 Global Step: 79790 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:35:22,618-Speed 5182.90 samples/sec Loss 3.8193 LearningRate 0.0579 Epoch: 4 Global Step: 79800 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:35:24,594-Speed 5185.08 samples/sec Loss 3.7733 LearningRate 0.0579 Epoch: 4 Global Step: 79810 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:35:26,568-Speed 5188.36 samples/sec Loss 3.8705 LearningRate 0.0579 Epoch: 4 Global Step: 79820 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:35:28,571-Speed 5113.52 samples/sec Loss 3.9039 LearningRate 0.0579 Epoch: 4 Global Step: 79830 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:35:30,546-Speed 5188.11 samples/sec Loss 3.8915 LearningRate 0.0579 Epoch: 4 Global Step: 79840 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:35:32,524-Speed 5177.52 samples/sec Loss 3.9231 LearningRate 0.0579 Epoch: 4 Global Step: 79850 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:35:34,502-Speed 5179.66 samples/sec Loss 3.9109 LearningRate 0.0579 Epoch: 4 Global Step: 79860 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:35:36,471-Speed 5202.54 samples/sec Loss 3.9152 LearningRate 0.0579 Epoch: 4 Global Step: 79870 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:35:38,461-Speed 5148.20 samples/sec Loss 3.8129 LearningRate 0.0579 Epoch: 4 Global Step: 79880 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:35:40,442-Speed 5169.47 samples/sec Loss 3.9027 LearningRate 0.0579 Epoch: 4 Global Step: 79890 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:35:42,414-Speed 5195.39 samples/sec Loss 3.8709 LearningRate 0.0579 Epoch: 4 Global Step: 79900 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:35:44,394-Speed 5171.22 samples/sec Loss 3.8768 LearningRate 0.0579 Epoch: 4 Global Step: 79910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:35:46,386-Speed 5143.03 samples/sec Loss 3.9614 LearningRate 0.0578 Epoch: 4 Global Step: 79920 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:35:48,389-Speed 5115.44 samples/sec Loss 3.8758 LearningRate 0.0578 Epoch: 4 Global Step: 79930 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:35:50,368-Speed 5174.85 samples/sec Loss 3.8480 LearningRate 0.0578 Epoch: 4 Global Step: 79940 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:35:52,341-Speed 5193.66 samples/sec Loss 3.9227 LearningRate 0.0578 Epoch: 4 Global Step: 79950 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:35:54,319-Speed 5177.65 samples/sec Loss 3.9218 LearningRate 0.0578 Epoch: 4 Global Step: 79960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:35:56,288-Speed 5202.50 samples/sec Loss 3.7917 LearningRate 0.0578 Epoch: 4 Global Step: 79970 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:35:58,272-Speed 5162.92 samples/sec Loss 3.9098 LearningRate 0.0578 Epoch: 4 Global Step: 79980 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:36:00,257-Speed 5161.41 samples/sec Loss 3.9441 LearningRate 0.0578 Epoch: 4 Global Step: 79990 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:36:02,234-Speed 5181.37 samples/sec Loss 3.9626 LearningRate 0.0578 Epoch: 4 Global Step: 80000 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:36:28,877-[lfw][80000]XNorm: 23.566116 Training: 2022-04-11 04:36:28,878-[lfw][80000]Accuracy-Flip: 0.99750+-0.00327 Training: 2022-04-11 04:36:28,878-[lfw][80000]Accuracy-Highest: 0.99800 Training: 2022-04-11 04:36:59,693-[cfp_fp][80000]XNorm: 21.534797 Training: 2022-04-11 04:36:59,694-[cfp_fp][80000]Accuracy-Flip: 0.97743+-0.00585 Training: 2022-04-11 04:36:59,694-[cfp_fp][80000]Accuracy-Highest: 0.98086 Training: 2022-04-11 04:37:26,239-[agedb_30][80000]XNorm: 23.947264 Training: 2022-04-11 04:37:26,240-[agedb_30][80000]Accuracy-Flip: 0.97683+-0.00790 Training: 2022-04-11 04:37:26,240-[agedb_30][80000]Accuracy-Highest: 0.97900 Training: 2022-04-11 04:37:28,223-Speed 119.09 samples/sec Loss 3.8440 LearningRate 0.0578 Epoch: 4 Global Step: 80010 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:37:30,179-Speed 5236.40 samples/sec Loss 3.9735 LearningRate 0.0578 Epoch: 4 Global Step: 80020 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:37:32,143-Speed 5216.90 samples/sec Loss 3.8922 LearningRate 0.0578 Epoch: 4 Global Step: 80030 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:37:34,117-Speed 5187.90 samples/sec Loss 3.8208 LearningRate 0.0578 Epoch: 4 Global Step: 80040 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:37:36,086-Speed 5203.53 samples/sec Loss 3.9147 LearningRate 0.0578 Epoch: 4 Global Step: 80050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:37:38,056-Speed 5198.35 samples/sec Loss 3.7865 LearningRate 0.0578 Epoch: 4 Global Step: 80060 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:37:40,037-Speed 5172.16 samples/sec Loss 3.8599 LearningRate 0.0578 Epoch: 4 Global Step: 80070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:37:42,011-Speed 5187.98 samples/sec Loss 3.8892 LearningRate 0.0578 Epoch: 4 Global Step: 80080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:37:43,967-Speed 5236.84 samples/sec Loss 3.8877 LearningRate 0.0578 Epoch: 4 Global Step: 80090 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:37:45,933-Speed 5211.61 samples/sec Loss 3.9654 LearningRate 0.0578 Epoch: 4 Global Step: 80100 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:37:47,908-Speed 5184.65 samples/sec Loss 3.8851 LearningRate 0.0578 Epoch: 4 Global Step: 80110 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:37:49,875-Speed 5209.75 samples/sec Loss 3.8712 LearningRate 0.0578 Epoch: 4 Global Step: 80120 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:37:51,845-Speed 5198.94 samples/sec Loss 3.8656 LearningRate 0.0578 Epoch: 4 Global Step: 80130 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:37:53,820-Speed 5184.80 samples/sec Loss 3.7546 LearningRate 0.0577 Epoch: 4 Global Step: 80140 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:37:55,792-Speed 5196.18 samples/sec Loss 3.9040 LearningRate 0.0577 Epoch: 4 Global Step: 80150 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:37:57,776-Speed 5163.97 samples/sec Loss 3.8907 LearningRate 0.0577 Epoch: 4 Global Step: 80160 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:37:59,755-Speed 5175.90 samples/sec Loss 3.7915 LearningRate 0.0577 Epoch: 4 Global Step: 80170 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:38:01,741-Speed 5158.06 samples/sec Loss 3.8400 LearningRate 0.0577 Epoch: 4 Global Step: 80180 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:38:03,716-Speed 5185.28 samples/sec Loss 3.9302 LearningRate 0.0577 Epoch: 4 Global Step: 80190 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:38:05,700-Speed 5163.04 samples/sec Loss 3.8766 LearningRate 0.0577 Epoch: 4 Global Step: 80200 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:38:07,664-Speed 5216.49 samples/sec Loss 3.9104 LearningRate 0.0577 Epoch: 4 Global Step: 80210 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:38:09,630-Speed 5210.91 samples/sec Loss 3.8195 LearningRate 0.0577 Epoch: 4 Global Step: 80220 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:38:11,612-Speed 5167.42 samples/sec Loss 3.9415 LearningRate 0.0577 Epoch: 4 Global Step: 80230 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:38:13,577-Speed 5211.54 samples/sec Loss 3.9271 LearningRate 0.0577 Epoch: 4 Global Step: 80240 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:38:15,550-Speed 5192.87 samples/sec Loss 3.8648 LearningRate 0.0577 Epoch: 4 Global Step: 80250 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:38:17,519-Speed 5202.20 samples/sec Loss 3.8865 LearningRate 0.0577 Epoch: 4 Global Step: 80260 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:38:19,481-Speed 5220.66 samples/sec Loss 3.8137 LearningRate 0.0577 Epoch: 4 Global Step: 80270 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:38:21,466-Speed 5160.73 samples/sec Loss 3.9228 LearningRate 0.0577 Epoch: 4 Global Step: 80280 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:38:23,438-Speed 5195.81 samples/sec Loss 3.9426 LearningRate 0.0577 Epoch: 4 Global Step: 80290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:38:25,418-Speed 5173.14 samples/sec Loss 3.8771 LearningRate 0.0577 Epoch: 4 Global Step: 80300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:38:27,398-Speed 5171.95 samples/sec Loss 3.9631 LearningRate 0.0577 Epoch: 4 Global Step: 80310 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:38:29,377-Speed 5176.07 samples/sec Loss 3.8777 LearningRate 0.0577 Epoch: 4 Global Step: 80320 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:38:31,343-Speed 5212.33 samples/sec Loss 3.8125 LearningRate 0.0577 Epoch: 4 Global Step: 80330 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:38:33,312-Speed 5201.42 samples/sec Loss 3.8611 LearningRate 0.0577 Epoch: 4 Global Step: 80340 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:38:35,285-Speed 5192.37 samples/sec Loss 3.8070 LearningRate 0.0577 Epoch: 4 Global Step: 80350 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:38:37,259-Speed 5188.25 samples/sec Loss 3.9121 LearningRate 0.0576 Epoch: 4 Global Step: 80360 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:38:39,235-Speed 5185.92 samples/sec Loss 3.9585 LearningRate 0.0576 Epoch: 4 Global Step: 80370 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:38:41,223-Speed 5150.69 samples/sec Loss 3.8889 LearningRate 0.0576 Epoch: 4 Global Step: 80380 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:38:43,190-Speed 5209.48 samples/sec Loss 3.8023 LearningRate 0.0576 Epoch: 4 Global Step: 80390 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:38:45,155-Speed 5213.16 samples/sec Loss 3.9065 LearningRate 0.0576 Epoch: 4 Global Step: 80400 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:38:47,125-Speed 5199.21 samples/sec Loss 3.9927 LearningRate 0.0576 Epoch: 4 Global Step: 80410 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:38:49,096-Speed 5195.89 samples/sec Loss 3.9107 LearningRate 0.0576 Epoch: 4 Global Step: 80420 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:38:51,088-Speed 5142.62 samples/sec Loss 3.7983 LearningRate 0.0576 Epoch: 4 Global Step: 80430 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:38:53,058-Speed 5199.12 samples/sec Loss 3.8490 LearningRate 0.0576 Epoch: 4 Global Step: 80440 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:38:55,025-Speed 5207.91 samples/sec Loss 3.8889 LearningRate 0.0576 Epoch: 4 Global Step: 80450 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:38:56,999-Speed 5190.28 samples/sec Loss 3.9353 LearningRate 0.0576 Epoch: 4 Global Step: 80460 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:38:58,988-Speed 5148.16 samples/sec Loss 3.8951 LearningRate 0.0576 Epoch: 4 Global Step: 80470 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:39:00,974-Speed 5157.78 samples/sec Loss 3.8565 LearningRate 0.0576 Epoch: 4 Global Step: 80480 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:39:02,966-Speed 5144.32 samples/sec Loss 3.9029 LearningRate 0.0576 Epoch: 4 Global Step: 80490 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:39:04,952-Speed 5158.43 samples/sec Loss 3.8527 LearningRate 0.0576 Epoch: 4 Global Step: 80500 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:39:06,927-Speed 5185.79 samples/sec Loss 3.8213 LearningRate 0.0576 Epoch: 4 Global Step: 80510 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:39:08,899-Speed 5192.95 samples/sec Loss 3.7683 LearningRate 0.0576 Epoch: 4 Global Step: 80520 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:39:10,888-Speed 5150.57 samples/sec Loss 3.8120 LearningRate 0.0576 Epoch: 4 Global Step: 80530 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:39:12,851-Speed 5219.44 samples/sec Loss 3.9014 LearningRate 0.0576 Epoch: 4 Global Step: 80540 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:39:14,821-Speed 5200.19 samples/sec Loss 3.8758 LearningRate 0.0576 Epoch: 4 Global Step: 80550 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:39:16,796-Speed 5184.83 samples/sec Loss 3.8285 LearningRate 0.0576 Epoch: 4 Global Step: 80560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:39:18,776-Speed 5173.58 samples/sec Loss 3.8597 LearningRate 0.0576 Epoch: 4 Global Step: 80570 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:39:20,749-Speed 5191.40 samples/sec Loss 3.9161 LearningRate 0.0575 Epoch: 4 Global Step: 80580 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:39:22,720-Speed 5197.74 samples/sec Loss 3.8241 LearningRate 0.0575 Epoch: 4 Global Step: 80590 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:39:24,692-Speed 5195.88 samples/sec Loss 3.8942 LearningRate 0.0575 Epoch: 4 Global Step: 80600 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:39:26,656-Speed 5214.84 samples/sec Loss 3.9379 LearningRate 0.0575 Epoch: 4 Global Step: 80610 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:39:28,646-Speed 5146.37 samples/sec Loss 3.8063 LearningRate 0.0575 Epoch: 4 Global Step: 80620 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:39:30,610-Speed 5215.06 samples/sec Loss 3.9292 LearningRate 0.0575 Epoch: 4 Global Step: 80630 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:39:32,573-Speed 5220.78 samples/sec Loss 3.7940 LearningRate 0.0575 Epoch: 4 Global Step: 80640 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:39:34,535-Speed 5220.40 samples/sec Loss 3.6952 LearningRate 0.0575 Epoch: 4 Global Step: 80650 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:39:36,531-Speed 5130.72 samples/sec Loss 3.7949 LearningRate 0.0575 Epoch: 4 Global Step: 80660 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:39:38,523-Speed 5141.37 samples/sec Loss 3.8151 LearningRate 0.0575 Epoch: 4 Global Step: 80670 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:39:40,491-Speed 5205.84 samples/sec Loss 3.8474 LearningRate 0.0575 Epoch: 4 Global Step: 80680 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:39:42,465-Speed 5189.34 samples/sec Loss 3.8244 LearningRate 0.0575 Epoch: 4 Global Step: 80690 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:39:44,431-Speed 5212.46 samples/sec Loss 3.8303 LearningRate 0.0575 Epoch: 4 Global Step: 80700 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:39:46,420-Speed 5147.71 samples/sec Loss 3.8931 LearningRate 0.0575 Epoch: 4 Global Step: 80710 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:39:48,381-Speed 5223.87 samples/sec Loss 3.8268 LearningRate 0.0575 Epoch: 4 Global Step: 80720 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:39:50,346-Speed 5213.84 samples/sec Loss 3.8886 LearningRate 0.0575 Epoch: 4 Global Step: 80730 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:39:52,316-Speed 5199.86 samples/sec Loss 4.0060 LearningRate 0.0575 Epoch: 4 Global Step: 80740 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:39:54,293-Speed 5181.30 samples/sec Loss 3.8839 LearningRate 0.0575 Epoch: 4 Global Step: 80750 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:39:56,259-Speed 5210.52 samples/sec Loss 3.8745 LearningRate 0.0575 Epoch: 4 Global Step: 80760 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:39:58,227-Speed 5202.95 samples/sec Loss 3.8349 LearningRate 0.0575 Epoch: 4 Global Step: 80770 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:40:00,196-Speed 5203.70 samples/sec Loss 3.8975 LearningRate 0.0575 Epoch: 4 Global Step: 80780 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:40:02,161-Speed 5213.75 samples/sec Loss 3.7999 LearningRate 0.0575 Epoch: 4 Global Step: 80790 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:40:04,146-Speed 5159.74 samples/sec Loss 3.8719 LearningRate 0.0574 Epoch: 4 Global Step: 80800 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:40:06,128-Speed 5169.40 samples/sec Loss 3.8268 LearningRate 0.0574 Epoch: 4 Global Step: 80810 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:40:08,101-Speed 5191.12 samples/sec Loss 3.8878 LearningRate 0.0574 Epoch: 4 Global Step: 80820 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:40:10,073-Speed 5194.14 samples/sec Loss 3.9267 LearningRate 0.0574 Epoch: 4 Global Step: 80830 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:40:12,048-Speed 5187.26 samples/sec Loss 3.8689 LearningRate 0.0574 Epoch: 4 Global Step: 80840 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:40:14,007-Speed 5226.59 samples/sec Loss 3.8737 LearningRate 0.0574 Epoch: 4 Global Step: 80850 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:40:15,978-Speed 5197.94 samples/sec Loss 3.8631 LearningRate 0.0574 Epoch: 4 Global Step: 80860 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:40:17,957-Speed 5176.18 samples/sec Loss 3.8619 LearningRate 0.0574 Epoch: 4 Global Step: 80870 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:40:19,925-Speed 5204.99 samples/sec Loss 3.9378 LearningRate 0.0574 Epoch: 4 Global Step: 80880 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:40:21,942-Speed 5077.44 samples/sec Loss 3.9450 LearningRate 0.0574 Epoch: 4 Global Step: 80890 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:40:23,909-Speed 5208.06 samples/sec Loss 3.8696 LearningRate 0.0574 Epoch: 4 Global Step: 80900 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:40:25,898-Speed 5151.40 samples/sec Loss 3.8385 LearningRate 0.0574 Epoch: 4 Global Step: 80910 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:40:27,889-Speed 5143.97 samples/sec Loss 3.9155 LearningRate 0.0574 Epoch: 4 Global Step: 80920 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:40:29,854-Speed 5212.12 samples/sec Loss 3.9791 LearningRate 0.0574 Epoch: 4 Global Step: 80930 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:40:31,830-Speed 5186.59 samples/sec Loss 3.8269 LearningRate 0.0574 Epoch: 4 Global Step: 80940 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:40:33,821-Speed 5144.73 samples/sec Loss 3.8235 LearningRate 0.0574 Epoch: 4 Global Step: 80950 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:40:35,807-Speed 5156.21 samples/sec Loss 3.8984 LearningRate 0.0574 Epoch: 4 Global Step: 80960 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:40:37,780-Speed 5191.28 samples/sec Loss 3.8333 LearningRate 0.0574 Epoch: 4 Global Step: 80970 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:40:39,768-Speed 5154.64 samples/sec Loss 3.8288 LearningRate 0.0574 Epoch: 4 Global Step: 80980 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:40:41,743-Speed 5184.56 samples/sec Loss 3.8826 LearningRate 0.0574 Epoch: 4 Global Step: 80990 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:40:43,727-Speed 5165.99 samples/sec Loss 3.8639 LearningRate 0.0574 Epoch: 4 Global Step: 81000 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:40:45,708-Speed 5169.63 samples/sec Loss 3.8720 LearningRate 0.0574 Epoch: 4 Global Step: 81010 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:40:47,673-Speed 5213.99 samples/sec Loss 3.8414 LearningRate 0.0573 Epoch: 4 Global Step: 81020 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:40:49,664-Speed 5146.22 samples/sec Loss 3.9018 LearningRate 0.0573 Epoch: 4 Global Step: 81030 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:40:51,633-Speed 5201.75 samples/sec Loss 3.8039 LearningRate 0.0573 Epoch: 4 Global Step: 81040 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:40:53,601-Speed 5204.76 samples/sec Loss 3.8445 LearningRate 0.0573 Epoch: 4 Global Step: 81050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:40:55,564-Speed 5218.28 samples/sec Loss 3.8928 LearningRate 0.0573 Epoch: 4 Global Step: 81060 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:40:57,527-Speed 5218.54 samples/sec Loss 3.8237 LearningRate 0.0573 Epoch: 4 Global Step: 81070 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:40:59,492-Speed 5210.92 samples/sec Loss 3.9441 LearningRate 0.0573 Epoch: 4 Global Step: 81080 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:41:01,457-Speed 5213.70 samples/sec Loss 3.8845 LearningRate 0.0573 Epoch: 4 Global Step: 81090 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:41:03,427-Speed 5199.85 samples/sec Loss 3.7003 LearningRate 0.0573 Epoch: 4 Global Step: 81100 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:41:05,401-Speed 5189.52 samples/sec Loss 3.9062 LearningRate 0.0573 Epoch: 4 Global Step: 81110 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:41:07,368-Speed 5207.88 samples/sec Loss 3.7801 LearningRate 0.0573 Epoch: 4 Global Step: 81120 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:41:09,340-Speed 5196.00 samples/sec Loss 3.8659 LearningRate 0.0573 Epoch: 4 Global Step: 81130 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:41:11,322-Speed 5167.71 samples/sec Loss 3.8656 LearningRate 0.0573 Epoch: 4 Global Step: 81140 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:41:13,301-Speed 5175.89 samples/sec Loss 3.7851 LearningRate 0.0573 Epoch: 4 Global Step: 81150 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:41:15,285-Speed 5163.06 samples/sec Loss 3.9168 LearningRate 0.0573 Epoch: 4 Global Step: 81160 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:41:17,248-Speed 5216.17 samples/sec Loss 3.9188 LearningRate 0.0573 Epoch: 4 Global Step: 81170 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:41:19,213-Speed 5213.42 samples/sec Loss 3.8666 LearningRate 0.0573 Epoch: 4 Global Step: 81180 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:41:21,196-Speed 5165.64 samples/sec Loss 3.9283 LearningRate 0.0573 Epoch: 4 Global Step: 81190 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:41:23,171-Speed 5187.00 samples/sec Loss 3.8595 LearningRate 0.0573 Epoch: 4 Global Step: 81200 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:41:25,153-Speed 5168.21 samples/sec Loss 3.8819 LearningRate 0.0573 Epoch: 4 Global Step: 81210 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:41:27,123-Speed 5201.15 samples/sec Loss 3.8189 LearningRate 0.0573 Epoch: 4 Global Step: 81220 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:41:29,097-Speed 5189.04 samples/sec Loss 3.8273 LearningRate 0.0573 Epoch: 4 Global Step: 81230 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:41:31,077-Speed 5173.40 samples/sec Loss 3.8799 LearningRate 0.0572 Epoch: 4 Global Step: 81240 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:41:33,047-Speed 5199.60 samples/sec Loss 3.8323 LearningRate 0.0572 Epoch: 4 Global Step: 81250 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:41:35,020-Speed 5191.03 samples/sec Loss 3.9466 LearningRate 0.0572 Epoch: 4 Global Step: 81260 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:41:36,998-Speed 5178.25 samples/sec Loss 3.7583 LearningRate 0.0572 Epoch: 4 Global Step: 81270 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:41:38,969-Speed 5196.97 samples/sec Loss 3.8747 LearningRate 0.0572 Epoch: 4 Global Step: 81280 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:41:40,957-Speed 5151.63 samples/sec Loss 3.8662 LearningRate 0.0572 Epoch: 4 Global Step: 81290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:41:42,948-Speed 5145.62 samples/sec Loss 3.8637 LearningRate 0.0572 Epoch: 4 Global Step: 81300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:41:44,946-Speed 5127.59 samples/sec Loss 3.8751 LearningRate 0.0572 Epoch: 4 Global Step: 81310 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:41:46,921-Speed 5185.33 samples/sec Loss 3.8640 LearningRate 0.0572 Epoch: 4 Global Step: 81320 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:41:48,925-Speed 5112.52 samples/sec Loss 3.8294 LearningRate 0.0572 Epoch: 4 Global Step: 81330 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:41:50,896-Speed 5197.10 samples/sec Loss 3.8450 LearningRate 0.0572 Epoch: 4 Global Step: 81340 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:41:52,865-Speed 5202.71 samples/sec Loss 3.9309 LearningRate 0.0572 Epoch: 4 Global Step: 81350 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:41:54,844-Speed 5176.24 samples/sec Loss 3.8321 LearningRate 0.0572 Epoch: 4 Global Step: 81360 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:41:56,812-Speed 5204.29 samples/sec Loss 3.8941 LearningRate 0.0572 Epoch: 4 Global Step: 81370 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:41:58,775-Speed 5217.66 samples/sec Loss 3.8372 LearningRate 0.0572 Epoch: 4 Global Step: 81380 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:42:00,762-Speed 5154.50 samples/sec Loss 3.7979 LearningRate 0.0572 Epoch: 4 Global Step: 81390 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:42:02,739-Speed 5182.67 samples/sec Loss 3.8531 LearningRate 0.0572 Epoch: 4 Global Step: 81400 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:42:04,720-Speed 5171.29 samples/sec Loss 3.8322 LearningRate 0.0572 Epoch: 4 Global Step: 81410 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:42:06,689-Speed 5201.23 samples/sec Loss 3.8190 LearningRate 0.0572 Epoch: 4 Global Step: 81420 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:42:08,659-Speed 5199.55 samples/sec Loss 3.8700 LearningRate 0.0572 Epoch: 4 Global Step: 81430 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:42:10,627-Speed 5208.28 samples/sec Loss 3.7976 LearningRate 0.0572 Epoch: 4 Global Step: 81440 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:42:12,612-Speed 5158.91 samples/sec Loss 3.8403 LearningRate 0.0572 Epoch: 4 Global Step: 81450 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:42:14,593-Speed 5170.58 samples/sec Loss 3.9075 LearningRate 0.0572 Epoch: 4 Global Step: 81460 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:42:16,569-Speed 5185.63 samples/sec Loss 3.8422 LearningRate 0.0571 Epoch: 4 Global Step: 81470 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:42:18,543-Speed 5187.39 samples/sec Loss 3.8251 LearningRate 0.0571 Epoch: 4 Global Step: 81480 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:42:20,518-Speed 5187.64 samples/sec Loss 3.8176 LearningRate 0.0571 Epoch: 4 Global Step: 81490 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:42:22,492-Speed 5188.17 samples/sec Loss 3.7960 LearningRate 0.0571 Epoch: 4 Global Step: 81500 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:42:24,484-Speed 5142.70 samples/sec Loss 3.9280 LearningRate 0.0571 Epoch: 4 Global Step: 81510 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:42:26,466-Speed 5169.40 samples/sec Loss 3.8736 LearningRate 0.0571 Epoch: 4 Global Step: 81520 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:42:28,441-Speed 5186.54 samples/sec Loss 3.8256 LearningRate 0.0571 Epoch: 4 Global Step: 81530 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:42:30,420-Speed 5176.59 samples/sec Loss 3.7832 LearningRate 0.0571 Epoch: 4 Global Step: 81540 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:42:32,391-Speed 5198.23 samples/sec Loss 3.8432 LearningRate 0.0571 Epoch: 4 Global Step: 81550 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:42:34,357-Speed 5208.97 samples/sec Loss 3.7895 LearningRate 0.0571 Epoch: 4 Global Step: 81560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:42:36,358-Speed 5117.98 samples/sec Loss 3.9261 LearningRate 0.0571 Epoch: 4 Global Step: 81570 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:42:38,351-Speed 5139.66 samples/sec Loss 3.8080 LearningRate 0.0571 Epoch: 4 Global Step: 81580 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:42:40,341-Speed 5149.49 samples/sec Loss 3.8374 LearningRate 0.0571 Epoch: 4 Global Step: 81590 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:42:42,317-Speed 5182.10 samples/sec Loss 3.9089 LearningRate 0.0571 Epoch: 4 Global Step: 81600 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:42:44,284-Speed 5207.91 samples/sec Loss 3.9136 LearningRate 0.0571 Epoch: 4 Global Step: 81610 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:42:46,255-Speed 5196.72 samples/sec Loss 3.8382 LearningRate 0.0571 Epoch: 4 Global Step: 81620 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:42:48,250-Speed 5135.22 samples/sec Loss 3.9177 LearningRate 0.0571 Epoch: 4 Global Step: 81630 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:42:50,225-Speed 5186.85 samples/sec Loss 3.7933 LearningRate 0.0571 Epoch: 4 Global Step: 81640 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:42:52,204-Speed 5174.98 samples/sec Loss 3.7790 LearningRate 0.0571 Epoch: 4 Global Step: 81650 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:42:54,172-Speed 5207.16 samples/sec Loss 3.9008 LearningRate 0.0571 Epoch: 4 Global Step: 81660 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:42:56,133-Speed 5221.18 samples/sec Loss 3.8861 LearningRate 0.0571 Epoch: 4 Global Step: 81670 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:42:58,126-Speed 5141.64 samples/sec Loss 3.9195 LearningRate 0.0571 Epoch: 4 Global Step: 81680 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:43:00,099-Speed 5190.42 samples/sec Loss 3.9723 LearningRate 0.0570 Epoch: 4 Global Step: 81690 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:43:02,077-Speed 5179.98 samples/sec Loss 3.8760 LearningRate 0.0570 Epoch: 4 Global Step: 81700 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:43:04,051-Speed 5187.61 samples/sec Loss 3.8238 LearningRate 0.0570 Epoch: 4 Global Step: 81710 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:43:06,027-Speed 5183.40 samples/sec Loss 3.8213 LearningRate 0.0570 Epoch: 4 Global Step: 81720 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:43:07,996-Speed 5204.05 samples/sec Loss 3.8084 LearningRate 0.0570 Epoch: 4 Global Step: 81730 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:43:09,965-Speed 5201.48 samples/sec Loss 3.8029 LearningRate 0.0570 Epoch: 4 Global Step: 81740 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:43:11,945-Speed 5174.69 samples/sec Loss 3.9061 LearningRate 0.0570 Epoch: 4 Global Step: 81750 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:43:13,923-Speed 5180.65 samples/sec Loss 3.8305 LearningRate 0.0570 Epoch: 4 Global Step: 81760 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:43:15,905-Speed 5166.49 samples/sec Loss 3.8546 LearningRate 0.0570 Epoch: 4 Global Step: 81770 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:43:17,874-Speed 5202.82 samples/sec Loss 3.7992 LearningRate 0.0570 Epoch: 4 Global Step: 81780 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:43:19,852-Speed 5177.43 samples/sec Loss 3.9228 LearningRate 0.0570 Epoch: 4 Global Step: 81790 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:43:21,821-Speed 5203.51 samples/sec Loss 3.8181 LearningRate 0.0570 Epoch: 4 Global Step: 81800 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:43:23,791-Speed 5199.43 samples/sec Loss 3.9085 LearningRate 0.0570 Epoch: 4 Global Step: 81810 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:43:25,766-Speed 5187.22 samples/sec Loss 3.7703 LearningRate 0.0570 Epoch: 4 Global Step: 81820 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:43:27,777-Speed 5093.54 samples/sec Loss 3.8766 LearningRate 0.0570 Epoch: 4 Global Step: 81830 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:43:29,777-Speed 5119.88 samples/sec Loss 3.8844 LearningRate 0.0570 Epoch: 4 Global Step: 81840 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:43:31,743-Speed 5211.85 samples/sec Loss 3.9071 LearningRate 0.0570 Epoch: 4 Global Step: 81850 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:43:33,719-Speed 5183.74 samples/sec Loss 3.8173 LearningRate 0.0570 Epoch: 4 Global Step: 81860 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:43:35,706-Speed 5156.57 samples/sec Loss 3.8445 LearningRate 0.0570 Epoch: 4 Global Step: 81870 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:43:37,689-Speed 5166.00 samples/sec Loss 3.8126 LearningRate 0.0570 Epoch: 4 Global Step: 81880 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:43:39,669-Speed 5173.55 samples/sec Loss 3.8047 LearningRate 0.0570 Epoch: 4 Global Step: 81890 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:43:41,637-Speed 5204.08 samples/sec Loss 3.8118 LearningRate 0.0570 Epoch: 4 Global Step: 81900 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:43:43,606-Speed 5202.26 samples/sec Loss 3.9508 LearningRate 0.0569 Epoch: 4 Global Step: 81910 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:43:45,583-Speed 5180.33 samples/sec Loss 3.9356 LearningRate 0.0569 Epoch: 4 Global Step: 81920 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:43:47,554-Speed 5196.48 samples/sec Loss 3.9106 LearningRate 0.0569 Epoch: 4 Global Step: 81930 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:43:49,523-Speed 5203.06 samples/sec Loss 3.8519 LearningRate 0.0569 Epoch: 4 Global Step: 81940 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:43:51,493-Speed 5199.97 samples/sec Loss 3.8148 LearningRate 0.0569 Epoch: 4 Global Step: 81950 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:43:53,456-Speed 5218.33 samples/sec Loss 3.7789 LearningRate 0.0569 Epoch: 4 Global Step: 81960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:43:55,440-Speed 5162.97 samples/sec Loss 3.8437 LearningRate 0.0569 Epoch: 4 Global Step: 81970 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:43:57,411-Speed 5197.04 samples/sec Loss 3.8598 LearningRate 0.0569 Epoch: 4 Global Step: 81980 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:43:59,406-Speed 5135.56 samples/sec Loss 3.8893 LearningRate 0.0569 Epoch: 4 Global Step: 81990 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:44:01,380-Speed 5188.98 samples/sec Loss 3.8202 LearningRate 0.0569 Epoch: 4 Global Step: 82000 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:44:28,135-[lfw][82000]XNorm: 22.644816 Training: 2022-04-11 04:44:28,136-[lfw][82000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 04:44:28,136-[lfw][82000]Accuracy-Highest: 0.99817 Training: 2022-04-11 04:44:58,974-[cfp_fp][82000]XNorm: 20.535962 Training: 2022-04-11 04:44:58,975-[cfp_fp][82000]Accuracy-Flip: 0.97943+-0.00500 Training: 2022-04-11 04:44:58,975-[cfp_fp][82000]Accuracy-Highest: 0.98086 Training: 2022-04-11 04:45:25,481-[agedb_30][82000]XNorm: 22.635246 Training: 2022-04-11 04:45:25,482-[agedb_30][82000]Accuracy-Flip: 0.97850+-0.00705 Training: 2022-04-11 04:45:25,482-[agedb_30][82000]Accuracy-Highest: 0.97900 Training: 2022-04-11 04:45:27,466-Speed 118.95 samples/sec Loss 3.7395 LearningRate 0.0569 Epoch: 4 Global Step: 82010 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:45:29,429-Speed 5219.11 samples/sec Loss 3.8193 LearningRate 0.0569 Epoch: 4 Global Step: 82020 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:45:31,397-Speed 5202.94 samples/sec Loss 3.9357 LearningRate 0.0569 Epoch: 4 Global Step: 82030 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:45:33,377-Speed 5174.77 samples/sec Loss 3.7675 LearningRate 0.0569 Epoch: 4 Global Step: 82040 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:45:35,342-Speed 5212.83 samples/sec Loss 3.8553 LearningRate 0.0569 Epoch: 4 Global Step: 82050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:45:37,314-Speed 5193.42 samples/sec Loss 3.8526 LearningRate 0.0569 Epoch: 4 Global Step: 82060 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:45:39,291-Speed 5181.58 samples/sec Loss 3.8530 LearningRate 0.0569 Epoch: 4 Global Step: 82070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:45:41,260-Speed 5203.31 samples/sec Loss 3.8415 LearningRate 0.0569 Epoch: 4 Global Step: 82080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:45:43,232-Speed 5195.41 samples/sec Loss 3.8834 LearningRate 0.0569 Epoch: 4 Global Step: 82090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:45:45,197-Speed 5212.65 samples/sec Loss 3.8893 LearningRate 0.0569 Epoch: 4 Global Step: 82100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:45:47,176-Speed 5174.66 samples/sec Loss 3.8432 LearningRate 0.0569 Epoch: 4 Global Step: 82110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:45:49,148-Speed 5194.06 samples/sec Loss 3.8650 LearningRate 0.0569 Epoch: 4 Global Step: 82120 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:45:51,129-Speed 5172.56 samples/sec Loss 3.8789 LearningRate 0.0568 Epoch: 4 Global Step: 82130 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:45:53,098-Speed 5200.25 samples/sec Loss 3.8617 LearningRate 0.0568 Epoch: 4 Global Step: 82140 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:45:55,079-Speed 5173.42 samples/sec Loss 3.8245 LearningRate 0.0568 Epoch: 4 Global Step: 82150 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:45:57,068-Speed 5149.52 samples/sec Loss 3.8979 LearningRate 0.0568 Epoch: 4 Global Step: 82160 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:45:59,045-Speed 5181.13 samples/sec Loss 3.9331 LearningRate 0.0568 Epoch: 4 Global Step: 82170 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:46:01,045-Speed 5123.40 samples/sec Loss 3.8331 LearningRate 0.0568 Epoch: 4 Global Step: 82180 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:46:03,029-Speed 5161.97 samples/sec Loss 3.8110 LearningRate 0.0568 Epoch: 4 Global Step: 82190 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:46:05,020-Speed 5144.30 samples/sec Loss 3.7926 LearningRate 0.0568 Epoch: 4 Global Step: 82200 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:46:06,998-Speed 5180.83 samples/sec Loss 3.7855 LearningRate 0.0568 Epoch: 4 Global Step: 82210 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:46:08,986-Speed 5150.66 samples/sec Loss 3.8643 LearningRate 0.0568 Epoch: 4 Global Step: 82220 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:46:10,972-Speed 5160.26 samples/sec Loss 3.7970 LearningRate 0.0568 Epoch: 4 Global Step: 82230 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:46:12,953-Speed 5170.37 samples/sec Loss 3.8687 LearningRate 0.0568 Epoch: 4 Global Step: 82240 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:46:14,938-Speed 5159.23 samples/sec Loss 3.8277 LearningRate 0.0568 Epoch: 4 Global Step: 82250 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:46:16,938-Speed 5121.39 samples/sec Loss 3.8483 LearningRate 0.0568 Epoch: 4 Global Step: 82260 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:46:18,919-Speed 5172.11 samples/sec Loss 3.8432 LearningRate 0.0568 Epoch: 4 Global Step: 82270 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:46:20,906-Speed 5154.14 samples/sec Loss 3.8236 LearningRate 0.0568 Epoch: 4 Global Step: 82280 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:46:22,888-Speed 5169.11 samples/sec Loss 3.8367 LearningRate 0.0568 Epoch: 4 Global Step: 82290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:46:24,897-Speed 5098.00 samples/sec Loss 3.8675 LearningRate 0.0568 Epoch: 4 Global Step: 82300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:46:26,886-Speed 5152.08 samples/sec Loss 3.8235 LearningRate 0.0568 Epoch: 4 Global Step: 82310 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:46:28,873-Speed 5154.79 samples/sec Loss 3.8141 LearningRate 0.0568 Epoch: 4 Global Step: 82320 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:46:30,852-Speed 5175.28 samples/sec Loss 3.8244 LearningRate 0.0568 Epoch: 4 Global Step: 82330 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:46:32,860-Speed 5102.26 samples/sec Loss 3.8499 LearningRate 0.0568 Epoch: 4 Global Step: 82340 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:46:34,856-Speed 5131.64 samples/sec Loss 3.7571 LearningRate 0.0567 Epoch: 4 Global Step: 82350 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:46:36,845-Speed 5148.64 samples/sec Loss 3.7906 LearningRate 0.0567 Epoch: 4 Global Step: 82360 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:46:38,821-Speed 5183.20 samples/sec Loss 3.8794 LearningRate 0.0567 Epoch: 4 Global Step: 82370 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:46:40,828-Speed 5104.93 samples/sec Loss 3.7739 LearningRate 0.0567 Epoch: 4 Global Step: 82380 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:46:42,826-Speed 5125.94 samples/sec Loss 3.8635 LearningRate 0.0567 Epoch: 4 Global Step: 82390 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:46:44,815-Speed 5152.79 samples/sec Loss 3.7702 LearningRate 0.0567 Epoch: 4 Global Step: 82400 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:46:46,796-Speed 5170.39 samples/sec Loss 3.7979 LearningRate 0.0567 Epoch: 4 Global Step: 82410 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:46:48,785-Speed 5148.90 samples/sec Loss 3.8775 LearningRate 0.0567 Epoch: 4 Global Step: 82420 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:46:50,773-Speed 5152.48 samples/sec Loss 3.7948 LearningRate 0.0567 Epoch: 4 Global Step: 82430 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:46:52,762-Speed 5151.13 samples/sec Loss 3.8382 LearningRate 0.0567 Epoch: 4 Global Step: 82440 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:46:54,738-Speed 5181.98 samples/sec Loss 3.8778 LearningRate 0.0567 Epoch: 4 Global Step: 82450 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:46:56,720-Speed 5170.25 samples/sec Loss 3.9041 LearningRate 0.0567 Epoch: 4 Global Step: 82460 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:46:58,702-Speed 5167.14 samples/sec Loss 3.8345 LearningRate 0.0567 Epoch: 4 Global Step: 82470 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:47:00,692-Speed 5146.12 samples/sec Loss 3.8102 LearningRate 0.0567 Epoch: 4 Global Step: 82480 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:47:02,683-Speed 5145.27 samples/sec Loss 3.8395 LearningRate 0.0567 Epoch: 4 Global Step: 82490 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:47:04,670-Speed 5155.54 samples/sec Loss 3.8581 LearningRate 0.0567 Epoch: 4 Global Step: 82500 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:47:06,646-Speed 5188.45 samples/sec Loss 3.8262 LearningRate 0.0567 Epoch: 4 Global Step: 82510 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:47:08,616-Speed 5198.53 samples/sec Loss 3.9023 LearningRate 0.0567 Epoch: 4 Global Step: 82520 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:47:10,596-Speed 5174.84 samples/sec Loss 3.7861 LearningRate 0.0567 Epoch: 4 Global Step: 82530 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:47:12,571-Speed 5186.72 samples/sec Loss 3.8463 LearningRate 0.0567 Epoch: 4 Global Step: 82540 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:47:14,542-Speed 5195.87 samples/sec Loss 3.8201 LearningRate 0.0567 Epoch: 4 Global Step: 82550 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:47:16,529-Speed 5155.00 samples/sec Loss 3.7759 LearningRate 0.0567 Epoch: 4 Global Step: 82560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:47:18,513-Speed 5161.84 samples/sec Loss 3.8452 LearningRate 0.0566 Epoch: 4 Global Step: 82570 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:47:20,511-Speed 5127.56 samples/sec Loss 3.9005 LearningRate 0.0566 Epoch: 4 Global Step: 82580 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:47:22,496-Speed 5160.84 samples/sec Loss 3.8097 LearningRate 0.0566 Epoch: 4 Global Step: 82590 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:47:24,481-Speed 5160.52 samples/sec Loss 3.7613 LearningRate 0.0566 Epoch: 4 Global Step: 82600 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:47:26,471-Speed 5146.12 samples/sec Loss 3.8287 LearningRate 0.0566 Epoch: 4 Global Step: 82610 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:47:28,474-Speed 5115.98 samples/sec Loss 3.7654 LearningRate 0.0566 Epoch: 4 Global Step: 82620 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:47:30,452-Speed 5179.00 samples/sec Loss 3.8511 LearningRate 0.0566 Epoch: 4 Global Step: 82630 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:47:32,420-Speed 5203.29 samples/sec Loss 3.9100 LearningRate 0.0566 Epoch: 4 Global Step: 82640 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:47:34,386-Speed 5211.84 samples/sec Loss 3.7788 LearningRate 0.0566 Epoch: 4 Global Step: 82650 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:47:36,375-Speed 5148.66 samples/sec Loss 3.8229 LearningRate 0.0566 Epoch: 4 Global Step: 82660 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:47:38,353-Speed 5179.87 samples/sec Loss 3.7887 LearningRate 0.0566 Epoch: 4 Global Step: 82670 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:47:40,325-Speed 5195.76 samples/sec Loss 3.7090 LearningRate 0.0566 Epoch: 4 Global Step: 82680 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:47:42,292-Speed 5206.85 samples/sec Loss 3.7897 LearningRate 0.0566 Epoch: 4 Global Step: 82690 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:47:44,259-Speed 5206.04 samples/sec Loss 3.8269 LearningRate 0.0566 Epoch: 4 Global Step: 82700 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:47:46,232-Speed 5193.87 samples/sec Loss 3.8302 LearningRate 0.0566 Epoch: 4 Global Step: 82710 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:47:48,198-Speed 5209.01 samples/sec Loss 3.8736 LearningRate 0.0566 Epoch: 4 Global Step: 82720 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:47:50,167-Speed 5201.51 samples/sec Loss 3.7678 LearningRate 0.0566 Epoch: 4 Global Step: 82730 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:47:52,136-Speed 5201.96 samples/sec Loss 3.9414 LearningRate 0.0566 Epoch: 4 Global Step: 82740 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:47:54,117-Speed 5171.05 samples/sec Loss 3.8730 LearningRate 0.0566 Epoch: 4 Global Step: 82750 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:47:56,112-Speed 5135.24 samples/sec Loss 3.8430 LearningRate 0.0566 Epoch: 4 Global Step: 82760 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:47:58,086-Speed 5190.53 samples/sec Loss 3.8822 LearningRate 0.0566 Epoch: 4 Global Step: 82770 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:48:00,058-Speed 5193.02 samples/sec Loss 3.7914 LearningRate 0.0566 Epoch: 4 Global Step: 82780 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:48:02,035-Speed 5181.21 samples/sec Loss 3.8019 LearningRate 0.0565 Epoch: 4 Global Step: 82790 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:48:04,017-Speed 5170.62 samples/sec Loss 3.8698 LearningRate 0.0565 Epoch: 4 Global Step: 82800 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:48:05,999-Speed 5168.06 samples/sec Loss 3.8774 LearningRate 0.0565 Epoch: 4 Global Step: 82810 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:48:07,976-Speed 5181.13 samples/sec Loss 3.8132 LearningRate 0.0565 Epoch: 4 Global Step: 82820 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:48:09,944-Speed 5203.85 samples/sec Loss 3.7939 LearningRate 0.0565 Epoch: 4 Global Step: 82830 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:48:11,919-Speed 5186.56 samples/sec Loss 3.8448 LearningRate 0.0565 Epoch: 4 Global Step: 82840 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:48:13,888-Speed 5203.58 samples/sec Loss 3.9578 LearningRate 0.0565 Epoch: 4 Global Step: 82850 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:48:15,873-Speed 5159.89 samples/sec Loss 3.8296 LearningRate 0.0565 Epoch: 4 Global Step: 82860 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:48:17,850-Speed 5181.91 samples/sec Loss 3.7329 LearningRate 0.0565 Epoch: 4 Global Step: 82870 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:48:19,816-Speed 5210.13 samples/sec Loss 3.8326 LearningRate 0.0565 Epoch: 4 Global Step: 82880 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:48:21,786-Speed 5197.84 samples/sec Loss 3.8690 LearningRate 0.0565 Epoch: 4 Global Step: 82890 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:48:23,760-Speed 5188.85 samples/sec Loss 3.8360 LearningRate 0.0565 Epoch: 4 Global Step: 82900 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:48:25,729-Speed 5202.63 samples/sec Loss 3.9393 LearningRate 0.0565 Epoch: 4 Global Step: 82910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:48:27,697-Speed 5204.94 samples/sec Loss 3.7556 LearningRate 0.0565 Epoch: 4 Global Step: 82920 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:48:29,668-Speed 5197.42 samples/sec Loss 3.7899 LearningRate 0.0565 Epoch: 4 Global Step: 82930 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:48:31,643-Speed 5186.22 samples/sec Loss 3.7828 LearningRate 0.0565 Epoch: 4 Global Step: 82940 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:48:33,625-Speed 5169.66 samples/sec Loss 3.7528 LearningRate 0.0565 Epoch: 4 Global Step: 82950 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:48:35,619-Speed 5135.48 samples/sec Loss 3.9357 LearningRate 0.0565 Epoch: 4 Global Step: 82960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:48:37,612-Speed 5140.08 samples/sec Loss 3.8291 LearningRate 0.0565 Epoch: 4 Global Step: 82970 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:48:39,619-Speed 5105.98 samples/sec Loss 3.8650 LearningRate 0.0565 Epoch: 4 Global Step: 82980 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:48:41,599-Speed 5171.91 samples/sec Loss 3.8694 LearningRate 0.0565 Epoch: 4 Global Step: 82990 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:48:43,572-Speed 5192.57 samples/sec Loss 3.8304 LearningRate 0.0565 Epoch: 4 Global Step: 83000 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:48:45,563-Speed 5144.15 samples/sec Loss 3.8639 LearningRate 0.0565 Epoch: 4 Global Step: 83010 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:48:47,536-Speed 5193.05 samples/sec Loss 3.7357 LearningRate 0.0564 Epoch: 4 Global Step: 83020 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:48:49,529-Speed 5138.04 samples/sec Loss 3.8730 LearningRate 0.0564 Epoch: 4 Global Step: 83030 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:48:51,493-Speed 5216.39 samples/sec Loss 3.7097 LearningRate 0.0564 Epoch: 4 Global Step: 83040 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:48:53,465-Speed 5193.55 samples/sec Loss 3.8743 LearningRate 0.0564 Epoch: 4 Global Step: 83050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:48:55,446-Speed 5171.32 samples/sec Loss 3.8297 LearningRate 0.0564 Epoch: 4 Global Step: 83060 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:48:57,419-Speed 5190.68 samples/sec Loss 3.8708 LearningRate 0.0564 Epoch: 4 Global Step: 83070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:48:59,427-Speed 5101.99 samples/sec Loss 3.8361 LearningRate 0.0564 Epoch: 4 Global Step: 83080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:49:01,420-Speed 5141.28 samples/sec Loss 3.8132 LearningRate 0.0564 Epoch: 4 Global Step: 83090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:49:03,397-Speed 5180.24 samples/sec Loss 3.9363 LearningRate 0.0564 Epoch: 4 Global Step: 83100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:49:05,377-Speed 5173.58 samples/sec Loss 3.8269 LearningRate 0.0564 Epoch: 4 Global Step: 83110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:49:07,355-Speed 5179.02 samples/sec Loss 3.7988 LearningRate 0.0564 Epoch: 4 Global Step: 83120 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:49:09,327-Speed 5193.93 samples/sec Loss 3.8509 LearningRate 0.0564 Epoch: 4 Global Step: 83130 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:49:11,297-Speed 5202.96 samples/sec Loss 3.7945 LearningRate 0.0564 Epoch: 4 Global Step: 83140 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:49:13,266-Speed 5200.00 samples/sec Loss 3.7955 LearningRate 0.0564 Epoch: 4 Global Step: 83150 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:49:15,259-Speed 5139.59 samples/sec Loss 3.9295 LearningRate 0.0564 Epoch: 4 Global Step: 83160 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:49:17,229-Speed 5201.21 samples/sec Loss 3.7275 LearningRate 0.0564 Epoch: 4 Global Step: 83170 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:49:19,215-Speed 5158.07 samples/sec Loss 3.8079 LearningRate 0.0564 Epoch: 4 Global Step: 83180 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:49:21,203-Speed 5153.75 samples/sec Loss 3.8988 LearningRate 0.0564 Epoch: 4 Global Step: 83190 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:49:23,192-Speed 5148.44 samples/sec Loss 3.8397 LearningRate 0.0564 Epoch: 4 Global Step: 83200 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:49:25,161-Speed 5203.17 samples/sec Loss 3.8048 LearningRate 0.0564 Epoch: 4 Global Step: 83210 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:49:27,130-Speed 5202.76 samples/sec Loss 3.7873 LearningRate 0.0564 Epoch: 4 Global Step: 83220 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:49:29,099-Speed 5201.38 samples/sec Loss 3.7512 LearningRate 0.0564 Epoch: 4 Global Step: 83230 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:49:31,096-Speed 5130.64 samples/sec Loss 3.7603 LearningRate 0.0563 Epoch: 4 Global Step: 83240 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:49:33,074-Speed 5177.68 samples/sec Loss 3.7939 LearningRate 0.0563 Epoch: 4 Global Step: 83250 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:49:35,065-Speed 5145.67 samples/sec Loss 3.8407 LearningRate 0.0563 Epoch: 4 Global Step: 83260 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:49:37,057-Speed 5141.15 samples/sec Loss 3.8106 LearningRate 0.0563 Epoch: 4 Global Step: 83270 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:49:39,034-Speed 5182.52 samples/sec Loss 3.7822 LearningRate 0.0563 Epoch: 4 Global Step: 83280 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:49:41,029-Speed 5133.87 samples/sec Loss 3.7554 LearningRate 0.0563 Epoch: 4 Global Step: 83290 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:49:43,004-Speed 5187.78 samples/sec Loss 3.7136 LearningRate 0.0563 Epoch: 4 Global Step: 83300 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:49:44,976-Speed 5193.62 samples/sec Loss 3.6901 LearningRate 0.0563 Epoch: 4 Global Step: 83310 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:49:46,949-Speed 5192.33 samples/sec Loss 3.8173 LearningRate 0.0563 Epoch: 4 Global Step: 83320 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:49:48,929-Speed 5172.32 samples/sec Loss 3.7994 LearningRate 0.0563 Epoch: 4 Global Step: 83330 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:49:50,903-Speed 5188.30 samples/sec Loss 3.8417 LearningRate 0.0563 Epoch: 4 Global Step: 83340 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:49:52,881-Speed 5181.07 samples/sec Loss 3.7349 LearningRate 0.0563 Epoch: 4 Global Step: 83350 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:49:54,853-Speed 5194.48 samples/sec Loss 3.8137 LearningRate 0.0563 Epoch: 4 Global Step: 83360 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:49:56,837-Speed 5161.41 samples/sec Loss 3.8019 LearningRate 0.0563 Epoch: 4 Global Step: 83370 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:49:58,843-Speed 5106.72 samples/sec Loss 3.7686 LearningRate 0.0563 Epoch: 4 Global Step: 83380 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:50:00,840-Speed 5129.33 samples/sec Loss 3.7824 LearningRate 0.0563 Epoch: 4 Global Step: 83390 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:50:02,842-Speed 5117.25 samples/sec Loss 3.8319 LearningRate 0.0563 Epoch: 4 Global Step: 83400 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:50:04,809-Speed 5207.43 samples/sec Loss 3.7691 LearningRate 0.0563 Epoch: 4 Global Step: 83410 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:50:06,797-Speed 5152.78 samples/sec Loss 3.7698 LearningRate 0.0563 Epoch: 4 Global Step: 83420 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:50:08,766-Speed 5203.96 samples/sec Loss 3.7935 LearningRate 0.0563 Epoch: 4 Global Step: 83430 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:50:10,741-Speed 5185.25 samples/sec Loss 3.7581 LearningRate 0.0563 Epoch: 4 Global Step: 83440 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:50:13,195-Speed 4174.04 samples/sec Loss 3.8245 LearningRate 0.0563 Epoch: 4 Global Step: 83450 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:50:43,921-Speed 333.28 samples/sec Loss 3.5854 LearningRate 0.0562 Epoch: 5 Global Step: 83460 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:50:45,902-Speed 5172.79 samples/sec Loss 3.2682 LearningRate 0.0562 Epoch: 5 Global Step: 83470 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:50:47,902-Speed 5122.18 samples/sec Loss 3.2021 LearningRate 0.0562 Epoch: 5 Global Step: 83480 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:50:49,875-Speed 5193.61 samples/sec Loss 3.1697 LearningRate 0.0562 Epoch: 5 Global Step: 83490 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:50:52,317-Speed 4193.89 samples/sec Loss 3.1992 LearningRate 0.0562 Epoch: 5 Global Step: 83500 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:50:54,340-Speed 5065.15 samples/sec Loss 3.0829 LearningRate 0.0562 Epoch: 5 Global Step: 83510 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:50:56,317-Speed 5182.02 samples/sec Loss 3.1640 LearningRate 0.0562 Epoch: 5 Global Step: 83520 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:50:58,300-Speed 5163.92 samples/sec Loss 3.1321 LearningRate 0.0562 Epoch: 5 Global Step: 83530 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:51:00,291-Speed 5149.77 samples/sec Loss 3.1840 LearningRate 0.0562 Epoch: 5 Global Step: 83540 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:51:02,277-Speed 5155.60 samples/sec Loss 3.2262 LearningRate 0.0562 Epoch: 5 Global Step: 83550 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:51:04,265-Speed 5153.82 samples/sec Loss 3.1853 LearningRate 0.0562 Epoch: 5 Global Step: 83560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:51:06,261-Speed 5132.79 samples/sec Loss 3.1900 LearningRate 0.0562 Epoch: 5 Global Step: 83570 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:51:08,238-Speed 5181.99 samples/sec Loss 3.1506 LearningRate 0.0562 Epoch: 5 Global Step: 83580 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:51:10,214-Speed 5182.46 samples/sec Loss 3.1157 LearningRate 0.0562 Epoch: 5 Global Step: 83590 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:51:12,195-Speed 5171.98 samples/sec Loss 3.1679 LearningRate 0.0562 Epoch: 5 Global Step: 83600 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:51:14,194-Speed 5123.40 samples/sec Loss 3.1973 LearningRate 0.0562 Epoch: 5 Global Step: 83610 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:51:16,165-Speed 5197.93 samples/sec Loss 3.1938 LearningRate 0.0562 Epoch: 5 Global Step: 83620 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:51:18,156-Speed 5146.70 samples/sec Loss 3.2232 LearningRate 0.0562 Epoch: 5 Global Step: 83630 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:51:20,130-Speed 5188.21 samples/sec Loss 3.2161 LearningRate 0.0562 Epoch: 5 Global Step: 83640 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:51:22,108-Speed 5178.65 samples/sec Loss 3.1310 LearningRate 0.0562 Epoch: 5 Global Step: 83650 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:51:24,087-Speed 5176.52 samples/sec Loss 3.2082 LearningRate 0.0562 Epoch: 5 Global Step: 83660 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:51:26,093-Speed 5105.72 samples/sec Loss 3.1482 LearningRate 0.0562 Epoch: 5 Global Step: 83670 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:51:28,070-Speed 5179.20 samples/sec Loss 3.1784 LearningRate 0.0561 Epoch: 5 Global Step: 83680 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:51:30,038-Speed 5208.43 samples/sec Loss 3.2135 LearningRate 0.0561 Epoch: 5 Global Step: 83690 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:51:32,010-Speed 5192.81 samples/sec Loss 3.1458 LearningRate 0.0561 Epoch: 5 Global Step: 83700 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:51:33,985-Speed 5188.24 samples/sec Loss 3.1462 LearningRate 0.0561 Epoch: 5 Global Step: 83710 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:51:35,972-Speed 5154.49 samples/sec Loss 3.1584 LearningRate 0.0561 Epoch: 5 Global Step: 83720 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:51:37,960-Speed 5153.12 samples/sec Loss 3.0680 LearningRate 0.0561 Epoch: 5 Global Step: 83730 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-04-11 04:51:39,932-Speed 5194.26 samples/sec Loss 3.2060 LearningRate 0.0561 Epoch: 5 Global Step: 83740 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:51:41,911-Speed 5174.56 samples/sec Loss 3.2088 LearningRate 0.0561 Epoch: 5 Global Step: 83750 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:51:43,896-Speed 5162.33 samples/sec Loss 3.2256 LearningRate 0.0561 Epoch: 5 Global Step: 83760 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:51:45,881-Speed 5159.87 samples/sec Loss 3.1320 LearningRate 0.0561 Epoch: 5 Global Step: 83770 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:51:47,888-Speed 5102.89 samples/sec Loss 3.1990 LearningRate 0.0561 Epoch: 5 Global Step: 83780 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:51:49,866-Speed 5180.49 samples/sec Loss 3.1799 LearningRate 0.0561 Epoch: 5 Global Step: 83790 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:51:51,856-Speed 5146.42 samples/sec Loss 3.2176 LearningRate 0.0561 Epoch: 5 Global Step: 83800 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:51:53,828-Speed 5194.66 samples/sec Loss 3.2152 LearningRate 0.0561 Epoch: 5 Global Step: 83810 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:51:55,819-Speed 5144.39 samples/sec Loss 3.1917 LearningRate 0.0561 Epoch: 5 Global Step: 83820 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:51:57,807-Speed 5154.05 samples/sec Loss 3.2238 LearningRate 0.0561 Epoch: 5 Global Step: 83830 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:51:59,791-Speed 5162.45 samples/sec Loss 3.1703 LearningRate 0.0561 Epoch: 5 Global Step: 83840 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:52:01,783-Speed 5142.55 samples/sec Loss 3.2022 LearningRate 0.0561 Epoch: 5 Global Step: 83850 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:52:03,763-Speed 5174.78 samples/sec Loss 3.1789 LearningRate 0.0561 Epoch: 5 Global Step: 83860 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:52:05,747-Speed 5162.75 samples/sec Loss 3.2777 LearningRate 0.0561 Epoch: 5 Global Step: 83870 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:52:07,744-Speed 5129.81 samples/sec Loss 3.2278 LearningRate 0.0561 Epoch: 5 Global Step: 83880 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:52:09,734-Speed 5147.52 samples/sec Loss 3.2719 LearningRate 0.0561 Epoch: 5 Global Step: 83890 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:52:11,729-Speed 5133.36 samples/sec Loss 3.2219 LearningRate 0.0561 Epoch: 5 Global Step: 83900 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:52:13,713-Speed 5162.65 samples/sec Loss 3.1892 LearningRate 0.0560 Epoch: 5 Global Step: 83910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:52:15,694-Speed 5172.50 samples/sec Loss 3.2930 LearningRate 0.0560 Epoch: 5 Global Step: 83920 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:52:17,684-Speed 5146.07 samples/sec Loss 3.2096 LearningRate 0.0560 Epoch: 5 Global Step: 83930 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:52:19,672-Speed 5154.16 samples/sec Loss 3.1606 LearningRate 0.0560 Epoch: 5 Global Step: 83940 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:52:21,667-Speed 5133.74 samples/sec Loss 3.1889 LearningRate 0.0560 Epoch: 5 Global Step: 83950 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:52:23,679-Speed 5091.50 samples/sec Loss 3.2498 LearningRate 0.0560 Epoch: 5 Global Step: 83960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:52:25,672-Speed 5140.75 samples/sec Loss 3.2250 LearningRate 0.0560 Epoch: 5 Global Step: 83970 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:52:27,661-Speed 5148.33 samples/sec Loss 3.3112 LearningRate 0.0560 Epoch: 5 Global Step: 83980 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:52:29,640-Speed 5175.62 samples/sec Loss 3.2178 LearningRate 0.0560 Epoch: 5 Global Step: 83990 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:52:31,652-Speed 5091.94 samples/sec Loss 3.2576 LearningRate 0.0560 Epoch: 5 Global Step: 84000 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:52:58,363-[lfw][84000]XNorm: 23.851377 Training: 2022-04-11 04:52:58,364-[lfw][84000]Accuracy-Flip: 0.99733+-0.00309 Training: 2022-04-11 04:52:58,364-[lfw][84000]Accuracy-Highest: 0.99817 Training: 2022-04-11 04:53:29,239-[cfp_fp][84000]XNorm: 21.555157 Training: 2022-04-11 04:53:29,240-[cfp_fp][84000]Accuracy-Flip: 0.97657+-0.00850 Training: 2022-04-11 04:53:29,240-[cfp_fp][84000]Accuracy-Highest: 0.98086 Training: 2022-04-11 04:53:55,859-[agedb_30][84000]XNorm: 23.609117 Training: 2022-04-11 04:53:55,859-[agedb_30][84000]Accuracy-Flip: 0.97867+-0.00878 Training: 2022-04-11 04:53:55,860-[agedb_30][84000]Accuracy-Highest: 0.97900 Training: 2022-04-11 04:53:57,856-Speed 118.79 samples/sec Loss 3.2922 LearningRate 0.0560 Epoch: 5 Global Step: 84010 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:53:59,973-Speed 4839.49 samples/sec Loss 3.2403 LearningRate 0.0560 Epoch: 5 Global Step: 84020 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:54:01,960-Speed 5154.04 samples/sec Loss 3.2439 LearningRate 0.0560 Epoch: 5 Global Step: 84030 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:54:03,932-Speed 5193.59 samples/sec Loss 3.2285 LearningRate 0.0560 Epoch: 5 Global Step: 84040 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:54:05,898-Speed 5211.03 samples/sec Loss 3.2485 LearningRate 0.0560 Epoch: 5 Global Step: 84050 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:54:07,861-Speed 5221.28 samples/sec Loss 3.2841 LearningRate 0.0560 Epoch: 5 Global Step: 84060 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:54:10,100-Speed 4573.07 samples/sec Loss 3.3089 LearningRate 0.0560 Epoch: 5 Global Step: 84070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:54:12,075-Speed 5187.97 samples/sec Loss 3.2782 LearningRate 0.0560 Epoch: 5 Global Step: 84080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:54:14,044-Speed 5201.83 samples/sec Loss 3.2970 LearningRate 0.0560 Epoch: 5 Global Step: 84090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:54:16,013-Speed 5201.20 samples/sec Loss 3.2763 LearningRate 0.0560 Epoch: 5 Global Step: 84100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:54:17,989-Speed 5184.13 samples/sec Loss 3.2668 LearningRate 0.0560 Epoch: 5 Global Step: 84110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:54:19,970-Speed 5172.30 samples/sec Loss 3.2097 LearningRate 0.0560 Epoch: 5 Global Step: 84120 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:54:21,943-Speed 5191.42 samples/sec Loss 3.3409 LearningRate 0.0559 Epoch: 5 Global Step: 84130 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:54:23,910-Speed 5206.55 samples/sec Loss 3.3497 LearningRate 0.0559 Epoch: 5 Global Step: 84140 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:54:25,894-Speed 5163.94 samples/sec Loss 3.2769 LearningRate 0.0559 Epoch: 5 Global Step: 84150 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:54:27,866-Speed 5192.74 samples/sec Loss 3.3052 LearningRate 0.0559 Epoch: 5 Global Step: 84160 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:54:29,849-Speed 5166.12 samples/sec Loss 3.3062 LearningRate 0.0559 Epoch: 5 Global Step: 84170 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:54:31,818-Speed 5202.87 samples/sec Loss 3.1697 LearningRate 0.0559 Epoch: 5 Global Step: 84180 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:54:33,796-Speed 5179.25 samples/sec Loss 3.2928 LearningRate 0.0559 Epoch: 5 Global Step: 84190 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:54:35,771-Speed 5187.01 samples/sec Loss 3.3237 LearningRate 0.0559 Epoch: 5 Global Step: 84200 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:54:37,748-Speed 5181.40 samples/sec Loss 3.2834 LearningRate 0.0559 Epoch: 5 Global Step: 84210 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:54:39,729-Speed 5169.16 samples/sec Loss 3.2528 LearningRate 0.0559 Epoch: 5 Global Step: 84220 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:54:41,710-Speed 5172.67 samples/sec Loss 3.2997 LearningRate 0.0559 Epoch: 5 Global Step: 84230 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:54:43,675-Speed 5211.17 samples/sec Loss 3.2811 LearningRate 0.0559 Epoch: 5 Global Step: 84240 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:54:45,642-Speed 5209.06 samples/sec Loss 3.2599 LearningRate 0.0559 Epoch: 5 Global Step: 84250 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:54:47,613-Speed 5196.81 samples/sec Loss 3.2688 LearningRate 0.0559 Epoch: 5 Global Step: 84260 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:54:49,577-Speed 5216.60 samples/sec Loss 3.3130 LearningRate 0.0559 Epoch: 5 Global Step: 84270 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:54:51,567-Speed 5147.58 samples/sec Loss 3.2705 LearningRate 0.0559 Epoch: 5 Global Step: 84280 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:54:53,562-Speed 5134.10 samples/sec Loss 3.3012 LearningRate 0.0559 Epoch: 5 Global Step: 84290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:54:55,528-Speed 5210.24 samples/sec Loss 3.3301 LearningRate 0.0559 Epoch: 5 Global Step: 84300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:54:57,497-Speed 5200.43 samples/sec Loss 3.3025 LearningRate 0.0559 Epoch: 5 Global Step: 84310 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:54:59,475-Speed 5179.38 samples/sec Loss 3.2647 LearningRate 0.0559 Epoch: 5 Global Step: 84320 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:55:01,446-Speed 5197.86 samples/sec Loss 3.2776 LearningRate 0.0559 Epoch: 5 Global Step: 84330 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:55:03,428-Speed 5167.70 samples/sec Loss 3.2343 LearningRate 0.0559 Epoch: 5 Global Step: 84340 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:55:05,402-Speed 5189.47 samples/sec Loss 3.2747 LearningRate 0.0558 Epoch: 5 Global Step: 84350 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:55:07,382-Speed 5172.39 samples/sec Loss 3.3075 LearningRate 0.0558 Epoch: 5 Global Step: 84360 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:55:09,340-Speed 5232.85 samples/sec Loss 3.2878 LearningRate 0.0558 Epoch: 5 Global Step: 84370 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:55:11,327-Speed 5155.36 samples/sec Loss 3.2691 LearningRate 0.0558 Epoch: 5 Global Step: 84380 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:55:13,323-Speed 5132.15 samples/sec Loss 3.3283 LearningRate 0.0558 Epoch: 5 Global Step: 84390 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:55:15,301-Speed 5179.00 samples/sec Loss 3.2641 LearningRate 0.0558 Epoch: 5 Global Step: 84400 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:55:17,270-Speed 5201.89 samples/sec Loss 3.2124 LearningRate 0.0558 Epoch: 5 Global Step: 84410 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:55:19,237-Speed 5207.95 samples/sec Loss 3.2801 LearningRate 0.0558 Epoch: 5 Global Step: 84420 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:55:21,218-Speed 5169.27 samples/sec Loss 3.2991 LearningRate 0.0558 Epoch: 5 Global Step: 84430 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:55:23,187-Speed 5204.05 samples/sec Loss 3.3109 LearningRate 0.0558 Epoch: 5 Global Step: 84440 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:55:25,156-Speed 5200.34 samples/sec Loss 3.3864 LearningRate 0.0558 Epoch: 5 Global Step: 84450 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:55:27,161-Speed 5111.43 samples/sec Loss 3.3254 LearningRate 0.0558 Epoch: 5 Global Step: 84460 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:55:29,139-Speed 5178.86 samples/sec Loss 3.3293 LearningRate 0.0558 Epoch: 5 Global Step: 84470 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:55:31,110-Speed 5196.72 samples/sec Loss 3.2810 LearningRate 0.0558 Epoch: 5 Global Step: 84480 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:55:33,100-Speed 5146.15 samples/sec Loss 3.2995 LearningRate 0.0558 Epoch: 5 Global Step: 84490 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:55:35,084-Speed 5164.54 samples/sec Loss 3.2593 LearningRate 0.0558 Epoch: 5 Global Step: 84500 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:55:37,102-Speed 5075.45 samples/sec Loss 3.4082 LearningRate 0.0558 Epoch: 5 Global Step: 84510 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:55:39,106-Speed 5112.41 samples/sec Loss 3.2835 LearningRate 0.0558 Epoch: 5 Global Step: 84520 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:55:41,086-Speed 5173.22 samples/sec Loss 3.2890 LearningRate 0.0558 Epoch: 5 Global Step: 84530 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:55:43,073-Speed 5153.01 samples/sec Loss 3.2623 LearningRate 0.0558 Epoch: 5 Global Step: 84540 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:55:45,045-Speed 5196.18 samples/sec Loss 3.4174 LearningRate 0.0558 Epoch: 5 Global Step: 84550 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:55:47,018-Speed 5192.78 samples/sec Loss 3.2999 LearningRate 0.0558 Epoch: 5 Global Step: 84560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:55:48,992-Speed 5188.91 samples/sec Loss 3.3992 LearningRate 0.0558 Epoch: 5 Global Step: 84570 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:55:50,986-Speed 5135.35 samples/sec Loss 3.2612 LearningRate 0.0557 Epoch: 5 Global Step: 84580 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:55:52,954-Speed 5205.68 samples/sec Loss 3.3284 LearningRate 0.0557 Epoch: 5 Global Step: 84590 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:55:54,928-Speed 5189.88 samples/sec Loss 3.3656 LearningRate 0.0557 Epoch: 5 Global Step: 84600 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:55:56,915-Speed 5153.18 samples/sec Loss 3.3234 LearningRate 0.0557 Epoch: 5 Global Step: 84610 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:55:58,903-Speed 5152.35 samples/sec Loss 3.3353 LearningRate 0.0557 Epoch: 5 Global Step: 84620 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:56:00,899-Speed 5132.79 samples/sec Loss 3.3385 LearningRate 0.0557 Epoch: 5 Global Step: 84630 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:56:02,904-Speed 5110.05 samples/sec Loss 3.3452 LearningRate 0.0557 Epoch: 5 Global Step: 84640 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:56:04,896-Speed 5142.85 samples/sec Loss 3.3538 LearningRate 0.0557 Epoch: 5 Global Step: 84650 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:56:06,879-Speed 5164.29 samples/sec Loss 3.1837 LearningRate 0.0557 Epoch: 5 Global Step: 84660 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:56:08,842-Speed 5219.16 samples/sec Loss 3.2819 LearningRate 0.0557 Epoch: 5 Global Step: 84670 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:56:10,822-Speed 5174.35 samples/sec Loss 3.2802 LearningRate 0.0557 Epoch: 5 Global Step: 84680 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:56:12,786-Speed 5213.28 samples/sec Loss 3.3271 LearningRate 0.0557 Epoch: 5 Global Step: 84690 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:56:14,756-Speed 5200.53 samples/sec Loss 3.2877 LearningRate 0.0557 Epoch: 5 Global Step: 84700 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:56:16,757-Speed 5120.23 samples/sec Loss 3.3164 LearningRate 0.0557 Epoch: 5 Global Step: 84710 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:56:18,717-Speed 5226.14 samples/sec Loss 3.3603 LearningRate 0.0557 Epoch: 5 Global Step: 84720 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:56:20,684-Speed 5206.98 samples/sec Loss 3.3923 LearningRate 0.0557 Epoch: 5 Global Step: 84730 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:56:22,664-Speed 5174.41 samples/sec Loss 3.4565 LearningRate 0.0557 Epoch: 5 Global Step: 84740 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:56:24,637-Speed 5192.69 samples/sec Loss 3.3334 LearningRate 0.0557 Epoch: 5 Global Step: 84750 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:56:26,626-Speed 5149.09 samples/sec Loss 3.3778 LearningRate 0.0557 Epoch: 5 Global Step: 84760 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:56:28,615-Speed 5148.73 samples/sec Loss 3.3713 LearningRate 0.0557 Epoch: 5 Global Step: 84770 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:56:30,581-Speed 5210.62 samples/sec Loss 3.3710 LearningRate 0.0557 Epoch: 5 Global Step: 84780 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:56:32,554-Speed 5192.18 samples/sec Loss 3.3664 LearningRate 0.0557 Epoch: 5 Global Step: 84790 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:56:34,528-Speed 5188.71 samples/sec Loss 3.3363 LearningRate 0.0556 Epoch: 5 Global Step: 84800 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:56:36,516-Speed 5154.02 samples/sec Loss 3.3383 LearningRate 0.0556 Epoch: 5 Global Step: 84810 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:56:38,500-Speed 5165.12 samples/sec Loss 3.3407 LearningRate 0.0556 Epoch: 5 Global Step: 84820 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:56:40,497-Speed 5126.93 samples/sec Loss 3.3478 LearningRate 0.0556 Epoch: 5 Global Step: 84830 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:56:42,475-Speed 5179.32 samples/sec Loss 3.2842 LearningRate 0.0556 Epoch: 5 Global Step: 84840 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:56:44,454-Speed 5176.95 samples/sec Loss 3.3588 LearningRate 0.0556 Epoch: 5 Global Step: 84850 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:56:46,432-Speed 5178.71 samples/sec Loss 3.3514 LearningRate 0.0556 Epoch: 5 Global Step: 84860 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:56:48,408-Speed 5185.48 samples/sec Loss 3.3515 LearningRate 0.0556 Epoch: 5 Global Step: 84870 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:56:50,384-Speed 5181.49 samples/sec Loss 3.3951 LearningRate 0.0556 Epoch: 5 Global Step: 84880 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:56:52,400-Speed 5082.95 samples/sec Loss 3.3565 LearningRate 0.0556 Epoch: 5 Global Step: 84890 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:56:54,367-Speed 5205.76 samples/sec Loss 3.4281 LearningRate 0.0556 Epoch: 5 Global Step: 84900 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:56:56,377-Speed 5098.16 samples/sec Loss 3.2550 LearningRate 0.0556 Epoch: 5 Global Step: 84910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:56:58,370-Speed 5138.81 samples/sec Loss 3.4071 LearningRate 0.0556 Epoch: 5 Global Step: 84920 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:57:00,361-Speed 5145.61 samples/sec Loss 3.4131 LearningRate 0.0556 Epoch: 5 Global Step: 84930 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:57:02,383-Speed 5065.22 samples/sec Loss 3.3572 LearningRate 0.0556 Epoch: 5 Global Step: 84940 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:57:04,368-Speed 5160.58 samples/sec Loss 3.3872 LearningRate 0.0556 Epoch: 5 Global Step: 84950 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:57:06,357-Speed 5149.16 samples/sec Loss 3.3754 LearningRate 0.0556 Epoch: 5 Global Step: 84960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:57:08,369-Speed 5092.20 samples/sec Loss 3.3909 LearningRate 0.0556 Epoch: 5 Global Step: 84970 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:57:10,353-Speed 5163.65 samples/sec Loss 3.4841 LearningRate 0.0556 Epoch: 5 Global Step: 84980 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:57:12,325-Speed 5192.42 samples/sec Loss 3.3628 LearningRate 0.0556 Epoch: 5 Global Step: 84990 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:57:14,311-Speed 5158.79 samples/sec Loss 3.4673 LearningRate 0.0556 Epoch: 5 Global Step: 85000 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:57:16,284-Speed 5193.43 samples/sec Loss 3.3879 LearningRate 0.0556 Epoch: 5 Global Step: 85010 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:57:18,255-Speed 5195.90 samples/sec Loss 3.3697 LearningRate 0.0555 Epoch: 5 Global Step: 85020 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:57:20,233-Speed 5180.04 samples/sec Loss 3.3796 LearningRate 0.0555 Epoch: 5 Global Step: 85030 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:57:22,217-Speed 5161.81 samples/sec Loss 3.3814 LearningRate 0.0555 Epoch: 5 Global Step: 85040 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:57:24,224-Speed 5105.23 samples/sec Loss 3.3684 LearningRate 0.0555 Epoch: 5 Global Step: 85050 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:57:26,208-Speed 5163.83 samples/sec Loss 3.3937 LearningRate 0.0555 Epoch: 5 Global Step: 85060 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:57:28,180-Speed 5194.09 samples/sec Loss 3.3984 LearningRate 0.0555 Epoch: 5 Global Step: 85070 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:57:30,183-Speed 5114.37 samples/sec Loss 3.4264 LearningRate 0.0555 Epoch: 5 Global Step: 85080 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:57:32,164-Speed 5170.27 samples/sec Loss 3.3166 LearningRate 0.0555 Epoch: 5 Global Step: 85090 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:57:34,148-Speed 5165.43 samples/sec Loss 3.3917 LearningRate 0.0555 Epoch: 5 Global Step: 85100 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:57:36,119-Speed 5195.82 samples/sec Loss 3.3160 LearningRate 0.0555 Epoch: 5 Global Step: 85110 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:57:38,116-Speed 5129.13 samples/sec Loss 3.4649 LearningRate 0.0555 Epoch: 5 Global Step: 85120 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:57:40,111-Speed 5135.80 samples/sec Loss 3.5286 LearningRate 0.0555 Epoch: 5 Global Step: 85130 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:57:42,079-Speed 5204.41 samples/sec Loss 3.4242 LearningRate 0.0555 Epoch: 5 Global Step: 85140 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:57:44,053-Speed 5190.71 samples/sec Loss 3.4158 LearningRate 0.0555 Epoch: 5 Global Step: 85150 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:57:46,035-Speed 5168.99 samples/sec Loss 3.4576 LearningRate 0.0555 Epoch: 5 Global Step: 85160 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:57:48,008-Speed 5191.48 samples/sec Loss 3.4026 LearningRate 0.0555 Epoch: 5 Global Step: 85170 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:57:49,993-Speed 5158.92 samples/sec Loss 3.3736 LearningRate 0.0555 Epoch: 5 Global Step: 85180 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:57:51,959-Speed 5210.90 samples/sec Loss 3.3781 LearningRate 0.0555 Epoch: 5 Global Step: 85190 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:57:53,951-Speed 5142.64 samples/sec Loss 3.3451 LearningRate 0.0555 Epoch: 5 Global Step: 85200 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:57:55,921-Speed 5199.11 samples/sec Loss 3.4754 LearningRate 0.0555 Epoch: 5 Global Step: 85210 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:57:57,891-Speed 5199.11 samples/sec Loss 3.4345 LearningRate 0.0555 Epoch: 5 Global Step: 85220 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:57:59,898-Speed 5104.80 samples/sec Loss 3.3810 LearningRate 0.0555 Epoch: 5 Global Step: 85230 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:58:01,869-Speed 5196.84 samples/sec Loss 3.3282 LearningRate 0.0555 Epoch: 5 Global Step: 85240 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:58:03,841-Speed 5193.43 samples/sec Loss 3.3293 LearningRate 0.0554 Epoch: 5 Global Step: 85250 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:58:05,810-Speed 5202.63 samples/sec Loss 3.3705 LearningRate 0.0554 Epoch: 5 Global Step: 85260 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:58:07,789-Speed 5176.61 samples/sec Loss 3.4310 LearningRate 0.0554 Epoch: 5 Global Step: 85270 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:58:09,764-Speed 5185.88 samples/sec Loss 3.4456 LearningRate 0.0554 Epoch: 5 Global Step: 85280 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:58:11,764-Speed 5122.68 samples/sec Loss 3.4657 LearningRate 0.0554 Epoch: 5 Global Step: 85290 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:58:13,759-Speed 5135.67 samples/sec Loss 3.4389 LearningRate 0.0554 Epoch: 5 Global Step: 85300 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:58:15,781-Speed 5065.60 samples/sec Loss 3.4305 LearningRate 0.0554 Epoch: 5 Global Step: 85310 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:58:17,771-Speed 5146.77 samples/sec Loss 3.3457 LearningRate 0.0554 Epoch: 5 Global Step: 85320 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:58:19,746-Speed 5187.48 samples/sec Loss 3.3718 LearningRate 0.0554 Epoch: 5 Global Step: 85330 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:58:21,718-Speed 5194.08 samples/sec Loss 3.4233 LearningRate 0.0554 Epoch: 5 Global Step: 85340 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:58:23,713-Speed 5135.67 samples/sec Loss 3.3529 LearningRate 0.0554 Epoch: 5 Global Step: 85350 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:58:25,710-Speed 5127.82 samples/sec Loss 3.4540 LearningRate 0.0554 Epoch: 5 Global Step: 85360 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:58:27,686-Speed 5182.75 samples/sec Loss 3.4011 LearningRate 0.0554 Epoch: 5 Global Step: 85370 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:58:29,664-Speed 5179.55 samples/sec Loss 3.3423 LearningRate 0.0554 Epoch: 5 Global Step: 85380 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:58:31,654-Speed 5145.83 samples/sec Loss 3.3897 LearningRate 0.0554 Epoch: 5 Global Step: 85390 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:58:33,625-Speed 5198.52 samples/sec Loss 3.3599 LearningRate 0.0554 Epoch: 5 Global Step: 85400 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:58:35,603-Speed 5180.18 samples/sec Loss 3.4004 LearningRate 0.0554 Epoch: 5 Global Step: 85410 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:58:37,643-Speed 5020.88 samples/sec Loss 3.4063 LearningRate 0.0554 Epoch: 5 Global Step: 85420 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:58:39,644-Speed 5119.14 samples/sec Loss 3.4222 LearningRate 0.0554 Epoch: 5 Global Step: 85430 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:58:41,630-Speed 5156.03 samples/sec Loss 3.3754 LearningRate 0.0554 Epoch: 5 Global Step: 85440 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:58:43,602-Speed 5195.83 samples/sec Loss 3.3820 LearningRate 0.0554 Epoch: 5 Global Step: 85450 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:58:45,572-Speed 5199.79 samples/sec Loss 3.4354 LearningRate 0.0554 Epoch: 5 Global Step: 85460 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:58:47,584-Speed 5091.68 samples/sec Loss 3.4346 LearningRate 0.0553 Epoch: 5 Global Step: 85470 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:58:49,591-Speed 5102.02 samples/sec Loss 3.4138 LearningRate 0.0553 Epoch: 5 Global Step: 85480 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:58:51,575-Speed 5165.86 samples/sec Loss 3.4062 LearningRate 0.0553 Epoch: 5 Global Step: 85490 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:58:53,585-Speed 5095.23 samples/sec Loss 3.3280 LearningRate 0.0553 Epoch: 5 Global Step: 85500 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:58:55,559-Speed 5190.75 samples/sec Loss 3.4042 LearningRate 0.0553 Epoch: 5 Global Step: 85510 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:58:57,531-Speed 5193.82 samples/sec Loss 3.4270 LearningRate 0.0553 Epoch: 5 Global Step: 85520 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:58:59,503-Speed 5192.31 samples/sec Loss 3.4292 LearningRate 0.0553 Epoch: 5 Global Step: 85530 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:59:01,479-Speed 5185.95 samples/sec Loss 3.4183 LearningRate 0.0553 Epoch: 5 Global Step: 85540 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:59:03,455-Speed 5183.38 samples/sec Loss 3.3405 LearningRate 0.0553 Epoch: 5 Global Step: 85550 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:59:05,425-Speed 5201.26 samples/sec Loss 3.4612 LearningRate 0.0553 Epoch: 5 Global Step: 85560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:59:07,402-Speed 5181.08 samples/sec Loss 3.4037 LearningRate 0.0553 Epoch: 5 Global Step: 85570 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:59:09,389-Speed 5156.19 samples/sec Loss 3.4064 LearningRate 0.0553 Epoch: 5 Global Step: 85580 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:59:11,377-Speed 5152.70 samples/sec Loss 3.3620 LearningRate 0.0553 Epoch: 5 Global Step: 85590 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:59:13,364-Speed 5154.94 samples/sec Loss 3.3881 LearningRate 0.0553 Epoch: 5 Global Step: 85600 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:59:15,355-Speed 5145.52 samples/sec Loss 3.4314 LearningRate 0.0553 Epoch: 5 Global Step: 85610 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:59:17,346-Speed 5145.02 samples/sec Loss 3.4025 LearningRate 0.0553 Epoch: 5 Global Step: 85620 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:59:19,320-Speed 5189.72 samples/sec Loss 3.4453 LearningRate 0.0553 Epoch: 5 Global Step: 85630 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:59:21,316-Speed 5131.72 samples/sec Loss 3.3611 LearningRate 0.0553 Epoch: 5 Global Step: 85640 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:59:23,289-Speed 5192.20 samples/sec Loss 3.3860 LearningRate 0.0553 Epoch: 5 Global Step: 85650 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:59:25,277-Speed 5152.77 samples/sec Loss 3.4552 LearningRate 0.0553 Epoch: 5 Global Step: 85660 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 04:59:27,260-Speed 5167.00 samples/sec Loss 3.3997 LearningRate 0.0553 Epoch: 5 Global Step: 85670 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:59:29,253-Speed 5140.19 samples/sec Loss 3.4371 LearningRate 0.0553 Epoch: 5 Global Step: 85680 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:59:31,235-Speed 5167.44 samples/sec Loss 3.4550 LearningRate 0.0553 Epoch: 5 Global Step: 85690 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:59:33,232-Speed 5128.85 samples/sec Loss 3.4363 LearningRate 0.0552 Epoch: 5 Global Step: 85700 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:59:35,239-Speed 5103.01 samples/sec Loss 3.4738 LearningRate 0.0552 Epoch: 5 Global Step: 85710 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:59:37,231-Speed 5143.02 samples/sec Loss 3.5228 LearningRate 0.0552 Epoch: 5 Global Step: 85720 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:59:39,210-Speed 5177.29 samples/sec Loss 3.4675 LearningRate 0.0552 Epoch: 5 Global Step: 85730 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:59:41,181-Speed 5195.84 samples/sec Loss 3.4326 LearningRate 0.0552 Epoch: 5 Global Step: 85740 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:59:43,160-Speed 5176.70 samples/sec Loss 3.4421 LearningRate 0.0552 Epoch: 5 Global Step: 85750 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:59:45,144-Speed 5164.04 samples/sec Loss 3.4748 LearningRate 0.0552 Epoch: 5 Global Step: 85760 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:59:47,163-Speed 5073.29 samples/sec Loss 3.4410 LearningRate 0.0552 Epoch: 5 Global Step: 85770 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:59:49,173-Speed 5096.81 samples/sec Loss 3.2930 LearningRate 0.0552 Epoch: 5 Global Step: 85780 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:59:51,165-Speed 5141.31 samples/sec Loss 3.4833 LearningRate 0.0552 Epoch: 5 Global Step: 85790 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:59:53,157-Speed 5142.60 samples/sec Loss 3.4075 LearningRate 0.0552 Epoch: 5 Global Step: 85800 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:59:55,136-Speed 5175.92 samples/sec Loss 3.4233 LearningRate 0.0552 Epoch: 5 Global Step: 85810 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-11 04:59:57,113-Speed 5182.87 samples/sec Loss 3.4243 LearningRate 0.0552 Epoch: 5 Global Step: 85820 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 04:59:59,098-Speed 5157.72 samples/sec Loss 3.4774 LearningRate 0.0552 Epoch: 5 Global Step: 85830 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:00:01,101-Speed 5114.29 samples/sec Loss 3.4407 LearningRate 0.0552 Epoch: 5 Global Step: 85840 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:00:03,085-Speed 5164.33 samples/sec Loss 3.5235 LearningRate 0.0552 Epoch: 5 Global Step: 85850 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:00:05,069-Speed 5164.13 samples/sec Loss 3.4629 LearningRate 0.0552 Epoch: 5 Global Step: 85860 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:00:07,071-Speed 5116.27 samples/sec Loss 3.4152 LearningRate 0.0552 Epoch: 5 Global Step: 85870 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:00:09,053-Speed 5168.71 samples/sec Loss 3.4210 LearningRate 0.0552 Epoch: 5 Global Step: 85880 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:00:11,060-Speed 5102.98 samples/sec Loss 3.3709 LearningRate 0.0552 Epoch: 5 Global Step: 85890 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:00:13,038-Speed 5179.46 samples/sec Loss 3.3672 LearningRate 0.0552 Epoch: 5 Global Step: 85900 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:00:15,018-Speed 5173.80 samples/sec Loss 3.4207 LearningRate 0.0552 Epoch: 5 Global Step: 85910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:00:17,005-Speed 5155.97 samples/sec Loss 3.4500 LearningRate 0.0551 Epoch: 5 Global Step: 85920 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:00:18,985-Speed 5173.85 samples/sec Loss 3.4335 LearningRate 0.0551 Epoch: 5 Global Step: 85930 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:00:20,973-Speed 5152.01 samples/sec Loss 3.3641 LearningRate 0.0551 Epoch: 5 Global Step: 85940 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:00:22,953-Speed 5172.69 samples/sec Loss 3.3689 LearningRate 0.0551 Epoch: 5 Global Step: 85950 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:00:24,943-Speed 5147.57 samples/sec Loss 3.5082 LearningRate 0.0551 Epoch: 5 Global Step: 85960 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:00:26,935-Speed 5142.21 samples/sec Loss 3.4729 LearningRate 0.0551 Epoch: 5 Global Step: 85970 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:00:28,921-Speed 5158.24 samples/sec Loss 3.3949 LearningRate 0.0551 Epoch: 5 Global Step: 85980 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:00:30,908-Speed 5154.73 samples/sec Loss 3.4089 LearningRate 0.0551 Epoch: 5 Global Step: 85990 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:00:32,883-Speed 5187.21 samples/sec Loss 3.4693 LearningRate 0.0551 Epoch: 5 Global Step: 86000 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:00:59,684-[lfw][86000]XNorm: 22.369343 Training: 2022-04-11 05:00:59,685-[lfw][86000]Accuracy-Flip: 0.99800+-0.00267 Training: 2022-04-11 05:00:59,685-[lfw][86000]Accuracy-Highest: 0.99817 Training: 2022-04-11 05:01:30,338-[cfp_fp][86000]XNorm: 20.194462 Training: 2022-04-11 05:01:30,339-[cfp_fp][86000]Accuracy-Flip: 0.97900+-0.00636 Training: 2022-04-11 05:01:30,339-[cfp_fp][86000]Accuracy-Highest: 0.98086 Training: 2022-04-11 05:01:56,757-[agedb_30][86000]XNorm: 22.463669 Training: 2022-04-11 05:01:56,758-[agedb_30][86000]Accuracy-Flip: 0.97833+-0.00587 Training: 2022-04-11 05:01:56,758-[agedb_30][86000]Accuracy-Highest: 0.97900 Training: 2022-04-11 05:01:58,756-Speed 119.25 samples/sec Loss 3.4310 LearningRate 0.0551 Epoch: 5 Global Step: 86010 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:02:00,715-Speed 5229.63 samples/sec Loss 3.4710 LearningRate 0.0551 Epoch: 5 Global Step: 86020 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:02:02,696-Speed 5169.63 samples/sec Loss 3.4536 LearningRate 0.0551 Epoch: 5 Global Step: 86030 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:02:04,659-Speed 5220.26 samples/sec Loss 3.5802 LearningRate 0.0551 Epoch: 5 Global Step: 86040 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:02:06,637-Speed 5177.23 samples/sec Loss 3.4862 LearningRate 0.0551 Epoch: 5 Global Step: 86050 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:02:08,608-Speed 5197.36 samples/sec Loss 3.4511 LearningRate 0.0551 Epoch: 5 Global Step: 86060 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:02:10,573-Speed 5214.81 samples/sec Loss 3.4437 LearningRate 0.0551 Epoch: 5 Global Step: 86070 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:02:12,537-Speed 5214.37 samples/sec Loss 3.4694 LearningRate 0.0551 Epoch: 5 Global Step: 86080 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:02:14,522-Speed 5160.15 samples/sec Loss 3.4162 LearningRate 0.0551 Epoch: 5 Global Step: 86090 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:02:16,529-Speed 5105.11 samples/sec Loss 3.4297 LearningRate 0.0551 Epoch: 5 Global Step: 86100 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:02:18,494-Speed 5211.90 samples/sec Loss 3.5302 LearningRate 0.0551 Epoch: 5 Global Step: 86110 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:02:20,461-Speed 5206.75 samples/sec Loss 3.3985 LearningRate 0.0551 Epoch: 5 Global Step: 86120 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-04-11 05:02:22,452-Speed 5146.03 samples/sec Loss 3.4901 LearningRate 0.0551 Epoch: 5 Global Step: 86130 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:02:24,465-Speed 5089.56 samples/sec Loss 3.4070 LearningRate 0.0550 Epoch: 5 Global Step: 86140 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:02:26,468-Speed 5112.66 samples/sec Loss 3.4208 LearningRate 0.0550 Epoch: 5 Global Step: 86150 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:02:28,477-Speed 5098.39 samples/sec Loss 3.4511 LearningRate 0.0550 Epoch: 5 Global Step: 86160 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:02:30,467-Speed 5147.95 samples/sec Loss 3.4308 LearningRate 0.0550 Epoch: 5 Global Step: 86170 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:02:32,429-Speed 5221.13 samples/sec Loss 3.4653 LearningRate 0.0550 Epoch: 5 Global Step: 86180 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:02:34,410-Speed 5169.38 samples/sec Loss 3.5030 LearningRate 0.0550 Epoch: 5 Global Step: 86190 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:02:36,429-Speed 5075.68 samples/sec Loss 3.4209 LearningRate 0.0550 Epoch: 5 Global Step: 86200 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:02:38,423-Speed 5137.35 samples/sec Loss 3.4564 LearningRate 0.0550 Epoch: 5 Global Step: 86210 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:02:40,426-Speed 5113.76 samples/sec Loss 3.4024 LearningRate 0.0550 Epoch: 5 Global Step: 86220 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:02:42,462-Speed 5029.70 samples/sec Loss 3.4247 LearningRate 0.0550 Epoch: 5 Global Step: 86230 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:02:44,449-Speed 5156.25 samples/sec Loss 3.5428 LearningRate 0.0550 Epoch: 5 Global Step: 86240 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:02:46,474-Speed 5059.02 samples/sec Loss 3.4614 LearningRate 0.0550 Epoch: 5 Global Step: 86250 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:02:48,483-Speed 5099.23 samples/sec Loss 3.4922 LearningRate 0.0550 Epoch: 5 Global Step: 86260 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:02:50,479-Speed 5132.24 samples/sec Loss 3.4053 LearningRate 0.0550 Epoch: 5 Global Step: 86270 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:02:52,471-Speed 5144.97 samples/sec Loss 3.3773 LearningRate 0.0550 Epoch: 5 Global Step: 86280 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:02:54,448-Speed 5179.20 samples/sec Loss 3.4161 LearningRate 0.0550 Epoch: 5 Global Step: 86290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:02:56,420-Speed 5196.17 samples/sec Loss 3.5420 LearningRate 0.0550 Epoch: 5 Global Step: 86300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:02:58,409-Speed 5149.54 samples/sec Loss 3.5257 LearningRate 0.0550 Epoch: 5 Global Step: 86310 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:03:00,386-Speed 5180.90 samples/sec Loss 3.5432 LearningRate 0.0550 Epoch: 5 Global Step: 86320 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:03:02,354-Speed 5206.30 samples/sec Loss 3.5384 LearningRate 0.0550 Epoch: 5 Global Step: 86330 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:03:04,327-Speed 5191.16 samples/sec Loss 3.5082 LearningRate 0.0550 Epoch: 5 Global Step: 86340 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:03:06,301-Speed 5188.81 samples/sec Loss 3.4953 LearningRate 0.0550 Epoch: 5 Global Step: 86350 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:03:08,275-Speed 5191.01 samples/sec Loss 3.5896 LearningRate 0.0550 Epoch: 5 Global Step: 86360 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:03:10,272-Speed 5129.28 samples/sec Loss 3.4879 LearningRate 0.0549 Epoch: 5 Global Step: 86370 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:03:12,248-Speed 5184.86 samples/sec Loss 3.4294 LearningRate 0.0549 Epoch: 5 Global Step: 86380 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:03:14,231-Speed 5164.90 samples/sec Loss 3.4286 LearningRate 0.0549 Epoch: 5 Global Step: 86390 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:03:16,241-Speed 5095.46 samples/sec Loss 3.4142 LearningRate 0.0549 Epoch: 5 Global Step: 86400 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:03:18,211-Speed 5199.26 samples/sec Loss 3.4822 LearningRate 0.0549 Epoch: 5 Global Step: 86410 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:03:20,197-Speed 5159.22 samples/sec Loss 3.6259 LearningRate 0.0549 Epoch: 5 Global Step: 86420 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:03:22,170-Speed 5191.70 samples/sec Loss 3.4517 LearningRate 0.0549 Epoch: 5 Global Step: 86430 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:03:24,176-Speed 5108.03 samples/sec Loss 3.4735 LearningRate 0.0549 Epoch: 5 Global Step: 86440 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:03:26,143-Speed 5208.04 samples/sec Loss 3.4025 LearningRate 0.0549 Epoch: 5 Global Step: 86450 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:03:28,109-Speed 5207.61 samples/sec Loss 3.4517 LearningRate 0.0549 Epoch: 5 Global Step: 86460 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:03:30,092-Speed 5167.27 samples/sec Loss 3.5252 LearningRate 0.0549 Epoch: 5 Global Step: 86470 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:03:32,079-Speed 5155.85 samples/sec Loss 3.4773 LearningRate 0.0549 Epoch: 5 Global Step: 86480 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:03:34,050-Speed 5196.66 samples/sec Loss 3.5307 LearningRate 0.0549 Epoch: 5 Global Step: 86490 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:03:36,032-Speed 5167.45 samples/sec Loss 3.4615 LearningRate 0.0549 Epoch: 5 Global Step: 86500 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:03:38,023-Speed 5145.82 samples/sec Loss 3.4278 LearningRate 0.0549 Epoch: 5 Global Step: 86510 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:03:40,013-Speed 5147.80 samples/sec Loss 3.4922 LearningRate 0.0549 Epoch: 5 Global Step: 86520 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:03:41,979-Speed 5211.07 samples/sec Loss 3.4694 LearningRate 0.0549 Epoch: 5 Global Step: 86530 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:03:43,946-Speed 5207.80 samples/sec Loss 3.4931 LearningRate 0.0549 Epoch: 5 Global Step: 86540 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:03:45,939-Speed 5139.89 samples/sec Loss 3.4893 LearningRate 0.0549 Epoch: 5 Global Step: 86550 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:03:47,954-Speed 5081.87 samples/sec Loss 3.4773 LearningRate 0.0549 Epoch: 5 Global Step: 86560 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:03:49,946-Speed 5141.70 samples/sec Loss 3.4764 LearningRate 0.0549 Epoch: 5 Global Step: 86570 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:03:51,930-Speed 5163.23 samples/sec Loss 3.3915 LearningRate 0.0549 Epoch: 5 Global Step: 86580 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:03:53,905-Speed 5187.29 samples/sec Loss 3.5416 LearningRate 0.0549 Epoch: 5 Global Step: 86590 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:03:55,871-Speed 5210.09 samples/sec Loss 3.4934 LearningRate 0.0548 Epoch: 5 Global Step: 86600 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:03:57,861-Speed 5147.74 samples/sec Loss 3.5219 LearningRate 0.0548 Epoch: 5 Global Step: 86610 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:03:59,835-Speed 5190.26 samples/sec Loss 3.5512 LearningRate 0.0548 Epoch: 5 Global Step: 86620 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:04:01,844-Speed 5097.54 samples/sec Loss 3.4497 LearningRate 0.0548 Epoch: 5 Global Step: 86630 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:04:03,822-Speed 5178.12 samples/sec Loss 3.5424 LearningRate 0.0548 Epoch: 5 Global Step: 86640 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:04:05,792-Speed 5200.24 samples/sec Loss 3.4297 LearningRate 0.0548 Epoch: 5 Global Step: 86650 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:04:07,770-Speed 5178.18 samples/sec Loss 3.5445 LearningRate 0.0548 Epoch: 5 Global Step: 86660 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:04:09,739-Speed 5202.06 samples/sec Loss 3.4633 LearningRate 0.0548 Epoch: 5 Global Step: 86670 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:04:11,724-Speed 5160.23 samples/sec Loss 3.4798 LearningRate 0.0548 Epoch: 5 Global Step: 86680 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:04:13,718-Speed 5137.82 samples/sec Loss 3.4910 LearningRate 0.0548 Epoch: 5 Global Step: 86690 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:04:15,691-Speed 5192.42 samples/sec Loss 3.5162 LearningRate 0.0548 Epoch: 5 Global Step: 86700 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:04:17,661-Speed 5200.14 samples/sec Loss 3.5099 LearningRate 0.0548 Epoch: 5 Global Step: 86710 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:04:19,638-Speed 5180.92 samples/sec Loss 3.4262 LearningRate 0.0548 Epoch: 5 Global Step: 86720 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:04:21,618-Speed 5174.11 samples/sec Loss 3.4332 LearningRate 0.0548 Epoch: 5 Global Step: 86730 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:04:23,616-Speed 5128.04 samples/sec Loss 3.3766 LearningRate 0.0548 Epoch: 5 Global Step: 86740 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:04:25,595-Speed 5175.85 samples/sec Loss 3.4033 LearningRate 0.0548 Epoch: 5 Global Step: 86750 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:04:27,563-Speed 5204.82 samples/sec Loss 3.4025 LearningRate 0.0548 Epoch: 5 Global Step: 86760 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:04:29,533-Speed 5199.86 samples/sec Loss 3.4628 LearningRate 0.0548 Epoch: 5 Global Step: 86770 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:04:31,501-Speed 5206.18 samples/sec Loss 3.5539 LearningRate 0.0548 Epoch: 5 Global Step: 86780 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:04:33,461-Speed 5224.79 samples/sec Loss 3.4996 LearningRate 0.0548 Epoch: 5 Global Step: 86790 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:04:35,454-Speed 5140.58 samples/sec Loss 3.5323 LearningRate 0.0548 Epoch: 5 Global Step: 86800 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:04:37,439-Speed 5161.32 samples/sec Loss 3.5620 LearningRate 0.0548 Epoch: 5 Global Step: 86810 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:04:39,432-Speed 5138.45 samples/sec Loss 3.4993 LearningRate 0.0547 Epoch: 5 Global Step: 86820 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:04:41,407-Speed 5185.73 samples/sec Loss 3.4815 LearningRate 0.0547 Epoch: 5 Global Step: 86830 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:04:43,380-Speed 5192.40 samples/sec Loss 3.4943 LearningRate 0.0547 Epoch: 5 Global Step: 86840 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:04:45,379-Speed 5123.82 samples/sec Loss 3.5882 LearningRate 0.0547 Epoch: 5 Global Step: 86850 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:04:47,387-Speed 5102.60 samples/sec Loss 3.4232 LearningRate 0.0547 Epoch: 5 Global Step: 86860 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:04:49,389-Speed 5115.64 samples/sec Loss 3.5190 LearningRate 0.0547 Epoch: 5 Global Step: 86870 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:04:51,373-Speed 5164.32 samples/sec Loss 3.4374 LearningRate 0.0547 Epoch: 5 Global Step: 86880 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:04:53,353-Speed 5173.98 samples/sec Loss 3.5343 LearningRate 0.0547 Epoch: 5 Global Step: 86890 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:04:55,322-Speed 5201.80 samples/sec Loss 3.4639 LearningRate 0.0547 Epoch: 5 Global Step: 86900 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:04:57,293-Speed 5197.34 samples/sec Loss 3.4688 LearningRate 0.0547 Epoch: 5 Global Step: 86910 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:04:59,268-Speed 5186.81 samples/sec Loss 3.5861 LearningRate 0.0547 Epoch: 5 Global Step: 86920 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:05:01,259-Speed 5143.03 samples/sec Loss 3.5789 LearningRate 0.0547 Epoch: 5 Global Step: 86930 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:05:03,242-Speed 5165.92 samples/sec Loss 3.5512 LearningRate 0.0547 Epoch: 5 Global Step: 86940 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:05:05,228-Speed 5158.71 samples/sec Loss 3.4397 LearningRate 0.0547 Epoch: 5 Global Step: 86950 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:05:07,206-Speed 5178.15 samples/sec Loss 3.5706 LearningRate 0.0547 Epoch: 5 Global Step: 86960 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:05:09,185-Speed 5176.26 samples/sec Loss 3.5349 LearningRate 0.0547 Epoch: 5 Global Step: 86970 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:05:11,175-Speed 5148.19 samples/sec Loss 3.4980 LearningRate 0.0547 Epoch: 5 Global Step: 86980 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:05:13,167-Speed 5142.78 samples/sec Loss 3.5775 LearningRate 0.0547 Epoch: 5 Global Step: 86990 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:05:15,156-Speed 5159.30 samples/sec Loss 3.5560 LearningRate 0.0547 Epoch: 5 Global Step: 87000 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:05:17,135-Speed 5175.65 samples/sec Loss 3.5087 LearningRate 0.0547 Epoch: 5 Global Step: 87010 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:05:19,108-Speed 5193.43 samples/sec Loss 3.4658 LearningRate 0.0547 Epoch: 5 Global Step: 87020 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:05:21,097-Speed 5148.13 samples/sec Loss 3.4885 LearningRate 0.0547 Epoch: 5 Global Step: 87030 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:05:23,093-Speed 5132.86 samples/sec Loss 3.5668 LearningRate 0.0547 Epoch: 5 Global Step: 87040 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:05:25,069-Speed 5185.90 samples/sec Loss 3.5564 LearningRate 0.0546 Epoch: 5 Global Step: 87050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:05:27,066-Speed 5129.74 samples/sec Loss 3.4525 LearningRate 0.0546 Epoch: 5 Global Step: 87060 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:05:29,051-Speed 5160.26 samples/sec Loss 3.6104 LearningRate 0.0546 Epoch: 5 Global Step: 87070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:05:31,030-Speed 5175.47 samples/sec Loss 3.4952 LearningRate 0.0546 Epoch: 5 Global Step: 87080 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:05:33,043-Speed 5088.68 samples/sec Loss 3.5674 LearningRate 0.0546 Epoch: 5 Global Step: 87090 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:05:35,016-Speed 5192.21 samples/sec Loss 3.4996 LearningRate 0.0546 Epoch: 5 Global Step: 87100 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:05:36,980-Speed 5216.75 samples/sec Loss 3.4793 LearningRate 0.0546 Epoch: 5 Global Step: 87110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:05:38,952-Speed 5192.77 samples/sec Loss 3.6057 LearningRate 0.0546 Epoch: 5 Global Step: 87120 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:05:40,924-Speed 5194.71 samples/sec Loss 3.4955 LearningRate 0.0546 Epoch: 5 Global Step: 87130 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:05:42,899-Speed 5187.66 samples/sec Loss 3.4770 LearningRate 0.0546 Epoch: 5 Global Step: 87140 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:05:44,868-Speed 5201.41 samples/sec Loss 3.4674 LearningRate 0.0546 Epoch: 5 Global Step: 87150 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:05:46,845-Speed 5181.00 samples/sec Loss 3.5062 LearningRate 0.0546 Epoch: 5 Global Step: 87160 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:05:48,826-Speed 5172.09 samples/sec Loss 3.5865 LearningRate 0.0546 Epoch: 5 Global Step: 87170 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:05:50,807-Speed 5168.56 samples/sec Loss 3.5607 LearningRate 0.0546 Epoch: 5 Global Step: 87180 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:05:52,777-Speed 5202.82 samples/sec Loss 3.5099 LearningRate 0.0546 Epoch: 5 Global Step: 87190 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:05:54,751-Speed 5188.73 samples/sec Loss 3.5027 LearningRate 0.0546 Epoch: 5 Global Step: 87200 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:05:56,747-Speed 5131.20 samples/sec Loss 3.5624 LearningRate 0.0546 Epoch: 5 Global Step: 87210 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:05:58,719-Speed 5196.11 samples/sec Loss 3.4936 LearningRate 0.0546 Epoch: 5 Global Step: 87220 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:06:00,687-Speed 5204.37 samples/sec Loss 3.4333 LearningRate 0.0546 Epoch: 5 Global Step: 87230 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:06:02,667-Speed 5171.94 samples/sec Loss 3.4883 LearningRate 0.0546 Epoch: 5 Global Step: 87240 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:06:04,649-Speed 5168.88 samples/sec Loss 3.5425 LearningRate 0.0546 Epoch: 5 Global Step: 87250 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:06:06,632-Speed 5165.91 samples/sec Loss 3.4963 LearningRate 0.0546 Epoch: 5 Global Step: 87260 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:06:08,626-Speed 5135.85 samples/sec Loss 3.5697 LearningRate 0.0545 Epoch: 5 Global Step: 87270 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:06:10,620-Speed 5139.16 samples/sec Loss 3.5787 LearningRate 0.0545 Epoch: 5 Global Step: 87280 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:06:12,614-Speed 5137.22 samples/sec Loss 3.5263 LearningRate 0.0545 Epoch: 5 Global Step: 87290 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:06:14,590-Speed 5185.01 samples/sec Loss 3.4781 LearningRate 0.0545 Epoch: 5 Global Step: 87300 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:06:16,583-Speed 5138.64 samples/sec Loss 3.4889 LearningRate 0.0545 Epoch: 5 Global Step: 87310 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:06:18,562-Speed 5176.95 samples/sec Loss 3.5009 LearningRate 0.0545 Epoch: 5 Global Step: 87320 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:06:20,538-Speed 5183.59 samples/sec Loss 3.3981 LearningRate 0.0545 Epoch: 5 Global Step: 87330 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:06:22,538-Speed 5120.88 samples/sec Loss 3.4855 LearningRate 0.0545 Epoch: 5 Global Step: 87340 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:06:24,539-Speed 5119.83 samples/sec Loss 3.6022 LearningRate 0.0545 Epoch: 5 Global Step: 87350 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:06:26,531-Speed 5141.94 samples/sec Loss 3.4957 LearningRate 0.0545 Epoch: 5 Global Step: 87360 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:06:28,525-Speed 5138.74 samples/sec Loss 3.5387 LearningRate 0.0545 Epoch: 5 Global Step: 87370 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:06:30,511-Speed 5158.30 samples/sec Loss 3.4915 LearningRate 0.0545 Epoch: 5 Global Step: 87380 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:06:32,495-Speed 5163.15 samples/sec Loss 3.4863 LearningRate 0.0545 Epoch: 5 Global Step: 87390 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:06:34,473-Speed 5176.76 samples/sec Loss 3.5259 LearningRate 0.0545 Epoch: 5 Global Step: 87400 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:06:36,449-Speed 5184.27 samples/sec Loss 3.6453 LearningRate 0.0545 Epoch: 5 Global Step: 87410 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:06:38,437-Speed 5153.73 samples/sec Loss 3.5765 LearningRate 0.0545 Epoch: 5 Global Step: 87420 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:06:40,415-Speed 5177.33 samples/sec Loss 3.5624 LearningRate 0.0545 Epoch: 5 Global Step: 87430 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:06:42,389-Speed 5189.50 samples/sec Loss 3.6829 LearningRate 0.0545 Epoch: 5 Global Step: 87440 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:06:44,364-Speed 5188.52 samples/sec Loss 3.5590 LearningRate 0.0545 Epoch: 5 Global Step: 87450 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:06:46,340-Speed 5182.61 samples/sec Loss 3.5621 LearningRate 0.0545 Epoch: 5 Global Step: 87460 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:06:48,326-Speed 5159.43 samples/sec Loss 3.4904 LearningRate 0.0545 Epoch: 5 Global Step: 87470 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:06:50,323-Speed 5128.62 samples/sec Loss 3.5421 LearningRate 0.0545 Epoch: 5 Global Step: 87480 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:06:52,297-Speed 5189.87 samples/sec Loss 3.5032 LearningRate 0.0545 Epoch: 5 Global Step: 87490 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:06:54,274-Speed 5182.59 samples/sec Loss 3.5252 LearningRate 0.0544 Epoch: 5 Global Step: 87500 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:06:56,256-Speed 5167.60 samples/sec Loss 3.5025 LearningRate 0.0544 Epoch: 5 Global Step: 87510 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:06:58,239-Speed 5166.39 samples/sec Loss 3.5708 LearningRate 0.0544 Epoch: 5 Global Step: 87520 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:07:00,220-Speed 5172.29 samples/sec Loss 3.4451 LearningRate 0.0544 Epoch: 5 Global Step: 87530 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:07:02,200-Speed 5172.91 samples/sec Loss 3.4830 LearningRate 0.0544 Epoch: 5 Global Step: 87540 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:07:04,186-Speed 5155.97 samples/sec Loss 3.5024 LearningRate 0.0544 Epoch: 5 Global Step: 87550 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:07:06,163-Speed 5182.17 samples/sec Loss 3.5372 LearningRate 0.0544 Epoch: 5 Global Step: 87560 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:07:08,136-Speed 5191.65 samples/sec Loss 3.5631 LearningRate 0.0544 Epoch: 5 Global Step: 87570 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:07:10,116-Speed 5172.29 samples/sec Loss 3.4811 LearningRate 0.0544 Epoch: 5 Global Step: 87580 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:07:12,102-Speed 5158.39 samples/sec Loss 3.4861 LearningRate 0.0544 Epoch: 5 Global Step: 87590 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:07:14,091-Speed 5152.21 samples/sec Loss 3.4346 LearningRate 0.0544 Epoch: 5 Global Step: 87600 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:07:16,086-Speed 5133.34 samples/sec Loss 3.5566 LearningRate 0.0544 Epoch: 5 Global Step: 87610 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:07:18,076-Speed 5148.78 samples/sec Loss 3.5453 LearningRate 0.0544 Epoch: 5 Global Step: 87620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:07:20,048-Speed 5196.63 samples/sec Loss 3.4900 LearningRate 0.0544 Epoch: 5 Global Step: 87630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:07:22,023-Speed 5184.80 samples/sec Loss 3.5516 LearningRate 0.0544 Epoch: 5 Global Step: 87640 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:07:24,020-Speed 5131.99 samples/sec Loss 3.5454 LearningRate 0.0544 Epoch: 5 Global Step: 87650 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:07:26,029-Speed 5097.39 samples/sec Loss 3.5192 LearningRate 0.0544 Epoch: 5 Global Step: 87660 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:07:28,015-Speed 5160.23 samples/sec Loss 3.4736 LearningRate 0.0544 Epoch: 5 Global Step: 87670 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:07:29,990-Speed 5187.39 samples/sec Loss 3.5108 LearningRate 0.0544 Epoch: 5 Global Step: 87680 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:07:31,968-Speed 5176.43 samples/sec Loss 3.4850 LearningRate 0.0544 Epoch: 5 Global Step: 87690 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:07:33,964-Speed 5131.60 samples/sec Loss 3.5677 LearningRate 0.0544 Epoch: 5 Global Step: 87700 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:07:35,951-Speed 5156.62 samples/sec Loss 3.4666 LearningRate 0.0544 Epoch: 5 Global Step: 87710 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:07:37,921-Speed 5200.17 samples/sec Loss 3.5168 LearningRate 0.0543 Epoch: 5 Global Step: 87720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:07:39,912-Speed 5143.73 samples/sec Loss 3.5517 LearningRate 0.0543 Epoch: 5 Global Step: 87730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:07:41,938-Speed 5058.89 samples/sec Loss 3.5336 LearningRate 0.0543 Epoch: 5 Global Step: 87740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:07:43,912-Speed 5187.91 samples/sec Loss 3.6306 LearningRate 0.0543 Epoch: 5 Global Step: 87750 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:07:45,908-Speed 5132.39 samples/sec Loss 3.4482 LearningRate 0.0543 Epoch: 5 Global Step: 87760 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:07:47,889-Speed 5170.96 samples/sec Loss 3.5736 LearningRate 0.0543 Epoch: 5 Global Step: 87770 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:07:49,876-Speed 5157.69 samples/sec Loss 3.5791 LearningRate 0.0543 Epoch: 5 Global Step: 87780 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:07:51,855-Speed 5175.04 samples/sec Loss 3.6437 LearningRate 0.0543 Epoch: 5 Global Step: 87790 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:07:53,840-Speed 5159.46 samples/sec Loss 3.5884 LearningRate 0.0543 Epoch: 5 Global Step: 87800 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:07:55,831-Speed 5144.32 samples/sec Loss 3.5731 LearningRate 0.0543 Epoch: 5 Global Step: 87810 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:07:57,819-Speed 5154.94 samples/sec Loss 3.5912 LearningRate 0.0543 Epoch: 5 Global Step: 87820 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:07:59,807-Speed 5152.87 samples/sec Loss 3.5201 LearningRate 0.0543 Epoch: 5 Global Step: 87830 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:08:01,785-Speed 5179.19 samples/sec Loss 3.5126 LearningRate 0.0543 Epoch: 5 Global Step: 87840 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:08:03,764-Speed 5175.01 samples/sec Loss 3.5368 LearningRate 0.0543 Epoch: 5 Global Step: 87850 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:08:05,734-Speed 5199.02 samples/sec Loss 3.5738 LearningRate 0.0543 Epoch: 5 Global Step: 87860 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:08:07,709-Speed 5187.61 samples/sec Loss 3.5116 LearningRate 0.0543 Epoch: 5 Global Step: 87870 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:08:09,681-Speed 5193.05 samples/sec Loss 3.5797 LearningRate 0.0543 Epoch: 5 Global Step: 87880 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:08:11,673-Speed 5142.36 samples/sec Loss 3.6883 LearningRate 0.0543 Epoch: 5 Global Step: 87890 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:08:13,644-Speed 5196.43 samples/sec Loss 3.5368 LearningRate 0.0543 Epoch: 5 Global Step: 87900 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:08:15,624-Speed 5175.24 samples/sec Loss 3.5089 LearningRate 0.0543 Epoch: 5 Global Step: 87910 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:08:17,601-Speed 5180.31 samples/sec Loss 3.5597 LearningRate 0.0543 Epoch: 5 Global Step: 87920 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:08:19,578-Speed 5182.17 samples/sec Loss 3.4983 LearningRate 0.0543 Epoch: 5 Global Step: 87930 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:08:21,566-Speed 5152.23 samples/sec Loss 3.5632 LearningRate 0.0543 Epoch: 5 Global Step: 87940 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:08:23,554-Speed 5152.43 samples/sec Loss 3.5196 LearningRate 0.0542 Epoch: 5 Global Step: 87950 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:08:25,548-Speed 5137.05 samples/sec Loss 3.4710 LearningRate 0.0542 Epoch: 5 Global Step: 87960 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:08:27,543-Speed 5135.91 samples/sec Loss 3.5775 LearningRate 0.0542 Epoch: 5 Global Step: 87970 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:08:29,542-Speed 5123.53 samples/sec Loss 3.6088 LearningRate 0.0542 Epoch: 5 Global Step: 87980 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:08:31,518-Speed 5183.75 samples/sec Loss 3.4908 LearningRate 0.0542 Epoch: 5 Global Step: 87990 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:08:33,511-Speed 5140.61 samples/sec Loss 3.5591 LearningRate 0.0542 Epoch: 5 Global Step: 88000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:08:59,978-[lfw][88000]XNorm: 23.528808 Training: 2022-04-11 05:08:59,979-[lfw][88000]Accuracy-Flip: 0.99750+-0.00261 Training: 2022-04-11 05:08:59,979-[lfw][88000]Accuracy-Highest: 0.99817 Training: 2022-04-11 05:09:30,729-[cfp_fp][88000]XNorm: 21.488591 Training: 2022-04-11 05:09:30,730-[cfp_fp][88000]Accuracy-Flip: 0.97800+-0.00597 Training: 2022-04-11 05:09:30,730-[cfp_fp][88000]Accuracy-Highest: 0.98086 Training: 2022-04-11 05:09:57,254-[agedb_30][88000]XNorm: 23.599850 Training: 2022-04-11 05:09:57,255-[agedb_30][88000]Accuracy-Flip: 0.97850+-0.00560 Training: 2022-04-11 05:09:57,255-[agedb_30][88000]Accuracy-Highest: 0.97900 Training: 2022-04-11 05:09:59,246-Speed 119.44 samples/sec Loss 3.5258 LearningRate 0.0542 Epoch: 5 Global Step: 88010 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:10:01,218-Speed 5193.68 samples/sec Loss 3.6000 LearningRate 0.0542 Epoch: 5 Global Step: 88020 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:10:03,221-Speed 5112.62 samples/sec Loss 3.5341 LearningRate 0.0542 Epoch: 5 Global Step: 88030 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:10:05,188-Speed 5209.24 samples/sec Loss 3.5957 LearningRate 0.0542 Epoch: 5 Global Step: 88040 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:10:07,151-Speed 5218.51 samples/sec Loss 3.6175 LearningRate 0.0542 Epoch: 5 Global Step: 88050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:10:09,127-Speed 5182.38 samples/sec Loss 3.6120 LearningRate 0.0542 Epoch: 5 Global Step: 88060 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:10:11,088-Speed 5224.66 samples/sec Loss 3.5604 LearningRate 0.0542 Epoch: 5 Global Step: 88070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:10:13,050-Speed 5221.14 samples/sec Loss 3.6270 LearningRate 0.0542 Epoch: 5 Global Step: 88080 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:10:15,030-Speed 5172.08 samples/sec Loss 3.6332 LearningRate 0.0542 Epoch: 5 Global Step: 88090 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-11 05:10:17,034-Speed 5111.71 samples/sec Loss 3.6273 LearningRate 0.0542 Epoch: 5 Global Step: 88100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:10:19,000-Speed 5211.49 samples/sec Loss 3.6451 LearningRate 0.0542 Epoch: 5 Global Step: 88110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:10:20,993-Speed 5138.97 samples/sec Loss 3.5783 LearningRate 0.0542 Epoch: 5 Global Step: 88120 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:10:22,967-Speed 5188.12 samples/sec Loss 3.5589 LearningRate 0.0542 Epoch: 5 Global Step: 88130 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:10:24,940-Speed 5192.73 samples/sec Loss 3.6207 LearningRate 0.0542 Epoch: 5 Global Step: 88140 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:10:26,925-Speed 5160.61 samples/sec Loss 3.5134 LearningRate 0.0542 Epoch: 5 Global Step: 88150 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:10:28,895-Speed 5199.99 samples/sec Loss 3.5471 LearningRate 0.0542 Epoch: 5 Global Step: 88160 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:10:30,882-Speed 5155.87 samples/sec Loss 3.5510 LearningRate 0.0542 Epoch: 5 Global Step: 88170 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:10:32,861-Speed 5177.54 samples/sec Loss 3.5889 LearningRate 0.0541 Epoch: 5 Global Step: 88180 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:10:34,835-Speed 5187.85 samples/sec Loss 3.5818 LearningRate 0.0541 Epoch: 5 Global Step: 88190 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-11 05:10:36,821-Speed 5160.19 samples/sec Loss 3.5279 LearningRate 0.0541 Epoch: 5 Global Step: 88200 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:10:38,791-Speed 5199.37 samples/sec Loss 3.5452 LearningRate 0.0541 Epoch: 5 Global Step: 88210 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:10:40,756-Speed 5211.44 samples/sec Loss 3.4713 LearningRate 0.0541 Epoch: 5 Global Step: 88220 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:10:42,743-Speed 5156.44 samples/sec Loss 3.5627 LearningRate 0.0541 Epoch: 5 Global Step: 88230 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:10:44,729-Speed 5158.15 samples/sec Loss 3.5181 LearningRate 0.0541 Epoch: 5 Global Step: 88240 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:10:46,701-Speed 5194.85 samples/sec Loss 3.4754 LearningRate 0.0541 Epoch: 5 Global Step: 88250 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:10:48,673-Speed 5195.63 samples/sec Loss 3.5082 LearningRate 0.0541 Epoch: 5 Global Step: 88260 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:10:50,647-Speed 5186.90 samples/sec Loss 3.5894 LearningRate 0.0541 Epoch: 5 Global Step: 88270 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:10:52,618-Speed 5198.50 samples/sec Loss 3.5664 LearningRate 0.0541 Epoch: 5 Global Step: 88280 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:10:54,589-Speed 5196.47 samples/sec Loss 3.4649 LearningRate 0.0541 Epoch: 5 Global Step: 88290 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:10:56,558-Speed 5201.09 samples/sec Loss 3.5275 LearningRate 0.0541 Epoch: 5 Global Step: 88300 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:10:58,541-Speed 5165.95 samples/sec Loss 3.5897 LearningRate 0.0541 Epoch: 5 Global Step: 88310 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:11:00,509-Speed 5205.16 samples/sec Loss 3.5281 LearningRate 0.0541 Epoch: 5 Global Step: 88320 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:11:02,482-Speed 5193.73 samples/sec Loss 3.5315 LearningRate 0.0541 Epoch: 5 Global Step: 88330 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:11:04,456-Speed 5188.17 samples/sec Loss 3.5188 LearningRate 0.0541 Epoch: 5 Global Step: 88340 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:11:06,431-Speed 5187.30 samples/sec Loss 3.5214 LearningRate 0.0541 Epoch: 5 Global Step: 88350 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:11:08,468-Speed 5029.42 samples/sec Loss 3.5159 LearningRate 0.0541 Epoch: 5 Global Step: 88360 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:11:10,458-Speed 5148.15 samples/sec Loss 3.5432 LearningRate 0.0541 Epoch: 5 Global Step: 88370 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:11:12,443-Speed 5159.41 samples/sec Loss 3.5106 LearningRate 0.0541 Epoch: 5 Global Step: 88380 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:11:14,464-Speed 5069.36 samples/sec Loss 3.5319 LearningRate 0.0541 Epoch: 5 Global Step: 88390 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:11:16,432-Speed 5204.15 samples/sec Loss 3.5339 LearningRate 0.0540 Epoch: 5 Global Step: 88400 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:11:18,412-Speed 5175.25 samples/sec Loss 3.5823 LearningRate 0.0540 Epoch: 5 Global Step: 88410 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:11:20,389-Speed 5179.99 samples/sec Loss 3.5137 LearningRate 0.0540 Epoch: 5 Global Step: 88420 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:11:22,380-Speed 5145.66 samples/sec Loss 3.4966 LearningRate 0.0540 Epoch: 5 Global Step: 88430 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:11:24,357-Speed 5179.61 samples/sec Loss 3.6320 LearningRate 0.0540 Epoch: 5 Global Step: 88440 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:11:26,366-Speed 5100.14 samples/sec Loss 3.5767 LearningRate 0.0540 Epoch: 5 Global Step: 88450 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:11:28,359-Speed 5139.60 samples/sec Loss 3.5859 LearningRate 0.0540 Epoch: 5 Global Step: 88460 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:11:30,327-Speed 5204.62 samples/sec Loss 3.5250 LearningRate 0.0540 Epoch: 5 Global Step: 88470 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:11:32,297-Speed 5200.10 samples/sec Loss 3.5699 LearningRate 0.0540 Epoch: 5 Global Step: 88480 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:11:34,305-Speed 5101.24 samples/sec Loss 3.4645 LearningRate 0.0540 Epoch: 5 Global Step: 88490 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:11:36,292-Speed 5155.89 samples/sec Loss 3.6152 LearningRate 0.0540 Epoch: 5 Global Step: 88500 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:11:38,292-Speed 5121.58 samples/sec Loss 3.5646 LearningRate 0.0540 Epoch: 5 Global Step: 88510 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:11:40,333-Speed 5019.32 samples/sec Loss 3.5909 LearningRate 0.0540 Epoch: 5 Global Step: 88520 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:11:42,358-Speed 5058.60 samples/sec Loss 3.5286 LearningRate 0.0540 Epoch: 5 Global Step: 88530 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:11:44,324-Speed 5209.43 samples/sec Loss 3.5043 LearningRate 0.0540 Epoch: 5 Global Step: 88540 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:11:46,314-Speed 5148.16 samples/sec Loss 3.6149 LearningRate 0.0540 Epoch: 5 Global Step: 88550 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:11:48,316-Speed 5117.45 samples/sec Loss 3.5914 LearningRate 0.0540 Epoch: 5 Global Step: 88560 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:11:50,288-Speed 5194.11 samples/sec Loss 3.5873 LearningRate 0.0540 Epoch: 5 Global Step: 88570 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:11:52,258-Speed 5200.33 samples/sec Loss 3.4753 LearningRate 0.0540 Epoch: 5 Global Step: 88580 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:11:54,235-Speed 5180.38 samples/sec Loss 3.5240 LearningRate 0.0540 Epoch: 5 Global Step: 88590 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:11:56,218-Speed 5166.78 samples/sec Loss 3.6290 LearningRate 0.0540 Epoch: 5 Global Step: 88600 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:11:58,192-Speed 5190.44 samples/sec Loss 3.6048 LearningRate 0.0540 Epoch: 5 Global Step: 88610 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:12:00,159-Speed 5205.68 samples/sec Loss 3.6226 LearningRate 0.0540 Epoch: 5 Global Step: 88620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:12:02,140-Speed 5172.45 samples/sec Loss 3.6347 LearningRate 0.0539 Epoch: 5 Global Step: 88630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:12:04,122-Speed 5166.54 samples/sec Loss 3.6339 LearningRate 0.0539 Epoch: 5 Global Step: 88640 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:12:06,109-Speed 5156.66 samples/sec Loss 3.5935 LearningRate 0.0539 Epoch: 5 Global Step: 88650 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:12:08,090-Speed 5170.83 samples/sec Loss 3.6067 LearningRate 0.0539 Epoch: 5 Global Step: 88660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:12:10,058-Speed 5205.79 samples/sec Loss 3.5702 LearningRate 0.0539 Epoch: 5 Global Step: 88670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:12:12,044-Speed 5157.41 samples/sec Loss 3.5737 LearningRate 0.0539 Epoch: 5 Global Step: 88680 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:12:14,031-Speed 5155.78 samples/sec Loss 3.5617 LearningRate 0.0539 Epoch: 5 Global Step: 88690 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:12:16,006-Speed 5186.70 samples/sec Loss 3.6378 LearningRate 0.0539 Epoch: 5 Global Step: 88700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:12:17,971-Speed 5212.76 samples/sec Loss 3.5383 LearningRate 0.0539 Epoch: 5 Global Step: 88710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:12:19,933-Speed 5218.75 samples/sec Loss 3.5101 LearningRate 0.0539 Epoch: 5 Global Step: 88720 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:12:21,917-Speed 5163.01 samples/sec Loss 3.5973 LearningRate 0.0539 Epoch: 5 Global Step: 88730 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:12:23,894-Speed 5182.92 samples/sec Loss 3.5359 LearningRate 0.0539 Epoch: 5 Global Step: 88740 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:12:25,875-Speed 5171.42 samples/sec Loss 3.5014 LearningRate 0.0539 Epoch: 5 Global Step: 88750 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:12:27,857-Speed 5166.93 samples/sec Loss 3.5824 LearningRate 0.0539 Epoch: 5 Global Step: 88760 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:12:29,826-Speed 5202.10 samples/sec Loss 3.5570 LearningRate 0.0539 Epoch: 5 Global Step: 88770 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:12:31,790-Speed 5215.37 samples/sec Loss 3.5460 LearningRate 0.0539 Epoch: 5 Global Step: 88780 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:12:33,763-Speed 5192.98 samples/sec Loss 3.5531 LearningRate 0.0539 Epoch: 5 Global Step: 88790 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:12:35,724-Speed 5223.73 samples/sec Loss 3.5868 LearningRate 0.0539 Epoch: 5 Global Step: 88800 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:12:37,720-Speed 5130.94 samples/sec Loss 3.4703 LearningRate 0.0539 Epoch: 5 Global Step: 88810 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:12:39,709-Speed 5149.94 samples/sec Loss 3.5329 LearningRate 0.0539 Epoch: 5 Global Step: 88820 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:12:41,699-Speed 5148.38 samples/sec Loss 3.5664 LearningRate 0.0539 Epoch: 5 Global Step: 88830 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:12:43,669-Speed 5199.24 samples/sec Loss 3.5692 LearningRate 0.0539 Epoch: 5 Global Step: 88840 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:12:45,651-Speed 5170.00 samples/sec Loss 3.5295 LearningRate 0.0539 Epoch: 5 Global Step: 88850 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:12:47,656-Speed 5108.99 samples/sec Loss 3.5403 LearningRate 0.0538 Epoch: 5 Global Step: 88860 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:12:49,626-Speed 5199.06 samples/sec Loss 3.6065 LearningRate 0.0538 Epoch: 5 Global Step: 88870 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:12:51,618-Speed 5141.04 samples/sec Loss 3.5102 LearningRate 0.0538 Epoch: 5 Global Step: 88880 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:12:53,606-Speed 5152.08 samples/sec Loss 3.5454 LearningRate 0.0538 Epoch: 5 Global Step: 88890 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:12:55,592-Speed 5157.91 samples/sec Loss 3.4908 LearningRate 0.0538 Epoch: 5 Global Step: 88900 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:12:57,565-Speed 5192.37 samples/sec Loss 3.5795 LearningRate 0.0538 Epoch: 5 Global Step: 88910 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:12:59,539-Speed 5190.68 samples/sec Loss 3.5813 LearningRate 0.0538 Epoch: 5 Global Step: 88920 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:13:01,510-Speed 5196.19 samples/sec Loss 3.6350 LearningRate 0.0538 Epoch: 5 Global Step: 88930 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:13:03,491-Speed 5171.65 samples/sec Loss 3.5717 LearningRate 0.0538 Epoch: 5 Global Step: 88940 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:13:05,475-Speed 5161.84 samples/sec Loss 3.6531 LearningRate 0.0538 Epoch: 5 Global Step: 88950 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:13:07,445-Speed 5200.39 samples/sec Loss 3.6349 LearningRate 0.0538 Epoch: 5 Global Step: 88960 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:13:09,437-Speed 5142.98 samples/sec Loss 3.5955 LearningRate 0.0538 Epoch: 5 Global Step: 88970 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:13:11,407-Speed 5198.20 samples/sec Loss 3.5679 LearningRate 0.0538 Epoch: 5 Global Step: 88980 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:13:13,383-Speed 5183.84 samples/sec Loss 3.5810 LearningRate 0.0538 Epoch: 5 Global Step: 88990 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:13:15,387-Speed 5114.04 samples/sec Loss 3.5441 LearningRate 0.0538 Epoch: 5 Global Step: 89000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:13:17,353-Speed 5210.30 samples/sec Loss 3.6273 LearningRate 0.0538 Epoch: 5 Global Step: 89010 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:13:19,314-Speed 5222.24 samples/sec Loss 3.5257 LearningRate 0.0538 Epoch: 5 Global Step: 89020 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:13:21,286-Speed 5195.04 samples/sec Loss 3.6540 LearningRate 0.0538 Epoch: 5 Global Step: 89030 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:13:23,253-Speed 5206.21 samples/sec Loss 3.4619 LearningRate 0.0538 Epoch: 5 Global Step: 89040 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:13:25,235-Speed 5169.23 samples/sec Loss 3.6023 LearningRate 0.0538 Epoch: 5 Global Step: 89050 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:13:27,210-Speed 5184.55 samples/sec Loss 3.5922 LearningRate 0.0538 Epoch: 5 Global Step: 89060 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:13:29,191-Speed 5172.35 samples/sec Loss 3.5964 LearningRate 0.0538 Epoch: 5 Global Step: 89070 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:13:31,154-Speed 5218.80 samples/sec Loss 3.5983 LearningRate 0.0538 Epoch: 5 Global Step: 89080 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:13:33,137-Speed 5164.21 samples/sec Loss 3.5362 LearningRate 0.0537 Epoch: 5 Global Step: 89090 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:13:35,107-Speed 5200.73 samples/sec Loss 3.5560 LearningRate 0.0537 Epoch: 5 Global Step: 89100 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:13:37,084-Speed 5181.61 samples/sec Loss 3.6049 LearningRate 0.0537 Epoch: 5 Global Step: 89110 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:13:39,071-Speed 5155.25 samples/sec Loss 3.5223 LearningRate 0.0537 Epoch: 5 Global Step: 89120 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:13:41,037-Speed 5209.48 samples/sec Loss 3.5772 LearningRate 0.0537 Epoch: 5 Global Step: 89130 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:13:43,002-Speed 5212.58 samples/sec Loss 3.5812 LearningRate 0.0537 Epoch: 5 Global Step: 89140 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:13:44,976-Speed 5188.84 samples/sec Loss 3.6092 LearningRate 0.0537 Epoch: 5 Global Step: 89150 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:13:46,958-Speed 5168.21 samples/sec Loss 3.6015 LearningRate 0.0537 Epoch: 5 Global Step: 89160 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:13:48,937-Speed 5175.88 samples/sec Loss 3.5738 LearningRate 0.0537 Epoch: 5 Global Step: 89170 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:13:50,923-Speed 5158.46 samples/sec Loss 3.4832 LearningRate 0.0537 Epoch: 5 Global Step: 89180 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:13:52,899-Speed 5185.43 samples/sec Loss 3.5556 LearningRate 0.0537 Epoch: 5 Global Step: 89190 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:13:54,861-Speed 5220.37 samples/sec Loss 3.5857 LearningRate 0.0537 Epoch: 5 Global Step: 89200 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:13:56,854-Speed 5140.27 samples/sec Loss 3.6015 LearningRate 0.0537 Epoch: 5 Global Step: 89210 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:13:58,846-Speed 5143.21 samples/sec Loss 3.5209 LearningRate 0.0537 Epoch: 5 Global Step: 89220 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:14:00,818-Speed 5192.99 samples/sec Loss 3.5414 LearningRate 0.0537 Epoch: 5 Global Step: 89230 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:14:02,796-Speed 5178.97 samples/sec Loss 3.5541 LearningRate 0.0537 Epoch: 5 Global Step: 89240 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:14:04,765-Speed 5202.30 samples/sec Loss 3.5996 LearningRate 0.0537 Epoch: 5 Global Step: 89250 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:14:06,737-Speed 5195.00 samples/sec Loss 3.5722 LearningRate 0.0537 Epoch: 5 Global Step: 89260 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:14:08,717-Speed 5172.41 samples/sec Loss 3.5097 LearningRate 0.0537 Epoch: 5 Global Step: 89270 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:14:10,695-Speed 5178.79 samples/sec Loss 3.5902 LearningRate 0.0537 Epoch: 5 Global Step: 89280 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:14:12,693-Speed 5127.51 samples/sec Loss 3.5634 LearningRate 0.0537 Epoch: 5 Global Step: 89290 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:14:14,667-Speed 5189.43 samples/sec Loss 3.6152 LearningRate 0.0537 Epoch: 5 Global Step: 89300 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:14:16,654-Speed 5155.77 samples/sec Loss 3.6186 LearningRate 0.0536 Epoch: 5 Global Step: 89310 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:14:18,635-Speed 5170.55 samples/sec Loss 3.6948 LearningRate 0.0536 Epoch: 5 Global Step: 89320 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:14:20,599-Speed 5214.23 samples/sec Loss 3.5614 LearningRate 0.0536 Epoch: 5 Global Step: 89330 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:14:22,581-Speed 5170.34 samples/sec Loss 3.5043 LearningRate 0.0536 Epoch: 5 Global Step: 89340 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:14:24,559-Speed 5177.85 samples/sec Loss 3.5700 LearningRate 0.0536 Epoch: 5 Global Step: 89350 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:14:26,582-Speed 5062.85 samples/sec Loss 3.5995 LearningRate 0.0536 Epoch: 5 Global Step: 89360 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:14:28,574-Speed 5143.49 samples/sec Loss 3.5860 LearningRate 0.0536 Epoch: 5 Global Step: 89370 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:14:30,560-Speed 5156.31 samples/sec Loss 3.6275 LearningRate 0.0536 Epoch: 5 Global Step: 89380 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:14:32,526-Speed 5211.90 samples/sec Loss 3.5577 LearningRate 0.0536 Epoch: 5 Global Step: 89390 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:14:34,518-Speed 5141.33 samples/sec Loss 3.5757 LearningRate 0.0536 Epoch: 5 Global Step: 89400 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:14:36,481-Speed 5218.35 samples/sec Loss 3.5614 LearningRate 0.0536 Epoch: 5 Global Step: 89410 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:14:38,474-Speed 5139.44 samples/sec Loss 3.7136 LearningRate 0.0536 Epoch: 5 Global Step: 89420 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:14:40,466-Speed 5143.38 samples/sec Loss 3.6870 LearningRate 0.0536 Epoch: 5 Global Step: 89430 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:14:42,434-Speed 5205.41 samples/sec Loss 3.5223 LearningRate 0.0536 Epoch: 5 Global Step: 89440 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:14:44,402-Speed 5204.49 samples/sec Loss 3.5553 LearningRate 0.0536 Epoch: 5 Global Step: 89450 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:14:46,375-Speed 5191.59 samples/sec Loss 3.5451 LearningRate 0.0536 Epoch: 5 Global Step: 89460 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:14:48,365-Speed 5145.77 samples/sec Loss 3.6077 LearningRate 0.0536 Epoch: 5 Global Step: 89470 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:14:50,346-Speed 5172.72 samples/sec Loss 3.5171 LearningRate 0.0536 Epoch: 5 Global Step: 89480 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:14:52,321-Speed 5186.51 samples/sec Loss 3.5307 LearningRate 0.0536 Epoch: 5 Global Step: 89490 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:14:54,292-Speed 5195.52 samples/sec Loss 3.6206 LearningRate 0.0536 Epoch: 5 Global Step: 89500 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:14:56,262-Speed 5201.30 samples/sec Loss 3.6120 LearningRate 0.0536 Epoch: 5 Global Step: 89510 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:14:58,257-Speed 5133.88 samples/sec Loss 3.5998 LearningRate 0.0536 Epoch: 5 Global Step: 89520 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:15:00,262-Speed 5109.30 samples/sec Loss 3.5603 LearningRate 0.0536 Epoch: 5 Global Step: 89530 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:15:02,230-Speed 5205.03 samples/sec Loss 3.5340 LearningRate 0.0535 Epoch: 5 Global Step: 89540 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:15:04,223-Speed 5138.54 samples/sec Loss 3.5350 LearningRate 0.0535 Epoch: 5 Global Step: 89550 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:15:06,191-Speed 5203.97 samples/sec Loss 3.6231 LearningRate 0.0535 Epoch: 5 Global Step: 89560 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:15:08,165-Speed 5191.22 samples/sec Loss 3.6876 LearningRate 0.0535 Epoch: 5 Global Step: 89570 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:15:10,165-Speed 5120.77 samples/sec Loss 3.6432 LearningRate 0.0535 Epoch: 5 Global Step: 89580 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:15:12,136-Speed 5198.52 samples/sec Loss 3.5194 LearningRate 0.0535 Epoch: 5 Global Step: 89590 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:15:14,108-Speed 5193.00 samples/sec Loss 3.5938 LearningRate 0.0535 Epoch: 5 Global Step: 89600 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:15:16,087-Speed 5177.08 samples/sec Loss 3.5986 LearningRate 0.0535 Epoch: 5 Global Step: 89610 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:15:18,071-Speed 5162.57 samples/sec Loss 3.5990 LearningRate 0.0535 Epoch: 5 Global Step: 89620 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:15:20,042-Speed 5197.97 samples/sec Loss 3.5572 LearningRate 0.0535 Epoch: 5 Global Step: 89630 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:15:22,017-Speed 5184.85 samples/sec Loss 3.6101 LearningRate 0.0535 Epoch: 5 Global Step: 89640 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:15:24,008-Speed 5146.49 samples/sec Loss 3.5642 LearningRate 0.0535 Epoch: 5 Global Step: 89650 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:15:25,985-Speed 5181.27 samples/sec Loss 3.5094 LearningRate 0.0535 Epoch: 5 Global Step: 89660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:15:27,994-Speed 5096.71 samples/sec Loss 3.5997 LearningRate 0.0535 Epoch: 5 Global Step: 89670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:15:30,011-Speed 5079.16 samples/sec Loss 3.5712 LearningRate 0.0535 Epoch: 5 Global Step: 89680 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:15:31,980-Speed 5201.50 samples/sec Loss 3.6127 LearningRate 0.0535 Epoch: 5 Global Step: 89690 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:15:33,962-Speed 5169.70 samples/sec Loss 3.5300 LearningRate 0.0535 Epoch: 5 Global Step: 89700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:15:35,949-Speed 5157.10 samples/sec Loss 3.5698 LearningRate 0.0535 Epoch: 5 Global Step: 89710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:15:37,952-Speed 5114.58 samples/sec Loss 3.6085 LearningRate 0.0535 Epoch: 5 Global Step: 89720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:15:39,929-Speed 5180.36 samples/sec Loss 3.5376 LearningRate 0.0535 Epoch: 5 Global Step: 89730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:15:41,901-Speed 5192.75 samples/sec Loss 3.6329 LearningRate 0.0535 Epoch: 5 Global Step: 89740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:15:43,884-Speed 5165.78 samples/sec Loss 3.6445 LearningRate 0.0535 Epoch: 5 Global Step: 89750 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:15:45,871-Speed 5156.64 samples/sec Loss 3.4693 LearningRate 0.0535 Epoch: 5 Global Step: 89760 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:15:47,834-Speed 5216.92 samples/sec Loss 3.5084 LearningRate 0.0534 Epoch: 5 Global Step: 89770 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:15:49,827-Speed 5140.53 samples/sec Loss 3.6120 LearningRate 0.0534 Epoch: 5 Global Step: 89780 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:15:51,807-Speed 5173.47 samples/sec Loss 3.6208 LearningRate 0.0534 Epoch: 5 Global Step: 89790 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:15:53,778-Speed 5196.70 samples/sec Loss 3.5892 LearningRate 0.0534 Epoch: 5 Global Step: 89800 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:15:55,744-Speed 5210.12 samples/sec Loss 3.5528 LearningRate 0.0534 Epoch: 5 Global Step: 89810 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:15:57,716-Speed 5195.65 samples/sec Loss 3.6258 LearningRate 0.0534 Epoch: 5 Global Step: 89820 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:15:59,705-Speed 5149.86 samples/sec Loss 3.6074 LearningRate 0.0534 Epoch: 5 Global Step: 89830 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:16:01,678-Speed 5190.55 samples/sec Loss 3.5954 LearningRate 0.0534 Epoch: 5 Global Step: 89840 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:16:03,659-Speed 5172.27 samples/sec Loss 3.5474 LearningRate 0.0534 Epoch: 5 Global Step: 89850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:16:05,659-Speed 5121.32 samples/sec Loss 3.6235 LearningRate 0.0534 Epoch: 5 Global Step: 89860 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:16:07,642-Speed 5164.53 samples/sec Loss 3.5795 LearningRate 0.0534 Epoch: 5 Global Step: 89870 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:16:09,615-Speed 5191.43 samples/sec Loss 3.6201 LearningRate 0.0534 Epoch: 5 Global Step: 89880 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:16:11,600-Speed 5162.11 samples/sec Loss 3.5731 LearningRate 0.0534 Epoch: 5 Global Step: 89890 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:16:13,577-Speed 5181.67 samples/sec Loss 3.5621 LearningRate 0.0534 Epoch: 5 Global Step: 89900 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:16:15,578-Speed 5120.27 samples/sec Loss 3.5468 LearningRate 0.0534 Epoch: 5 Global Step: 89910 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:16:17,577-Speed 5122.35 samples/sec Loss 3.5692 LearningRate 0.0534 Epoch: 5 Global Step: 89920 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:16:19,551-Speed 5190.57 samples/sec Loss 3.6341 LearningRate 0.0534 Epoch: 5 Global Step: 89930 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:16:21,544-Speed 5137.51 samples/sec Loss 3.6134 LearningRate 0.0534 Epoch: 5 Global Step: 89940 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:16:23,521-Speed 5181.42 samples/sec Loss 3.6263 LearningRate 0.0534 Epoch: 5 Global Step: 89950 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:16:25,497-Speed 5184.71 samples/sec Loss 3.5755 LearningRate 0.0534 Epoch: 5 Global Step: 89960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:16:27,483-Speed 5156.20 samples/sec Loss 3.5058 LearningRate 0.0534 Epoch: 5 Global Step: 89970 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:16:29,477-Speed 5139.34 samples/sec Loss 3.4883 LearningRate 0.0534 Epoch: 5 Global Step: 89980 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:16:31,451-Speed 5188.86 samples/sec Loss 3.5331 LearningRate 0.0534 Epoch: 5 Global Step: 89990 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:16:33,459-Speed 5101.76 samples/sec Loss 3.4377 LearningRate 0.0533 Epoch: 5 Global Step: 90000 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:16:59,987-[lfw][90000]XNorm: 22.066125 Training: 2022-04-11 05:16:59,988-[lfw][90000]Accuracy-Flip: 0.99783+-0.00308 Training: 2022-04-11 05:16:59,988-[lfw][90000]Accuracy-Highest: 0.99817 Training: 2022-04-11 05:17:30,697-[cfp_fp][90000]XNorm: 20.543083 Training: 2022-04-11 05:17:30,697-[cfp_fp][90000]Accuracy-Flip: 0.97800+-0.00586 Training: 2022-04-11 05:17:30,698-[cfp_fp][90000]Accuracy-Highest: 0.98086 Training: 2022-04-11 05:17:57,232-[agedb_30][90000]XNorm: 21.693062 Training: 2022-04-11 05:17:57,232-[agedb_30][90000]Accuracy-Flip: 0.97783+-0.00785 Training: 2022-04-11 05:17:57,233-[agedb_30][90000]Accuracy-Highest: 0.97900 Training: 2022-04-11 05:17:59,223-Speed 119.40 samples/sec Loss 3.5631 LearningRate 0.0533 Epoch: 5 Global Step: 90010 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:18:01,184-Speed 5222.18 samples/sec Loss 3.5573 LearningRate 0.0533 Epoch: 5 Global Step: 90020 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:18:03,151-Speed 5208.54 samples/sec Loss 3.6139 LearningRate 0.0533 Epoch: 5 Global Step: 90030 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:18:05,110-Speed 5228.75 samples/sec Loss 3.5870 LearningRate 0.0533 Epoch: 5 Global Step: 90040 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:18:07,080-Speed 5198.51 samples/sec Loss 3.6437 LearningRate 0.0533 Epoch: 5 Global Step: 90050 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:18:09,037-Speed 5234.44 samples/sec Loss 3.5870 LearningRate 0.0533 Epoch: 5 Global Step: 90060 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:18:10,997-Speed 5226.96 samples/sec Loss 3.6629 LearningRate 0.0533 Epoch: 5 Global Step: 90070 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:18:12,955-Speed 5231.52 samples/sec Loss 3.5008 LearningRate 0.0533 Epoch: 5 Global Step: 90080 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:18:14,916-Speed 5223.91 samples/sec Loss 3.5424 LearningRate 0.0533 Epoch: 5 Global Step: 90090 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:18:16,886-Speed 5200.08 samples/sec Loss 3.6493 LearningRate 0.0533 Epoch: 5 Global Step: 90100 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:18:18,862-Speed 5181.69 samples/sec Loss 3.5304 LearningRate 0.0533 Epoch: 5 Global Step: 90110 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:18:20,833-Speed 5197.22 samples/sec Loss 3.4891 LearningRate 0.0533 Epoch: 5 Global Step: 90120 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:18:22,818-Speed 5162.25 samples/sec Loss 3.5175 LearningRate 0.0533 Epoch: 5 Global Step: 90130 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:18:24,802-Speed 5161.75 samples/sec Loss 3.5477 LearningRate 0.0533 Epoch: 5 Global Step: 90140 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:18:26,772-Speed 5200.21 samples/sec Loss 3.5733 LearningRate 0.0533 Epoch: 5 Global Step: 90150 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:18:28,747-Speed 5186.89 samples/sec Loss 3.5672 LearningRate 0.0533 Epoch: 5 Global Step: 90160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:18:30,713-Speed 5210.41 samples/sec Loss 3.5436 LearningRate 0.0533 Epoch: 5 Global Step: 90170 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:18:32,689-Speed 5182.27 samples/sec Loss 3.4700 LearningRate 0.0533 Epoch: 5 Global Step: 90180 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:18:34,649-Speed 5227.77 samples/sec Loss 3.6043 LearningRate 0.0533 Epoch: 5 Global Step: 90190 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:18:36,611-Speed 5221.84 samples/sec Loss 3.5484 LearningRate 0.0533 Epoch: 5 Global Step: 90200 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:18:38,581-Speed 5199.00 samples/sec Loss 3.5639 LearningRate 0.0533 Epoch: 5 Global Step: 90210 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:18:40,557-Speed 5183.24 samples/sec Loss 3.5782 LearningRate 0.0533 Epoch: 5 Global Step: 90220 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:18:42,533-Speed 5183.30 samples/sec Loss 3.6063 LearningRate 0.0532 Epoch: 5 Global Step: 90230 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:18:44,497-Speed 5216.97 samples/sec Loss 3.6544 LearningRate 0.0532 Epoch: 5 Global Step: 90240 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:18:46,485-Speed 5153.29 samples/sec Loss 3.5674 LearningRate 0.0532 Epoch: 5 Global Step: 90250 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:18:48,486-Speed 5119.93 samples/sec Loss 3.5201 LearningRate 0.0532 Epoch: 5 Global Step: 90260 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:18:50,495-Speed 5099.41 samples/sec Loss 3.6587 LearningRate 0.0532 Epoch: 5 Global Step: 90270 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:18:52,489-Speed 5134.81 samples/sec Loss 3.6622 LearningRate 0.0532 Epoch: 5 Global Step: 90280 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:18:54,463-Speed 5191.18 samples/sec Loss 3.6162 LearningRate 0.0532 Epoch: 5 Global Step: 90290 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:18:56,426-Speed 5216.71 samples/sec Loss 3.5421 LearningRate 0.0532 Epoch: 5 Global Step: 90300 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:18:58,418-Speed 5143.12 samples/sec Loss 3.5966 LearningRate 0.0532 Epoch: 5 Global Step: 90310 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:19:00,387-Speed 5201.18 samples/sec Loss 3.6099 LearningRate 0.0532 Epoch: 5 Global Step: 90320 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:19:02,357-Speed 5198.96 samples/sec Loss 3.5828 LearningRate 0.0532 Epoch: 5 Global Step: 90330 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:19:04,353-Speed 5134.20 samples/sec Loss 3.5672 LearningRate 0.0532 Epoch: 5 Global Step: 90340 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:19:06,328-Speed 5185.59 samples/sec Loss 3.5346 LearningRate 0.0532 Epoch: 5 Global Step: 90350 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:19:08,312-Speed 5163.07 samples/sec Loss 3.5788 LearningRate 0.0532 Epoch: 5 Global Step: 90360 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:19:10,289-Speed 5182.31 samples/sec Loss 3.5685 LearningRate 0.0532 Epoch: 5 Global Step: 90370 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:19:12,251-Speed 5220.77 samples/sec Loss 3.5884 LearningRate 0.0532 Epoch: 5 Global Step: 90380 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:19:14,244-Speed 5139.91 samples/sec Loss 3.6310 LearningRate 0.0532 Epoch: 5 Global Step: 90390 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:19:16,215-Speed 5196.52 samples/sec Loss 3.5743 LearningRate 0.0532 Epoch: 5 Global Step: 90400 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:19:18,181-Speed 5210.87 samples/sec Loss 3.5704 LearningRate 0.0532 Epoch: 5 Global Step: 90410 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:19:20,189-Speed 5101.71 samples/sec Loss 3.5966 LearningRate 0.0532 Epoch: 5 Global Step: 90420 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:19:22,164-Speed 5187.20 samples/sec Loss 3.6245 LearningRate 0.0532 Epoch: 5 Global Step: 90430 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:19:24,189-Speed 5058.05 samples/sec Loss 3.5749 LearningRate 0.0532 Epoch: 5 Global Step: 90440 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:19:26,159-Speed 5201.00 samples/sec Loss 3.5826 LearningRate 0.0532 Epoch: 5 Global Step: 90450 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:19:28,131-Speed 5194.87 samples/sec Loss 3.6188 LearningRate 0.0531 Epoch: 5 Global Step: 90460 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:19:30,108-Speed 5181.61 samples/sec Loss 3.5756 LearningRate 0.0531 Epoch: 5 Global Step: 90470 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:19:32,074-Speed 5212.14 samples/sec Loss 3.7031 LearningRate 0.0531 Epoch: 5 Global Step: 90480 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:19:34,042-Speed 5203.99 samples/sec Loss 3.5549 LearningRate 0.0531 Epoch: 5 Global Step: 90490 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:19:36,020-Speed 5179.27 samples/sec Loss 3.6436 LearningRate 0.0531 Epoch: 5 Global Step: 90500 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:19:37,993-Speed 5191.32 samples/sec Loss 3.5679 LearningRate 0.0531 Epoch: 5 Global Step: 90510 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:19:40,002-Speed 5097.73 samples/sec Loss 3.5896 LearningRate 0.0531 Epoch: 5 Global Step: 90520 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:19:41,969-Speed 5209.77 samples/sec Loss 3.5943 LearningRate 0.0531 Epoch: 5 Global Step: 90530 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:19:43,933-Speed 5214.73 samples/sec Loss 3.6192 LearningRate 0.0531 Epoch: 5 Global Step: 90540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:19:45,896-Speed 5217.79 samples/sec Loss 3.6178 LearningRate 0.0531 Epoch: 5 Global Step: 90550 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:19:47,862-Speed 5210.60 samples/sec Loss 3.5821 LearningRate 0.0531 Epoch: 5 Global Step: 90560 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:19:49,827-Speed 5212.49 samples/sec Loss 3.5291 LearningRate 0.0531 Epoch: 5 Global Step: 90570 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:19:51,802-Speed 5187.78 samples/sec Loss 3.5299 LearningRate 0.0531 Epoch: 5 Global Step: 90580 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:19:53,775-Speed 5190.38 samples/sec Loss 3.4835 LearningRate 0.0531 Epoch: 5 Global Step: 90590 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:19:55,740-Speed 5213.52 samples/sec Loss 3.5592 LearningRate 0.0531 Epoch: 5 Global Step: 90600 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:19:57,720-Speed 5174.42 samples/sec Loss 3.5856 LearningRate 0.0531 Epoch: 5 Global Step: 90610 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:19:59,705-Speed 5161.04 samples/sec Loss 3.6050 LearningRate 0.0531 Epoch: 5 Global Step: 90620 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:20:01,663-Speed 5230.02 samples/sec Loss 3.5568 LearningRate 0.0531 Epoch: 5 Global Step: 90630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:20:03,643-Speed 5174.13 samples/sec Loss 3.5576 LearningRate 0.0531 Epoch: 5 Global Step: 90640 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:20:05,611-Speed 5204.83 samples/sec Loss 3.6325 LearningRate 0.0531 Epoch: 5 Global Step: 90650 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:20:07,581-Speed 5199.49 samples/sec Loss 3.5353 LearningRate 0.0531 Epoch: 5 Global Step: 90660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:20:09,559-Speed 5179.43 samples/sec Loss 3.6676 LearningRate 0.0531 Epoch: 5 Global Step: 90670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:20:11,525-Speed 5209.56 samples/sec Loss 3.5993 LearningRate 0.0531 Epoch: 5 Global Step: 90680 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:20:13,492-Speed 5208.20 samples/sec Loss 3.5231 LearningRate 0.0530 Epoch: 5 Global Step: 90690 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:20:15,494-Speed 5115.97 samples/sec Loss 3.5374 LearningRate 0.0530 Epoch: 5 Global Step: 90700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:20:17,495-Speed 5119.98 samples/sec Loss 3.5430 LearningRate 0.0530 Epoch: 5 Global Step: 90710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:20:19,464-Speed 5203.60 samples/sec Loss 3.5737 LearningRate 0.0530 Epoch: 5 Global Step: 90720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:20:21,436-Speed 5194.34 samples/sec Loss 3.6211 LearningRate 0.0530 Epoch: 5 Global Step: 90730 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:20:23,405-Speed 5200.86 samples/sec Loss 3.6214 LearningRate 0.0530 Epoch: 5 Global Step: 90740 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:20:25,369-Speed 5215.55 samples/sec Loss 3.5984 LearningRate 0.0530 Epoch: 5 Global Step: 90750 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:20:27,344-Speed 5187.76 samples/sec Loss 3.5584 LearningRate 0.0530 Epoch: 5 Global Step: 90760 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:20:29,321-Speed 5180.30 samples/sec Loss 3.5724 LearningRate 0.0530 Epoch: 5 Global Step: 90770 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:20:31,290-Speed 5203.65 samples/sec Loss 3.6045 LearningRate 0.0530 Epoch: 5 Global Step: 90780 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:20:33,259-Speed 5199.88 samples/sec Loss 3.5793 LearningRate 0.0530 Epoch: 5 Global Step: 90790 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:20:35,254-Speed 5135.21 samples/sec Loss 3.6851 LearningRate 0.0530 Epoch: 5 Global Step: 90800 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:20:37,230-Speed 5183.52 samples/sec Loss 3.5193 LearningRate 0.0530 Epoch: 5 Global Step: 90810 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:20:39,197-Speed 5209.78 samples/sec Loss 3.5279 LearningRate 0.0530 Epoch: 5 Global Step: 90820 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:20:41,163-Speed 5209.87 samples/sec Loss 3.4726 LearningRate 0.0530 Epoch: 5 Global Step: 90830 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:20:43,126-Speed 5217.37 samples/sec Loss 3.5481 LearningRate 0.0530 Epoch: 5 Global Step: 90840 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:20:45,107-Speed 5170.82 samples/sec Loss 3.5884 LearningRate 0.0530 Epoch: 5 Global Step: 90850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:20:47,099-Speed 5144.29 samples/sec Loss 3.5444 LearningRate 0.0530 Epoch: 5 Global Step: 90860 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:20:49,090-Speed 5143.13 samples/sec Loss 3.5734 LearningRate 0.0530 Epoch: 5 Global Step: 90870 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:20:51,064-Speed 5190.96 samples/sec Loss 3.5634 LearningRate 0.0530 Epoch: 5 Global Step: 90880 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:20:53,043-Speed 5176.76 samples/sec Loss 3.5665 LearningRate 0.0530 Epoch: 5 Global Step: 90890 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:20:55,033-Speed 5148.28 samples/sec Loss 3.4988 LearningRate 0.0530 Epoch: 5 Global Step: 90900 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:20:57,002-Speed 5202.03 samples/sec Loss 3.5007 LearningRate 0.0530 Epoch: 5 Global Step: 90910 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:20:58,993-Speed 5145.18 samples/sec Loss 3.5241 LearningRate 0.0529 Epoch: 5 Global Step: 90920 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:21:00,973-Speed 5172.09 samples/sec Loss 3.5448 LearningRate 0.0529 Epoch: 5 Global Step: 90930 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:21:02,960-Speed 5154.27 samples/sec Loss 3.5536 LearningRate 0.0529 Epoch: 5 Global Step: 90940 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:21:04,931-Speed 5196.76 samples/sec Loss 3.6361 LearningRate 0.0529 Epoch: 5 Global Step: 90950 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:21:06,896-Speed 5213.70 samples/sec Loss 3.6019 LearningRate 0.0529 Epoch: 5 Global Step: 90960 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:21:08,886-Speed 5148.81 samples/sec Loss 3.5996 LearningRate 0.0529 Epoch: 5 Global Step: 90970 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:21:10,884-Speed 5126.46 samples/sec Loss 3.5492 LearningRate 0.0529 Epoch: 5 Global Step: 90980 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:21:12,849-Speed 5212.19 samples/sec Loss 3.6484 LearningRate 0.0529 Epoch: 5 Global Step: 90990 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:21:14,835-Speed 5158.90 samples/sec Loss 3.6260 LearningRate 0.0529 Epoch: 5 Global Step: 91000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:21:16,802-Speed 5206.42 samples/sec Loss 3.5337 LearningRate 0.0529 Epoch: 5 Global Step: 91010 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:21:18,770-Speed 5206.72 samples/sec Loss 3.5902 LearningRate 0.0529 Epoch: 5 Global Step: 91020 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:21:20,753-Speed 5163.45 samples/sec Loss 3.5738 LearningRate 0.0529 Epoch: 5 Global Step: 91030 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:21:22,745-Speed 5143.16 samples/sec Loss 3.5907 LearningRate 0.0529 Epoch: 5 Global Step: 91040 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:21:24,718-Speed 5191.00 samples/sec Loss 3.5878 LearningRate 0.0529 Epoch: 5 Global Step: 91050 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:21:26,684-Speed 5212.37 samples/sec Loss 3.6052 LearningRate 0.0529 Epoch: 5 Global Step: 91060 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:21:28,673-Speed 5149.57 samples/sec Loss 3.5863 LearningRate 0.0529 Epoch: 5 Global Step: 91070 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:21:30,639-Speed 5210.52 samples/sec Loss 3.6546 LearningRate 0.0529 Epoch: 5 Global Step: 91080 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:21:32,603-Speed 5215.70 samples/sec Loss 3.5445 LearningRate 0.0529 Epoch: 5 Global Step: 91090 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:21:34,591-Speed 5152.69 samples/sec Loss 3.5493 LearningRate 0.0529 Epoch: 5 Global Step: 91100 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:21:36,569-Speed 5177.54 samples/sec Loss 3.6337 LearningRate 0.0529 Epoch: 5 Global Step: 91110 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:21:38,537-Speed 5205.15 samples/sec Loss 3.5301 LearningRate 0.0529 Epoch: 5 Global Step: 91120 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:21:40,572-Speed 5034.07 samples/sec Loss 3.5495 LearningRate 0.0529 Epoch: 5 Global Step: 91130 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:21:42,562-Speed 5146.59 samples/sec Loss 3.6131 LearningRate 0.0528 Epoch: 5 Global Step: 91140 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:21:44,543-Speed 5170.29 samples/sec Loss 3.5387 LearningRate 0.0528 Epoch: 5 Global Step: 91150 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:21:46,509-Speed 5210.54 samples/sec Loss 3.5770 LearningRate 0.0528 Epoch: 5 Global Step: 91160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:21:48,469-Speed 5227.32 samples/sec Loss 3.5071 LearningRate 0.0528 Epoch: 5 Global Step: 91170 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:21:50,451-Speed 5168.06 samples/sec Loss 3.6567 LearningRate 0.0528 Epoch: 5 Global Step: 91180 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:21:52,417-Speed 5211.17 samples/sec Loss 3.5513 LearningRate 0.0528 Epoch: 5 Global Step: 91190 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:21:54,392-Speed 5186.58 samples/sec Loss 3.6051 LearningRate 0.0528 Epoch: 5 Global Step: 91200 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:21:56,366-Speed 5187.80 samples/sec Loss 3.5763 LearningRate 0.0528 Epoch: 5 Global Step: 91210 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:21:58,344-Speed 5178.44 samples/sec Loss 3.5817 LearningRate 0.0528 Epoch: 5 Global Step: 91220 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:22:00,335-Speed 5144.55 samples/sec Loss 3.5709 LearningRate 0.0528 Epoch: 5 Global Step: 91230 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:22:02,303-Speed 5207.16 samples/sec Loss 3.5369 LearningRate 0.0528 Epoch: 5 Global Step: 91240 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:22:04,271-Speed 5203.66 samples/sec Loss 3.6065 LearningRate 0.0528 Epoch: 5 Global Step: 91250 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:22:06,247-Speed 5184.03 samples/sec Loss 3.5865 LearningRate 0.0528 Epoch: 5 Global Step: 91260 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:22:08,215-Speed 5206.76 samples/sec Loss 3.5841 LearningRate 0.0528 Epoch: 5 Global Step: 91270 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:22:10,186-Speed 5195.20 samples/sec Loss 3.5724 LearningRate 0.0528 Epoch: 5 Global Step: 91280 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:22:12,158-Speed 5196.38 samples/sec Loss 3.5676 LearningRate 0.0528 Epoch: 5 Global Step: 91290 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:22:14,153-Speed 5132.06 samples/sec Loss 3.6643 LearningRate 0.0528 Epoch: 5 Global Step: 91300 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:22:16,141-Speed 5153.43 samples/sec Loss 3.6079 LearningRate 0.0528 Epoch: 5 Global Step: 91310 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:22:18,121-Speed 5174.82 samples/sec Loss 3.6087 LearningRate 0.0528 Epoch: 5 Global Step: 91320 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:22:20,096-Speed 5184.95 samples/sec Loss 3.6502 LearningRate 0.0528 Epoch: 5 Global Step: 91330 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:22:22,088-Speed 5143.68 samples/sec Loss 3.5337 LearningRate 0.0528 Epoch: 5 Global Step: 91340 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:22:24,078-Speed 5145.24 samples/sec Loss 3.6258 LearningRate 0.0528 Epoch: 5 Global Step: 91350 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:22:26,106-Speed 5053.33 samples/sec Loss 3.5714 LearningRate 0.0528 Epoch: 5 Global Step: 91360 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:22:28,106-Speed 5119.84 samples/sec Loss 3.6411 LearningRate 0.0527 Epoch: 5 Global Step: 91370 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:22:30,073-Speed 5209.77 samples/sec Loss 3.4862 LearningRate 0.0527 Epoch: 5 Global Step: 91380 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:22:32,046-Speed 5190.11 samples/sec Loss 3.5753 LearningRate 0.0527 Epoch: 5 Global Step: 91390 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:22:34,014-Speed 5207.17 samples/sec Loss 3.6463 LearningRate 0.0527 Epoch: 5 Global Step: 91400 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:22:35,995-Speed 5168.74 samples/sec Loss 3.5363 LearningRate 0.0527 Epoch: 5 Global Step: 91410 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:22:37,968-Speed 5191.38 samples/sec Loss 3.6205 LearningRate 0.0527 Epoch: 5 Global Step: 91420 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:22:39,943-Speed 5187.09 samples/sec Loss 3.6250 LearningRate 0.0527 Epoch: 5 Global Step: 91430 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:22:41,916-Speed 5192.40 samples/sec Loss 3.5589 LearningRate 0.0527 Epoch: 5 Global Step: 91440 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:22:43,904-Speed 5151.70 samples/sec Loss 3.5861 LearningRate 0.0527 Epoch: 5 Global Step: 91450 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:22:45,886-Speed 5167.54 samples/sec Loss 3.6440 LearningRate 0.0527 Epoch: 5 Global Step: 91460 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:22:47,869-Speed 5167.58 samples/sec Loss 3.6456 LearningRate 0.0527 Epoch: 5 Global Step: 91470 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:22:49,851-Speed 5168.65 samples/sec Loss 3.5709 LearningRate 0.0527 Epoch: 5 Global Step: 91480 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:22:51,822-Speed 5198.31 samples/sec Loss 3.6060 LearningRate 0.0527 Epoch: 5 Global Step: 91490 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:22:53,795-Speed 5192.00 samples/sec Loss 3.7154 LearningRate 0.0527 Epoch: 5 Global Step: 91500 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:22:55,768-Speed 5191.51 samples/sec Loss 3.6037 LearningRate 0.0527 Epoch: 5 Global Step: 91510 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:22:57,759-Speed 5143.37 samples/sec Loss 3.6821 LearningRate 0.0527 Epoch: 5 Global Step: 91520 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:22:59,733-Speed 5191.49 samples/sec Loss 3.7010 LearningRate 0.0527 Epoch: 5 Global Step: 91530 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:23:01,707-Speed 5186.65 samples/sec Loss 3.5366 LearningRate 0.0527 Epoch: 5 Global Step: 91540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:23:03,691-Speed 5163.95 samples/sec Loss 3.5757 LearningRate 0.0527 Epoch: 5 Global Step: 91550 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:23:05,678-Speed 5155.83 samples/sec Loss 3.6046 LearningRate 0.0527 Epoch: 5 Global Step: 91560 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:23:07,669-Speed 5143.28 samples/sec Loss 3.5673 LearningRate 0.0527 Epoch: 5 Global Step: 91570 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:23:09,663-Speed 5140.00 samples/sec Loss 3.5759 LearningRate 0.0527 Epoch: 5 Global Step: 91580 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:23:11,637-Speed 5187.29 samples/sec Loss 3.5762 LearningRate 0.0527 Epoch: 5 Global Step: 91590 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:23:13,617-Speed 5174.55 samples/sec Loss 3.5297 LearningRate 0.0526 Epoch: 5 Global Step: 91600 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:23:15,590-Speed 5191.95 samples/sec Loss 3.6741 LearningRate 0.0526 Epoch: 5 Global Step: 91610 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:23:17,557-Speed 5208.22 samples/sec Loss 3.6898 LearningRate 0.0526 Epoch: 5 Global Step: 91620 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:23:19,523-Speed 5208.41 samples/sec Loss 3.6029 LearningRate 0.0526 Epoch: 5 Global Step: 91630 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:23:21,489-Speed 5211.47 samples/sec Loss 3.5883 LearningRate 0.0526 Epoch: 5 Global Step: 91640 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:23:23,494-Speed 5108.62 samples/sec Loss 3.5673 LearningRate 0.0526 Epoch: 5 Global Step: 91650 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:23:25,485-Speed 5143.57 samples/sec Loss 3.6452 LearningRate 0.0526 Epoch: 5 Global Step: 91660 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:23:27,452-Speed 5207.85 samples/sec Loss 3.6354 LearningRate 0.0526 Epoch: 5 Global Step: 91670 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:23:29,422-Speed 5202.06 samples/sec Loss 3.6015 LearningRate 0.0526 Epoch: 5 Global Step: 91680 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:23:31,381-Speed 5227.96 samples/sec Loss 3.6080 LearningRate 0.0526 Epoch: 5 Global Step: 91690 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:23:33,365-Speed 5163.11 samples/sec Loss 3.6042 LearningRate 0.0526 Epoch: 5 Global Step: 91700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:23:35,336-Speed 5196.29 samples/sec Loss 3.6359 LearningRate 0.0526 Epoch: 5 Global Step: 91710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:23:37,320-Speed 5163.69 samples/sec Loss 3.5934 LearningRate 0.0526 Epoch: 5 Global Step: 91720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:23:39,310-Speed 5146.83 samples/sec Loss 3.5252 LearningRate 0.0526 Epoch: 5 Global Step: 91730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:23:41,290-Speed 5174.77 samples/sec Loss 3.5914 LearningRate 0.0526 Epoch: 5 Global Step: 91740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:23:43,265-Speed 5185.20 samples/sec Loss 3.6535 LearningRate 0.0526 Epoch: 5 Global Step: 91750 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:23:45,230-Speed 5214.12 samples/sec Loss 3.5803 LearningRate 0.0526 Epoch: 5 Global Step: 91760 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:23:47,216-Speed 5156.25 samples/sec Loss 3.5522 LearningRate 0.0526 Epoch: 5 Global Step: 91770 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:23:49,189-Speed 5192.28 samples/sec Loss 3.5725 LearningRate 0.0526 Epoch: 5 Global Step: 91780 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:23:51,160-Speed 5197.42 samples/sec Loss 3.5473 LearningRate 0.0526 Epoch: 5 Global Step: 91790 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:23:53,145-Speed 5160.75 samples/sec Loss 3.4820 LearningRate 0.0526 Epoch: 5 Global Step: 91800 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:23:55,124-Speed 5177.23 samples/sec Loss 3.5588 LearningRate 0.0526 Epoch: 5 Global Step: 91810 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:23:57,095-Speed 5195.61 samples/sec Loss 3.5838 LearningRate 0.0526 Epoch: 5 Global Step: 91820 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:23:59,080-Speed 5161.79 samples/sec Loss 3.6003 LearningRate 0.0525 Epoch: 5 Global Step: 91830 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:24:01,058-Speed 5178.22 samples/sec Loss 3.5395 LearningRate 0.0525 Epoch: 5 Global Step: 91840 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:24:03,040-Speed 5166.62 samples/sec Loss 3.6193 LearningRate 0.0525 Epoch: 5 Global Step: 91850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:24:05,026-Speed 5159.36 samples/sec Loss 3.5560 LearningRate 0.0525 Epoch: 5 Global Step: 91860 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:24:06,992-Speed 5209.37 samples/sec Loss 3.5588 LearningRate 0.0525 Epoch: 5 Global Step: 91870 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:24:08,962-Speed 5200.75 samples/sec Loss 3.5502 LearningRate 0.0525 Epoch: 5 Global Step: 91880 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:24:10,934-Speed 5193.41 samples/sec Loss 3.5314 LearningRate 0.0525 Epoch: 5 Global Step: 91890 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:24:12,903-Speed 5204.11 samples/sec Loss 3.5167 LearningRate 0.0525 Epoch: 5 Global Step: 91900 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:24:14,882-Speed 5174.55 samples/sec Loss 3.6019 LearningRate 0.0525 Epoch: 5 Global Step: 91910 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:24:16,854-Speed 5194.91 samples/sec Loss 3.5723 LearningRate 0.0525 Epoch: 5 Global Step: 91920 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:24:18,824-Speed 5199.66 samples/sec Loss 3.5445 LearningRate 0.0525 Epoch: 5 Global Step: 91930 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:24:20,810-Speed 5158.54 samples/sec Loss 3.6230 LearningRate 0.0525 Epoch: 5 Global Step: 91940 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:24:22,787-Speed 5180.05 samples/sec Loss 3.5960 LearningRate 0.0525 Epoch: 5 Global Step: 91950 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:24:24,782-Speed 5135.28 samples/sec Loss 3.6371 LearningRate 0.0525 Epoch: 5 Global Step: 91960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:24:26,767-Speed 5161.45 samples/sec Loss 3.5617 LearningRate 0.0525 Epoch: 5 Global Step: 91970 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:24:28,731-Speed 5215.23 samples/sec Loss 3.5251 LearningRate 0.0525 Epoch: 5 Global Step: 91980 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:24:30,705-Speed 5189.23 samples/sec Loss 3.5620 LearningRate 0.0525 Epoch: 5 Global Step: 91990 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:24:32,673-Speed 5203.91 samples/sec Loss 3.5609 LearningRate 0.0525 Epoch: 5 Global Step: 92000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:24:59,175-[lfw][92000]XNorm: 23.286742 Training: 2022-04-11 05:24:59,176-[lfw][92000]Accuracy-Flip: 0.99800+-0.00277 Training: 2022-04-11 05:24:59,176-[lfw][92000]Accuracy-Highest: 0.99817 Training: 2022-04-11 05:25:29,884-[cfp_fp][92000]XNorm: 21.048078 Training: 2022-04-11 05:25:29,885-[cfp_fp][92000]Accuracy-Flip: 0.97986+-0.00476 Training: 2022-04-11 05:25:29,885-[cfp_fp][92000]Accuracy-Highest: 0.98086 Training: 2022-04-11 05:25:56,590-[agedb_30][92000]XNorm: 22.271978 Training: 2022-04-11 05:25:56,591-[agedb_30][92000]Accuracy-Flip: 0.97767+-0.00834 Training: 2022-04-11 05:25:56,591-[agedb_30][92000]Accuracy-Highest: 0.97900 Training: 2022-04-11 05:25:58,576-Speed 119.21 samples/sec Loss 3.5216 LearningRate 0.0525 Epoch: 5 Global Step: 92010 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:26:00,544-Speed 5203.56 samples/sec Loss 3.6902 LearningRate 0.0525 Epoch: 5 Global Step: 92020 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:26:02,522-Speed 5177.28 samples/sec Loss 3.5998 LearningRate 0.0525 Epoch: 5 Global Step: 92030 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:26:04,493-Speed 5197.37 samples/sec Loss 3.5829 LearningRate 0.0525 Epoch: 5 Global Step: 92040 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:26:06,462-Speed 5202.50 samples/sec Loss 3.6885 LearningRate 0.0525 Epoch: 5 Global Step: 92050 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:26:08,428-Speed 5212.25 samples/sec Loss 3.5461 LearningRate 0.0524 Epoch: 5 Global Step: 92060 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:26:10,412-Speed 5162.68 samples/sec Loss 3.5564 LearningRate 0.0524 Epoch: 5 Global Step: 92070 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:26:12,385-Speed 5191.15 samples/sec Loss 3.7375 LearningRate 0.0524 Epoch: 5 Global Step: 92080 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:26:14,381-Speed 5131.22 samples/sec Loss 3.6208 LearningRate 0.0524 Epoch: 5 Global Step: 92090 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:26:16,357-Speed 5184.93 samples/sec Loss 3.6288 LearningRate 0.0524 Epoch: 5 Global Step: 92100 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:26:18,336-Speed 5175.55 samples/sec Loss 3.5741 LearningRate 0.0524 Epoch: 5 Global Step: 92110 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:26:20,325-Speed 5152.22 samples/sec Loss 3.5119 LearningRate 0.0524 Epoch: 5 Global Step: 92120 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:26:22,309-Speed 5161.11 samples/sec Loss 3.6270 LearningRate 0.0524 Epoch: 5 Global Step: 92130 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:26:24,329-Speed 5071.30 samples/sec Loss 3.6233 LearningRate 0.0524 Epoch: 5 Global Step: 92140 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:26:26,320-Speed 5146.97 samples/sec Loss 3.5678 LearningRate 0.0524 Epoch: 5 Global Step: 92150 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:26:28,308-Speed 5152.97 samples/sec Loss 3.6033 LearningRate 0.0524 Epoch: 5 Global Step: 92160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:26:30,284-Speed 5183.44 samples/sec Loss 3.5465 LearningRate 0.0524 Epoch: 5 Global Step: 92170 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:26:32,265-Speed 5170.07 samples/sec Loss 3.5678 LearningRate 0.0524 Epoch: 5 Global Step: 92180 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:26:34,285-Speed 5071.01 samples/sec Loss 3.5311 LearningRate 0.0524 Epoch: 5 Global Step: 92190 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:26:36,287-Speed 5116.37 samples/sec Loss 3.6067 LearningRate 0.0524 Epoch: 5 Global Step: 92200 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:26:38,275-Speed 5153.86 samples/sec Loss 3.5974 LearningRate 0.0524 Epoch: 5 Global Step: 92210 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:26:40,253-Speed 5177.90 samples/sec Loss 3.6088 LearningRate 0.0524 Epoch: 5 Global Step: 92220 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:26:42,241-Speed 5153.18 samples/sec Loss 3.5231 LearningRate 0.0524 Epoch: 5 Global Step: 92230 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:26:44,219-Speed 5179.11 samples/sec Loss 3.6190 LearningRate 0.0524 Epoch: 5 Global Step: 92240 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:26:46,203-Speed 5161.72 samples/sec Loss 3.5696 LearningRate 0.0524 Epoch: 5 Global Step: 92250 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:26:48,170-Speed 5208.89 samples/sec Loss 3.6430 LearningRate 0.0524 Epoch: 5 Global Step: 92260 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:26:50,167-Speed 5128.91 samples/sec Loss 3.6172 LearningRate 0.0524 Epoch: 5 Global Step: 92270 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:26:52,162-Speed 5134.09 samples/sec Loss 3.6151 LearningRate 0.0524 Epoch: 5 Global Step: 92280 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:26:54,144-Speed 5169.43 samples/sec Loss 3.5312 LearningRate 0.0524 Epoch: 5 Global Step: 92290 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:26:56,144-Speed 5122.72 samples/sec Loss 3.4711 LearningRate 0.0523 Epoch: 5 Global Step: 92300 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:26:58,125-Speed 5169.58 samples/sec Loss 3.6749 LearningRate 0.0523 Epoch: 5 Global Step: 92310 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:27:00,106-Speed 5169.99 samples/sec Loss 3.5910 LearningRate 0.0523 Epoch: 5 Global Step: 92320 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:27:02,086-Speed 5172.74 samples/sec Loss 3.5291 LearningRate 0.0523 Epoch: 5 Global Step: 92330 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:27:04,065-Speed 5177.65 samples/sec Loss 3.6612 LearningRate 0.0523 Epoch: 5 Global Step: 92340 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:27:06,075-Speed 5094.56 samples/sec Loss 3.5809 LearningRate 0.0523 Epoch: 5 Global Step: 92350 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:27:08,062-Speed 5154.23 samples/sec Loss 3.5246 LearningRate 0.0523 Epoch: 5 Global Step: 92360 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:27:10,057-Speed 5135.96 samples/sec Loss 3.4781 LearningRate 0.0523 Epoch: 5 Global Step: 92370 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:27:12,045-Speed 5152.48 samples/sec Loss 3.5525 LearningRate 0.0523 Epoch: 5 Global Step: 92380 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:27:14,035-Speed 5147.96 samples/sec Loss 3.6171 LearningRate 0.0523 Epoch: 5 Global Step: 92390 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:27:16,028-Speed 5139.19 samples/sec Loss 3.6524 LearningRate 0.0523 Epoch: 5 Global Step: 92400 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:27:18,009-Speed 5172.20 samples/sec Loss 3.6879 LearningRate 0.0523 Epoch: 5 Global Step: 92410 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:27:19,987-Speed 5179.65 samples/sec Loss 3.4987 LearningRate 0.0523 Epoch: 5 Global Step: 92420 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:27:21,959-Speed 5193.13 samples/sec Loss 3.6189 LearningRate 0.0523 Epoch: 5 Global Step: 92430 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:27:23,936-Speed 5180.53 samples/sec Loss 3.5725 LearningRate 0.0523 Epoch: 5 Global Step: 92440 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:27:25,913-Speed 5190.60 samples/sec Loss 3.6377 LearningRate 0.0523 Epoch: 5 Global Step: 92450 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:27:27,887-Speed 5188.65 samples/sec Loss 3.6084 LearningRate 0.0523 Epoch: 5 Global Step: 92460 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:27:29,881-Speed 5135.69 samples/sec Loss 3.6138 LearningRate 0.0523 Epoch: 5 Global Step: 92470 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:27:31,860-Speed 5177.76 samples/sec Loss 3.6289 LearningRate 0.0523 Epoch: 5 Global Step: 92480 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:27:33,847-Speed 5153.92 samples/sec Loss 3.6528 LearningRate 0.0523 Epoch: 5 Global Step: 92490 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:27:35,839-Speed 5143.18 samples/sec Loss 3.5880 LearningRate 0.0523 Epoch: 5 Global Step: 92500 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:27:37,820-Speed 5170.41 samples/sec Loss 3.4979 LearningRate 0.0523 Epoch: 5 Global Step: 92510 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:27:39,809-Speed 5149.86 samples/sec Loss 3.6228 LearningRate 0.0523 Epoch: 5 Global Step: 92520 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:27:41,779-Speed 5201.41 samples/sec Loss 3.5010 LearningRate 0.0522 Epoch: 5 Global Step: 92530 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:27:43,757-Speed 5177.26 samples/sec Loss 3.6035 LearningRate 0.0522 Epoch: 5 Global Step: 92540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:27:45,727-Speed 5198.89 samples/sec Loss 3.6026 LearningRate 0.0522 Epoch: 5 Global Step: 92550 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:27:47,700-Speed 5192.26 samples/sec Loss 3.6096 LearningRate 0.0522 Epoch: 5 Global Step: 92560 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:27:49,684-Speed 5164.65 samples/sec Loss 3.5400 LearningRate 0.0522 Epoch: 5 Global Step: 92570 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:27:51,711-Speed 5052.45 samples/sec Loss 3.5895 LearningRate 0.0522 Epoch: 5 Global Step: 92580 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:27:53,672-Speed 5223.87 samples/sec Loss 3.5663 LearningRate 0.0522 Epoch: 5 Global Step: 92590 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:27:55,640-Speed 5206.54 samples/sec Loss 3.5809 LearningRate 0.0522 Epoch: 5 Global Step: 92600 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:27:57,610-Speed 5198.60 samples/sec Loss 3.5608 LearningRate 0.0522 Epoch: 5 Global Step: 92610 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:27:59,599-Speed 5150.43 samples/sec Loss 3.6016 LearningRate 0.0522 Epoch: 5 Global Step: 92620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:28:01,584-Speed 5159.59 samples/sec Loss 3.6146 LearningRate 0.0522 Epoch: 5 Global Step: 92630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:28:03,596-Speed 5091.51 samples/sec Loss 3.5244 LearningRate 0.0522 Epoch: 5 Global Step: 92640 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:28:05,568-Speed 5194.03 samples/sec Loss 3.5562 LearningRate 0.0522 Epoch: 5 Global Step: 92650 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:28:07,554-Speed 5156.49 samples/sec Loss 3.5571 LearningRate 0.0522 Epoch: 5 Global Step: 92660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:28:09,541-Speed 5154.94 samples/sec Loss 3.5497 LearningRate 0.0522 Epoch: 5 Global Step: 92670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:28:11,526-Speed 5162.58 samples/sec Loss 3.5708 LearningRate 0.0522 Epoch: 5 Global Step: 92680 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:28:13,500-Speed 5188.73 samples/sec Loss 3.6484 LearningRate 0.0522 Epoch: 5 Global Step: 92690 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:28:15,479-Speed 5174.75 samples/sec Loss 3.5622 LearningRate 0.0522 Epoch: 5 Global Step: 92700 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:28:17,448-Speed 5202.10 samples/sec Loss 3.5929 LearningRate 0.0522 Epoch: 5 Global Step: 92710 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:28:19,414-Speed 5212.14 samples/sec Loss 3.5674 LearningRate 0.0522 Epoch: 5 Global Step: 92720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:28:21,379-Speed 5211.31 samples/sec Loss 3.5622 LearningRate 0.0522 Epoch: 5 Global Step: 92730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:28:23,353-Speed 5190.75 samples/sec Loss 3.6105 LearningRate 0.0522 Epoch: 5 Global Step: 92740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:28:25,339-Speed 5157.33 samples/sec Loss 3.5623 LearningRate 0.0522 Epoch: 5 Global Step: 92750 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:28:27,336-Speed 5129.79 samples/sec Loss 3.4986 LearningRate 0.0521 Epoch: 5 Global Step: 92760 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:28:29,350-Speed 5085.74 samples/sec Loss 3.5579 LearningRate 0.0521 Epoch: 5 Global Step: 92770 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:28:31,334-Speed 5164.54 samples/sec Loss 3.6178 LearningRate 0.0521 Epoch: 5 Global Step: 92780 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:28:33,298-Speed 5214.49 samples/sec Loss 3.6106 LearningRate 0.0521 Epoch: 5 Global Step: 92790 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:28:35,267-Speed 5202.64 samples/sec Loss 3.6342 LearningRate 0.0521 Epoch: 5 Global Step: 92800 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:28:37,251-Speed 5162.89 samples/sec Loss 3.6126 LearningRate 0.0521 Epoch: 5 Global Step: 92810 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:28:39,233-Speed 5167.18 samples/sec Loss 3.5423 LearningRate 0.0521 Epoch: 5 Global Step: 92820 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:28:41,226-Speed 5141.49 samples/sec Loss 3.5783 LearningRate 0.0521 Epoch: 5 Global Step: 92830 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:28:43,203-Speed 5181.10 samples/sec Loss 3.6931 LearningRate 0.0521 Epoch: 5 Global Step: 92840 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:28:45,177-Speed 5187.40 samples/sec Loss 3.6103 LearningRate 0.0521 Epoch: 5 Global Step: 92850 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:28:47,157-Speed 5174.47 samples/sec Loss 3.5458 LearningRate 0.0521 Epoch: 5 Global Step: 92860 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:28:49,131-Speed 5189.53 samples/sec Loss 3.5857 LearningRate 0.0521 Epoch: 5 Global Step: 92870 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:28:51,099-Speed 5203.79 samples/sec Loss 3.4531 LearningRate 0.0521 Epoch: 5 Global Step: 92880 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:28:53,111-Speed 5093.85 samples/sec Loss 3.5353 LearningRate 0.0521 Epoch: 5 Global Step: 92890 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:28:55,094-Speed 5164.58 samples/sec Loss 3.6259 LearningRate 0.0521 Epoch: 5 Global Step: 92900 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:28:57,077-Speed 5165.10 samples/sec Loss 3.5546 LearningRate 0.0521 Epoch: 5 Global Step: 92910 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:28:59,047-Speed 5200.96 samples/sec Loss 3.5954 LearningRate 0.0521 Epoch: 5 Global Step: 92920 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:29:01,023-Speed 5182.14 samples/sec Loss 3.6388 LearningRate 0.0521 Epoch: 5 Global Step: 92930 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:29:02,992-Speed 5203.19 samples/sec Loss 3.5211 LearningRate 0.0521 Epoch: 5 Global Step: 92940 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:29:04,964-Speed 5193.91 samples/sec Loss 3.6343 LearningRate 0.0521 Epoch: 5 Global Step: 92950 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:29:06,942-Speed 5178.55 samples/sec Loss 3.6299 LearningRate 0.0521 Epoch: 5 Global Step: 92960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:29:08,908-Speed 5210.81 samples/sec Loss 3.5720 LearningRate 0.0521 Epoch: 5 Global Step: 92970 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:29:10,885-Speed 5180.73 samples/sec Loss 3.5697 LearningRate 0.0521 Epoch: 5 Global Step: 92980 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:29:12,879-Speed 5137.98 samples/sec Loss 3.4309 LearningRate 0.0520 Epoch: 5 Global Step: 92990 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:29:14,860-Speed 5169.70 samples/sec Loss 3.5985 LearningRate 0.0520 Epoch: 5 Global Step: 93000 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:29:16,840-Speed 5175.31 samples/sec Loss 3.5188 LearningRate 0.0520 Epoch: 5 Global Step: 93010 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:29:18,811-Speed 5196.81 samples/sec Loss 3.5793 LearningRate 0.0520 Epoch: 5 Global Step: 93020 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:29:20,783-Speed 5192.65 samples/sec Loss 3.5711 LearningRate 0.0520 Epoch: 5 Global Step: 93030 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:29:22,748-Speed 5215.20 samples/sec Loss 3.6533 LearningRate 0.0520 Epoch: 5 Global Step: 93040 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:29:24,717-Speed 5200.87 samples/sec Loss 3.6315 LearningRate 0.0520 Epoch: 5 Global Step: 93050 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:29:26,698-Speed 5171.40 samples/sec Loss 3.4720 LearningRate 0.0520 Epoch: 5 Global Step: 93060 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:29:28,692-Speed 5137.40 samples/sec Loss 3.5305 LearningRate 0.0520 Epoch: 5 Global Step: 93070 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:29:30,658-Speed 5208.49 samples/sec Loss 3.4677 LearningRate 0.0520 Epoch: 5 Global Step: 93080 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:29:32,635-Speed 5182.26 samples/sec Loss 3.5117 LearningRate 0.0520 Epoch: 5 Global Step: 93090 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:29:34,622-Speed 5154.87 samples/sec Loss 3.6186 LearningRate 0.0520 Epoch: 5 Global Step: 93100 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:29:36,608-Speed 5159.93 samples/sec Loss 3.6500 LearningRate 0.0520 Epoch: 5 Global Step: 93110 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:29:38,589-Speed 5170.32 samples/sec Loss 3.5189 LearningRate 0.0520 Epoch: 5 Global Step: 93120 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:29:40,577-Speed 5151.71 samples/sec Loss 3.6494 LearningRate 0.0520 Epoch: 5 Global Step: 93130 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:29:42,565-Speed 5154.04 samples/sec Loss 3.6308 LearningRate 0.0520 Epoch: 5 Global Step: 93140 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:29:44,537-Speed 5192.80 samples/sec Loss 3.5864 LearningRate 0.0520 Epoch: 5 Global Step: 93150 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:29:46,511-Speed 5189.66 samples/sec Loss 3.5808 LearningRate 0.0520 Epoch: 5 Global Step: 93160 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:29:48,504-Speed 5139.79 samples/sec Loss 3.6203 LearningRate 0.0520 Epoch: 5 Global Step: 93170 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:29:50,478-Speed 5189.09 samples/sec Loss 3.6015 LearningRate 0.0520 Epoch: 5 Global Step: 93180 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:29:52,455-Speed 5180.47 samples/sec Loss 3.6412 LearningRate 0.0520 Epoch: 5 Global Step: 93190 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:29:54,447-Speed 5142.47 samples/sec Loss 3.5512 LearningRate 0.0520 Epoch: 5 Global Step: 93200 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:29:56,416-Speed 5203.76 samples/sec Loss 3.5468 LearningRate 0.0520 Epoch: 5 Global Step: 93210 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:29:58,397-Speed 5169.82 samples/sec Loss 3.5905 LearningRate 0.0519 Epoch: 5 Global Step: 93220 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:30:00,397-Speed 5121.87 samples/sec Loss 3.6178 LearningRate 0.0519 Epoch: 5 Global Step: 93230 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:30:02,379-Speed 5169.69 samples/sec Loss 3.5980 LearningRate 0.0519 Epoch: 5 Global Step: 93240 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:30:04,350-Speed 5196.44 samples/sec Loss 3.6084 LearningRate 0.0519 Epoch: 5 Global Step: 93250 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:30:06,326-Speed 5182.96 samples/sec Loss 3.4978 LearningRate 0.0519 Epoch: 5 Global Step: 93260 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:30:08,305-Speed 5174.53 samples/sec Loss 3.6474 LearningRate 0.0519 Epoch: 5 Global Step: 93270 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:30:10,282-Speed 5182.56 samples/sec Loss 3.6265 LearningRate 0.0519 Epoch: 5 Global Step: 93280 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:30:12,263-Speed 5170.96 samples/sec Loss 3.6088 LearningRate 0.0519 Epoch: 5 Global Step: 93290 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:30:14,248-Speed 5159.44 samples/sec Loss 3.6050 LearningRate 0.0519 Epoch: 5 Global Step: 93300 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:30:16,217-Speed 5204.84 samples/sec Loss 3.4932 LearningRate 0.0519 Epoch: 5 Global Step: 93310 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:30:18,209-Speed 5140.73 samples/sec Loss 3.5845 LearningRate 0.0519 Epoch: 5 Global Step: 93320 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:30:20,179-Speed 5201.55 samples/sec Loss 3.5974 LearningRate 0.0519 Epoch: 5 Global Step: 93330 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:30:22,170-Speed 5144.55 samples/sec Loss 3.5466 LearningRate 0.0519 Epoch: 5 Global Step: 93340 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:30:24,143-Speed 5191.46 samples/sec Loss 3.5740 LearningRate 0.0519 Epoch: 5 Global Step: 93350 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:30:26,135-Speed 5141.79 samples/sec Loss 3.5958 LearningRate 0.0519 Epoch: 5 Global Step: 93360 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:30:28,111-Speed 5183.99 samples/sec Loss 3.4996 LearningRate 0.0519 Epoch: 5 Global Step: 93370 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:30:30,078-Speed 5205.82 samples/sec Loss 3.5739 LearningRate 0.0519 Epoch: 5 Global Step: 93380 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:30:32,053-Speed 5187.48 samples/sec Loss 3.6166 LearningRate 0.0519 Epoch: 5 Global Step: 93390 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:30:34,040-Speed 5155.31 samples/sec Loss 3.6208 LearningRate 0.0519 Epoch: 5 Global Step: 93400 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:30:36,017-Speed 5182.41 samples/sec Loss 3.5582 LearningRate 0.0519 Epoch: 5 Global Step: 93410 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:30:38,019-Speed 5114.88 samples/sec Loss 3.6017 LearningRate 0.0519 Epoch: 5 Global Step: 93420 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:30:39,996-Speed 5183.58 samples/sec Loss 3.6210 LearningRate 0.0519 Epoch: 5 Global Step: 93430 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:30:41,983-Speed 5154.24 samples/sec Loss 3.5579 LearningRate 0.0519 Epoch: 5 Global Step: 93440 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:30:43,956-Speed 5190.32 samples/sec Loss 3.5974 LearningRate 0.0518 Epoch: 5 Global Step: 93450 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:30:45,924-Speed 5204.89 samples/sec Loss 3.5770 LearningRate 0.0518 Epoch: 5 Global Step: 93460 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:30:47,898-Speed 5189.68 samples/sec Loss 3.5829 LearningRate 0.0518 Epoch: 5 Global Step: 93470 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:30:49,874-Speed 5183.37 samples/sec Loss 3.5816 LearningRate 0.0518 Epoch: 5 Global Step: 93480 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:30:51,866-Speed 5142.00 samples/sec Loss 3.5522 LearningRate 0.0518 Epoch: 5 Global Step: 93490 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:30:53,837-Speed 5199.09 samples/sec Loss 3.5803 LearningRate 0.0518 Epoch: 5 Global Step: 93500 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:30:55,805-Speed 5204.44 samples/sec Loss 3.6202 LearningRate 0.0518 Epoch: 5 Global Step: 93510 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:30:57,779-Speed 5189.47 samples/sec Loss 3.5843 LearningRate 0.0518 Epoch: 5 Global Step: 93520 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:30:59,757-Speed 5179.39 samples/sec Loss 3.5407 LearningRate 0.0518 Epoch: 5 Global Step: 93530 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:31:01,768-Speed 5094.01 samples/sec Loss 3.5411 LearningRate 0.0518 Epoch: 5 Global Step: 93540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:31:03,744-Speed 5182.80 samples/sec Loss 3.5511 LearningRate 0.0518 Epoch: 5 Global Step: 93550 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:31:05,723-Speed 5176.15 samples/sec Loss 3.5734 LearningRate 0.0518 Epoch: 5 Global Step: 93560 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:31:07,693-Speed 5199.70 samples/sec Loss 3.5868 LearningRate 0.0518 Epoch: 5 Global Step: 93570 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:31:09,672-Speed 5175.97 samples/sec Loss 3.5387 LearningRate 0.0518 Epoch: 5 Global Step: 93580 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:31:11,651-Speed 5174.65 samples/sec Loss 3.6512 LearningRate 0.0518 Epoch: 5 Global Step: 93590 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:31:13,638-Speed 5155.95 samples/sec Loss 3.5684 LearningRate 0.0518 Epoch: 5 Global Step: 93600 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:31:15,616-Speed 5179.79 samples/sec Loss 3.6215 LearningRate 0.0518 Epoch: 5 Global Step: 93610 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:31:17,624-Speed 5099.53 samples/sec Loss 3.5122 LearningRate 0.0518 Epoch: 5 Global Step: 93620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:31:19,598-Speed 5190.72 samples/sec Loss 3.6901 LearningRate 0.0518 Epoch: 5 Global Step: 93630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:31:21,567-Speed 5203.00 samples/sec Loss 3.5822 LearningRate 0.0518 Epoch: 5 Global Step: 93640 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:31:23,546-Speed 5176.00 samples/sec Loss 3.5475 LearningRate 0.0518 Epoch: 5 Global Step: 93650 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:31:25,541-Speed 5133.86 samples/sec Loss 3.4233 LearningRate 0.0518 Epoch: 5 Global Step: 93660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:31:27,527-Speed 5157.01 samples/sec Loss 3.5125 LearningRate 0.0518 Epoch: 5 Global Step: 93670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:31:29,499-Speed 5194.77 samples/sec Loss 3.5692 LearningRate 0.0517 Epoch: 5 Global Step: 93680 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:31:31,475-Speed 5184.80 samples/sec Loss 3.5447 LearningRate 0.0517 Epoch: 5 Global Step: 93690 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:31:33,446-Speed 5197.12 samples/sec Loss 3.5248 LearningRate 0.0517 Epoch: 5 Global Step: 93700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:31:35,416-Speed 5199.88 samples/sec Loss 3.5887 LearningRate 0.0517 Epoch: 5 Global Step: 93710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:31:37,401-Speed 5159.16 samples/sec Loss 3.6076 LearningRate 0.0517 Epoch: 5 Global Step: 93720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:31:39,383-Speed 5167.59 samples/sec Loss 3.5734 LearningRate 0.0517 Epoch: 5 Global Step: 93730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:31:41,381-Speed 5128.83 samples/sec Loss 3.6364 LearningRate 0.0517 Epoch: 5 Global Step: 93740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:31:43,362-Speed 5171.57 samples/sec Loss 3.6112 LearningRate 0.0517 Epoch: 5 Global Step: 93750 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:31:45,338-Speed 5183.60 samples/sec Loss 3.5138 LearningRate 0.0517 Epoch: 5 Global Step: 93760 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:31:47,316-Speed 5178.03 samples/sec Loss 3.6509 LearningRate 0.0517 Epoch: 5 Global Step: 93770 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:31:49,297-Speed 5169.78 samples/sec Loss 3.5864 LearningRate 0.0517 Epoch: 5 Global Step: 93780 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:31:51,274-Speed 5181.40 samples/sec Loss 3.6219 LearningRate 0.0517 Epoch: 5 Global Step: 93790 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:31:53,244-Speed 5200.42 samples/sec Loss 3.6008 LearningRate 0.0517 Epoch: 5 Global Step: 93800 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:31:55,215-Speed 5196.02 samples/sec Loss 3.5366 LearningRate 0.0517 Epoch: 5 Global Step: 93810 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:31:57,188-Speed 5193.66 samples/sec Loss 3.5925 LearningRate 0.0517 Epoch: 5 Global Step: 93820 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:31:59,206-Speed 5075.04 samples/sec Loss 3.5900 LearningRate 0.0517 Epoch: 5 Global Step: 93830 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:32:01,196-Speed 5149.49 samples/sec Loss 3.5574 LearningRate 0.0517 Epoch: 5 Global Step: 93840 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:32:03,177-Speed 5170.51 samples/sec Loss 3.5929 LearningRate 0.0517 Epoch: 5 Global Step: 93850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:32:05,149-Speed 5192.74 samples/sec Loss 3.5894 LearningRate 0.0517 Epoch: 5 Global Step: 93860 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:32:07,112-Speed 5217.83 samples/sec Loss 3.5942 LearningRate 0.0517 Epoch: 5 Global Step: 93870 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:32:09,090-Speed 5179.04 samples/sec Loss 3.6315 LearningRate 0.0517 Epoch: 5 Global Step: 93880 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:32:11,072-Speed 5167.64 samples/sec Loss 3.6077 LearningRate 0.0517 Epoch: 5 Global Step: 93890 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:32:13,055-Speed 5167.37 samples/sec Loss 3.6032 LearningRate 0.0517 Epoch: 5 Global Step: 93900 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:32:15,048-Speed 5137.69 samples/sec Loss 3.6532 LearningRate 0.0517 Epoch: 5 Global Step: 93910 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:32:17,025-Speed 5181.64 samples/sec Loss 3.6221 LearningRate 0.0516 Epoch: 5 Global Step: 93920 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:32:19,027-Speed 5118.22 samples/sec Loss 3.6221 LearningRate 0.0516 Epoch: 5 Global Step: 93930 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:32:21,016-Speed 5148.39 samples/sec Loss 3.5883 LearningRate 0.0516 Epoch: 5 Global Step: 93940 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:32:23,000-Speed 5164.19 samples/sec Loss 3.5599 LearningRate 0.0516 Epoch: 5 Global Step: 93950 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:32:24,999-Speed 5125.42 samples/sec Loss 3.6266 LearningRate 0.0516 Epoch: 5 Global Step: 93960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:32:26,978-Speed 5174.58 samples/sec Loss 3.6563 LearningRate 0.0516 Epoch: 5 Global Step: 93970 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:32:28,977-Speed 5124.62 samples/sec Loss 3.5858 LearningRate 0.0516 Epoch: 5 Global Step: 93980 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:32:30,947-Speed 5200.60 samples/sec Loss 3.6262 LearningRate 0.0516 Epoch: 5 Global Step: 93990 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:32:32,920-Speed 5190.92 samples/sec Loss 3.5290 LearningRate 0.0516 Epoch: 5 Global Step: 94000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:32:59,584-[lfw][94000]XNorm: 22.557776 Training: 2022-04-11 05:32:59,585-[lfw][94000]Accuracy-Flip: 0.99767+-0.00300 Training: 2022-04-11 05:32:59,585-[lfw][94000]Accuracy-Highest: 0.99817 Training: 2022-04-11 05:33:30,577-[cfp_fp][94000]XNorm: 20.450550 Training: 2022-04-11 05:33:30,578-[cfp_fp][94000]Accuracy-Flip: 0.97743+-0.00651 Training: 2022-04-11 05:33:30,578-[cfp_fp][94000]Accuracy-Highest: 0.98086 Training: 2022-04-11 05:33:57,163-[agedb_30][94000]XNorm: 22.048120 Training: 2022-04-11 05:33:57,163-[agedb_30][94000]Accuracy-Flip: 0.97600+-0.00708 Training: 2022-04-11 05:33:57,164-[agedb_30][94000]Accuracy-Highest: 0.97900 Training: 2022-04-11 05:33:59,152-Speed 118.75 samples/sec Loss 3.4911 LearningRate 0.0516 Epoch: 5 Global Step: 94010 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:34:01,127-Speed 5187.32 samples/sec Loss 3.6238 LearningRate 0.0516 Epoch: 5 Global Step: 94020 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:34:03,149-Speed 5066.64 samples/sec Loss 3.6488 LearningRate 0.0516 Epoch: 5 Global Step: 94030 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:34:05,166-Speed 5076.62 samples/sec Loss 3.6192 LearningRate 0.0516 Epoch: 5 Global Step: 94040 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:34:07,138-Speed 5194.21 samples/sec Loss 3.5194 LearningRate 0.0516 Epoch: 5 Global Step: 94050 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:34:09,113-Speed 5187.73 samples/sec Loss 3.4744 LearningRate 0.0516 Epoch: 5 Global Step: 94060 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:34:11,087-Speed 5189.88 samples/sec Loss 3.5234 LearningRate 0.0516 Epoch: 5 Global Step: 94070 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:34:13,068-Speed 5171.07 samples/sec Loss 3.5463 LearningRate 0.0516 Epoch: 5 Global Step: 94080 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:34:15,036-Speed 5204.79 samples/sec Loss 3.5600 LearningRate 0.0516 Epoch: 5 Global Step: 94090 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:34:17,027-Speed 5144.47 samples/sec Loss 3.5796 LearningRate 0.0516 Epoch: 5 Global Step: 94100 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:34:19,006-Speed 5174.56 samples/sec Loss 3.6070 LearningRate 0.0516 Epoch: 5 Global Step: 94110 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:34:20,985-Speed 5176.13 samples/sec Loss 3.5172 LearningRate 0.0516 Epoch: 5 Global Step: 94120 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:34:23,028-Speed 5015.14 samples/sec Loss 3.6118 LearningRate 0.0516 Epoch: 5 Global Step: 94130 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:34:25,017-Speed 5148.01 samples/sec Loss 3.5596 LearningRate 0.0516 Epoch: 5 Global Step: 94140 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:34:27,000-Speed 5166.48 samples/sec Loss 3.5134 LearningRate 0.0515 Epoch: 5 Global Step: 94150 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:34:28,994-Speed 5135.85 samples/sec Loss 3.6112 LearningRate 0.0515 Epoch: 5 Global Step: 94160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:34:30,983-Speed 5150.71 samples/sec Loss 3.5700 LearningRate 0.0515 Epoch: 5 Global Step: 94170 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:34:32,963-Speed 5174.54 samples/sec Loss 3.6843 LearningRate 0.0515 Epoch: 5 Global Step: 94180 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:34:34,955-Speed 5143.83 samples/sec Loss 3.6486 LearningRate 0.0515 Epoch: 5 Global Step: 94190 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:34:36,928-Speed 5190.19 samples/sec Loss 3.6866 LearningRate 0.0515 Epoch: 5 Global Step: 94200 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:34:38,935-Speed 5104.87 samples/sec Loss 3.6109 LearningRate 0.0515 Epoch: 5 Global Step: 94210 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:34:40,933-Speed 5126.40 samples/sec Loss 3.5523 LearningRate 0.0515 Epoch: 5 Global Step: 94220 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:34:42,919-Speed 5156.24 samples/sec Loss 3.4551 LearningRate 0.0515 Epoch: 5 Global Step: 94230 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:34:44,914-Speed 5136.48 samples/sec Loss 3.6383 LearningRate 0.0515 Epoch: 5 Global Step: 94240 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:34:46,908-Speed 5134.99 samples/sec Loss 3.5537 LearningRate 0.0515 Epoch: 5 Global Step: 94250 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:34:48,903-Speed 5135.51 samples/sec Loss 3.5098 LearningRate 0.0515 Epoch: 5 Global Step: 94260 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:34:50,914-Speed 5094.31 samples/sec Loss 3.5377 LearningRate 0.0515 Epoch: 5 Global Step: 94270 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:34:52,903-Speed 5149.52 samples/sec Loss 3.6189 LearningRate 0.0515 Epoch: 5 Global Step: 94280 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:34:54,884-Speed 5171.49 samples/sec Loss 3.5710 LearningRate 0.0515 Epoch: 5 Global Step: 94290 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:34:56,859-Speed 5186.30 samples/sec Loss 3.5532 LearningRate 0.0515 Epoch: 5 Global Step: 94300 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:34:58,856-Speed 5129.85 samples/sec Loss 3.4652 LearningRate 0.0515 Epoch: 5 Global Step: 94310 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:35:00,847-Speed 5143.19 samples/sec Loss 3.5849 LearningRate 0.0515 Epoch: 5 Global Step: 94320 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:35:02,829-Speed 5168.23 samples/sec Loss 3.6432 LearningRate 0.0515 Epoch: 5 Global Step: 94330 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:35:04,813-Speed 5164.35 samples/sec Loss 3.5354 LearningRate 0.0515 Epoch: 5 Global Step: 94340 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:35:06,794-Speed 5170.87 samples/sec Loss 3.6834 LearningRate 0.0515 Epoch: 5 Global Step: 94350 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:35:08,804-Speed 5095.46 samples/sec Loss 3.5110 LearningRate 0.0515 Epoch: 5 Global Step: 94360 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:35:10,792-Speed 5153.26 samples/sec Loss 3.5709 LearningRate 0.0515 Epoch: 5 Global Step: 94370 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:35:12,780-Speed 5151.58 samples/sec Loss 3.6463 LearningRate 0.0514 Epoch: 5 Global Step: 94380 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:35:14,798-Speed 5077.71 samples/sec Loss 3.6289 LearningRate 0.0514 Epoch: 5 Global Step: 94390 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:35:16,779-Speed 5171.00 samples/sec Loss 3.5585 LearningRate 0.0514 Epoch: 5 Global Step: 94400 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:35:18,756-Speed 5179.92 samples/sec Loss 3.6116 LearningRate 0.0514 Epoch: 5 Global Step: 94410 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:35:20,739-Speed 5165.77 samples/sec Loss 3.5987 LearningRate 0.0514 Epoch: 5 Global Step: 94420 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:35:22,713-Speed 5188.35 samples/sec Loss 3.6267 LearningRate 0.0514 Epoch: 5 Global Step: 94430 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:35:24,687-Speed 5190.89 samples/sec Loss 3.5465 LearningRate 0.0514 Epoch: 5 Global Step: 94440 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:35:26,667-Speed 5173.28 samples/sec Loss 3.5898 LearningRate 0.0514 Epoch: 5 Global Step: 94450 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:35:28,639-Speed 5194.36 samples/sec Loss 3.5341 LearningRate 0.0514 Epoch: 5 Global Step: 94460 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:35:30,615-Speed 5183.23 samples/sec Loss 3.5811 LearningRate 0.0514 Epoch: 5 Global Step: 94470 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:35:32,588-Speed 5192.65 samples/sec Loss 3.5378 LearningRate 0.0514 Epoch: 5 Global Step: 94480 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:35:34,562-Speed 5189.55 samples/sec Loss 3.6406 LearningRate 0.0514 Epoch: 5 Global Step: 94490 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:35:36,536-Speed 5190.26 samples/sec Loss 3.5009 LearningRate 0.0514 Epoch: 5 Global Step: 94500 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:35:38,529-Speed 5138.41 samples/sec Loss 3.5120 LearningRate 0.0514 Epoch: 5 Global Step: 94510 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:35:40,505-Speed 5184.67 samples/sec Loss 3.4937 LearningRate 0.0514 Epoch: 5 Global Step: 94520 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:35:42,503-Speed 5125.39 samples/sec Loss 3.5249 LearningRate 0.0514 Epoch: 5 Global Step: 94530 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:35:44,477-Speed 5189.55 samples/sec Loss 3.6080 LearningRate 0.0514 Epoch: 5 Global Step: 94540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:35:46,475-Speed 5127.78 samples/sec Loss 3.6384 LearningRate 0.0514 Epoch: 5 Global Step: 94550 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:35:48,453-Speed 5177.22 samples/sec Loss 3.5112 LearningRate 0.0514 Epoch: 5 Global Step: 94560 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:35:50,426-Speed 5190.93 samples/sec Loss 3.6277 LearningRate 0.0514 Epoch: 5 Global Step: 94570 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:35:52,402-Speed 5184.81 samples/sec Loss 3.6114 LearningRate 0.0514 Epoch: 5 Global Step: 94580 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:35:54,375-Speed 5191.51 samples/sec Loss 3.6108 LearningRate 0.0514 Epoch: 5 Global Step: 94590 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:35:56,344-Speed 5204.44 samples/sec Loss 3.5425 LearningRate 0.0514 Epoch: 5 Global Step: 94600 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:35:58,336-Speed 5140.94 samples/sec Loss 3.5909 LearningRate 0.0513 Epoch: 5 Global Step: 94610 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:36:00,311-Speed 5186.72 samples/sec Loss 3.5225 LearningRate 0.0513 Epoch: 5 Global Step: 94620 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:36:02,285-Speed 5189.69 samples/sec Loss 3.5581 LearningRate 0.0513 Epoch: 5 Global Step: 94630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:36:04,258-Speed 5192.41 samples/sec Loss 3.5998 LearningRate 0.0513 Epoch: 5 Global Step: 94640 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:36:06,233-Speed 5187.33 samples/sec Loss 3.6517 LearningRate 0.0513 Epoch: 5 Global Step: 94650 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:36:08,199-Speed 5210.58 samples/sec Loss 3.6310 LearningRate 0.0513 Epoch: 5 Global Step: 94660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:36:10,173-Speed 5189.71 samples/sec Loss 3.5221 LearningRate 0.0513 Epoch: 5 Global Step: 94670 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:36:12,151-Speed 5178.23 samples/sec Loss 3.5792 LearningRate 0.0513 Epoch: 5 Global Step: 94680 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:36:14,136-Speed 5160.44 samples/sec Loss 3.5166 LearningRate 0.0513 Epoch: 5 Global Step: 94690 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:36:16,103-Speed 5206.31 samples/sec Loss 3.5783 LearningRate 0.0513 Epoch: 5 Global Step: 94700 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:36:18,079-Speed 5184.39 samples/sec Loss 3.5123 LearningRate 0.0513 Epoch: 5 Global Step: 94710 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:36:20,045-Speed 5210.95 samples/sec Loss 3.5070 LearningRate 0.0513 Epoch: 5 Global Step: 94720 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:36:22,011-Speed 5210.63 samples/sec Loss 3.5152 LearningRate 0.0513 Epoch: 5 Global Step: 94730 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:36:24,023-Speed 5090.23 samples/sec Loss 3.5577 LearningRate 0.0513 Epoch: 5 Global Step: 94740 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:36:26,000-Speed 5181.37 samples/sec Loss 3.5274 LearningRate 0.0513 Epoch: 5 Global Step: 94750 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:36:27,989-Speed 5149.81 samples/sec Loss 3.5908 LearningRate 0.0513 Epoch: 5 Global Step: 94760 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:36:29,965-Speed 5184.93 samples/sec Loss 3.5674 LearningRate 0.0513 Epoch: 5 Global Step: 94770 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:36:31,952-Speed 5154.66 samples/sec Loss 3.6225 LearningRate 0.0513 Epoch: 5 Global Step: 94780 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:36:33,929-Speed 5180.74 samples/sec Loss 3.4843 LearningRate 0.0513 Epoch: 5 Global Step: 94790 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:36:35,909-Speed 5175.81 samples/sec Loss 3.5895 LearningRate 0.0513 Epoch: 5 Global Step: 94800 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:36:37,886-Speed 5181.02 samples/sec Loss 3.4901 LearningRate 0.0513 Epoch: 5 Global Step: 94810 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:36:39,862-Speed 5184.10 samples/sec Loss 3.5833 LearningRate 0.0513 Epoch: 5 Global Step: 94820 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:36:41,866-Speed 5110.22 samples/sec Loss 3.5962 LearningRate 0.0513 Epoch: 5 Global Step: 94830 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:36:43,840-Speed 5188.41 samples/sec Loss 3.5394 LearningRate 0.0513 Epoch: 5 Global Step: 94840 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:36:45,818-Speed 5178.76 samples/sec Loss 3.6109 LearningRate 0.0512 Epoch: 5 Global Step: 94850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:36:47,787-Speed 5202.45 samples/sec Loss 3.5599 LearningRate 0.0512 Epoch: 5 Global Step: 94860 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:36:49,766-Speed 5177.41 samples/sec Loss 3.5003 LearningRate 0.0512 Epoch: 5 Global Step: 94870 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:36:51,743-Speed 5180.71 samples/sec Loss 3.6083 LearningRate 0.0512 Epoch: 5 Global Step: 94880 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:36:53,713-Speed 5199.64 samples/sec Loss 3.5595 LearningRate 0.0512 Epoch: 5 Global Step: 94890 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:36:55,685-Speed 5194.81 samples/sec Loss 3.4975 LearningRate 0.0512 Epoch: 5 Global Step: 94900 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:36:57,683-Speed 5125.84 samples/sec Loss 3.6357 LearningRate 0.0512 Epoch: 5 Global Step: 94910 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:36:59,661-Speed 5179.28 samples/sec Loss 3.5983 LearningRate 0.0512 Epoch: 5 Global Step: 94920 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:37:01,647-Speed 5158.85 samples/sec Loss 3.4355 LearningRate 0.0512 Epoch: 5 Global Step: 94930 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:37:03,632-Speed 5161.50 samples/sec Loss 3.4993 LearningRate 0.0512 Epoch: 5 Global Step: 94940 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:37:05,614-Speed 5166.42 samples/sec Loss 3.6136 LearningRate 0.0512 Epoch: 5 Global Step: 94950 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:37:07,586-Speed 5195.13 samples/sec Loss 3.5652 LearningRate 0.0512 Epoch: 5 Global Step: 94960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:37:09,564-Speed 5178.93 samples/sec Loss 3.4864 LearningRate 0.0512 Epoch: 5 Global Step: 94970 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:37:11,534-Speed 5200.27 samples/sec Loss 3.5348 LearningRate 0.0512 Epoch: 5 Global Step: 94980 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:37:13,503-Speed 5201.64 samples/sec Loss 3.4824 LearningRate 0.0512 Epoch: 5 Global Step: 94990 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:37:15,483-Speed 5171.28 samples/sec Loss 3.5622 LearningRate 0.0512 Epoch: 5 Global Step: 95000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:37:17,472-Speed 5150.46 samples/sec Loss 3.5281 LearningRate 0.0512 Epoch: 5 Global Step: 95010 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:37:19,463-Speed 5145.57 samples/sec Loss 3.5782 LearningRate 0.0512 Epoch: 5 Global Step: 95020 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:37:21,437-Speed 5189.57 samples/sec Loss 3.5275 LearningRate 0.0512 Epoch: 5 Global Step: 95030 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:37:23,410-Speed 5191.84 samples/sec Loss 3.5619 LearningRate 0.0512 Epoch: 5 Global Step: 95040 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:37:25,384-Speed 5190.01 samples/sec Loss 3.6035 LearningRate 0.0512 Epoch: 5 Global Step: 95050 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:37:27,365-Speed 5169.67 samples/sec Loss 3.5388 LearningRate 0.0512 Epoch: 5 Global Step: 95060 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:37:29,355-Speed 5146.19 samples/sec Loss 3.6179 LearningRate 0.0512 Epoch: 5 Global Step: 95070 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:37:31,328-Speed 5193.85 samples/sec Loss 3.5660 LearningRate 0.0511 Epoch: 5 Global Step: 95080 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:37:33,310-Speed 5168.55 samples/sec Loss 3.5814 LearningRate 0.0511 Epoch: 5 Global Step: 95090 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:37:35,310-Speed 5120.16 samples/sec Loss 3.6092 LearningRate 0.0511 Epoch: 5 Global Step: 95100 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:37:37,301-Speed 5144.47 samples/sec Loss 3.5772 LearningRate 0.0511 Epoch: 5 Global Step: 95110 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:37:39,286-Speed 5161.20 samples/sec Loss 3.5713 LearningRate 0.0511 Epoch: 5 Global Step: 95120 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:37:41,289-Speed 5114.95 samples/sec Loss 3.5496 LearningRate 0.0511 Epoch: 5 Global Step: 95130 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:37:43,267-Speed 5177.39 samples/sec Loss 3.4551 LearningRate 0.0511 Epoch: 5 Global Step: 95140 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:37:45,248-Speed 5171.65 samples/sec Loss 3.5727 LearningRate 0.0511 Epoch: 5 Global Step: 95150 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:37:47,224-Speed 5184.27 samples/sec Loss 3.4896 LearningRate 0.0511 Epoch: 5 Global Step: 95160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:37:49,194-Speed 5199.01 samples/sec Loss 3.5072 LearningRate 0.0511 Epoch: 5 Global Step: 95170 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:37:51,183-Speed 5149.20 samples/sec Loss 3.4859 LearningRate 0.0511 Epoch: 5 Global Step: 95180 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:37:53,155-Speed 5193.58 samples/sec Loss 3.5321 LearningRate 0.0511 Epoch: 5 Global Step: 95190 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:37:55,135-Speed 5175.29 samples/sec Loss 3.5481 LearningRate 0.0511 Epoch: 5 Global Step: 95200 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:37:57,123-Speed 5152.55 samples/sec Loss 3.5543 LearningRate 0.0511 Epoch: 5 Global Step: 95210 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:37:59,106-Speed 5164.50 samples/sec Loss 3.5198 LearningRate 0.0511 Epoch: 5 Global Step: 95220 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:38:01,119-Speed 5089.58 samples/sec Loss 3.5497 LearningRate 0.0511 Epoch: 5 Global Step: 95230 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:38:03,136-Speed 5079.92 samples/sec Loss 3.5198 LearningRate 0.0511 Epoch: 5 Global Step: 95240 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:38:05,106-Speed 5199.35 samples/sec Loss 3.5043 LearningRate 0.0511 Epoch: 5 Global Step: 95250 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:38:07,074-Speed 5204.41 samples/sec Loss 3.5157 LearningRate 0.0511 Epoch: 5 Global Step: 95260 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:38:09,047-Speed 5191.96 samples/sec Loss 3.5932 LearningRate 0.0511 Epoch: 5 Global Step: 95270 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:38:11,022-Speed 5185.14 samples/sec Loss 3.5195 LearningRate 0.0511 Epoch: 5 Global Step: 95280 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:38:13,005-Speed 5166.64 samples/sec Loss 3.5373 LearningRate 0.0511 Epoch: 5 Global Step: 95290 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:38:14,987-Speed 5166.85 samples/sec Loss 3.5495 LearningRate 0.0511 Epoch: 5 Global Step: 95300 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:38:16,973-Speed 5158.75 samples/sec Loss 3.6036 LearningRate 0.0510 Epoch: 5 Global Step: 95310 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:38:18,950-Speed 5181.24 samples/sec Loss 3.5603 LearningRate 0.0510 Epoch: 5 Global Step: 95320 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:38:20,942-Speed 5140.88 samples/sec Loss 3.5370 LearningRate 0.0510 Epoch: 5 Global Step: 95330 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:38:22,921-Speed 5175.69 samples/sec Loss 3.5067 LearningRate 0.0510 Epoch: 5 Global Step: 95340 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:38:24,914-Speed 5141.69 samples/sec Loss 3.4973 LearningRate 0.0510 Epoch: 5 Global Step: 95350 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:38:26,890-Speed 5182.91 samples/sec Loss 3.6048 LearningRate 0.0510 Epoch: 5 Global Step: 95360 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:38:28,870-Speed 5174.70 samples/sec Loss 3.5489 LearningRate 0.0510 Epoch: 5 Global Step: 95370 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:38:30,848-Speed 5176.61 samples/sec Loss 3.5877 LearningRate 0.0510 Epoch: 5 Global Step: 95380 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:38:32,817-Speed 5204.87 samples/sec Loss 3.6073 LearningRate 0.0510 Epoch: 5 Global Step: 95390 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:38:34,784-Speed 5207.60 samples/sec Loss 3.5243 LearningRate 0.0510 Epoch: 5 Global Step: 95400 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:38:36,761-Speed 5179.89 samples/sec Loss 3.4319 LearningRate 0.0510 Epoch: 5 Global Step: 95410 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:38:38,732-Speed 5196.59 samples/sec Loss 3.4976 LearningRate 0.0510 Epoch: 5 Global Step: 95420 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:38:40,734-Speed 5117.40 samples/sec Loss 3.5460 LearningRate 0.0510 Epoch: 5 Global Step: 95430 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:38:42,718-Speed 5163.14 samples/sec Loss 3.5518 LearningRate 0.0510 Epoch: 5 Global Step: 95440 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:38:44,704-Speed 5158.38 samples/sec Loss 3.5396 LearningRate 0.0510 Epoch: 5 Global Step: 95450 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:38:46,676-Speed 5194.51 samples/sec Loss 3.4842 LearningRate 0.0510 Epoch: 5 Global Step: 95460 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:38:48,682-Speed 5104.89 samples/sec Loss 3.5905 LearningRate 0.0510 Epoch: 5 Global Step: 95470 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:38:50,664-Speed 5169.74 samples/sec Loss 3.5829 LearningRate 0.0510 Epoch: 5 Global Step: 95480 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:38:52,654-Speed 5147.58 samples/sec Loss 3.5652 LearningRate 0.0510 Epoch: 5 Global Step: 95490 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:38:54,631-Speed 5181.84 samples/sec Loss 3.5631 LearningRate 0.0510 Epoch: 5 Global Step: 95500 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:38:56,612-Speed 5170.64 samples/sec Loss 3.5047 LearningRate 0.0510 Epoch: 5 Global Step: 95510 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:38:58,580-Speed 5202.28 samples/sec Loss 3.5123 LearningRate 0.0510 Epoch: 5 Global Step: 95520 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:39:00,545-Speed 5212.70 samples/sec Loss 3.5985 LearningRate 0.0510 Epoch: 5 Global Step: 95530 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:39:02,526-Speed 5173.61 samples/sec Loss 3.4761 LearningRate 0.0510 Epoch: 5 Global Step: 95540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:39:04,493-Speed 5208.43 samples/sec Loss 3.5705 LearningRate 0.0509 Epoch: 5 Global Step: 95550 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:39:06,463-Speed 5199.38 samples/sec Loss 3.5540 LearningRate 0.0509 Epoch: 5 Global Step: 95560 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:39:08,433-Speed 5198.70 samples/sec Loss 3.5623 LearningRate 0.0509 Epoch: 5 Global Step: 95570 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:39:10,426-Speed 5138.96 samples/sec Loss 3.4511 LearningRate 0.0509 Epoch: 5 Global Step: 95580 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:39:12,403-Speed 5183.04 samples/sec Loss 3.6388 LearningRate 0.0509 Epoch: 5 Global Step: 95590 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:39:14,378-Speed 5185.37 samples/sec Loss 3.5384 LearningRate 0.0509 Epoch: 5 Global Step: 95600 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:39:16,360-Speed 5168.04 samples/sec Loss 3.5764 LearningRate 0.0509 Epoch: 5 Global Step: 95610 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:39:18,332-Speed 5194.71 samples/sec Loss 3.5900 LearningRate 0.0509 Epoch: 5 Global Step: 95620 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:39:20,305-Speed 5191.11 samples/sec Loss 3.6473 LearningRate 0.0509 Epoch: 5 Global Step: 95630 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:39:22,288-Speed 5164.90 samples/sec Loss 3.5795 LearningRate 0.0509 Epoch: 5 Global Step: 95640 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:39:24,268-Speed 5173.93 samples/sec Loss 3.6208 LearningRate 0.0509 Epoch: 5 Global Step: 95650 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:39:26,249-Speed 5172.69 samples/sec Loss 3.6271 LearningRate 0.0509 Epoch: 5 Global Step: 95660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:39:28,223-Speed 5188.10 samples/sec Loss 3.5065 LearningRate 0.0509 Epoch: 5 Global Step: 95670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:39:30,196-Speed 5192.78 samples/sec Loss 3.4753 LearningRate 0.0509 Epoch: 5 Global Step: 95680 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:39:32,163-Speed 5205.63 samples/sec Loss 3.5728 LearningRate 0.0509 Epoch: 5 Global Step: 95690 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:39:34,133-Speed 5200.53 samples/sec Loss 3.5738 LearningRate 0.0509 Epoch: 5 Global Step: 95700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:39:36,108-Speed 5188.20 samples/sec Loss 3.4904 LearningRate 0.0509 Epoch: 5 Global Step: 95710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:39:38,103-Speed 5133.53 samples/sec Loss 3.5767 LearningRate 0.0509 Epoch: 5 Global Step: 95720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:39:40,074-Speed 5196.82 samples/sec Loss 3.5212 LearningRate 0.0509 Epoch: 5 Global Step: 95730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:39:42,047-Speed 5192.93 samples/sec Loss 3.5324 LearningRate 0.0509 Epoch: 5 Global Step: 95740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:39:44,011-Speed 5215.17 samples/sec Loss 3.5289 LearningRate 0.0509 Epoch: 5 Global Step: 95750 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:39:45,986-Speed 5187.43 samples/sec Loss 3.5980 LearningRate 0.0509 Epoch: 5 Global Step: 95760 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:39:47,966-Speed 5171.52 samples/sec Loss 3.4843 LearningRate 0.0509 Epoch: 5 Global Step: 95770 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:39:49,949-Speed 5167.31 samples/sec Loss 3.5575 LearningRate 0.0508 Epoch: 5 Global Step: 95780 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:39:51,932-Speed 5164.32 samples/sec Loss 3.5345 LearningRate 0.0508 Epoch: 5 Global Step: 95790 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:39:53,931-Speed 5124.84 samples/sec Loss 3.5495 LearningRate 0.0508 Epoch: 5 Global Step: 95800 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:39:55,904-Speed 5189.81 samples/sec Loss 3.5312 LearningRate 0.0508 Epoch: 5 Global Step: 95810 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:39:57,876-Speed 5194.13 samples/sec Loss 3.5430 LearningRate 0.0508 Epoch: 5 Global Step: 95820 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:39:59,845-Speed 5202.14 samples/sec Loss 3.5249 LearningRate 0.0508 Epoch: 5 Global Step: 95830 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:40:01,826-Speed 5172.39 samples/sec Loss 3.4798 LearningRate 0.0508 Epoch: 5 Global Step: 95840 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:40:03,814-Speed 5153.55 samples/sec Loss 3.5593 LearningRate 0.0508 Epoch: 5 Global Step: 95850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:40:05,804-Speed 5147.51 samples/sec Loss 3.4523 LearningRate 0.0508 Epoch: 5 Global Step: 95860 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:40:07,794-Speed 5147.15 samples/sec Loss 3.5315 LearningRate 0.0508 Epoch: 5 Global Step: 95870 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:40:09,776-Speed 5168.28 samples/sec Loss 3.5217 LearningRate 0.0508 Epoch: 5 Global Step: 95880 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:40:11,752-Speed 5185.04 samples/sec Loss 3.5847 LearningRate 0.0508 Epoch: 5 Global Step: 95890 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:40:13,740-Speed 5153.75 samples/sec Loss 3.5611 LearningRate 0.0508 Epoch: 5 Global Step: 95900 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:40:15,716-Speed 5183.39 samples/sec Loss 3.5808 LearningRate 0.0508 Epoch: 5 Global Step: 95910 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:40:17,708-Speed 5141.49 samples/sec Loss 3.5269 LearningRate 0.0508 Epoch: 5 Global Step: 95920 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:40:19,681-Speed 5191.97 samples/sec Loss 3.5358 LearningRate 0.0508 Epoch: 5 Global Step: 95930 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:40:21,684-Speed 5113.92 samples/sec Loss 3.6015 LearningRate 0.0508 Epoch: 5 Global Step: 95940 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:40:23,691-Speed 5105.13 samples/sec Loss 3.5499 LearningRate 0.0508 Epoch: 5 Global Step: 95950 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:40:25,686-Speed 5134.83 samples/sec Loss 3.5389 LearningRate 0.0508 Epoch: 5 Global Step: 95960 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:40:27,669-Speed 5165.70 samples/sec Loss 3.5522 LearningRate 0.0508 Epoch: 5 Global Step: 95970 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:40:29,650-Speed 5169.24 samples/sec Loss 3.4526 LearningRate 0.0508 Epoch: 5 Global Step: 95980 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:40:31,621-Speed 5197.52 samples/sec Loss 3.5752 LearningRate 0.0508 Epoch: 5 Global Step: 95990 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:40:33,610-Speed 5149.97 samples/sec Loss 3.5233 LearningRate 0.0508 Epoch: 5 Global Step: 96000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:41:00,207-[lfw][96000]XNorm: 23.012314 Training: 2022-04-11 05:41:00,207-[lfw][96000]Accuracy-Flip: 0.99800+-0.00287 Training: 2022-04-11 05:41:00,207-[lfw][96000]Accuracy-Highest: 0.99817 Training: 2022-04-11 05:41:30,961-[cfp_fp][96000]XNorm: 21.309848 Training: 2022-04-11 05:41:30,962-[cfp_fp][96000]Accuracy-Flip: 0.97957+-0.00636 Training: 2022-04-11 05:41:30,962-[cfp_fp][96000]Accuracy-Highest: 0.98086 Training: 2022-04-11 05:41:57,527-[agedb_30][96000]XNorm: 22.875894 Training: 2022-04-11 05:41:57,528-[agedb_30][96000]Accuracy-Flip: 0.97950+-0.00820 Training: 2022-04-11 05:41:57,528-[agedb_30][96000]Accuracy-Highest: 0.97950 Training: 2022-04-11 05:41:59,526-Speed 119.19 samples/sec Loss 3.5074 LearningRate 0.0507 Epoch: 5 Global Step: 96010 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:42:01,501-Speed 5187.43 samples/sec Loss 3.5949 LearningRate 0.0507 Epoch: 5 Global Step: 96020 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:42:03,481-Speed 5172.44 samples/sec Loss 3.4929 LearningRate 0.0507 Epoch: 5 Global Step: 96030 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:42:05,447-Speed 5209.58 samples/sec Loss 3.5151 LearningRate 0.0507 Epoch: 5 Global Step: 96040 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:42:07,414-Speed 5209.11 samples/sec Loss 3.4730 LearningRate 0.0507 Epoch: 5 Global Step: 96050 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:42:09,382-Speed 5205.52 samples/sec Loss 3.5276 LearningRate 0.0507 Epoch: 5 Global Step: 96060 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:42:11,354-Speed 5193.54 samples/sec Loss 3.6438 LearningRate 0.0507 Epoch: 5 Global Step: 96070 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:42:13,329-Speed 5187.41 samples/sec Loss 3.5851 LearningRate 0.0507 Epoch: 5 Global Step: 96080 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:42:15,300-Speed 5196.00 samples/sec Loss 3.5355 LearningRate 0.0507 Epoch: 5 Global Step: 96090 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:42:17,277-Speed 5180.70 samples/sec Loss 3.4690 LearningRate 0.0507 Epoch: 5 Global Step: 96100 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:42:19,255-Speed 5179.60 samples/sec Loss 3.5224 LearningRate 0.0507 Epoch: 5 Global Step: 96110 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:42:21,237-Speed 5167.77 samples/sec Loss 3.4753 LearningRate 0.0507 Epoch: 5 Global Step: 96120 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:42:23,226-Speed 5150.36 samples/sec Loss 3.5065 LearningRate 0.0507 Epoch: 5 Global Step: 96130 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:42:25,213-Speed 5154.77 samples/sec Loss 3.5946 LearningRate 0.0507 Epoch: 5 Global Step: 96140 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:42:27,206-Speed 5140.20 samples/sec Loss 3.4843 LearningRate 0.0507 Epoch: 5 Global Step: 96150 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:42:29,180-Speed 5187.17 samples/sec Loss 3.5679 LearningRate 0.0507 Epoch: 5 Global Step: 96160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:42:31,156-Speed 5185.66 samples/sec Loss 3.4987 LearningRate 0.0507 Epoch: 5 Global Step: 96170 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:42:33,143-Speed 5155.45 samples/sec Loss 3.5608 LearningRate 0.0507 Epoch: 5 Global Step: 96180 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:42:35,117-Speed 5189.62 samples/sec Loss 3.6499 LearningRate 0.0507 Epoch: 5 Global Step: 96190 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:42:37,098-Speed 5170.17 samples/sec Loss 3.4903 LearningRate 0.0507 Epoch: 5 Global Step: 96200 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:42:39,080-Speed 5168.01 samples/sec Loss 3.5557 LearningRate 0.0507 Epoch: 5 Global Step: 96210 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:42:41,063-Speed 5166.47 samples/sec Loss 3.5835 LearningRate 0.0507 Epoch: 5 Global Step: 96220 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:42:43,060-Speed 5128.12 samples/sec Loss 3.5487 LearningRate 0.0507 Epoch: 5 Global Step: 96230 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:42:45,053-Speed 5141.04 samples/sec Loss 3.5737 LearningRate 0.0507 Epoch: 5 Global Step: 96240 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:42:47,066-Speed 5089.14 samples/sec Loss 3.5107 LearningRate 0.0506 Epoch: 5 Global Step: 96250 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:42:49,072-Speed 5105.15 samples/sec Loss 3.5765 LearningRate 0.0506 Epoch: 5 Global Step: 96260 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:42:51,064-Speed 5143.12 samples/sec Loss 3.5691 LearningRate 0.0506 Epoch: 5 Global Step: 96270 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:42:53,079-Speed 5082.39 samples/sec Loss 3.4932 LearningRate 0.0506 Epoch: 5 Global Step: 96280 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:42:55,096-Speed 5079.60 samples/sec Loss 3.4824 LearningRate 0.0506 Epoch: 5 Global Step: 96290 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:42:57,070-Speed 5188.49 samples/sec Loss 3.5530 LearningRate 0.0506 Epoch: 5 Global Step: 96300 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:42:59,054-Speed 5164.58 samples/sec Loss 3.4921 LearningRate 0.0506 Epoch: 5 Global Step: 96310 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:43:01,033-Speed 5174.45 samples/sec Loss 3.5093 LearningRate 0.0506 Epoch: 5 Global Step: 96320 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:43:03,014-Speed 5171.76 samples/sec Loss 3.6167 LearningRate 0.0506 Epoch: 5 Global Step: 96330 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:43:05,004-Speed 5147.24 samples/sec Loss 3.5382 LearningRate 0.0506 Epoch: 5 Global Step: 96340 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:43:06,992-Speed 5152.66 samples/sec Loss 3.5196 LearningRate 0.0506 Epoch: 5 Global Step: 96350 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:43:08,978-Speed 5157.96 samples/sec Loss 3.5560 LearningRate 0.0506 Epoch: 5 Global Step: 96360 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:43:10,959-Speed 5168.66 samples/sec Loss 3.5297 LearningRate 0.0506 Epoch: 5 Global Step: 96370 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:43:12,943-Speed 5165.23 samples/sec Loss 3.6028 LearningRate 0.0506 Epoch: 5 Global Step: 96380 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:43:14,953-Speed 5096.98 samples/sec Loss 3.5333 LearningRate 0.0506 Epoch: 5 Global Step: 96390 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:43:16,935-Speed 5167.99 samples/sec Loss 3.5484 LearningRate 0.0506 Epoch: 5 Global Step: 96400 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:43:18,917-Speed 5167.91 samples/sec Loss 3.5224 LearningRate 0.0506 Epoch: 5 Global Step: 96410 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:43:20,910-Speed 5138.55 samples/sec Loss 3.4982 LearningRate 0.0506 Epoch: 5 Global Step: 96420 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:43:22,897-Speed 5156.01 samples/sec Loss 3.5825 LearningRate 0.0506 Epoch: 5 Global Step: 96430 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:43:24,881-Speed 5163.31 samples/sec Loss 3.4876 LearningRate 0.0506 Epoch: 5 Global Step: 96440 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:43:26,853-Speed 5192.70 samples/sec Loss 3.5340 LearningRate 0.0506 Epoch: 5 Global Step: 96450 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:43:28,845-Speed 5142.08 samples/sec Loss 3.4889 LearningRate 0.0506 Epoch: 5 Global Step: 96460 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:43:30,832-Speed 5155.98 samples/sec Loss 3.5540 LearningRate 0.0506 Epoch: 5 Global Step: 96470 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:43:32,822-Speed 5148.37 samples/sec Loss 3.5456 LearningRate 0.0505 Epoch: 5 Global Step: 96480 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:43:34,808-Speed 5157.80 samples/sec Loss 3.5114 LearningRate 0.0505 Epoch: 5 Global Step: 96490 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:43:36,802-Speed 5137.91 samples/sec Loss 3.5291 LearningRate 0.0505 Epoch: 5 Global Step: 96500 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:43:38,788-Speed 5157.51 samples/sec Loss 3.6077 LearningRate 0.0505 Epoch: 5 Global Step: 96510 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:43:40,772-Speed 5162.54 samples/sec Loss 3.5771 LearningRate 0.0505 Epoch: 5 Global Step: 96520 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:43:42,759-Speed 5155.71 samples/sec Loss 3.5274 LearningRate 0.0505 Epoch: 5 Global Step: 96530 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:43:44,739-Speed 5173.93 samples/sec Loss 3.4774 LearningRate 0.0505 Epoch: 5 Global Step: 96540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:43:46,732-Speed 5137.35 samples/sec Loss 3.5213 LearningRate 0.0505 Epoch: 5 Global Step: 96550 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:43:48,723-Speed 5145.89 samples/sec Loss 3.5935 LearningRate 0.0505 Epoch: 5 Global Step: 96560 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:43:50,736-Speed 5086.91 samples/sec Loss 3.5878 LearningRate 0.0505 Epoch: 5 Global Step: 96570 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:43:52,724-Speed 5153.00 samples/sec Loss 3.5664 LearningRate 0.0505 Epoch: 5 Global Step: 96580 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:43:54,711-Speed 5156.06 samples/sec Loss 3.5214 LearningRate 0.0505 Epoch: 5 Global Step: 96590 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:43:56,691-Speed 5174.88 samples/sec Loss 3.5857 LearningRate 0.0505 Epoch: 5 Global Step: 96600 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:43:58,669-Speed 5176.88 samples/sec Loss 3.4840 LearningRate 0.0505 Epoch: 5 Global Step: 96610 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:44:00,661-Speed 5142.74 samples/sec Loss 3.6235 LearningRate 0.0505 Epoch: 5 Global Step: 96620 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:44:02,653-Speed 5141.56 samples/sec Loss 3.4975 LearningRate 0.0505 Epoch: 5 Global Step: 96630 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:44:04,642-Speed 5150.91 samples/sec Loss 3.5708 LearningRate 0.0505 Epoch: 5 Global Step: 96640 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:44:06,624-Speed 5168.89 samples/sec Loss 3.5768 LearningRate 0.0505 Epoch: 5 Global Step: 96650 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:44:08,600-Speed 5182.98 samples/sec Loss 3.5594 LearningRate 0.0505 Epoch: 5 Global Step: 96660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:44:10,591-Speed 5143.94 samples/sec Loss 3.4608 LearningRate 0.0505 Epoch: 5 Global Step: 96670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:44:12,583-Speed 5142.23 samples/sec Loss 3.5944 LearningRate 0.0505 Epoch: 5 Global Step: 96680 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:44:14,572-Speed 5150.35 samples/sec Loss 3.6018 LearningRate 0.0505 Epoch: 5 Global Step: 96690 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:44:16,558-Speed 5158.98 samples/sec Loss 3.5626 LearningRate 0.0505 Epoch: 5 Global Step: 96700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:44:18,539-Speed 5172.35 samples/sec Loss 3.5628 LearningRate 0.0505 Epoch: 5 Global Step: 96710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:44:20,514-Speed 5186.09 samples/sec Loss 3.5337 LearningRate 0.0504 Epoch: 5 Global Step: 96720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:44:22,501-Speed 5153.50 samples/sec Loss 3.4959 LearningRate 0.0504 Epoch: 5 Global Step: 96730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:44:24,476-Speed 5186.99 samples/sec Loss 3.4729 LearningRate 0.0504 Epoch: 5 Global Step: 96740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:44:26,466-Speed 5146.96 samples/sec Loss 3.5134 LearningRate 0.0504 Epoch: 5 Global Step: 96750 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:44:28,444-Speed 5178.06 samples/sec Loss 3.5771 LearningRate 0.0504 Epoch: 5 Global Step: 96760 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:44:30,421-Speed 5181.85 samples/sec Loss 3.5839 LearningRate 0.0504 Epoch: 5 Global Step: 96770 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:44:32,396-Speed 5186.86 samples/sec Loss 3.4869 LearningRate 0.0504 Epoch: 5 Global Step: 96780 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:44:34,377-Speed 5170.23 samples/sec Loss 3.4727 LearningRate 0.0504 Epoch: 5 Global Step: 96790 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:44:36,356-Speed 5174.99 samples/sec Loss 3.5400 LearningRate 0.0504 Epoch: 5 Global Step: 96800 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:44:38,341-Speed 5161.30 samples/sec Loss 3.4973 LearningRate 0.0504 Epoch: 5 Global Step: 96810 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:44:40,323-Speed 5168.56 samples/sec Loss 3.5946 LearningRate 0.0504 Epoch: 5 Global Step: 96820 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:44:42,300-Speed 5181.50 samples/sec Loss 3.5395 LearningRate 0.0504 Epoch: 5 Global Step: 96830 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:44:44,279-Speed 5175.73 samples/sec Loss 3.5599 LearningRate 0.0504 Epoch: 5 Global Step: 96840 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:44:46,296-Speed 5078.91 samples/sec Loss 3.4868 LearningRate 0.0504 Epoch: 5 Global Step: 96850 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:44:48,286-Speed 5148.42 samples/sec Loss 3.5933 LearningRate 0.0504 Epoch: 5 Global Step: 96860 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:44:50,281-Speed 5133.10 samples/sec Loss 3.5733 LearningRate 0.0504 Epoch: 5 Global Step: 96870 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:44:52,286-Speed 5111.22 samples/sec Loss 3.5721 LearningRate 0.0504 Epoch: 5 Global Step: 96880 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:44:54,276-Speed 5147.12 samples/sec Loss 3.5681 LearningRate 0.0504 Epoch: 5 Global Step: 96890 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:44:56,271-Speed 5132.67 samples/sec Loss 3.5778 LearningRate 0.0504 Epoch: 5 Global Step: 96900 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:44:58,277-Speed 5107.46 samples/sec Loss 3.5291 LearningRate 0.0504 Epoch: 5 Global Step: 96910 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:45:00,279-Speed 5117.39 samples/sec Loss 3.5074 LearningRate 0.0504 Epoch: 5 Global Step: 96920 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:45:02,274-Speed 5134.98 samples/sec Loss 3.5649 LearningRate 0.0504 Epoch: 5 Global Step: 96930 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:45:04,253-Speed 5174.64 samples/sec Loss 3.4500 LearningRate 0.0504 Epoch: 5 Global Step: 96940 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:45:06,235-Speed 5167.96 samples/sec Loss 3.4876 LearningRate 0.0503 Epoch: 5 Global Step: 96950 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:45:08,213-Speed 5179.23 samples/sec Loss 3.4931 LearningRate 0.0503 Epoch: 5 Global Step: 96960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:45:10,199-Speed 5157.07 samples/sec Loss 3.5769 LearningRate 0.0503 Epoch: 5 Global Step: 96970 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:45:12,187-Speed 5152.97 samples/sec Loss 3.4612 LearningRate 0.0503 Epoch: 5 Global Step: 96980 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:45:14,177-Speed 5148.57 samples/sec Loss 3.5226 LearningRate 0.0503 Epoch: 5 Global Step: 96990 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:45:16,160-Speed 5165.15 samples/sec Loss 3.5536 LearningRate 0.0503 Epoch: 5 Global Step: 97000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:45:18,152-Speed 5143.78 samples/sec Loss 3.5765 LearningRate 0.0503 Epoch: 5 Global Step: 97010 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:45:20,136-Speed 5162.28 samples/sec Loss 3.5505 LearningRate 0.0503 Epoch: 5 Global Step: 97020 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:45:22,133-Speed 5128.32 samples/sec Loss 3.5975 LearningRate 0.0503 Epoch: 5 Global Step: 97030 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:45:24,107-Speed 5189.45 samples/sec Loss 3.6017 LearningRate 0.0503 Epoch: 5 Global Step: 97040 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:45:26,130-Speed 5064.11 samples/sec Loss 3.4945 LearningRate 0.0503 Epoch: 5 Global Step: 97050 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:45:28,128-Speed 5127.23 samples/sec Loss 3.4864 LearningRate 0.0503 Epoch: 5 Global Step: 97060 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:45:30,121-Speed 5137.90 samples/sec Loss 3.5280 LearningRate 0.0503 Epoch: 5 Global Step: 97070 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:45:32,116-Speed 5135.77 samples/sec Loss 3.4765 LearningRate 0.0503 Epoch: 5 Global Step: 97080 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:45:34,113-Speed 5130.46 samples/sec Loss 3.5027 LearningRate 0.0503 Epoch: 5 Global Step: 97090 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:45:36,093-Speed 5173.05 samples/sec Loss 3.4300 LearningRate 0.0503 Epoch: 5 Global Step: 97100 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:45:38,080-Speed 5153.93 samples/sec Loss 3.5186 LearningRate 0.0503 Epoch: 5 Global Step: 97110 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:45:40,074-Speed 5138.78 samples/sec Loss 3.5392 LearningRate 0.0503 Epoch: 5 Global Step: 97120 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:45:42,049-Speed 5186.53 samples/sec Loss 3.5643 LearningRate 0.0503 Epoch: 5 Global Step: 97130 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:45:44,029-Speed 5173.41 samples/sec Loss 3.5985 LearningRate 0.0503 Epoch: 5 Global Step: 97140 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:45:46,012-Speed 5164.39 samples/sec Loss 3.5807 LearningRate 0.0503 Epoch: 5 Global Step: 97150 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:45:47,996-Speed 5164.65 samples/sec Loss 3.5936 LearningRate 0.0503 Epoch: 5 Global Step: 97160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:45:49,999-Speed 5111.74 samples/sec Loss 3.6235 LearningRate 0.0503 Epoch: 5 Global Step: 97170 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:45:51,983-Speed 5165.36 samples/sec Loss 3.5495 LearningRate 0.0503 Epoch: 5 Global Step: 97180 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:45:53,971-Speed 5151.68 samples/sec Loss 3.5799 LearningRate 0.0502 Epoch: 5 Global Step: 97190 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:45:55,945-Speed 5188.77 samples/sec Loss 3.5752 LearningRate 0.0502 Epoch: 5 Global Step: 97200 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:45:57,944-Speed 5126.40 samples/sec Loss 3.4629 LearningRate 0.0502 Epoch: 5 Global Step: 97210 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:45:59,927-Speed 5164.54 samples/sec Loss 3.5371 LearningRate 0.0502 Epoch: 5 Global Step: 97220 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:46:01,906-Speed 5174.51 samples/sec Loss 3.6193 LearningRate 0.0502 Epoch: 5 Global Step: 97230 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:46:03,900-Speed 5138.10 samples/sec Loss 3.5329 LearningRate 0.0502 Epoch: 5 Global Step: 97240 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:46:05,886-Speed 5158.78 samples/sec Loss 3.5494 LearningRate 0.0502 Epoch: 5 Global Step: 97250 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:46:07,861-Speed 5184.91 samples/sec Loss 3.5751 LearningRate 0.0502 Epoch: 5 Global Step: 97260 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:46:09,844-Speed 5167.08 samples/sec Loss 3.5610 LearningRate 0.0502 Epoch: 5 Global Step: 97270 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:46:11,823-Speed 5174.68 samples/sec Loss 3.4303 LearningRate 0.0502 Epoch: 5 Global Step: 97280 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:46:13,808-Speed 5160.10 samples/sec Loss 3.5775 LearningRate 0.0502 Epoch: 5 Global Step: 97290 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:46:15,788-Speed 5173.63 samples/sec Loss 3.4530 LearningRate 0.0502 Epoch: 5 Global Step: 97300 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:46:17,777-Speed 5151.78 samples/sec Loss 3.5058 LearningRate 0.0502 Epoch: 5 Global Step: 97310 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:46:19,751-Speed 5186.83 samples/sec Loss 3.4603 LearningRate 0.0502 Epoch: 5 Global Step: 97320 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:46:21,741-Speed 5148.83 samples/sec Loss 3.5548 LearningRate 0.0502 Epoch: 5 Global Step: 97330 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:46:23,723-Speed 5167.37 samples/sec Loss 3.5089 LearningRate 0.0502 Epoch: 5 Global Step: 97340 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:46:25,698-Speed 5186.10 samples/sec Loss 3.5181 LearningRate 0.0502 Epoch: 5 Global Step: 97350 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:46:27,695-Speed 5129.34 samples/sec Loss 3.4252 LearningRate 0.0502 Epoch: 5 Global Step: 97360 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:46:29,692-Speed 5130.49 samples/sec Loss 3.5394 LearningRate 0.0502 Epoch: 5 Global Step: 97370 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:46:31,680-Speed 5152.45 samples/sec Loss 3.4758 LearningRate 0.0502 Epoch: 5 Global Step: 97380 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:46:33,672-Speed 5141.95 samples/sec Loss 3.5625 LearningRate 0.0502 Epoch: 5 Global Step: 97390 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:46:35,675-Speed 5113.09 samples/sec Loss 3.4923 LearningRate 0.0502 Epoch: 5 Global Step: 97400 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:46:37,660-Speed 5162.16 samples/sec Loss 3.4488 LearningRate 0.0502 Epoch: 5 Global Step: 97410 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:46:39,642-Speed 5168.54 samples/sec Loss 3.5117 LearningRate 0.0501 Epoch: 5 Global Step: 97420 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:46:41,632-Speed 5146.35 samples/sec Loss 3.5584 LearningRate 0.0501 Epoch: 5 Global Step: 97430 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:46:43,607-Speed 5187.51 samples/sec Loss 3.4700 LearningRate 0.0501 Epoch: 5 Global Step: 97440 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:46:45,599-Speed 5142.14 samples/sec Loss 3.5587 LearningRate 0.0501 Epoch: 5 Global Step: 97450 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:46:47,578-Speed 5176.11 samples/sec Loss 3.5099 LearningRate 0.0501 Epoch: 5 Global Step: 97460 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:46:49,566-Speed 5150.89 samples/sec Loss 3.4387 LearningRate 0.0501 Epoch: 5 Global Step: 97470 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:46:51,568-Speed 5119.65 samples/sec Loss 3.5628 LearningRate 0.0501 Epoch: 5 Global Step: 97480 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:46:53,549-Speed 5171.77 samples/sec Loss 3.5544 LearningRate 0.0501 Epoch: 5 Global Step: 97490 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:46:55,526-Speed 5179.35 samples/sec Loss 3.5061 LearningRate 0.0501 Epoch: 5 Global Step: 97500 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:46:57,520-Speed 5139.58 samples/sec Loss 3.4967 LearningRate 0.0501 Epoch: 5 Global Step: 97510 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:46:59,495-Speed 5187.21 samples/sec Loss 3.4981 LearningRate 0.0501 Epoch: 5 Global Step: 97520 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:47:01,477-Speed 5167.17 samples/sec Loss 3.4527 LearningRate 0.0501 Epoch: 5 Global Step: 97530 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:47:03,468-Speed 5145.30 samples/sec Loss 3.5264 LearningRate 0.0501 Epoch: 5 Global Step: 97540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:47:05,444-Speed 5182.68 samples/sec Loss 3.4862 LearningRate 0.0501 Epoch: 5 Global Step: 97550 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:47:07,425-Speed 5172.27 samples/sec Loss 3.5804 LearningRate 0.0501 Epoch: 5 Global Step: 97560 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:47:09,413-Speed 5152.28 samples/sec Loss 3.5047 LearningRate 0.0501 Epoch: 5 Global Step: 97570 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:47:11,407-Speed 5136.55 samples/sec Loss 3.5546 LearningRate 0.0501 Epoch: 5 Global Step: 97580 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:47:13,396-Speed 5149.28 samples/sec Loss 3.4925 LearningRate 0.0501 Epoch: 5 Global Step: 97590 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:47:15,407-Speed 5094.64 samples/sec Loss 3.5579 LearningRate 0.0501 Epoch: 5 Global Step: 97600 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:47:17,386-Speed 5174.67 samples/sec Loss 3.4953 LearningRate 0.0501 Epoch: 5 Global Step: 97610 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:47:19,373-Speed 5156.89 samples/sec Loss 3.5219 LearningRate 0.0501 Epoch: 5 Global Step: 97620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:47:21,370-Speed 5128.25 samples/sec Loss 3.5463 LearningRate 0.0501 Epoch: 5 Global Step: 97630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:47:23,347-Speed 5181.68 samples/sec Loss 3.5557 LearningRate 0.0501 Epoch: 5 Global Step: 97640 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:47:25,328-Speed 5171.66 samples/sec Loss 3.5399 LearningRate 0.0501 Epoch: 5 Global Step: 97650 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:47:27,306-Speed 5177.03 samples/sec Loss 3.5211 LearningRate 0.0500 Epoch: 5 Global Step: 97660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:47:29,287-Speed 5172.08 samples/sec Loss 3.5092 LearningRate 0.0500 Epoch: 5 Global Step: 97670 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:47:31,268-Speed 5170.25 samples/sec Loss 3.5843 LearningRate 0.0500 Epoch: 5 Global Step: 97680 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:47:33,243-Speed 5186.42 samples/sec Loss 3.4315 LearningRate 0.0500 Epoch: 5 Global Step: 97690 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:47:35,248-Speed 5110.53 samples/sec Loss 3.4857 LearningRate 0.0500 Epoch: 5 Global Step: 97700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:47:37,229-Speed 5170.83 samples/sec Loss 3.5632 LearningRate 0.0500 Epoch: 5 Global Step: 97710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:47:39,241-Speed 5089.49 samples/sec Loss 3.5300 LearningRate 0.0500 Epoch: 5 Global Step: 97720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:47:41,227-Speed 5158.50 samples/sec Loss 3.5291 LearningRate 0.0500 Epoch: 5 Global Step: 97730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:47:43,211-Speed 5161.84 samples/sec Loss 3.5258 LearningRate 0.0500 Epoch: 5 Global Step: 97740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:47:45,197-Speed 5160.38 samples/sec Loss 3.4022 LearningRate 0.0500 Epoch: 5 Global Step: 97750 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:47:47,187-Speed 5146.55 samples/sec Loss 3.5040 LearningRate 0.0500 Epoch: 5 Global Step: 97760 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:47:49,202-Speed 5083.35 samples/sec Loss 3.5909 LearningRate 0.0500 Epoch: 5 Global Step: 97770 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:47:51,194-Speed 5144.07 samples/sec Loss 3.5900 LearningRate 0.0500 Epoch: 5 Global Step: 97780 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:47:53,185-Speed 5143.70 samples/sec Loss 3.5894 LearningRate 0.0500 Epoch: 5 Global Step: 97790 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:47:55,161-Speed 5184.67 samples/sec Loss 3.5360 LearningRate 0.0500 Epoch: 5 Global Step: 97800 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:47:57,142-Speed 5171.49 samples/sec Loss 3.5472 LearningRate 0.0500 Epoch: 5 Global Step: 97810 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:47:59,118-Speed 5182.77 samples/sec Loss 3.5361 LearningRate 0.0500 Epoch: 5 Global Step: 97820 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:48:01,099-Speed 5169.60 samples/sec Loss 3.5557 LearningRate 0.0500 Epoch: 5 Global Step: 97830 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:48:03,089-Speed 5148.38 samples/sec Loss 3.5585 LearningRate 0.0500 Epoch: 5 Global Step: 97840 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:48:05,073-Speed 5163.19 samples/sec Loss 3.5391 LearningRate 0.0500 Epoch: 5 Global Step: 97850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:48:07,056-Speed 5165.48 samples/sec Loss 3.5714 LearningRate 0.0500 Epoch: 5 Global Step: 97860 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:48:09,053-Speed 5131.35 samples/sec Loss 3.5578 LearningRate 0.0500 Epoch: 5 Global Step: 97870 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:48:11,052-Speed 5122.28 samples/sec Loss 3.4999 LearningRate 0.0500 Epoch: 5 Global Step: 97880 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:48:13,029-Speed 5181.16 samples/sec Loss 3.4448 LearningRate 0.0500 Epoch: 5 Global Step: 97890 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:48:15,012-Speed 5167.41 samples/sec Loss 3.6213 LearningRate 0.0499 Epoch: 5 Global Step: 97900 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:48:16,992-Speed 5171.88 samples/sec Loss 3.5298 LearningRate 0.0499 Epoch: 5 Global Step: 97910 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:48:18,967-Speed 5185.95 samples/sec Loss 3.4864 LearningRate 0.0499 Epoch: 5 Global Step: 97920 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:48:20,947-Speed 5174.71 samples/sec Loss 3.4299 LearningRate 0.0499 Epoch: 5 Global Step: 97930 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:48:22,925-Speed 5178.54 samples/sec Loss 3.5493 LearningRate 0.0499 Epoch: 5 Global Step: 97940 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:48:24,911-Speed 5158.42 samples/sec Loss 3.4405 LearningRate 0.0499 Epoch: 5 Global Step: 97950 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:48:26,895-Speed 5162.49 samples/sec Loss 3.5121 LearningRate 0.0499 Epoch: 5 Global Step: 97960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:48:28,878-Speed 5164.75 samples/sec Loss 3.5293 LearningRate 0.0499 Epoch: 5 Global Step: 97970 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:48:30,856-Speed 5180.93 samples/sec Loss 3.5025 LearningRate 0.0499 Epoch: 5 Global Step: 97980 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:48:32,834-Speed 5176.34 samples/sec Loss 3.5715 LearningRate 0.0499 Epoch: 5 Global Step: 97990 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:48:34,818-Speed 5164.98 samples/sec Loss 3.4965 LearningRate 0.0499 Epoch: 5 Global Step: 98000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:49:01,498-[lfw][98000]XNorm: 22.178783 Training: 2022-04-11 05:49:01,498-[lfw][98000]Accuracy-Flip: 0.99750+-0.00300 Training: 2022-04-11 05:49:01,499-[lfw][98000]Accuracy-Highest: 0.99817 Training: 2022-04-11 05:49:32,252-[cfp_fp][98000]XNorm: 20.005678 Training: 2022-04-11 05:49:32,253-[cfp_fp][98000]Accuracy-Flip: 0.97700+-0.00624 Training: 2022-04-11 05:49:32,253-[cfp_fp][98000]Accuracy-Highest: 0.98086 Training: 2022-04-11 05:49:58,797-[agedb_30][98000]XNorm: 21.588698 Training: 2022-04-11 05:49:58,798-[agedb_30][98000]Accuracy-Flip: 0.97667+-0.00882 Training: 2022-04-11 05:49:58,798-[agedb_30][98000]Accuracy-Highest: 0.97950 Training: 2022-04-11 05:50:00,786-Speed 119.11 samples/sec Loss 3.4690 LearningRate 0.0499 Epoch: 5 Global Step: 98010 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:50:02,773-Speed 5155.16 samples/sec Loss 3.4583 LearningRate 0.0499 Epoch: 5 Global Step: 98020 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:50:04,786-Speed 5087.44 samples/sec Loss 3.5129 LearningRate 0.0499 Epoch: 5 Global Step: 98030 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:50:06,790-Speed 5112.20 samples/sec Loss 3.4850 LearningRate 0.0499 Epoch: 5 Global Step: 98040 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:50:08,769-Speed 5175.33 samples/sec Loss 3.5433 LearningRate 0.0499 Epoch: 5 Global Step: 98050 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:50:10,736-Speed 5208.45 samples/sec Loss 3.4726 LearningRate 0.0499 Epoch: 5 Global Step: 98060 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:50:12,707-Speed 5195.83 samples/sec Loss 3.5021 LearningRate 0.0499 Epoch: 5 Global Step: 98070 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:50:14,676-Speed 5202.41 samples/sec Loss 3.5210 LearningRate 0.0499 Epoch: 5 Global Step: 98080 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:50:16,647-Speed 5198.07 samples/sec Loss 3.5608 LearningRate 0.0499 Epoch: 5 Global Step: 98090 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:50:18,618-Speed 5196.56 samples/sec Loss 3.4582 LearningRate 0.0499 Epoch: 5 Global Step: 98100 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:50:20,590-Speed 5195.44 samples/sec Loss 3.4312 LearningRate 0.0499 Epoch: 5 Global Step: 98110 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:50:22,575-Speed 5160.50 samples/sec Loss 3.5635 LearningRate 0.0499 Epoch: 5 Global Step: 98120 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:50:24,571-Speed 5132.61 samples/sec Loss 3.4569 LearningRate 0.0498 Epoch: 5 Global Step: 98130 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:50:26,542-Speed 5196.47 samples/sec Loss 3.5890 LearningRate 0.0498 Epoch: 5 Global Step: 98140 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:50:28,506-Speed 5216.60 samples/sec Loss 3.4508 LearningRate 0.0498 Epoch: 5 Global Step: 98150 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:50:30,495-Speed 5149.95 samples/sec Loss 3.5078 LearningRate 0.0498 Epoch: 5 Global Step: 98160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:50:32,468-Speed 5191.04 samples/sec Loss 3.4122 LearningRate 0.0498 Epoch: 5 Global Step: 98170 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:50:34,449-Speed 5171.62 samples/sec Loss 3.5738 LearningRate 0.0498 Epoch: 5 Global Step: 98180 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:50:36,437-Speed 5152.22 samples/sec Loss 3.5290 LearningRate 0.0498 Epoch: 5 Global Step: 98190 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:50:38,433-Speed 5130.83 samples/sec Loss 3.5115 LearningRate 0.0498 Epoch: 5 Global Step: 98200 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:50:40,429-Speed 5131.90 samples/sec Loss 3.4628 LearningRate 0.0498 Epoch: 5 Global Step: 98210 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:50:42,422-Speed 5140.88 samples/sec Loss 3.5086 LearningRate 0.0498 Epoch: 5 Global Step: 98220 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:50:44,394-Speed 5195.16 samples/sec Loss 3.4946 LearningRate 0.0498 Epoch: 5 Global Step: 98230 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:50:46,372-Speed 5178.83 samples/sec Loss 3.4416 LearningRate 0.0498 Epoch: 5 Global Step: 98240 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:50:48,368-Speed 5131.53 samples/sec Loss 3.5430 LearningRate 0.0498 Epoch: 5 Global Step: 98250 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:50:50,386-Speed 5074.56 samples/sec Loss 3.5362 LearningRate 0.0498 Epoch: 5 Global Step: 98260 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:50:52,361-Speed 5187.19 samples/sec Loss 3.5657 LearningRate 0.0498 Epoch: 5 Global Step: 98270 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:50:54,335-Speed 5188.92 samples/sec Loss 3.4873 LearningRate 0.0498 Epoch: 5 Global Step: 98280 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:50:56,309-Speed 5190.40 samples/sec Loss 3.5374 LearningRate 0.0498 Epoch: 5 Global Step: 98290 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:50:58,296-Speed 5154.93 samples/sec Loss 3.4998 LearningRate 0.0498 Epoch: 5 Global Step: 98300 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:51:00,277-Speed 5168.62 samples/sec Loss 3.5031 LearningRate 0.0498 Epoch: 5 Global Step: 98310 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:51:02,259-Speed 5170.19 samples/sec Loss 3.4972 LearningRate 0.0498 Epoch: 5 Global Step: 98320 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:51:04,273-Speed 5086.56 samples/sec Loss 3.5442 LearningRate 0.0498 Epoch: 5 Global Step: 98330 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:51:06,250-Speed 5182.34 samples/sec Loss 3.4748 LearningRate 0.0498 Epoch: 5 Global Step: 98340 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:51:08,228-Speed 5176.71 samples/sec Loss 3.5799 LearningRate 0.0498 Epoch: 5 Global Step: 98350 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:51:10,227-Speed 5124.79 samples/sec Loss 3.4915 LearningRate 0.0498 Epoch: 5 Global Step: 98360 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:51:12,242-Speed 5082.43 samples/sec Loss 3.4622 LearningRate 0.0497 Epoch: 5 Global Step: 98370 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:51:14,237-Speed 5136.17 samples/sec Loss 3.5201 LearningRate 0.0497 Epoch: 5 Global Step: 98380 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:51:16,212-Speed 5185.27 samples/sec Loss 3.5488 LearningRate 0.0497 Epoch: 5 Global Step: 98390 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:51:18,191-Speed 5175.76 samples/sec Loss 3.3727 LearningRate 0.0497 Epoch: 5 Global Step: 98400 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:51:20,180-Speed 5152.42 samples/sec Loss 3.5002 LearningRate 0.0497 Epoch: 5 Global Step: 98410 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:51:22,159-Speed 5174.71 samples/sec Loss 3.4583 LearningRate 0.0497 Epoch: 5 Global Step: 98420 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:51:24,151-Speed 5142.36 samples/sec Loss 3.4669 LearningRate 0.0497 Epoch: 5 Global Step: 98430 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:51:26,132-Speed 5171.90 samples/sec Loss 3.4403 LearningRate 0.0497 Epoch: 5 Global Step: 98440 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:51:28,114-Speed 5167.53 samples/sec Loss 3.5077 LearningRate 0.0497 Epoch: 5 Global Step: 98450 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:51:30,102-Speed 5152.62 samples/sec Loss 3.4730 LearningRate 0.0497 Epoch: 5 Global Step: 98460 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:51:32,079-Speed 5182.77 samples/sec Loss 3.5553 LearningRate 0.0497 Epoch: 5 Global Step: 98470 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:51:34,100-Speed 5067.01 samples/sec Loss 3.4862 LearningRate 0.0497 Epoch: 5 Global Step: 98480 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:51:36,098-Speed 5126.52 samples/sec Loss 3.5678 LearningRate 0.0497 Epoch: 5 Global Step: 98490 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:51:38,116-Speed 5076.75 samples/sec Loss 3.5240 LearningRate 0.0497 Epoch: 5 Global Step: 98500 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:51:40,101-Speed 5160.21 samples/sec Loss 3.5342 LearningRate 0.0497 Epoch: 5 Global Step: 98510 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:51:42,087-Speed 5157.00 samples/sec Loss 3.4811 LearningRate 0.0497 Epoch: 5 Global Step: 98520 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:51:44,065-Speed 5178.98 samples/sec Loss 3.4404 LearningRate 0.0497 Epoch: 5 Global Step: 98530 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:51:46,042-Speed 5181.13 samples/sec Loss 3.5147 LearningRate 0.0497 Epoch: 5 Global Step: 98540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:51:48,032-Speed 5147.97 samples/sec Loss 3.4741 LearningRate 0.0497 Epoch: 5 Global Step: 98550 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:51:50,024-Speed 5142.14 samples/sec Loss 3.5206 LearningRate 0.0497 Epoch: 5 Global Step: 98560 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:51:52,014-Speed 5148.03 samples/sec Loss 3.5680 LearningRate 0.0497 Epoch: 5 Global Step: 98570 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:51:53,993-Speed 5176.93 samples/sec Loss 3.5715 LearningRate 0.0497 Epoch: 5 Global Step: 98580 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:51:55,969-Speed 5182.66 samples/sec Loss 3.5702 LearningRate 0.0497 Epoch: 5 Global Step: 98590 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:51:57,971-Speed 5116.85 samples/sec Loss 3.4550 LearningRate 0.0497 Epoch: 5 Global Step: 98600 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:51:59,952-Speed 5170.15 samples/sec Loss 3.5478 LearningRate 0.0496 Epoch: 5 Global Step: 98610 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:52:01,980-Speed 5051.56 samples/sec Loss 3.4935 LearningRate 0.0496 Epoch: 5 Global Step: 98620 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:52:03,954-Speed 5190.31 samples/sec Loss 3.5654 LearningRate 0.0496 Epoch: 5 Global Step: 98630 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:52:05,959-Speed 5109.94 samples/sec Loss 3.4469 LearningRate 0.0496 Epoch: 5 Global Step: 98640 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:52:07,934-Speed 5185.62 samples/sec Loss 3.3384 LearningRate 0.0496 Epoch: 5 Global Step: 98650 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:52:09,903-Speed 5202.26 samples/sec Loss 3.5080 LearningRate 0.0496 Epoch: 5 Global Step: 98660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:52:11,884-Speed 5171.68 samples/sec Loss 3.6018 LearningRate 0.0496 Epoch: 5 Global Step: 98670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:52:13,861-Speed 5181.63 samples/sec Loss 3.5788 LearningRate 0.0496 Epoch: 5 Global Step: 98680 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:52:15,851-Speed 5146.58 samples/sec Loss 3.4499 LearningRate 0.0496 Epoch: 5 Global Step: 98690 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:52:17,828-Speed 5180.48 samples/sec Loss 3.5461 LearningRate 0.0496 Epoch: 5 Global Step: 98700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:52:19,805-Speed 5182.55 samples/sec Loss 3.3771 LearningRate 0.0496 Epoch: 5 Global Step: 98710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:52:21,779-Speed 5187.33 samples/sec Loss 3.5514 LearningRate 0.0496 Epoch: 5 Global Step: 98720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:52:23,761-Speed 5169.96 samples/sec Loss 3.5187 LearningRate 0.0496 Epoch: 5 Global Step: 98730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:52:25,743-Speed 5167.42 samples/sec Loss 3.4865 LearningRate 0.0496 Epoch: 5 Global Step: 98740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:52:27,736-Speed 5139.93 samples/sec Loss 3.5042 LearningRate 0.0496 Epoch: 5 Global Step: 98750 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:52:29,712-Speed 5185.00 samples/sec Loss 3.5507 LearningRate 0.0496 Epoch: 5 Global Step: 98760 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:52:31,676-Speed 5215.87 samples/sec Loss 3.4921 LearningRate 0.0496 Epoch: 5 Global Step: 98770 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:52:33,652-Speed 5183.42 samples/sec Loss 3.5207 LearningRate 0.0496 Epoch: 5 Global Step: 98780 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:52:35,621-Speed 5201.36 samples/sec Loss 3.4259 LearningRate 0.0496 Epoch: 5 Global Step: 98790 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:52:37,591-Speed 5200.30 samples/sec Loss 3.5217 LearningRate 0.0496 Epoch: 5 Global Step: 98800 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:52:39,575-Speed 5162.54 samples/sec Loss 3.4861 LearningRate 0.0496 Epoch: 5 Global Step: 98810 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:52:41,557-Speed 5168.07 samples/sec Loss 3.5571 LearningRate 0.0496 Epoch: 5 Global Step: 98820 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:52:43,535-Speed 5177.83 samples/sec Loss 3.4861 LearningRate 0.0496 Epoch: 5 Global Step: 98830 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:52:45,523-Speed 5154.30 samples/sec Loss 3.5372 LearningRate 0.0495 Epoch: 5 Global Step: 98840 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:52:47,518-Speed 5135.02 samples/sec Loss 3.5213 LearningRate 0.0495 Epoch: 5 Global Step: 98850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:52:49,503-Speed 5160.18 samples/sec Loss 3.4777 LearningRate 0.0495 Epoch: 5 Global Step: 98860 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:52:51,477-Speed 5188.77 samples/sec Loss 3.5125 LearningRate 0.0495 Epoch: 5 Global Step: 98870 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:52:53,461-Speed 5163.21 samples/sec Loss 3.5638 LearningRate 0.0495 Epoch: 5 Global Step: 98880 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:52:55,441-Speed 5173.62 samples/sec Loss 3.5186 LearningRate 0.0495 Epoch: 5 Global Step: 98890 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:52:57,414-Speed 5191.96 samples/sec Loss 3.5002 LearningRate 0.0495 Epoch: 5 Global Step: 98900 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:52:59,389-Speed 5185.94 samples/sec Loss 3.4715 LearningRate 0.0495 Epoch: 5 Global Step: 98910 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:53:01,374-Speed 5159.68 samples/sec Loss 3.5314 LearningRate 0.0495 Epoch: 5 Global Step: 98920 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:53:03,358-Speed 5163.41 samples/sec Loss 3.4900 LearningRate 0.0495 Epoch: 5 Global Step: 98930 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:53:05,343-Speed 5159.24 samples/sec Loss 3.5019 LearningRate 0.0495 Epoch: 5 Global Step: 98940 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:53:07,321-Speed 5179.51 samples/sec Loss 3.5613 LearningRate 0.0495 Epoch: 5 Global Step: 98950 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:53:09,349-Speed 5050.45 samples/sec Loss 3.5030 LearningRate 0.0495 Epoch: 5 Global Step: 98960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:53:11,345-Speed 5134.34 samples/sec Loss 3.3959 LearningRate 0.0495 Epoch: 5 Global Step: 98970 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:53:13,367-Speed 5065.61 samples/sec Loss 3.5707 LearningRate 0.0495 Epoch: 5 Global Step: 98980 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:53:15,345-Speed 5180.09 samples/sec Loss 3.6090 LearningRate 0.0495 Epoch: 5 Global Step: 98990 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:53:17,332-Speed 5154.34 samples/sec Loss 3.4344 LearningRate 0.0495 Epoch: 5 Global Step: 99000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:53:19,308-Speed 5184.69 samples/sec Loss 3.5470 LearningRate 0.0495 Epoch: 5 Global Step: 99010 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:53:21,294-Speed 5156.25 samples/sec Loss 3.5476 LearningRate 0.0495 Epoch: 5 Global Step: 99020 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:53:23,274-Speed 5173.54 samples/sec Loss 3.4255 LearningRate 0.0495 Epoch: 5 Global Step: 99030 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:53:25,261-Speed 5155.06 samples/sec Loss 3.4586 LearningRate 0.0495 Epoch: 5 Global Step: 99040 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:53:27,256-Speed 5135.70 samples/sec Loss 3.5534 LearningRate 0.0495 Epoch: 5 Global Step: 99050 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:53:29,258-Speed 5118.15 samples/sec Loss 3.5083 LearningRate 0.0495 Epoch: 5 Global Step: 99060 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:53:31,233-Speed 5185.99 samples/sec Loss 3.5625 LearningRate 0.0495 Epoch: 5 Global Step: 99070 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:53:33,206-Speed 5192.11 samples/sec Loss 3.4939 LearningRate 0.0494 Epoch: 5 Global Step: 99080 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:53:35,174-Speed 5203.88 samples/sec Loss 3.5021 LearningRate 0.0494 Epoch: 5 Global Step: 99090 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:53:37,160-Speed 5158.12 samples/sec Loss 3.5186 LearningRate 0.0494 Epoch: 5 Global Step: 99100 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:53:39,135-Speed 5187.43 samples/sec Loss 3.4780 LearningRate 0.0494 Epoch: 5 Global Step: 99110 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:53:41,134-Speed 5123.46 samples/sec Loss 3.4143 LearningRate 0.0494 Epoch: 5 Global Step: 99120 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:53:43,111-Speed 5180.31 samples/sec Loss 3.4756 LearningRate 0.0494 Epoch: 5 Global Step: 99130 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:53:45,109-Speed 5128.42 samples/sec Loss 3.5362 LearningRate 0.0494 Epoch: 5 Global Step: 99140 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:53:47,106-Speed 5127.40 samples/sec Loss 3.5245 LearningRate 0.0494 Epoch: 5 Global Step: 99150 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:53:49,097-Speed 5147.36 samples/sec Loss 3.3836 LearningRate 0.0494 Epoch: 5 Global Step: 99160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:53:51,077-Speed 5172.99 samples/sec Loss 3.4870 LearningRate 0.0494 Epoch: 5 Global Step: 99170 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:53:53,053-Speed 5183.07 samples/sec Loss 3.5573 LearningRate 0.0494 Epoch: 5 Global Step: 99180 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:53:55,025-Speed 5193.95 samples/sec Loss 3.4711 LearningRate 0.0494 Epoch: 5 Global Step: 99190 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:53:57,010-Speed 5163.97 samples/sec Loss 3.5289 LearningRate 0.0494 Epoch: 5 Global Step: 99200 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:53:59,005-Speed 5132.75 samples/sec Loss 3.5137 LearningRate 0.0494 Epoch: 5 Global Step: 99210 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:54:01,006-Speed 5118.65 samples/sec Loss 3.5154 LearningRate 0.0494 Epoch: 5 Global Step: 99220 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:54:02,998-Speed 5142.21 samples/sec Loss 3.5056 LearningRate 0.0494 Epoch: 5 Global Step: 99230 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:54:04,999-Speed 5119.78 samples/sec Loss 3.4309 LearningRate 0.0494 Epoch: 5 Global Step: 99240 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:54:06,979-Speed 5173.56 samples/sec Loss 3.4182 LearningRate 0.0494 Epoch: 5 Global Step: 99250 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:54:08,965-Speed 5156.79 samples/sec Loss 3.5534 LearningRate 0.0494 Epoch: 5 Global Step: 99260 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:54:10,935-Speed 5201.75 samples/sec Loss 3.4517 LearningRate 0.0494 Epoch: 5 Global Step: 99270 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:54:12,936-Speed 5119.46 samples/sec Loss 3.4311 LearningRate 0.0494 Epoch: 5 Global Step: 99280 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:54:14,937-Speed 5118.62 samples/sec Loss 3.5495 LearningRate 0.0494 Epoch: 5 Global Step: 99290 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:54:16,913-Speed 5183.01 samples/sec Loss 3.4636 LearningRate 0.0494 Epoch: 5 Global Step: 99300 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:54:18,886-Speed 5192.22 samples/sec Loss 3.3864 LearningRate 0.0494 Epoch: 5 Global Step: 99310 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:54:20,862-Speed 5184.81 samples/sec Loss 3.4528 LearningRate 0.0493 Epoch: 5 Global Step: 99320 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:54:22,873-Speed 5092.31 samples/sec Loss 3.5151 LearningRate 0.0493 Epoch: 5 Global Step: 99330 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:54:24,849-Speed 5182.98 samples/sec Loss 3.4699 LearningRate 0.0493 Epoch: 5 Global Step: 99340 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:54:26,871-Speed 5065.58 samples/sec Loss 3.4707 LearningRate 0.0493 Epoch: 5 Global Step: 99350 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:54:28,875-Speed 5111.85 samples/sec Loss 3.5014 LearningRate 0.0493 Epoch: 5 Global Step: 99360 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:54:30,868-Speed 5141.58 samples/sec Loss 3.4835 LearningRate 0.0493 Epoch: 5 Global Step: 99370 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:54:32,848-Speed 5174.52 samples/sec Loss 3.5604 LearningRate 0.0493 Epoch: 5 Global Step: 99380 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:54:34,822-Speed 5188.23 samples/sec Loss 3.4478 LearningRate 0.0493 Epoch: 5 Global Step: 99390 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:54:36,817-Speed 5133.12 samples/sec Loss 3.5085 LearningRate 0.0493 Epoch: 5 Global Step: 99400 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:54:38,804-Speed 5154.59 samples/sec Loss 3.4693 LearningRate 0.0493 Epoch: 5 Global Step: 99410 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:54:40,790-Speed 5159.04 samples/sec Loss 3.4537 LearningRate 0.0493 Epoch: 5 Global Step: 99420 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:54:42,768-Speed 5179.36 samples/sec Loss 3.4583 LearningRate 0.0493 Epoch: 5 Global Step: 99430 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:54:44,749-Speed 5170.41 samples/sec Loss 3.4887 LearningRate 0.0493 Epoch: 5 Global Step: 99440 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:54:46,744-Speed 5134.22 samples/sec Loss 3.5855 LearningRate 0.0493 Epoch: 5 Global Step: 99450 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:54:48,731-Speed 5155.15 samples/sec Loss 3.5256 LearningRate 0.0493 Epoch: 5 Global Step: 99460 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:54:50,715-Speed 5161.96 samples/sec Loss 3.4750 LearningRate 0.0493 Epoch: 5 Global Step: 99470 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:54:52,704-Speed 5150.83 samples/sec Loss 3.5565 LearningRate 0.0493 Epoch: 5 Global Step: 99480 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:54:54,678-Speed 5190.55 samples/sec Loss 3.4005 LearningRate 0.0493 Epoch: 5 Global Step: 99490 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:54:56,658-Speed 5171.93 samples/sec Loss 3.4879 LearningRate 0.0493 Epoch: 5 Global Step: 99500 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:54:58,636-Speed 5180.19 samples/sec Loss 3.5828 LearningRate 0.0493 Epoch: 5 Global Step: 99510 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:55:00,652-Speed 5079.89 samples/sec Loss 3.5304 LearningRate 0.0493 Epoch: 5 Global Step: 99520 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:55:02,633-Speed 5171.40 samples/sec Loss 3.5534 LearningRate 0.0493 Epoch: 5 Global Step: 99530 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:55:04,614-Speed 5171.06 samples/sec Loss 3.5309 LearningRate 0.0493 Epoch: 5 Global Step: 99540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:55:06,596-Speed 5167.19 samples/sec Loss 3.4704 LearningRate 0.0493 Epoch: 5 Global Step: 99550 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:55:08,577-Speed 5170.61 samples/sec Loss 3.4463 LearningRate 0.0492 Epoch: 5 Global Step: 99560 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:55:10,557-Speed 5173.82 samples/sec Loss 3.5173 LearningRate 0.0492 Epoch: 5 Global Step: 99570 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:55:12,543-Speed 5158.04 samples/sec Loss 3.4139 LearningRate 0.0492 Epoch: 5 Global Step: 99580 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:55:14,532-Speed 5149.12 samples/sec Loss 3.4891 LearningRate 0.0492 Epoch: 5 Global Step: 99590 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:55:16,535-Speed 5115.93 samples/sec Loss 3.4003 LearningRate 0.0492 Epoch: 5 Global Step: 99600 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:55:18,530-Speed 5134.02 samples/sec Loss 3.4445 LearningRate 0.0492 Epoch: 5 Global Step: 99610 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:55:20,508-Speed 5178.05 samples/sec Loss 3.3834 LearningRate 0.0492 Epoch: 5 Global Step: 99620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:55:22,487-Speed 5177.13 samples/sec Loss 3.5399 LearningRate 0.0492 Epoch: 5 Global Step: 99630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:55:24,471-Speed 5162.47 samples/sec Loss 3.5339 LearningRate 0.0492 Epoch: 5 Global Step: 99640 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:55:26,465-Speed 5135.12 samples/sec Loss 3.4439 LearningRate 0.0492 Epoch: 5 Global Step: 99650 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:55:28,441-Speed 5185.09 samples/sec Loss 3.4301 LearningRate 0.0492 Epoch: 5 Global Step: 99660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:55:30,416-Speed 5185.64 samples/sec Loss 3.5270 LearningRate 0.0492 Epoch: 5 Global Step: 99670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:55:32,392-Speed 5186.07 samples/sec Loss 3.4279 LearningRate 0.0492 Epoch: 5 Global Step: 99680 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:55:34,371-Speed 5174.13 samples/sec Loss 3.5126 LearningRate 0.0492 Epoch: 5 Global Step: 99690 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:55:36,351-Speed 5175.35 samples/sec Loss 3.5345 LearningRate 0.0492 Epoch: 5 Global Step: 99700 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:55:38,342-Speed 5145.12 samples/sec Loss 3.5737 LearningRate 0.0492 Epoch: 5 Global Step: 99710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:55:40,328-Speed 5156.88 samples/sec Loss 3.5324 LearningRate 0.0492 Epoch: 5 Global Step: 99720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:55:42,308-Speed 5176.64 samples/sec Loss 3.4740 LearningRate 0.0492 Epoch: 5 Global Step: 99730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:55:44,287-Speed 5176.20 samples/sec Loss 3.4642 LearningRate 0.0492 Epoch: 5 Global Step: 99740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:55:46,273-Speed 5156.21 samples/sec Loss 3.4670 LearningRate 0.0492 Epoch: 5 Global Step: 99750 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:55:48,253-Speed 5173.06 samples/sec Loss 3.4482 LearningRate 0.0492 Epoch: 5 Global Step: 99760 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:55:50,234-Speed 5171.46 samples/sec Loss 3.5044 LearningRate 0.0492 Epoch: 5 Global Step: 99770 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:55:52,224-Speed 5148.98 samples/sec Loss 3.4906 LearningRate 0.0492 Epoch: 5 Global Step: 99780 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:55:54,204-Speed 5173.55 samples/sec Loss 3.4298 LearningRate 0.0491 Epoch: 5 Global Step: 99790 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:55:56,190-Speed 5156.86 samples/sec Loss 3.5039 LearningRate 0.0491 Epoch: 5 Global Step: 99800 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:55:58,171-Speed 5172.55 samples/sec Loss 3.5163 LearningRate 0.0491 Epoch: 5 Global Step: 99810 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:56:00,142-Speed 5196.35 samples/sec Loss 3.4672 LearningRate 0.0491 Epoch: 5 Global Step: 99820 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:56:02,128-Speed 5156.77 samples/sec Loss 3.4959 LearningRate 0.0491 Epoch: 5 Global Step: 99830 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:56:04,103-Speed 5185.69 samples/sec Loss 3.5616 LearningRate 0.0491 Epoch: 5 Global Step: 99840 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:56:06,078-Speed 5188.20 samples/sec Loss 3.5266 LearningRate 0.0491 Epoch: 5 Global Step: 99850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:56:08,058-Speed 5171.87 samples/sec Loss 3.5372 LearningRate 0.0491 Epoch: 5 Global Step: 99860 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:56:10,043-Speed 5160.14 samples/sec Loss 3.5387 LearningRate 0.0491 Epoch: 5 Global Step: 99870 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:56:12,018-Speed 5186.75 samples/sec Loss 3.5500 LearningRate 0.0491 Epoch: 5 Global Step: 99880 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:56:14,009-Speed 5145.95 samples/sec Loss 3.5229 LearningRate 0.0491 Epoch: 5 Global Step: 99890 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:56:15,989-Speed 5173.84 samples/sec Loss 3.4594 LearningRate 0.0491 Epoch: 5 Global Step: 99900 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:56:17,979-Speed 5147.52 samples/sec Loss 3.5344 LearningRate 0.0491 Epoch: 5 Global Step: 99910 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:56:19,956-Speed 5182.30 samples/sec Loss 3.4780 LearningRate 0.0491 Epoch: 5 Global Step: 99920 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:56:21,933-Speed 5179.89 samples/sec Loss 3.4088 LearningRate 0.0491 Epoch: 5 Global Step: 99930 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:56:23,943-Speed 5096.16 samples/sec Loss 3.4147 LearningRate 0.0491 Epoch: 5 Global Step: 99940 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:56:25,920-Speed 5182.59 samples/sec Loss 3.4630 LearningRate 0.0491 Epoch: 5 Global Step: 99950 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:56:27,906-Speed 5158.41 samples/sec Loss 3.4482 LearningRate 0.0491 Epoch: 5 Global Step: 99960 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:56:29,893-Speed 5152.53 samples/sec Loss 3.4216 LearningRate 0.0491 Epoch: 5 Global Step: 99970 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 05:56:31,867-Speed 5191.07 samples/sec Loss 3.5170 LearningRate 0.0491 Epoch: 5 Global Step: 99980 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:56:33,842-Speed 5186.21 samples/sec Loss 3.5081 LearningRate 0.0491 Epoch: 5 Global Step: 99990 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:56:35,837-Speed 5134.18 samples/sec Loss 3.5281 LearningRate 0.0491 Epoch: 5 Global Step: 100000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:57:02,453-[lfw][100000]XNorm: 21.801242 Training: 2022-04-11 05:57:02,454-[lfw][100000]Accuracy-Flip: 0.99800+-0.00267 Training: 2022-04-11 05:57:02,454-[lfw][100000]Accuracy-Highest: 0.99817 Training: 2022-04-11 05:57:33,193-[cfp_fp][100000]XNorm: 20.150812 Training: 2022-04-11 05:57:33,193-[cfp_fp][100000]Accuracy-Flip: 0.97786+-0.00590 Training: 2022-04-11 05:57:33,194-[cfp_fp][100000]Accuracy-Highest: 0.98086 Training: 2022-04-11 05:57:59,745-[agedb_30][100000]XNorm: 21.737294 Training: 2022-04-11 05:57:59,746-[agedb_30][100000]Accuracy-Flip: 0.97883+-0.00869 Training: 2022-04-11 05:57:59,746-[agedb_30][100000]Accuracy-Highest: 0.97950 Training: 2022-04-11 05:58:01,760-Speed 119.18 samples/sec Loss 3.4507 LearningRate 0.0491 Epoch: 5 Global Step: 100010 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:58:03,745-Speed 5159.44 samples/sec Loss 3.4566 LearningRate 0.0491 Epoch: 5 Global Step: 100020 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:58:05,707-Speed 5222.81 samples/sec Loss 3.4262 LearningRate 0.0490 Epoch: 5 Global Step: 100030 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:58:07,675-Speed 5202.80 samples/sec Loss 3.5710 LearningRate 0.0490 Epoch: 5 Global Step: 100040 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:58:09,641-Speed 5211.52 samples/sec Loss 3.3964 LearningRate 0.0490 Epoch: 5 Global Step: 100050 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:58:11,608-Speed 5206.29 samples/sec Loss 3.5843 LearningRate 0.0490 Epoch: 5 Global Step: 100060 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:58:13,586-Speed 5180.24 samples/sec Loss 3.4766 LearningRate 0.0490 Epoch: 5 Global Step: 100070 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:58:15,556-Speed 5198.35 samples/sec Loss 3.4644 LearningRate 0.0490 Epoch: 5 Global Step: 100080 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:58:17,528-Speed 5194.86 samples/sec Loss 3.5108 LearningRate 0.0490 Epoch: 5 Global Step: 100090 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:58:19,512-Speed 5163.30 samples/sec Loss 3.4477 LearningRate 0.0490 Epoch: 5 Global Step: 100100 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:58:21,487-Speed 5185.98 samples/sec Loss 3.5005 LearningRate 0.0490 Epoch: 5 Global Step: 100110 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:58:23,453-Speed 5210.33 samples/sec Loss 3.4833 LearningRate 0.0490 Epoch: 5 Global Step: 100120 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:58:25,434-Speed 5170.40 samples/sec Loss 3.4861 LearningRate 0.0490 Epoch: 5 Global Step: 100130 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:58:27,648-Speed 4626.51 samples/sec Loss 3.4495 LearningRate 0.0490 Epoch: 5 Global Step: 100140 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:58:57,052-Speed 348.31 samples/sec Loss 3.2335 LearningRate 0.0490 Epoch: 6 Global Step: 100150 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:58:59,033-Speed 5171.38 samples/sec Loss 2.7945 LearningRate 0.0490 Epoch: 6 Global Step: 100160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:59:01,012-Speed 5176.81 samples/sec Loss 2.8522 LearningRate 0.0490 Epoch: 6 Global Step: 100170 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:59:02,986-Speed 5187.38 samples/sec Loss 2.8907 LearningRate 0.0490 Epoch: 6 Global Step: 100180 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:59:04,992-Speed 5106.57 samples/sec Loss 2.8223 LearningRate 0.0490 Epoch: 6 Global Step: 100190 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:59:08,183-Speed 3209.99 samples/sec Loss 2.8590 LearningRate 0.0490 Epoch: 6 Global Step: 100200 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:59:10,181-Speed 5125.22 samples/sec Loss 2.7861 LearningRate 0.0490 Epoch: 6 Global Step: 100210 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:59:12,186-Speed 5109.26 samples/sec Loss 2.8070 LearningRate 0.0490 Epoch: 6 Global Step: 100220 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:59:14,178-Speed 5142.46 samples/sec Loss 2.8035 LearningRate 0.0490 Epoch: 6 Global Step: 100230 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:59:16,177-Speed 5125.77 samples/sec Loss 2.8161 LearningRate 0.0490 Epoch: 6 Global Step: 100240 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:59:18,175-Speed 5126.14 samples/sec Loss 2.9044 LearningRate 0.0490 Epoch: 6 Global Step: 100250 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:59:20,157-Speed 5168.53 samples/sec Loss 2.8418 LearningRate 0.0490 Epoch: 6 Global Step: 100260 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:59:22,132-Speed 5186.23 samples/sec Loss 2.8135 LearningRate 0.0489 Epoch: 6 Global Step: 100270 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:59:24,132-Speed 5122.88 samples/sec Loss 2.8742 LearningRate 0.0489 Epoch: 6 Global Step: 100280 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:59:26,130-Speed 5126.04 samples/sec Loss 2.9084 LearningRate 0.0489 Epoch: 6 Global Step: 100290 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:59:28,105-Speed 5185.41 samples/sec Loss 2.8281 LearningRate 0.0489 Epoch: 6 Global Step: 100300 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:59:30,086-Speed 5170.86 samples/sec Loss 2.9131 LearningRate 0.0489 Epoch: 6 Global Step: 100310 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:59:32,206-Speed 4833.43 samples/sec Loss 2.8454 LearningRate 0.0489 Epoch: 6 Global Step: 100320 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:59:34,184-Speed 5175.98 samples/sec Loss 2.8061 LearningRate 0.0489 Epoch: 6 Global Step: 100330 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:59:36,164-Speed 5174.57 samples/sec Loss 2.8464 LearningRate 0.0489 Epoch: 6 Global Step: 100340 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:59:38,156-Speed 5141.73 samples/sec Loss 2.8878 LearningRate 0.0489 Epoch: 6 Global Step: 100350 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:59:40,136-Speed 5174.54 samples/sec Loss 2.9354 LearningRate 0.0489 Epoch: 6 Global Step: 100360 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:59:42,106-Speed 5199.27 samples/sec Loss 2.8629 LearningRate 0.0489 Epoch: 6 Global Step: 100370 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:59:44,089-Speed 5166.66 samples/sec Loss 2.7668 LearningRate 0.0489 Epoch: 6 Global Step: 100380 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:59:46,072-Speed 5164.86 samples/sec Loss 2.8499 LearningRate 0.0489 Epoch: 6 Global Step: 100390 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:59:48,052-Speed 5172.67 samples/sec Loss 2.8510 LearningRate 0.0489 Epoch: 6 Global Step: 100400 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:59:50,041-Speed 5151.40 samples/sec Loss 2.8023 LearningRate 0.0489 Epoch: 6 Global Step: 100410 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:59:52,014-Speed 5190.02 samples/sec Loss 2.8758 LearningRate 0.0489 Epoch: 6 Global Step: 100420 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:59:54,006-Speed 5142.18 samples/sec Loss 2.8353 LearningRate 0.0489 Epoch: 6 Global Step: 100430 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 05:59:55,995-Speed 5151.57 samples/sec Loss 2.8901 LearningRate 0.0489 Epoch: 6 Global Step: 100440 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:59:57,981-Speed 5155.82 samples/sec Loss 2.8400 LearningRate 0.0489 Epoch: 6 Global Step: 100450 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 05:59:59,974-Speed 5140.45 samples/sec Loss 2.8666 LearningRate 0.0489 Epoch: 6 Global Step: 100460 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:00:01,956-Speed 5171.12 samples/sec Loss 2.8618 LearningRate 0.0489 Epoch: 6 Global Step: 100470 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:00:04,234-Speed 4494.77 samples/sec Loss 2.8835 LearningRate 0.0489 Epoch: 6 Global Step: 100480 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:00:06,218-Speed 5162.38 samples/sec Loss 2.8815 LearningRate 0.0489 Epoch: 6 Global Step: 100490 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:00:08,199-Speed 5172.38 samples/sec Loss 2.8396 LearningRate 0.0489 Epoch: 6 Global Step: 100500 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:00:10,183-Speed 5161.85 samples/sec Loss 2.8841 LearningRate 0.0488 Epoch: 6 Global Step: 100510 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:00:12,163-Speed 5175.58 samples/sec Loss 2.9178 LearningRate 0.0488 Epoch: 6 Global Step: 100520 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:00:14,141-Speed 5176.50 samples/sec Loss 2.9042 LearningRate 0.0488 Epoch: 6 Global Step: 100530 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:00:16,163-Speed 5067.95 samples/sec Loss 2.8639 LearningRate 0.0488 Epoch: 6 Global Step: 100540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:00:18,147-Speed 5162.43 samples/sec Loss 2.8932 LearningRate 0.0488 Epoch: 6 Global Step: 100550 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:00:20,134-Speed 5154.31 samples/sec Loss 2.8605 LearningRate 0.0488 Epoch: 6 Global Step: 100560 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:00:22,127-Speed 5141.12 samples/sec Loss 2.9048 LearningRate 0.0488 Epoch: 6 Global Step: 100570 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:00:24,122-Speed 5135.83 samples/sec Loss 2.9143 LearningRate 0.0488 Epoch: 6 Global Step: 100580 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:00:26,147-Speed 5056.25 samples/sec Loss 2.8974 LearningRate 0.0488 Epoch: 6 Global Step: 100590 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:00:28,156-Speed 5100.41 samples/sec Loss 2.9157 LearningRate 0.0488 Epoch: 6 Global Step: 100600 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:00:30,148-Speed 5140.12 samples/sec Loss 2.9672 LearningRate 0.0488 Epoch: 6 Global Step: 100610 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:00:32,133-Speed 5162.27 samples/sec Loss 2.9070 LearningRate 0.0488 Epoch: 6 Global Step: 100620 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:00:34,119-Speed 5156.95 samples/sec Loss 2.9155 LearningRate 0.0488 Epoch: 6 Global Step: 100630 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:00:36,108-Speed 5149.42 samples/sec Loss 2.9220 LearningRate 0.0488 Epoch: 6 Global Step: 100640 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:00:38,106-Speed 5126.73 samples/sec Loss 2.9278 LearningRate 0.0488 Epoch: 6 Global Step: 100650 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:00:40,095-Speed 5150.42 samples/sec Loss 2.9673 LearningRate 0.0488 Epoch: 6 Global Step: 100660 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:00:42,075-Speed 5174.81 samples/sec Loss 2.8532 LearningRate 0.0488 Epoch: 6 Global Step: 100670 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:00:44,042-Speed 5206.05 samples/sec Loss 2.9187 LearningRate 0.0488 Epoch: 6 Global Step: 100680 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:00:46,022-Speed 5173.63 samples/sec Loss 2.9644 LearningRate 0.0488 Epoch: 6 Global Step: 100690 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:00:48,013-Speed 5146.75 samples/sec Loss 2.9320 LearningRate 0.0488 Epoch: 6 Global Step: 100700 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:00:49,987-Speed 5186.41 samples/sec Loss 2.9735 LearningRate 0.0488 Epoch: 6 Global Step: 100710 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:00:51,963-Speed 5184.47 samples/sec Loss 2.8865 LearningRate 0.0488 Epoch: 6 Global Step: 100720 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:00:53,954-Speed 5146.40 samples/sec Loss 2.9207 LearningRate 0.0488 Epoch: 6 Global Step: 100730 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:00:55,926-Speed 5193.74 samples/sec Loss 2.8922 LearningRate 0.0488 Epoch: 6 Global Step: 100740 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:00:57,919-Speed 5138.16 samples/sec Loss 2.8826 LearningRate 0.0487 Epoch: 6 Global Step: 100750 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:00:59,921-Speed 5118.99 samples/sec Loss 2.9724 LearningRate 0.0487 Epoch: 6 Global Step: 100760 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:01:01,897-Speed 5183.77 samples/sec Loss 2.9465 LearningRate 0.0487 Epoch: 6 Global Step: 100770 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:01:03,876-Speed 5175.75 samples/sec Loss 2.9261 LearningRate 0.0487 Epoch: 6 Global Step: 100780 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:01:05,871-Speed 5134.39 samples/sec Loss 2.9353 LearningRate 0.0487 Epoch: 6 Global Step: 100790 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:01:07,850-Speed 5174.86 samples/sec Loss 2.9169 LearningRate 0.0487 Epoch: 6 Global Step: 100800 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:01:09,833-Speed 5166.19 samples/sec Loss 2.9611 LearningRate 0.0487 Epoch: 6 Global Step: 100810 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:01:11,810-Speed 5181.06 samples/sec Loss 2.9849 LearningRate 0.0487 Epoch: 6 Global Step: 100820 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:01:13,799-Speed 5151.16 samples/sec Loss 2.9392 LearningRate 0.0487 Epoch: 6 Global Step: 100830 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:01:15,788-Speed 5149.94 samples/sec Loss 2.9207 LearningRate 0.0487 Epoch: 6 Global Step: 100840 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:01:17,767-Speed 5175.70 samples/sec Loss 2.9210 LearningRate 0.0487 Epoch: 6 Global Step: 100850 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:01:19,744-Speed 5181.89 samples/sec Loss 3.0040 LearningRate 0.0487 Epoch: 6 Global Step: 100860 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:01:21,726-Speed 5168.84 samples/sec Loss 2.8828 LearningRate 0.0487 Epoch: 6 Global Step: 100870 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:01:23,699-Speed 5193.33 samples/sec Loss 3.0061 LearningRate 0.0487 Epoch: 6 Global Step: 100880 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:01:25,685-Speed 5155.96 samples/sec Loss 2.9383 LearningRate 0.0487 Epoch: 6 Global Step: 100890 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:01:27,665-Speed 5173.90 samples/sec Loss 2.9646 LearningRate 0.0487 Epoch: 6 Global Step: 100900 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:01:29,649-Speed 5163.91 samples/sec Loss 2.9605 LearningRate 0.0487 Epoch: 6 Global Step: 100910 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:01:31,633-Speed 5160.71 samples/sec Loss 2.9424 LearningRate 0.0487 Epoch: 6 Global Step: 100920 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:01:33,627-Speed 5137.28 samples/sec Loss 2.9155 LearningRate 0.0487 Epoch: 6 Global Step: 100930 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:01:35,613-Speed 5158.17 samples/sec Loss 2.9496 LearningRate 0.0487 Epoch: 6 Global Step: 100940 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:01:37,602-Speed 5149.23 samples/sec Loss 2.9676 LearningRate 0.0487 Epoch: 6 Global Step: 100950 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:01:39,586-Speed 5165.58 samples/sec Loss 2.9525 LearningRate 0.0487 Epoch: 6 Global Step: 100960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:01:41,578-Speed 5141.66 samples/sec Loss 2.9845 LearningRate 0.0487 Epoch: 6 Global Step: 100970 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:01:43,556-Speed 5178.30 samples/sec Loss 3.0387 LearningRate 0.0487 Epoch: 6 Global Step: 100980 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:01:45,541-Speed 5160.61 samples/sec Loss 2.8768 LearningRate 0.0486 Epoch: 6 Global Step: 100990 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:01:47,515-Speed 5189.81 samples/sec Loss 2.8870 LearningRate 0.0486 Epoch: 6 Global Step: 101000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:01:49,492-Speed 5181.49 samples/sec Loss 3.0234 LearningRate 0.0486 Epoch: 6 Global Step: 101010 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:01:51,478-Speed 5155.73 samples/sec Loss 2.9536 LearningRate 0.0486 Epoch: 6 Global Step: 101020 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:01:53,473-Speed 5135.34 samples/sec Loss 2.8641 LearningRate 0.0486 Epoch: 6 Global Step: 101030 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:01:55,469-Speed 5130.64 samples/sec Loss 2.9214 LearningRate 0.0486 Epoch: 6 Global Step: 101040 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:01:57,469-Speed 5124.02 samples/sec Loss 2.9141 LearningRate 0.0486 Epoch: 6 Global Step: 101050 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:01:59,445-Speed 5182.77 samples/sec Loss 2.9718 LearningRate 0.0486 Epoch: 6 Global Step: 101060 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:02:01,451-Speed 5107.56 samples/sec Loss 3.0181 LearningRate 0.0486 Epoch: 6 Global Step: 101070 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:02:03,431-Speed 5172.68 samples/sec Loss 2.8649 LearningRate 0.0486 Epoch: 6 Global Step: 101080 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:02:05,409-Speed 5179.77 samples/sec Loss 2.9859 LearningRate 0.0486 Epoch: 6 Global Step: 101090 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:02:07,386-Speed 5180.44 samples/sec Loss 2.9345 LearningRate 0.0486 Epoch: 6 Global Step: 101100 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:02:09,381-Speed 5134.95 samples/sec Loss 2.9171 LearningRate 0.0486 Epoch: 6 Global Step: 101110 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:02:11,389-Speed 5101.39 samples/sec Loss 3.0035 LearningRate 0.0486 Epoch: 6 Global Step: 101120 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:02:13,376-Speed 5154.92 samples/sec Loss 2.9857 LearningRate 0.0486 Epoch: 6 Global Step: 101130 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:02:15,353-Speed 5182.10 samples/sec Loss 2.9467 LearningRate 0.0486 Epoch: 6 Global Step: 101140 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:02:17,340-Speed 5154.39 samples/sec Loss 2.9529 LearningRate 0.0486 Epoch: 6 Global Step: 101150 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:02:19,312-Speed 5195.43 samples/sec Loss 3.0554 LearningRate 0.0486 Epoch: 6 Global Step: 101160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:02:21,288-Speed 5182.44 samples/sec Loss 2.9115 LearningRate 0.0486 Epoch: 6 Global Step: 101170 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:02:23,281-Speed 5140.92 samples/sec Loss 3.0061 LearningRate 0.0486 Epoch: 6 Global Step: 101180 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:02:25,285-Speed 5112.53 samples/sec Loss 2.8974 LearningRate 0.0486 Epoch: 6 Global Step: 101190 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:02:27,279-Speed 5135.71 samples/sec Loss 2.9345 LearningRate 0.0486 Epoch: 6 Global Step: 101200 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:02:29,273-Speed 5137.65 samples/sec Loss 2.9303 LearningRate 0.0486 Epoch: 6 Global Step: 101210 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:02:31,260-Speed 5154.83 samples/sec Loss 3.0133 LearningRate 0.0486 Epoch: 6 Global Step: 101220 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:02:33,243-Speed 5164.46 samples/sec Loss 2.9745 LearningRate 0.0485 Epoch: 6 Global Step: 101230 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:02:35,306-Speed 4966.74 samples/sec Loss 2.9791 LearningRate 0.0485 Epoch: 6 Global Step: 101240 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:02:37,308-Speed 5116.58 samples/sec Loss 3.0152 LearningRate 0.0485 Epoch: 6 Global Step: 101250 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:02:39,285-Speed 5181.31 samples/sec Loss 2.9837 LearningRate 0.0485 Epoch: 6 Global Step: 101260 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:02:41,277-Speed 5143.05 samples/sec Loss 2.9480 LearningRate 0.0485 Epoch: 6 Global Step: 101270 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:02:43,246-Speed 5200.61 samples/sec Loss 2.9856 LearningRate 0.0485 Epoch: 6 Global Step: 101280 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:02:45,229-Speed 5166.18 samples/sec Loss 2.9869 LearningRate 0.0485 Epoch: 6 Global Step: 101290 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:02:47,219-Speed 5147.72 samples/sec Loss 2.9981 LearningRate 0.0485 Epoch: 6 Global Step: 101300 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:02:49,212-Speed 5140.22 samples/sec Loss 2.9223 LearningRate 0.0485 Epoch: 6 Global Step: 101310 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:02:51,236-Speed 5060.62 samples/sec Loss 2.9612 LearningRate 0.0485 Epoch: 6 Global Step: 101320 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:02:53,218-Speed 5168.01 samples/sec Loss 2.9807 LearningRate 0.0485 Epoch: 6 Global Step: 101330 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:02:55,210-Speed 5140.97 samples/sec Loss 2.9940 LearningRate 0.0485 Epoch: 6 Global Step: 101340 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:02:57,199-Speed 5150.59 samples/sec Loss 3.0000 LearningRate 0.0485 Epoch: 6 Global Step: 101350 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:02:59,205-Speed 5108.48 samples/sec Loss 3.0914 LearningRate 0.0485 Epoch: 6 Global Step: 101360 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:03:01,194-Speed 5148.86 samples/sec Loss 2.9940 LearningRate 0.0485 Epoch: 6 Global Step: 101370 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:03:03,177-Speed 5166.45 samples/sec Loss 2.9255 LearningRate 0.0485 Epoch: 6 Global Step: 101380 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:03:05,165-Speed 5150.75 samples/sec Loss 3.0183 LearningRate 0.0485 Epoch: 6 Global Step: 101390 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:03:07,141-Speed 5183.88 samples/sec Loss 3.0410 LearningRate 0.0485 Epoch: 6 Global Step: 101400 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:03:09,117-Speed 5184.19 samples/sec Loss 3.0270 LearningRate 0.0485 Epoch: 6 Global Step: 101410 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:03:11,100-Speed 5165.24 samples/sec Loss 3.0290 LearningRate 0.0485 Epoch: 6 Global Step: 101420 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:03:13,083-Speed 5165.96 samples/sec Loss 2.9273 LearningRate 0.0485 Epoch: 6 Global Step: 101430 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:03:15,075-Speed 5143.19 samples/sec Loss 2.9678 LearningRate 0.0485 Epoch: 6 Global Step: 101440 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:03:17,058-Speed 5164.95 samples/sec Loss 3.0117 LearningRate 0.0485 Epoch: 6 Global Step: 101450 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:03:19,033-Speed 5189.24 samples/sec Loss 3.0165 LearningRate 0.0485 Epoch: 6 Global Step: 101460 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:03:21,013-Speed 5171.28 samples/sec Loss 3.0552 LearningRate 0.0484 Epoch: 6 Global Step: 101470 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:03:22,993-Speed 5174.45 samples/sec Loss 2.9690 LearningRate 0.0484 Epoch: 6 Global Step: 101480 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:03:24,976-Speed 5166.42 samples/sec Loss 3.0086 LearningRate 0.0484 Epoch: 6 Global Step: 101490 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:03:26,955-Speed 5174.59 samples/sec Loss 2.9587 LearningRate 0.0484 Epoch: 6 Global Step: 101500 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:03:28,945-Speed 5150.74 samples/sec Loss 2.9965 LearningRate 0.0484 Epoch: 6 Global Step: 101510 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:03:30,908-Speed 5219.65 samples/sec Loss 3.0154 LearningRate 0.0484 Epoch: 6 Global Step: 101520 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:03:32,883-Speed 5186.50 samples/sec Loss 2.9710 LearningRate 0.0484 Epoch: 6 Global Step: 101530 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:03:34,869-Speed 5157.18 samples/sec Loss 2.9446 LearningRate 0.0484 Epoch: 6 Global Step: 101540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:03:36,866-Speed 5128.57 samples/sec Loss 3.0402 LearningRate 0.0484 Epoch: 6 Global Step: 101550 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:03:38,862-Speed 5134.35 samples/sec Loss 3.0165 LearningRate 0.0484 Epoch: 6 Global Step: 101560 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:03:40,846-Speed 5161.37 samples/sec Loss 3.0479 LearningRate 0.0484 Epoch: 6 Global Step: 101570 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:03:42,836-Speed 5146.95 samples/sec Loss 3.0186 LearningRate 0.0484 Epoch: 6 Global Step: 101580 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:03:44,819-Speed 5166.07 samples/sec Loss 3.0268 LearningRate 0.0484 Epoch: 6 Global Step: 101590 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:03:46,794-Speed 5186.84 samples/sec Loss 3.0308 LearningRate 0.0484 Epoch: 6 Global Step: 101600 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:03:48,774-Speed 5173.20 samples/sec Loss 3.0540 LearningRate 0.0484 Epoch: 6 Global Step: 101610 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:03:50,753-Speed 5177.58 samples/sec Loss 2.9367 LearningRate 0.0484 Epoch: 6 Global Step: 101620 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:03:52,742-Speed 5149.37 samples/sec Loss 3.0034 LearningRate 0.0484 Epoch: 6 Global Step: 101630 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:03:54,730-Speed 5151.02 samples/sec Loss 3.0085 LearningRate 0.0484 Epoch: 6 Global Step: 101640 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:03:56,716-Speed 5161.06 samples/sec Loss 3.0418 LearningRate 0.0484 Epoch: 6 Global Step: 101650 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:03:58,708-Speed 5142.46 samples/sec Loss 3.0004 LearningRate 0.0484 Epoch: 6 Global Step: 101660 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:04:00,685-Speed 5180.41 samples/sec Loss 3.0079 LearningRate 0.0484 Epoch: 6 Global Step: 101670 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:04:02,655-Speed 5198.33 samples/sec Loss 3.1009 LearningRate 0.0484 Epoch: 6 Global Step: 101680 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:04:04,646-Speed 5146.46 samples/sec Loss 2.9742 LearningRate 0.0484 Epoch: 6 Global Step: 101690 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:04:06,625-Speed 5173.68 samples/sec Loss 2.9897 LearningRate 0.0484 Epoch: 6 Global Step: 101700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:04:08,600-Speed 5189.00 samples/sec Loss 3.0196 LearningRate 0.0483 Epoch: 6 Global Step: 101710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:04:10,609-Speed 5097.12 samples/sec Loss 2.9907 LearningRate 0.0483 Epoch: 6 Global Step: 101720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:04:12,603-Speed 5138.51 samples/sec Loss 2.9497 LearningRate 0.0483 Epoch: 6 Global Step: 101730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:04:14,603-Speed 5121.35 samples/sec Loss 3.1121 LearningRate 0.0483 Epoch: 6 Global Step: 101740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:04:16,591-Speed 5153.22 samples/sec Loss 3.0081 LearningRate 0.0483 Epoch: 6 Global Step: 101750 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:04:18,566-Speed 5185.75 samples/sec Loss 3.0639 LearningRate 0.0483 Epoch: 6 Global Step: 101760 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:04:20,544-Speed 5180.25 samples/sec Loss 3.0306 LearningRate 0.0483 Epoch: 6 Global Step: 101770 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:04:22,525-Speed 5168.14 samples/sec Loss 3.0572 LearningRate 0.0483 Epoch: 6 Global Step: 101780 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:04:24,525-Speed 5123.41 samples/sec Loss 3.0307 LearningRate 0.0483 Epoch: 6 Global Step: 101790 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:04:26,506-Speed 5169.51 samples/sec Loss 3.0546 LearningRate 0.0483 Epoch: 6 Global Step: 101800 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:04:28,498-Speed 5142.46 samples/sec Loss 3.0398 LearningRate 0.0483 Epoch: 6 Global Step: 101810 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:04:30,476-Speed 5180.05 samples/sec Loss 3.0697 LearningRate 0.0483 Epoch: 6 Global Step: 101820 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:04:32,448-Speed 5192.67 samples/sec Loss 3.0070 LearningRate 0.0483 Epoch: 6 Global Step: 101830 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:04:34,425-Speed 5180.65 samples/sec Loss 3.0265 LearningRate 0.0483 Epoch: 6 Global Step: 101840 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:04:36,400-Speed 5186.87 samples/sec Loss 3.0026 LearningRate 0.0483 Epoch: 6 Global Step: 101850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:04:38,382-Speed 5171.02 samples/sec Loss 3.0213 LearningRate 0.0483 Epoch: 6 Global Step: 101860 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:04:40,357-Speed 5186.56 samples/sec Loss 3.0255 LearningRate 0.0483 Epoch: 6 Global Step: 101870 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:04:42,366-Speed 5097.35 samples/sec Loss 3.0771 LearningRate 0.0483 Epoch: 6 Global Step: 101880 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:04:44,363-Speed 5129.07 samples/sec Loss 3.1463 LearningRate 0.0483 Epoch: 6 Global Step: 101890 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:04:46,340-Speed 5182.32 samples/sec Loss 3.0242 LearningRate 0.0483 Epoch: 6 Global Step: 101900 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:04:48,318-Speed 5178.59 samples/sec Loss 3.1114 LearningRate 0.0483 Epoch: 6 Global Step: 101910 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:04:50,298-Speed 5172.18 samples/sec Loss 3.0952 LearningRate 0.0483 Epoch: 6 Global Step: 101920 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:04:52,279-Speed 5171.64 samples/sec Loss 3.1052 LearningRate 0.0483 Epoch: 6 Global Step: 101930 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:04:54,286-Speed 5104.02 samples/sec Loss 3.0169 LearningRate 0.0483 Epoch: 6 Global Step: 101940 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:04:56,261-Speed 5186.23 samples/sec Loss 3.0499 LearningRate 0.0482 Epoch: 6 Global Step: 101950 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:04:58,257-Speed 5131.74 samples/sec Loss 3.0850 LearningRate 0.0482 Epoch: 6 Global Step: 101960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:05:00,252-Speed 5136.71 samples/sec Loss 2.9881 LearningRate 0.0482 Epoch: 6 Global Step: 101970 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:05:02,252-Speed 5119.19 samples/sec Loss 3.0097 LearningRate 0.0482 Epoch: 6 Global Step: 101980 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:05:04,225-Speed 5192.19 samples/sec Loss 3.0682 LearningRate 0.0482 Epoch: 6 Global Step: 101990 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 06:05:06,219-Speed 5137.88 samples/sec Loss 3.0429 LearningRate 0.0482 Epoch: 6 Global Step: 102000 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 06:05:32,817-[lfw][102000]XNorm: 23.112938 Training: 2022-04-11 06:05:32,818-[lfw][102000]Accuracy-Flip: 0.99683+-0.00263 Training: 2022-04-11 06:05:32,818-[lfw][102000]Accuracy-Highest: 0.99817 Training: 2022-04-11 06:06:03,571-[cfp_fp][102000]XNorm: 21.362245 Training: 2022-04-11 06:06:03,571-[cfp_fp][102000]Accuracy-Flip: 0.98014+-0.00614 Training: 2022-04-11 06:06:03,572-[cfp_fp][102000]Accuracy-Highest: 0.98086 Training: 2022-04-11 06:06:30,062-[agedb_30][102000]XNorm: 23.183152 Training: 2022-04-11 06:06:30,062-[agedb_30][102000]Accuracy-Flip: 0.97817+-0.00705 Training: 2022-04-11 06:06:30,063-[agedb_30][102000]Accuracy-Highest: 0.97950 Training: 2022-04-11 06:06:32,044-Speed 119.31 samples/sec Loss 3.0900 LearningRate 0.0482 Epoch: 6 Global Step: 102010 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 06:06:34,011-Speed 5205.13 samples/sec Loss 3.0561 LearningRate 0.0482 Epoch: 6 Global Step: 102020 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 06:06:35,980-Speed 5204.27 samples/sec Loss 2.9921 LearningRate 0.0482 Epoch: 6 Global Step: 102030 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 06:06:37,944-Speed 5214.64 samples/sec Loss 3.0946 LearningRate 0.0482 Epoch: 6 Global Step: 102040 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 06:06:39,912-Speed 5204.04 samples/sec Loss 3.0869 LearningRate 0.0482 Epoch: 6 Global Step: 102050 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 06:06:41,898-Speed 5159.44 samples/sec Loss 3.0985 LearningRate 0.0482 Epoch: 6 Global Step: 102060 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 06:06:43,863-Speed 5213.59 samples/sec Loss 3.0725 LearningRate 0.0482 Epoch: 6 Global Step: 102070 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 06:06:45,850-Speed 5155.64 samples/sec Loss 3.0338 LearningRate 0.0482 Epoch: 6 Global Step: 102080 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 06:06:47,832-Speed 5167.53 samples/sec Loss 3.1149 LearningRate 0.0482 Epoch: 6 Global Step: 102090 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:06:49,834-Speed 5115.05 samples/sec Loss 3.0667 LearningRate 0.0482 Epoch: 6 Global Step: 102100 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:06:51,830-Speed 5134.11 samples/sec Loss 2.9878 LearningRate 0.0482 Epoch: 6 Global Step: 102110 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:06:53,833-Speed 5114.17 samples/sec Loss 3.1136 LearningRate 0.0482 Epoch: 6 Global Step: 102120 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:06:55,808-Speed 5185.96 samples/sec Loss 3.0370 LearningRate 0.0482 Epoch: 6 Global Step: 102130 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:06:57,793-Speed 5159.87 samples/sec Loss 3.0433 LearningRate 0.0482 Epoch: 6 Global Step: 102140 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:06:59,774-Speed 5169.45 samples/sec Loss 3.0812 LearningRate 0.0482 Epoch: 6 Global Step: 102150 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:07:01,765-Speed 5145.67 samples/sec Loss 3.1009 LearningRate 0.0482 Epoch: 6 Global Step: 102160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:07:03,759-Speed 5136.98 samples/sec Loss 3.0410 LearningRate 0.0482 Epoch: 6 Global Step: 102170 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:07:05,745-Speed 5158.55 samples/sec Loss 3.0953 LearningRate 0.0482 Epoch: 6 Global Step: 102180 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:07:07,725-Speed 5175.25 samples/sec Loss 3.0085 LearningRate 0.0481 Epoch: 6 Global Step: 102190 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:07:09,730-Speed 5107.29 samples/sec Loss 3.1415 LearningRate 0.0481 Epoch: 6 Global Step: 102200 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:07:11,721-Speed 5144.76 samples/sec Loss 3.0632 LearningRate 0.0481 Epoch: 6 Global Step: 102210 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:07:13,700-Speed 5177.25 samples/sec Loss 2.9839 LearningRate 0.0481 Epoch: 6 Global Step: 102220 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:07:15,694-Speed 5137.07 samples/sec Loss 3.0610 LearningRate 0.0481 Epoch: 6 Global Step: 102230 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:07:17,691-Speed 5128.96 samples/sec Loss 3.0442 LearningRate 0.0481 Epoch: 6 Global Step: 102240 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:07:19,679-Speed 5150.39 samples/sec Loss 3.0489 LearningRate 0.0481 Epoch: 6 Global Step: 102250 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:07:21,669-Speed 5148.86 samples/sec Loss 2.9688 LearningRate 0.0481 Epoch: 6 Global Step: 102260 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:07:23,658-Speed 5150.13 samples/sec Loss 3.0375 LearningRate 0.0481 Epoch: 6 Global Step: 102270 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:07:25,665-Speed 5105.40 samples/sec Loss 3.1340 LearningRate 0.0481 Epoch: 6 Global Step: 102280 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:07:27,646-Speed 5170.61 samples/sec Loss 3.0394 LearningRate 0.0481 Epoch: 6 Global Step: 102290 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:07:29,637-Speed 5144.13 samples/sec Loss 3.0492 LearningRate 0.0481 Epoch: 6 Global Step: 102300 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:07:31,615-Speed 5179.09 samples/sec Loss 2.9852 LearningRate 0.0481 Epoch: 6 Global Step: 102310 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:07:33,597-Speed 5166.07 samples/sec Loss 3.1394 LearningRate 0.0481 Epoch: 6 Global Step: 102320 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:07:35,606-Speed 5101.34 samples/sec Loss 3.1288 LearningRate 0.0481 Epoch: 6 Global Step: 102330 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:07:37,598-Speed 5140.93 samples/sec Loss 3.0692 LearningRate 0.0481 Epoch: 6 Global Step: 102340 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:07:39,613-Speed 5083.62 samples/sec Loss 3.0712 LearningRate 0.0481 Epoch: 6 Global Step: 102350 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:07:41,602-Speed 5149.91 samples/sec Loss 3.0141 LearningRate 0.0481 Epoch: 6 Global Step: 102360 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:07:43,582-Speed 5175.53 samples/sec Loss 3.0578 LearningRate 0.0481 Epoch: 6 Global Step: 102370 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:07:45,564-Speed 5166.97 samples/sec Loss 3.1281 LearningRate 0.0481 Epoch: 6 Global Step: 102380 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:07:47,553-Speed 5148.92 samples/sec Loss 3.0591 LearningRate 0.0481 Epoch: 6 Global Step: 102390 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:07:49,548-Speed 5135.46 samples/sec Loss 3.0428 LearningRate 0.0481 Epoch: 6 Global Step: 102400 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:07:51,528-Speed 5172.90 samples/sec Loss 3.0543 LearningRate 0.0481 Epoch: 6 Global Step: 102410 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:07:53,501-Speed 5190.54 samples/sec Loss 3.0057 LearningRate 0.0481 Epoch: 6 Global Step: 102420 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:07:55,505-Speed 5113.49 samples/sec Loss 3.0624 LearningRate 0.0480 Epoch: 6 Global Step: 102430 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:07:57,485-Speed 5172.43 samples/sec Loss 3.1405 LearningRate 0.0480 Epoch: 6 Global Step: 102440 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:07:59,475-Speed 5147.73 samples/sec Loss 3.0479 LearningRate 0.0480 Epoch: 6 Global Step: 102450 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:08:01,462-Speed 5155.16 samples/sec Loss 3.1041 LearningRate 0.0480 Epoch: 6 Global Step: 102460 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:08:03,438-Speed 5183.97 samples/sec Loss 3.0694 LearningRate 0.0480 Epoch: 6 Global Step: 102470 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:08:05,411-Speed 5193.31 samples/sec Loss 3.0776 LearningRate 0.0480 Epoch: 6 Global Step: 102480 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:08:07,386-Speed 5184.30 samples/sec Loss 3.0537 LearningRate 0.0480 Epoch: 6 Global Step: 102490 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:08:09,378-Speed 5143.63 samples/sec Loss 3.0635 LearningRate 0.0480 Epoch: 6 Global Step: 102500 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:08:11,378-Speed 5121.27 samples/sec Loss 3.1663 LearningRate 0.0480 Epoch: 6 Global Step: 102510 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:08:13,368-Speed 5148.81 samples/sec Loss 2.9817 LearningRate 0.0480 Epoch: 6 Global Step: 102520 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:08:15,349-Speed 5170.28 samples/sec Loss 3.0851 LearningRate 0.0480 Epoch: 6 Global Step: 102530 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:08:17,333-Speed 5160.98 samples/sec Loss 3.1317 LearningRate 0.0480 Epoch: 6 Global Step: 102540 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:08:19,322-Speed 5150.72 samples/sec Loss 3.0967 LearningRate 0.0480 Epoch: 6 Global Step: 102550 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:08:21,328-Speed 5107.63 samples/sec Loss 3.0681 LearningRate 0.0480 Epoch: 6 Global Step: 102560 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:08:23,301-Speed 5190.75 samples/sec Loss 3.0324 LearningRate 0.0480 Epoch: 6 Global Step: 102570 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:08:25,283-Speed 5167.79 samples/sec Loss 3.0759 LearningRate 0.0480 Epoch: 6 Global Step: 102580 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:08:27,271-Speed 5152.93 samples/sec Loss 3.0673 LearningRate 0.0480 Epoch: 6 Global Step: 102590 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:08:29,271-Speed 5121.59 samples/sec Loss 3.1318 LearningRate 0.0480 Epoch: 6 Global Step: 102600 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:08:31,238-Speed 5207.34 samples/sec Loss 3.0589 LearningRate 0.0480 Epoch: 6 Global Step: 102610 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:08:33,230-Speed 5143.58 samples/sec Loss 3.1034 LearningRate 0.0480 Epoch: 6 Global Step: 102620 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:08:35,228-Speed 5125.93 samples/sec Loss 3.1051 LearningRate 0.0480 Epoch: 6 Global Step: 102630 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:08:37,209-Speed 5171.52 samples/sec Loss 3.0664 LearningRate 0.0480 Epoch: 6 Global Step: 102640 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:08:39,199-Speed 5147.28 samples/sec Loss 3.0560 LearningRate 0.0480 Epoch: 6 Global Step: 102650 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:08:41,193-Speed 5137.61 samples/sec Loss 3.1070 LearningRate 0.0480 Epoch: 6 Global Step: 102660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:08:43,165-Speed 5195.71 samples/sec Loss 3.0191 LearningRate 0.0479 Epoch: 6 Global Step: 102670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:08:45,159-Speed 5136.38 samples/sec Loss 3.1102 LearningRate 0.0479 Epoch: 6 Global Step: 102680 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:08:47,141-Speed 5167.61 samples/sec Loss 3.0907 LearningRate 0.0479 Epoch: 6 Global Step: 102690 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:08:49,137-Speed 5131.24 samples/sec Loss 3.1384 LearningRate 0.0479 Epoch: 6 Global Step: 102700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:08:51,169-Speed 5041.62 samples/sec Loss 3.1438 LearningRate 0.0479 Epoch: 6 Global Step: 102710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:08:53,151-Speed 5167.49 samples/sec Loss 3.0252 LearningRate 0.0479 Epoch: 6 Global Step: 102720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:08:55,126-Speed 5186.40 samples/sec Loss 3.0223 LearningRate 0.0479 Epoch: 6 Global Step: 102730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:08:57,116-Speed 5148.08 samples/sec Loss 3.1171 LearningRate 0.0479 Epoch: 6 Global Step: 102740 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:08:59,102-Speed 5157.53 samples/sec Loss 3.1509 LearningRate 0.0479 Epoch: 6 Global Step: 102750 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:09:01,112-Speed 5095.74 samples/sec Loss 3.0788 LearningRate 0.0479 Epoch: 6 Global Step: 102760 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:09:03,113-Speed 5120.17 samples/sec Loss 3.2048 LearningRate 0.0479 Epoch: 6 Global Step: 102770 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:09:05,091-Speed 5178.87 samples/sec Loss 3.1093 LearningRate 0.0479 Epoch: 6 Global Step: 102780 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:09:07,062-Speed 5196.59 samples/sec Loss 3.1301 LearningRate 0.0479 Epoch: 6 Global Step: 102790 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:09:09,039-Speed 5180.72 samples/sec Loss 3.0142 LearningRate 0.0479 Epoch: 6 Global Step: 102800 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:09:11,027-Speed 5153.71 samples/sec Loss 3.1124 LearningRate 0.0479 Epoch: 6 Global Step: 102810 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:09:13,024-Speed 5129.76 samples/sec Loss 3.1617 LearningRate 0.0479 Epoch: 6 Global Step: 102820 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:09:14,996-Speed 5194.02 samples/sec Loss 3.0604 LearningRate 0.0479 Epoch: 6 Global Step: 102830 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-11 06:09:16,977-Speed 5171.82 samples/sec Loss 3.1415 LearningRate 0.0479 Epoch: 6 Global Step: 102840 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:09:18,953-Speed 5182.59 samples/sec Loss 3.1035 LearningRate 0.0479 Epoch: 6 Global Step: 102850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-11 06:09:20,923-Speed 5198.25 samples/sec Loss 3.0580 LearningRate 0.0479 Epoch: 6 Global Step: 102860 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 06:09:22,918-Speed 5135.06 samples/sec Loss 3.0834 LearningRate 0.0479 Epoch: 6 Global Step: 102870 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 06:09:24,886-Speed 5205.05 samples/sec Loss 3.1548 LearningRate 0.0479 Epoch: 6 Global Step: 102880 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-11 06:09:26,871-Speed 5162.08 samples/sec Loss 3.0856 LearningRate 0.0479 Epoch: 6 Global Step: 102890 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:09:28,853-Speed 5169.03 samples/sec Loss 3.1335 LearningRate 0.0479 Epoch: 6 Global Step: 102900 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:09:30,835-Speed 5166.24 samples/sec Loss 3.1148 LearningRate 0.0478 Epoch: 6 Global Step: 102910 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:09:32,825-Speed 5148.58 samples/sec Loss 3.0721 LearningRate 0.0478 Epoch: 6 Global Step: 102920 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:09:34,808-Speed 5164.15 samples/sec Loss 3.0110 LearningRate 0.0478 Epoch: 6 Global Step: 102930 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:09:36,807-Speed 5125.90 samples/sec Loss 3.1291 LearningRate 0.0478 Epoch: 6 Global Step: 102940 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:09:38,801-Speed 5135.70 samples/sec Loss 3.1223 LearningRate 0.0478 Epoch: 6 Global Step: 102950 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:09:40,813-Speed 5091.65 samples/sec Loss 3.1411 LearningRate 0.0478 Epoch: 6 Global Step: 102960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:09:42,796-Speed 5165.54 samples/sec Loss 3.1379 LearningRate 0.0478 Epoch: 6 Global Step: 102970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:09:44,768-Speed 5197.14 samples/sec Loss 3.1096 LearningRate 0.0478 Epoch: 6 Global Step: 102980 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:09:46,739-Speed 5197.72 samples/sec Loss 3.0807 LearningRate 0.0478 Epoch: 6 Global Step: 102990 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:09:48,717-Speed 5178.87 samples/sec Loss 3.0283 LearningRate 0.0478 Epoch: 6 Global Step: 103000 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:09:50,723-Speed 5107.00 samples/sec Loss 3.1715 LearningRate 0.0478 Epoch: 6 Global Step: 103010 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:09:52,703-Speed 5173.17 samples/sec Loss 3.1019 LearningRate 0.0478 Epoch: 6 Global Step: 103020 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:09:54,682-Speed 5174.74 samples/sec Loss 3.1488 LearningRate 0.0478 Epoch: 6 Global Step: 103030 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:09:56,658-Speed 5184.89 samples/sec Loss 3.1248 LearningRate 0.0478 Epoch: 6 Global Step: 103040 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:09:58,660-Speed 5117.11 samples/sec Loss 3.1413 LearningRate 0.0478 Epoch: 6 Global Step: 103050 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:10:00,663-Speed 5111.67 samples/sec Loss 3.0827 LearningRate 0.0478 Epoch: 6 Global Step: 103060 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:10:02,652-Speed 5151.49 samples/sec Loss 3.1202 LearningRate 0.0478 Epoch: 6 Global Step: 103070 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:10:04,650-Speed 5125.89 samples/sec Loss 3.1182 LearningRate 0.0478 Epoch: 6 Global Step: 103080 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:10:06,642-Speed 5144.07 samples/sec Loss 3.0900 LearningRate 0.0478 Epoch: 6 Global Step: 103090 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:10:08,622-Speed 5173.85 samples/sec Loss 3.0812 LearningRate 0.0478 Epoch: 6 Global Step: 103100 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:10:10,597-Speed 5185.23 samples/sec Loss 3.1247 LearningRate 0.0478 Epoch: 6 Global Step: 103110 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:10:12,593-Speed 5133.44 samples/sec Loss 3.1633 LearningRate 0.0478 Epoch: 6 Global Step: 103120 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:10:14,582-Speed 5149.69 samples/sec Loss 3.1172 LearningRate 0.0478 Epoch: 6 Global Step: 103130 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:10:16,559-Speed 5179.41 samples/sec Loss 3.1065 LearningRate 0.0478 Epoch: 6 Global Step: 103140 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:10:18,534-Speed 5187.06 samples/sec Loss 3.0979 LearningRate 0.0477 Epoch: 6 Global Step: 103150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:10:20,504-Speed 5198.97 samples/sec Loss 3.0523 LearningRate 0.0477 Epoch: 6 Global Step: 103160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:10:22,471-Speed 5208.66 samples/sec Loss 3.1984 LearningRate 0.0477 Epoch: 6 Global Step: 103170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:10:24,475-Speed 5109.69 samples/sec Loss 3.2212 LearningRate 0.0477 Epoch: 6 Global Step: 103180 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:10:26,451-Speed 5186.27 samples/sec Loss 3.1027 LearningRate 0.0477 Epoch: 6 Global Step: 103190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:10:28,440-Speed 5151.09 samples/sec Loss 3.1788 LearningRate 0.0477 Epoch: 6 Global Step: 103200 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:10:30,421-Speed 5168.70 samples/sec Loss 3.1156 LearningRate 0.0477 Epoch: 6 Global Step: 103210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:10:32,420-Speed 5124.30 samples/sec Loss 3.1571 LearningRate 0.0477 Epoch: 6 Global Step: 103220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:10:34,388-Speed 5204.68 samples/sec Loss 3.1401 LearningRate 0.0477 Epoch: 6 Global Step: 103230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:10:36,373-Speed 5161.06 samples/sec Loss 3.1451 LearningRate 0.0477 Epoch: 6 Global Step: 103240 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:10:38,358-Speed 5161.88 samples/sec Loss 3.1336 LearningRate 0.0477 Epoch: 6 Global Step: 103250 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:10:40,348-Speed 5146.34 samples/sec Loss 3.1201 LearningRate 0.0477 Epoch: 6 Global Step: 103260 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:10:42,326-Speed 5178.37 samples/sec Loss 3.1792 LearningRate 0.0477 Epoch: 6 Global Step: 103270 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:10:44,318-Speed 5141.81 samples/sec Loss 3.1306 LearningRate 0.0477 Epoch: 6 Global Step: 103280 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:10:46,318-Speed 5121.68 samples/sec Loss 3.1688 LearningRate 0.0477 Epoch: 6 Global Step: 103290 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:10:48,310-Speed 5142.77 samples/sec Loss 3.1438 LearningRate 0.0477 Epoch: 6 Global Step: 103300 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:10:50,301-Speed 5146.30 samples/sec Loss 3.1794 LearningRate 0.0477 Epoch: 6 Global Step: 103310 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:10:52,286-Speed 5159.26 samples/sec Loss 3.1623 LearningRate 0.0477 Epoch: 6 Global Step: 103320 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:10:54,265-Speed 5176.59 samples/sec Loss 3.0797 LearningRate 0.0477 Epoch: 6 Global Step: 103330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:10:56,246-Speed 5169.22 samples/sec Loss 3.0601 LearningRate 0.0477 Epoch: 6 Global Step: 103340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:10:58,232-Speed 5159.66 samples/sec Loss 3.1402 LearningRate 0.0477 Epoch: 6 Global Step: 103350 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:11:00,214-Speed 5165.74 samples/sec Loss 3.1358 LearningRate 0.0477 Epoch: 6 Global Step: 103360 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:11:02,184-Speed 5202.03 samples/sec Loss 3.0669 LearningRate 0.0477 Epoch: 6 Global Step: 103370 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:11:04,158-Speed 5188.56 samples/sec Loss 3.1277 LearningRate 0.0477 Epoch: 6 Global Step: 103380 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:11:06,134-Speed 5184.51 samples/sec Loss 3.1356 LearningRate 0.0476 Epoch: 6 Global Step: 103390 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:11:08,102-Speed 5203.76 samples/sec Loss 3.1076 LearningRate 0.0476 Epoch: 6 Global Step: 103400 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:11:10,100-Speed 5129.18 samples/sec Loss 3.1377 LearningRate 0.0476 Epoch: 6 Global Step: 103410 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:11:12,083-Speed 5165.37 samples/sec Loss 3.1369 LearningRate 0.0476 Epoch: 6 Global Step: 103420 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:11:14,080-Speed 5128.38 samples/sec Loss 3.0797 LearningRate 0.0476 Epoch: 6 Global Step: 103430 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:11:16,079-Speed 5124.88 samples/sec Loss 3.1376 LearningRate 0.0476 Epoch: 6 Global Step: 103440 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:11:18,056-Speed 5181.19 samples/sec Loss 3.1516 LearningRate 0.0476 Epoch: 6 Global Step: 103450 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:11:20,033-Speed 5180.27 samples/sec Loss 3.1617 LearningRate 0.0476 Epoch: 6 Global Step: 103460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:11:22,011-Speed 5178.66 samples/sec Loss 3.1202 LearningRate 0.0476 Epoch: 6 Global Step: 103470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:11:24,016-Speed 5109.12 samples/sec Loss 3.1134 LearningRate 0.0476 Epoch: 6 Global Step: 103480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:11:26,009-Speed 5140.37 samples/sec Loss 3.0484 LearningRate 0.0476 Epoch: 6 Global Step: 103490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:11:27,987-Speed 5177.72 samples/sec Loss 3.0714 LearningRate 0.0476 Epoch: 6 Global Step: 103500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:11:29,963-Speed 5183.91 samples/sec Loss 3.1250 LearningRate 0.0476 Epoch: 6 Global Step: 103510 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:11:31,936-Speed 5192.31 samples/sec Loss 3.1369 LearningRate 0.0476 Epoch: 6 Global Step: 103520 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:11:33,917-Speed 5170.35 samples/sec Loss 3.1572 LearningRate 0.0476 Epoch: 6 Global Step: 103530 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:11:35,908-Speed 5146.74 samples/sec Loss 3.1317 LearningRate 0.0476 Epoch: 6 Global Step: 103540 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:11:37,907-Speed 5123.25 samples/sec Loss 3.1101 LearningRate 0.0476 Epoch: 6 Global Step: 103550 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:11:39,880-Speed 5191.16 samples/sec Loss 3.2058 LearningRate 0.0476 Epoch: 6 Global Step: 103560 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:11:41,863-Speed 5166.72 samples/sec Loss 3.2165 LearningRate 0.0476 Epoch: 6 Global Step: 103570 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:11:43,852-Speed 5149.18 samples/sec Loss 3.2448 LearningRate 0.0476 Epoch: 6 Global Step: 103580 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:11:45,842-Speed 5148.52 samples/sec Loss 3.1223 LearningRate 0.0476 Epoch: 6 Global Step: 103590 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:11:47,876-Speed 5033.95 samples/sec Loss 3.1492 LearningRate 0.0476 Epoch: 6 Global Step: 103600 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:11:49,868-Speed 5144.51 samples/sec Loss 3.1551 LearningRate 0.0476 Epoch: 6 Global Step: 103610 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:11:51,844-Speed 5183.70 samples/sec Loss 3.1198 LearningRate 0.0476 Epoch: 6 Global Step: 103620 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:11:53,871-Speed 5052.75 samples/sec Loss 3.2053 LearningRate 0.0475 Epoch: 6 Global Step: 103630 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:11:55,852-Speed 5172.43 samples/sec Loss 3.1772 LearningRate 0.0475 Epoch: 6 Global Step: 103640 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:11:57,825-Speed 5189.67 samples/sec Loss 3.1949 LearningRate 0.0475 Epoch: 6 Global Step: 103650 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:11:59,805-Speed 5174.79 samples/sec Loss 3.1638 LearningRate 0.0475 Epoch: 6 Global Step: 103660 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:12:01,793-Speed 5151.08 samples/sec Loss 3.2135 LearningRate 0.0475 Epoch: 6 Global Step: 103670 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:12:03,786-Speed 5140.89 samples/sec Loss 3.1281 LearningRate 0.0475 Epoch: 6 Global Step: 103680 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:12:05,764-Speed 5177.16 samples/sec Loss 3.1739 LearningRate 0.0475 Epoch: 6 Global Step: 103690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:12:07,735-Speed 5196.31 samples/sec Loss 3.1604 LearningRate 0.0475 Epoch: 6 Global Step: 103700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:12:09,753-Speed 5076.71 samples/sec Loss 3.1535 LearningRate 0.0475 Epoch: 6 Global Step: 103710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:12:11,745-Speed 5144.84 samples/sec Loss 3.2318 LearningRate 0.0475 Epoch: 6 Global Step: 103720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:12:13,715-Speed 5199.80 samples/sec Loss 3.1896 LearningRate 0.0475 Epoch: 6 Global Step: 103730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:12:15,696-Speed 5170.75 samples/sec Loss 3.1965 LearningRate 0.0475 Epoch: 6 Global Step: 103740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:12:17,683-Speed 5153.36 samples/sec Loss 3.2782 LearningRate 0.0475 Epoch: 6 Global Step: 103750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:12:19,656-Speed 5191.62 samples/sec Loss 3.1918 LearningRate 0.0475 Epoch: 6 Global Step: 103760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:12:21,637-Speed 5171.21 samples/sec Loss 3.2253 LearningRate 0.0475 Epoch: 6 Global Step: 103770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:12:23,638-Speed 5119.83 samples/sec Loss 3.1878 LearningRate 0.0475 Epoch: 6 Global Step: 103780 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:12:25,630-Speed 5142.18 samples/sec Loss 3.1497 LearningRate 0.0475 Epoch: 6 Global Step: 103790 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:12:27,629-Speed 5124.09 samples/sec Loss 3.1838 LearningRate 0.0475 Epoch: 6 Global Step: 103800 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:12:29,602-Speed 5190.90 samples/sec Loss 3.1904 LearningRate 0.0475 Epoch: 6 Global Step: 103810 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:12:31,575-Speed 5191.11 samples/sec Loss 3.2599 LearningRate 0.0475 Epoch: 6 Global Step: 103820 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:12:33,549-Speed 5190.18 samples/sec Loss 3.1263 LearningRate 0.0475 Epoch: 6 Global Step: 103830 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:12:35,537-Speed 5153.36 samples/sec Loss 3.1656 LearningRate 0.0475 Epoch: 6 Global Step: 103840 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:12:37,508-Speed 5196.98 samples/sec Loss 3.1250 LearningRate 0.0475 Epoch: 6 Global Step: 103850 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:12:39,505-Speed 5128.72 samples/sec Loss 3.1662 LearningRate 0.0475 Epoch: 6 Global Step: 103860 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:12:41,526-Speed 5068.91 samples/sec Loss 3.1828 LearningRate 0.0475 Epoch: 6 Global Step: 103870 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:12:43,499-Speed 5191.22 samples/sec Loss 3.2248 LearningRate 0.0474 Epoch: 6 Global Step: 103880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:12:45,501-Speed 5116.35 samples/sec Loss 3.1572 LearningRate 0.0474 Epoch: 6 Global Step: 103890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:12:47,526-Speed 5058.26 samples/sec Loss 3.2129 LearningRate 0.0474 Epoch: 6 Global Step: 103900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:12:49,517-Speed 5144.91 samples/sec Loss 3.1740 LearningRate 0.0474 Epoch: 6 Global Step: 103910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:12:51,494-Speed 5182.52 samples/sec Loss 3.1104 LearningRate 0.0474 Epoch: 6 Global Step: 103920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:12:53,481-Speed 5156.13 samples/sec Loss 3.1280 LearningRate 0.0474 Epoch: 6 Global Step: 103930 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:12:55,456-Speed 5186.10 samples/sec Loss 3.1731 LearningRate 0.0474 Epoch: 6 Global Step: 103940 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:12:57,459-Speed 5113.52 samples/sec Loss 3.2382 LearningRate 0.0474 Epoch: 6 Global Step: 103950 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:12:59,449-Speed 5147.03 samples/sec Loss 3.2170 LearningRate 0.0474 Epoch: 6 Global Step: 103960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:13:01,420-Speed 5196.38 samples/sec Loss 3.2212 LearningRate 0.0474 Epoch: 6 Global Step: 103970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:13:03,393-Speed 5194.19 samples/sec Loss 3.1700 LearningRate 0.0474 Epoch: 6 Global Step: 103980 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:13:05,366-Speed 5191.73 samples/sec Loss 3.2112 LearningRate 0.0474 Epoch: 6 Global Step: 103990 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:13:07,341-Speed 5185.68 samples/sec Loss 3.1678 LearningRate 0.0474 Epoch: 6 Global Step: 104000 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:13:34,009-[lfw][104000]XNorm: 22.175999 Training: 2022-04-11 06:13:34,009-[lfw][104000]Accuracy-Flip: 0.99750+-0.00271 Training: 2022-04-11 06:13:34,010-[lfw][104000]Accuracy-Highest: 0.99817 Training: 2022-04-11 06:14:04,791-[cfp_fp][104000]XNorm: 20.243633 Training: 2022-04-11 06:14:04,791-[cfp_fp][104000]Accuracy-Flip: 0.97829+-0.00641 Training: 2022-04-11 06:14:04,792-[cfp_fp][104000]Accuracy-Highest: 0.98086 Training: 2022-04-11 06:14:31,260-[agedb_30][104000]XNorm: 21.962710 Training: 2022-04-11 06:14:31,261-[agedb_30][104000]Accuracy-Flip: 0.97917+-0.00664 Training: 2022-04-11 06:14:31,261-[agedb_30][104000]Accuracy-Highest: 0.97950 Training: 2022-04-11 06:14:33,242-Speed 119.21 samples/sec Loss 3.2095 LearningRate 0.0474 Epoch: 6 Global Step: 104010 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:14:35,213-Speed 5197.72 samples/sec Loss 3.1002 LearningRate 0.0474 Epoch: 6 Global Step: 104020 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:14:37,224-Speed 5092.91 samples/sec Loss 3.1482 LearningRate 0.0474 Epoch: 6 Global Step: 104030 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:14:39,209-Speed 5161.23 samples/sec Loss 3.1348 LearningRate 0.0474 Epoch: 6 Global Step: 104040 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:14:41,185-Speed 5187.77 samples/sec Loss 3.1770 LearningRate 0.0474 Epoch: 6 Global Step: 104050 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:14:43,164-Speed 5174.76 samples/sec Loss 3.1588 LearningRate 0.0474 Epoch: 6 Global Step: 104060 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:14:45,158-Speed 5138.55 samples/sec Loss 3.1591 LearningRate 0.0474 Epoch: 6 Global Step: 104070 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:14:47,139-Speed 5168.49 samples/sec Loss 3.1464 LearningRate 0.0474 Epoch: 6 Global Step: 104080 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:14:49,138-Speed 5124.04 samples/sec Loss 3.1250 LearningRate 0.0474 Epoch: 6 Global Step: 104090 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:14:51,122-Speed 5163.89 samples/sec Loss 3.1315 LearningRate 0.0474 Epoch: 6 Global Step: 104100 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:14:53,106-Speed 5161.96 samples/sec Loss 3.1743 LearningRate 0.0474 Epoch: 6 Global Step: 104110 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:14:55,082-Speed 5184.55 samples/sec Loss 3.2024 LearningRate 0.0473 Epoch: 6 Global Step: 104120 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:14:57,076-Speed 5137.55 samples/sec Loss 3.2153 LearningRate 0.0473 Epoch: 6 Global Step: 104130 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:14:59,064-Speed 5151.54 samples/sec Loss 3.1989 LearningRate 0.0473 Epoch: 6 Global Step: 104140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:15:01,049-Speed 5160.36 samples/sec Loss 3.2202 LearningRate 0.0473 Epoch: 6 Global Step: 104150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:15:03,036-Speed 5156.82 samples/sec Loss 3.1521 LearningRate 0.0473 Epoch: 6 Global Step: 104160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:15:05,036-Speed 5122.48 samples/sec Loss 3.0973 LearningRate 0.0473 Epoch: 6 Global Step: 104170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:15:07,023-Speed 5155.64 samples/sec Loss 3.1702 LearningRate 0.0473 Epoch: 6 Global Step: 104180 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:15:09,010-Speed 5153.91 samples/sec Loss 3.2247 LearningRate 0.0473 Epoch: 6 Global Step: 104190 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:15:11,002-Speed 5141.87 samples/sec Loss 3.1848 LearningRate 0.0473 Epoch: 6 Global Step: 104200 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:15:13,010-Speed 5100.98 samples/sec Loss 3.2498 LearningRate 0.0473 Epoch: 6 Global Step: 104210 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:15:15,006-Speed 5132.00 samples/sec Loss 3.1934 LearningRate 0.0473 Epoch: 6 Global Step: 104220 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:15:17,015-Speed 5099.12 samples/sec Loss 3.2291 LearningRate 0.0473 Epoch: 6 Global Step: 104230 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:15:18,992-Speed 5180.74 samples/sec Loss 3.2019 LearningRate 0.0473 Epoch: 6 Global Step: 104240 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:15:20,977-Speed 5160.93 samples/sec Loss 3.1726 LearningRate 0.0473 Epoch: 6 Global Step: 104250 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:15:22,959-Speed 5169.36 samples/sec Loss 3.1366 LearningRate 0.0473 Epoch: 6 Global Step: 104260 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:15:24,950-Speed 5144.33 samples/sec Loss 3.1695 LearningRate 0.0473 Epoch: 6 Global Step: 104270 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:15:26,934-Speed 5163.54 samples/sec Loss 3.1871 LearningRate 0.0473 Epoch: 6 Global Step: 104280 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:15:28,964-Speed 5045.19 samples/sec Loss 3.1641 LearningRate 0.0473 Epoch: 6 Global Step: 104290 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:15:30,940-Speed 5184.42 samples/sec Loss 3.1686 LearningRate 0.0473 Epoch: 6 Global Step: 104300 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:15:32,918-Speed 5178.17 samples/sec Loss 3.1846 LearningRate 0.0473 Epoch: 6 Global Step: 104310 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:15:34,950-Speed 5041.75 samples/sec Loss 3.1886 LearningRate 0.0473 Epoch: 6 Global Step: 104320 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:15:36,948-Speed 5125.55 samples/sec Loss 3.2452 LearningRate 0.0473 Epoch: 6 Global Step: 104330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:15:38,972-Speed 5061.66 samples/sec Loss 3.1859 LearningRate 0.0473 Epoch: 6 Global Step: 104340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:15:40,955-Speed 5165.30 samples/sec Loss 3.2191 LearningRate 0.0473 Epoch: 6 Global Step: 104350 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:15:42,939-Speed 5164.05 samples/sec Loss 3.2580 LearningRate 0.0472 Epoch: 6 Global Step: 104360 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:15:44,935-Speed 5133.11 samples/sec Loss 3.2105 LearningRate 0.0472 Epoch: 6 Global Step: 104370 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:15:46,914-Speed 5175.90 samples/sec Loss 3.2383 LearningRate 0.0472 Epoch: 6 Global Step: 104380 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:15:48,909-Speed 5134.71 samples/sec Loss 3.2471 LearningRate 0.0472 Epoch: 6 Global Step: 104390 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:15:50,902-Speed 5138.13 samples/sec Loss 3.2627 LearningRate 0.0472 Epoch: 6 Global Step: 104400 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:15:52,889-Speed 5154.63 samples/sec Loss 3.2132 LearningRate 0.0472 Epoch: 6 Global Step: 104410 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:15:54,882-Speed 5139.03 samples/sec Loss 3.2468 LearningRate 0.0472 Epoch: 6 Global Step: 104420 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:15:56,885-Speed 5115.72 samples/sec Loss 3.1597 LearningRate 0.0472 Epoch: 6 Global Step: 104430 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:15:58,886-Speed 5118.66 samples/sec Loss 3.2975 LearningRate 0.0472 Epoch: 6 Global Step: 104440 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:16:00,874-Speed 5153.87 samples/sec Loss 3.1853 LearningRate 0.0472 Epoch: 6 Global Step: 104450 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:16:02,895-Speed 5069.38 samples/sec Loss 3.1494 LearningRate 0.0472 Epoch: 6 Global Step: 104460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:16:04,891-Speed 5130.25 samples/sec Loss 3.2086 LearningRate 0.0472 Epoch: 6 Global Step: 104470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:16:06,892-Speed 5123.58 samples/sec Loss 3.2532 LearningRate 0.0472 Epoch: 6 Global Step: 104480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:16:08,875-Speed 5163.92 samples/sec Loss 3.1407 LearningRate 0.0472 Epoch: 6 Global Step: 104490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:16:10,871-Speed 5131.64 samples/sec Loss 3.0576 LearningRate 0.0472 Epoch: 6 Global Step: 104500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:16:12,844-Speed 5193.00 samples/sec Loss 3.2197 LearningRate 0.0472 Epoch: 6 Global Step: 104510 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:16:14,833-Speed 5150.88 samples/sec Loss 3.1412 LearningRate 0.0472 Epoch: 6 Global Step: 104520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:16:16,830-Speed 5128.99 samples/sec Loss 3.2486 LearningRate 0.0472 Epoch: 6 Global Step: 104530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:16:18,815-Speed 5160.10 samples/sec Loss 3.2117 LearningRate 0.0472 Epoch: 6 Global Step: 104540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:16:20,775-Speed 5226.40 samples/sec Loss 3.1780 LearningRate 0.0472 Epoch: 6 Global Step: 104550 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:16:22,783-Speed 5102.30 samples/sec Loss 3.1875 LearningRate 0.0472 Epoch: 6 Global Step: 104560 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:16:24,788-Speed 5107.18 samples/sec Loss 3.2688 LearningRate 0.0472 Epoch: 6 Global Step: 104570 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:16:26,766-Speed 5178.26 samples/sec Loss 3.1855 LearningRate 0.0472 Epoch: 6 Global Step: 104580 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:16:28,754-Speed 5153.82 samples/sec Loss 3.1455 LearningRate 0.0472 Epoch: 6 Global Step: 104590 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:16:30,723-Speed 5202.95 samples/sec Loss 3.2476 LearningRate 0.0471 Epoch: 6 Global Step: 104600 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:16:32,701-Speed 5177.75 samples/sec Loss 3.2498 LearningRate 0.0471 Epoch: 6 Global Step: 104610 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:16:34,697-Speed 5132.88 samples/sec Loss 3.2085 LearningRate 0.0471 Epoch: 6 Global Step: 104620 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:16:36,696-Speed 5124.70 samples/sec Loss 3.2059 LearningRate 0.0471 Epoch: 6 Global Step: 104630 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:16:38,684-Speed 5151.15 samples/sec Loss 3.1216 LearningRate 0.0471 Epoch: 6 Global Step: 104640 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:16:40,673-Speed 5149.17 samples/sec Loss 3.1911 LearningRate 0.0471 Epoch: 6 Global Step: 104650 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:16:42,642-Speed 5202.81 samples/sec Loss 3.2101 LearningRate 0.0471 Epoch: 6 Global Step: 104660 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:16:44,619-Speed 5183.55 samples/sec Loss 3.1849 LearningRate 0.0471 Epoch: 6 Global Step: 104670 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:16:46,604-Speed 5159.76 samples/sec Loss 3.2977 LearningRate 0.0471 Epoch: 6 Global Step: 104680 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:16:48,584-Speed 5172.25 samples/sec Loss 3.1649 LearningRate 0.0471 Epoch: 6 Global Step: 104690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:16:50,582-Speed 5126.93 samples/sec Loss 3.1762 LearningRate 0.0471 Epoch: 6 Global Step: 104700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:16:52,562-Speed 5175.22 samples/sec Loss 3.2010 LearningRate 0.0471 Epoch: 6 Global Step: 104710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:16:54,536-Speed 5187.62 samples/sec Loss 3.1978 LearningRate 0.0471 Epoch: 6 Global Step: 104720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:16:56,510-Speed 5190.27 samples/sec Loss 3.2334 LearningRate 0.0471 Epoch: 6 Global Step: 104730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:16:58,498-Speed 5152.70 samples/sec Loss 3.2358 LearningRate 0.0471 Epoch: 6 Global Step: 104740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:17:00,487-Speed 5147.94 samples/sec Loss 3.1298 LearningRate 0.0471 Epoch: 6 Global Step: 104750 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:17:02,481-Speed 5139.03 samples/sec Loss 3.2258 LearningRate 0.0471 Epoch: 6 Global Step: 104760 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:17:04,459-Speed 5178.41 samples/sec Loss 3.2004 LearningRate 0.0471 Epoch: 6 Global Step: 104770 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:17:06,430-Speed 5196.62 samples/sec Loss 3.2381 LearningRate 0.0471 Epoch: 6 Global Step: 104780 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:17:08,397-Speed 5208.73 samples/sec Loss 3.2814 LearningRate 0.0471 Epoch: 6 Global Step: 104790 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:17:10,370-Speed 5192.37 samples/sec Loss 3.1240 LearningRate 0.0471 Epoch: 6 Global Step: 104800 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:17:12,374-Speed 5110.80 samples/sec Loss 3.1623 LearningRate 0.0471 Epoch: 6 Global Step: 104810 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:17:14,353-Speed 5175.29 samples/sec Loss 3.1896 LearningRate 0.0471 Epoch: 6 Global Step: 104820 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:17:16,343-Speed 5147.71 samples/sec Loss 3.2100 LearningRate 0.0471 Epoch: 6 Global Step: 104830 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:17:18,322-Speed 5176.75 samples/sec Loss 3.2598 LearningRate 0.0471 Epoch: 6 Global Step: 104840 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:17:20,301-Speed 5175.17 samples/sec Loss 3.1956 LearningRate 0.0470 Epoch: 6 Global Step: 104850 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:17:22,284-Speed 5165.97 samples/sec Loss 3.1404 LearningRate 0.0470 Epoch: 6 Global Step: 104860 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:17:24,265-Speed 5171.12 samples/sec Loss 3.1757 LearningRate 0.0470 Epoch: 6 Global Step: 104870 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:17:26,248-Speed 5165.03 samples/sec Loss 3.2002 LearningRate 0.0470 Epoch: 6 Global Step: 104880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:17:28,233-Speed 5160.12 samples/sec Loss 3.1394 LearningRate 0.0470 Epoch: 6 Global Step: 104890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:17:30,214-Speed 5172.99 samples/sec Loss 3.2504 LearningRate 0.0470 Epoch: 6 Global Step: 104900 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:17:32,186-Speed 5192.43 samples/sec Loss 3.1755 LearningRate 0.0470 Epoch: 6 Global Step: 104910 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:17:34,187-Speed 5119.02 samples/sec Loss 3.2606 LearningRate 0.0470 Epoch: 6 Global Step: 104920 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:17:36,172-Speed 5160.43 samples/sec Loss 3.2948 LearningRate 0.0470 Epoch: 6 Global Step: 104930 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:17:38,145-Speed 5193.27 samples/sec Loss 3.1761 LearningRate 0.0470 Epoch: 6 Global Step: 104940 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:17:40,111-Speed 5209.46 samples/sec Loss 3.2402 LearningRate 0.0470 Epoch: 6 Global Step: 104950 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:17:42,095-Speed 5162.09 samples/sec Loss 3.1870 LearningRate 0.0470 Epoch: 6 Global Step: 104960 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:17:44,066-Speed 5197.00 samples/sec Loss 3.2685 LearningRate 0.0470 Epoch: 6 Global Step: 104970 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:17:46,038-Speed 5194.89 samples/sec Loss 3.1457 LearningRate 0.0470 Epoch: 6 Global Step: 104980 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:17:48,021-Speed 5165.79 samples/sec Loss 3.2834 LearningRate 0.0470 Epoch: 6 Global Step: 104990 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:17:49,998-Speed 5180.32 samples/sec Loss 3.1706 LearningRate 0.0470 Epoch: 6 Global Step: 105000 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-04-11 06:17:51,980-Speed 5169.27 samples/sec Loss 3.2359 LearningRate 0.0470 Epoch: 6 Global Step: 105010 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:17:53,953-Speed 5192.68 samples/sec Loss 3.2699 LearningRate 0.0470 Epoch: 6 Global Step: 105020 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:17:55,937-Speed 5160.52 samples/sec Loss 3.1810 LearningRate 0.0470 Epoch: 6 Global Step: 105030 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:17:57,912-Speed 5186.90 samples/sec Loss 3.2375 LearningRate 0.0470 Epoch: 6 Global Step: 105040 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:17:59,884-Speed 5194.11 samples/sec Loss 3.3069 LearningRate 0.0470 Epoch: 6 Global Step: 105050 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:18:01,871-Speed 5155.35 samples/sec Loss 3.2091 LearningRate 0.0470 Epoch: 6 Global Step: 105060 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:18:03,863-Speed 5144.84 samples/sec Loss 3.2328 LearningRate 0.0470 Epoch: 6 Global Step: 105070 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:18:05,845-Speed 5166.04 samples/sec Loss 3.2649 LearningRate 0.0470 Epoch: 6 Global Step: 105080 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:18:07,832-Speed 5158.13 samples/sec Loss 3.2598 LearningRate 0.0469 Epoch: 6 Global Step: 105090 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:18:09,810-Speed 5177.15 samples/sec Loss 3.2424 LearningRate 0.0469 Epoch: 6 Global Step: 105100 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:18:11,808-Speed 5126.83 samples/sec Loss 3.2993 LearningRate 0.0469 Epoch: 6 Global Step: 105110 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:18:13,790-Speed 5168.36 samples/sec Loss 3.2697 LearningRate 0.0469 Epoch: 6 Global Step: 105120 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:18:15,760-Speed 5199.87 samples/sec Loss 3.1922 LearningRate 0.0469 Epoch: 6 Global Step: 105130 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:18:17,813-Speed 4989.62 samples/sec Loss 3.2878 LearningRate 0.0469 Epoch: 6 Global Step: 105140 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:18:19,786-Speed 5189.45 samples/sec Loss 3.1743 LearningRate 0.0469 Epoch: 6 Global Step: 105150 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:18:21,774-Speed 5153.57 samples/sec Loss 3.2344 LearningRate 0.0469 Epoch: 6 Global Step: 105160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:18:23,748-Speed 5188.12 samples/sec Loss 3.1984 LearningRate 0.0469 Epoch: 6 Global Step: 105170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:18:25,732-Speed 5165.21 samples/sec Loss 3.2891 LearningRate 0.0469 Epoch: 6 Global Step: 105180 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:18:27,750-Speed 5075.34 samples/sec Loss 3.2699 LearningRate 0.0469 Epoch: 6 Global Step: 105190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:18:29,750-Speed 5122.28 samples/sec Loss 3.2703 LearningRate 0.0469 Epoch: 6 Global Step: 105200 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:18:31,721-Speed 5198.18 samples/sec Loss 3.2088 LearningRate 0.0469 Epoch: 6 Global Step: 105210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:18:33,699-Speed 5176.79 samples/sec Loss 3.1887 LearningRate 0.0469 Epoch: 6 Global Step: 105220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:18:35,701-Speed 5117.40 samples/sec Loss 3.2686 LearningRate 0.0469 Epoch: 6 Global Step: 105230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:18:37,709-Speed 5099.91 samples/sec Loss 3.2080 LearningRate 0.0469 Epoch: 6 Global Step: 105240 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:18:39,691-Speed 5170.20 samples/sec Loss 3.1807 LearningRate 0.0469 Epoch: 6 Global Step: 105250 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:18:41,678-Speed 5155.03 samples/sec Loss 3.1916 LearningRate 0.0469 Epoch: 6 Global Step: 105260 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:18:43,675-Speed 5128.26 samples/sec Loss 3.2246 LearningRate 0.0469 Epoch: 6 Global Step: 105270 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:18:45,662-Speed 5156.78 samples/sec Loss 3.2791 LearningRate 0.0469 Epoch: 6 Global Step: 105280 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:18:47,654-Speed 5143.37 samples/sec Loss 3.2121 LearningRate 0.0469 Epoch: 6 Global Step: 105290 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:18:49,627-Speed 5192.00 samples/sec Loss 3.2675 LearningRate 0.0469 Epoch: 6 Global Step: 105300 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:18:51,598-Speed 5194.93 samples/sec Loss 3.2035 LearningRate 0.0469 Epoch: 6 Global Step: 105310 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:18:53,573-Speed 5188.64 samples/sec Loss 3.1743 LearningRate 0.0469 Epoch: 6 Global Step: 105320 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:18:55,570-Speed 5126.86 samples/sec Loss 3.1878 LearningRate 0.0469 Epoch: 6 Global Step: 105330 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:18:57,550-Speed 5173.45 samples/sec Loss 3.1852 LearningRate 0.0468 Epoch: 6 Global Step: 105340 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:18:59,545-Speed 5135.02 samples/sec Loss 3.1850 LearningRate 0.0468 Epoch: 6 Global Step: 105350 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:19:01,553-Speed 5101.99 samples/sec Loss 3.1957 LearningRate 0.0468 Epoch: 6 Global Step: 105360 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:19:03,535-Speed 5169.36 samples/sec Loss 3.2430 LearningRate 0.0468 Epoch: 6 Global Step: 105370 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:19:05,533-Speed 5126.43 samples/sec Loss 3.2130 LearningRate 0.0468 Epoch: 6 Global Step: 105380 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:19:07,511-Speed 5179.73 samples/sec Loss 3.2631 LearningRate 0.0468 Epoch: 6 Global Step: 105390 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:19:09,507-Speed 5131.60 samples/sec Loss 3.2187 LearningRate 0.0468 Epoch: 6 Global Step: 105400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:19:11,520-Speed 5088.30 samples/sec Loss 3.2219 LearningRate 0.0468 Epoch: 6 Global Step: 105410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:19:13,527-Speed 5103.80 samples/sec Loss 3.1574 LearningRate 0.0468 Epoch: 6 Global Step: 105420 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:19:15,505-Speed 5178.93 samples/sec Loss 3.1979 LearningRate 0.0468 Epoch: 6 Global Step: 105430 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:19:17,479-Speed 5186.76 samples/sec Loss 3.1905 LearningRate 0.0468 Epoch: 6 Global Step: 105440 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:19:19,462-Speed 5167.45 samples/sec Loss 3.2305 LearningRate 0.0468 Epoch: 6 Global Step: 105450 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:19:21,446-Speed 5162.65 samples/sec Loss 3.2203 LearningRate 0.0468 Epoch: 6 Global Step: 105460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:19:23,445-Speed 5124.92 samples/sec Loss 3.2494 LearningRate 0.0468 Epoch: 6 Global Step: 105470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:19:25,460-Speed 5083.01 samples/sec Loss 3.1749 LearningRate 0.0468 Epoch: 6 Global Step: 105480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:19:27,449-Speed 5148.73 samples/sec Loss 3.2845 LearningRate 0.0468 Epoch: 6 Global Step: 105490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:19:29,442-Speed 5140.23 samples/sec Loss 3.2059 LearningRate 0.0468 Epoch: 6 Global Step: 105500 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:19:31,437-Speed 5134.77 samples/sec Loss 3.3112 LearningRate 0.0468 Epoch: 6 Global Step: 105510 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:19:33,426-Speed 5150.70 samples/sec Loss 3.2344 LearningRate 0.0468 Epoch: 6 Global Step: 105520 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:19:35,425-Speed 5123.49 samples/sec Loss 3.1928 LearningRate 0.0468 Epoch: 6 Global Step: 105530 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:19:37,402-Speed 5182.88 samples/sec Loss 3.2533 LearningRate 0.0468 Epoch: 6 Global Step: 105540 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:19:39,423-Speed 5067.63 samples/sec Loss 3.1979 LearningRate 0.0468 Epoch: 6 Global Step: 105550 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:19:41,416-Speed 5138.81 samples/sec Loss 3.2559 LearningRate 0.0468 Epoch: 6 Global Step: 105560 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:19:43,392-Speed 5185.54 samples/sec Loss 3.1968 LearningRate 0.0468 Epoch: 6 Global Step: 105570 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:19:45,365-Speed 5190.79 samples/sec Loss 3.1465 LearningRate 0.0467 Epoch: 6 Global Step: 105580 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:19:47,360-Speed 5133.88 samples/sec Loss 3.2378 LearningRate 0.0467 Epoch: 6 Global Step: 105590 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:19:49,338-Speed 5178.36 samples/sec Loss 3.1885 LearningRate 0.0467 Epoch: 6 Global Step: 105600 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:19:51,325-Speed 5156.09 samples/sec Loss 3.2412 LearningRate 0.0467 Epoch: 6 Global Step: 105610 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:19:53,301-Speed 5183.04 samples/sec Loss 3.2432 LearningRate 0.0467 Epoch: 6 Global Step: 105620 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:19:55,317-Speed 5082.49 samples/sec Loss 3.1929 LearningRate 0.0467 Epoch: 6 Global Step: 105630 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:19:57,290-Speed 5191.37 samples/sec Loss 3.1788 LearningRate 0.0467 Epoch: 6 Global Step: 105640 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:19:59,297-Speed 5104.45 samples/sec Loss 3.1573 LearningRate 0.0467 Epoch: 6 Global Step: 105650 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:20:01,277-Speed 5172.59 samples/sec Loss 3.2791 LearningRate 0.0467 Epoch: 6 Global Step: 105660 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:20:03,285-Speed 5102.42 samples/sec Loss 3.2311 LearningRate 0.0467 Epoch: 6 Global Step: 105670 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:20:05,286-Speed 5119.67 samples/sec Loss 3.2257 LearningRate 0.0467 Epoch: 6 Global Step: 105680 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:20:07,275-Speed 5149.25 samples/sec Loss 3.2505 LearningRate 0.0467 Epoch: 6 Global Step: 105690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:20:09,275-Speed 5121.88 samples/sec Loss 3.1606 LearningRate 0.0467 Epoch: 6 Global Step: 105700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:20:11,256-Speed 5170.38 samples/sec Loss 3.1556 LearningRate 0.0467 Epoch: 6 Global Step: 105710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:20:13,245-Speed 5148.86 samples/sec Loss 3.1456 LearningRate 0.0467 Epoch: 6 Global Step: 105720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:20:15,234-Speed 5151.37 samples/sec Loss 3.2666 LearningRate 0.0467 Epoch: 6 Global Step: 105730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:20:17,248-Speed 5086.08 samples/sec Loss 3.2012 LearningRate 0.0467 Epoch: 6 Global Step: 105740 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:20:19,232-Speed 5162.90 samples/sec Loss 3.2186 LearningRate 0.0467 Epoch: 6 Global Step: 105750 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:20:21,210-Speed 5180.26 samples/sec Loss 3.1850 LearningRate 0.0467 Epoch: 6 Global Step: 105760 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:20:23,195-Speed 5158.90 samples/sec Loss 3.3759 LearningRate 0.0467 Epoch: 6 Global Step: 105770 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:20:25,171-Speed 5184.20 samples/sec Loss 3.2137 LearningRate 0.0467 Epoch: 6 Global Step: 105780 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:20:27,158-Speed 5155.52 samples/sec Loss 3.3020 LearningRate 0.0467 Epoch: 6 Global Step: 105790 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:20:29,136-Speed 5178.46 samples/sec Loss 3.2008 LearningRate 0.0467 Epoch: 6 Global Step: 105800 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:20:31,125-Speed 5148.43 samples/sec Loss 3.2185 LearningRate 0.0467 Epoch: 6 Global Step: 105810 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:20:33,104-Speed 5176.42 samples/sec Loss 3.3000 LearningRate 0.0466 Epoch: 6 Global Step: 105820 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:20:35,087-Speed 5166.03 samples/sec Loss 3.2201 LearningRate 0.0466 Epoch: 6 Global Step: 105830 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:20:37,068-Speed 5170.59 samples/sec Loss 3.2911 LearningRate 0.0466 Epoch: 6 Global Step: 105840 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:20:39,048-Speed 5172.78 samples/sec Loss 3.2255 LearningRate 0.0466 Epoch: 6 Global Step: 105850 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:20:41,059-Speed 5094.74 samples/sec Loss 3.2534 LearningRate 0.0466 Epoch: 6 Global Step: 105860 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:20:43,041-Speed 5168.30 samples/sec Loss 3.2726 LearningRate 0.0466 Epoch: 6 Global Step: 105870 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:20:45,020-Speed 5176.15 samples/sec Loss 3.2562 LearningRate 0.0466 Epoch: 6 Global Step: 105880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:20:47,017-Speed 5130.99 samples/sec Loss 3.2297 LearningRate 0.0466 Epoch: 6 Global Step: 105890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:20:48,999-Speed 5166.90 samples/sec Loss 3.3628 LearningRate 0.0466 Epoch: 6 Global Step: 105900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:20:50,990-Speed 5146.80 samples/sec Loss 3.2975 LearningRate 0.0466 Epoch: 6 Global Step: 105910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:20:52,965-Speed 5184.34 samples/sec Loss 3.1894 LearningRate 0.0466 Epoch: 6 Global Step: 105920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:20:54,941-Speed 5185.18 samples/sec Loss 3.2517 LearningRate 0.0466 Epoch: 6 Global Step: 105930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:20:56,916-Speed 5186.72 samples/sec Loss 3.2066 LearningRate 0.0466 Epoch: 6 Global Step: 105940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:20:58,897-Speed 5169.56 samples/sec Loss 3.2198 LearningRate 0.0466 Epoch: 6 Global Step: 105950 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:21:00,886-Speed 5150.06 samples/sec Loss 3.3815 LearningRate 0.0466 Epoch: 6 Global Step: 105960 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:21:02,871-Speed 5163.60 samples/sec Loss 3.3384 LearningRate 0.0466 Epoch: 6 Global Step: 105970 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:21:04,853-Speed 5167.77 samples/sec Loss 3.1649 LearningRate 0.0466 Epoch: 6 Global Step: 105980 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:21:06,838-Speed 5161.02 samples/sec Loss 3.2641 LearningRate 0.0466 Epoch: 6 Global Step: 105990 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:21:08,852-Speed 5084.74 samples/sec Loss 3.2657 LearningRate 0.0466 Epoch: 6 Global Step: 106000 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:21:36,963-[lfw][106000]XNorm: 21.632347 Training: 2022-04-11 06:21:36,964-[lfw][106000]Accuracy-Flip: 0.99800+-0.00277 Training: 2022-04-11 06:21:36,964-[lfw][106000]Accuracy-Highest: 0.99817 Training: 2022-04-11 06:22:07,875-[cfp_fp][106000]XNorm: 19.942224 Training: 2022-04-11 06:22:07,876-[cfp_fp][106000]Accuracy-Flip: 0.98129+-0.00335 Training: 2022-04-11 06:22:07,876-[cfp_fp][106000]Accuracy-Highest: 0.98129 Training: 2022-04-11 06:22:34,529-[agedb_30][106000]XNorm: 21.624712 Training: 2022-04-11 06:22:34,530-[agedb_30][106000]Accuracy-Flip: 0.97667+-0.00615 Training: 2022-04-11 06:22:34,530-[agedb_30][106000]Accuracy-Highest: 0.97950 Training: 2022-04-11 06:22:36,556-Speed 116.76 samples/sec Loss 3.2317 LearningRate 0.0466 Epoch: 6 Global Step: 106010 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:22:38,532-Speed 5185.20 samples/sec Loss 3.2083 LearningRate 0.0466 Epoch: 6 Global Step: 106020 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:22:40,514-Speed 5168.22 samples/sec Loss 3.2028 LearningRate 0.0466 Epoch: 6 Global Step: 106030 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:22:42,482-Speed 5204.93 samples/sec Loss 3.2997 LearningRate 0.0466 Epoch: 6 Global Step: 106040 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:22:44,468-Speed 5158.27 samples/sec Loss 3.1387 LearningRate 0.0466 Epoch: 6 Global Step: 106050 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:22:46,496-Speed 5052.76 samples/sec Loss 3.1248 LearningRate 0.0466 Epoch: 6 Global Step: 106060 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:22:48,490-Speed 5136.46 samples/sec Loss 3.2559 LearningRate 0.0465 Epoch: 6 Global Step: 106070 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:22:50,482-Speed 5142.00 samples/sec Loss 3.1881 LearningRate 0.0465 Epoch: 6 Global Step: 106080 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:22:52,458-Speed 5184.34 samples/sec Loss 3.2123 LearningRate 0.0465 Epoch: 6 Global Step: 106090 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:22:54,455-Speed 5131.03 samples/sec Loss 3.2462 LearningRate 0.0465 Epoch: 6 Global Step: 106100 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:22:56,430-Speed 5185.49 samples/sec Loss 3.2545 LearningRate 0.0465 Epoch: 6 Global Step: 106110 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:22:58,415-Speed 5159.09 samples/sec Loss 3.1774 LearningRate 0.0465 Epoch: 6 Global Step: 106120 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:23:00,412-Speed 5131.59 samples/sec Loss 3.2085 LearningRate 0.0465 Epoch: 6 Global Step: 106130 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:23:02,391-Speed 5173.68 samples/sec Loss 3.1969 LearningRate 0.0465 Epoch: 6 Global Step: 106140 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:23:04,375-Speed 5164.73 samples/sec Loss 3.2582 LearningRate 0.0465 Epoch: 6 Global Step: 106150 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:23:06,363-Speed 5151.86 samples/sec Loss 3.2663 LearningRate 0.0465 Epoch: 6 Global Step: 106160 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:23:08,377-Speed 5087.53 samples/sec Loss 3.2148 LearningRate 0.0465 Epoch: 6 Global Step: 106170 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:23:10,362-Speed 5160.50 samples/sec Loss 3.2927 LearningRate 0.0465 Epoch: 6 Global Step: 106180 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:23:12,347-Speed 5159.35 samples/sec Loss 3.2867 LearningRate 0.0465 Epoch: 6 Global Step: 106190 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:23:14,347-Speed 5121.97 samples/sec Loss 3.3222 LearningRate 0.0465 Epoch: 6 Global Step: 106200 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:23:16,342-Speed 5134.90 samples/sec Loss 3.2965 LearningRate 0.0465 Epoch: 6 Global Step: 106210 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:23:18,322-Speed 5172.95 samples/sec Loss 3.3153 LearningRate 0.0465 Epoch: 6 Global Step: 106220 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-04-11 06:23:20,293-Speed 5196.13 samples/sec Loss 3.2812 LearningRate 0.0465 Epoch: 6 Global Step: 106230 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:23:22,274-Speed 5171.82 samples/sec Loss 3.2349 LearningRate 0.0465 Epoch: 6 Global Step: 106240 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:23:24,257-Speed 5164.72 samples/sec Loss 3.2580 LearningRate 0.0465 Epoch: 6 Global Step: 106250 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:23:26,254-Speed 5130.38 samples/sec Loss 3.2499 LearningRate 0.0465 Epoch: 6 Global Step: 106260 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:23:28,273-Speed 5074.98 samples/sec Loss 3.2567 LearningRate 0.0465 Epoch: 6 Global Step: 106270 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:23:30,259-Speed 5156.96 samples/sec Loss 3.3320 LearningRate 0.0465 Epoch: 6 Global Step: 106280 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:23:32,239-Speed 5173.03 samples/sec Loss 3.2734 LearningRate 0.0465 Epoch: 6 Global Step: 106290 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:23:34,252-Speed 5089.30 samples/sec Loss 3.2725 LearningRate 0.0465 Epoch: 6 Global Step: 106300 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:23:36,265-Speed 5087.72 samples/sec Loss 3.2395 LearningRate 0.0464 Epoch: 6 Global Step: 106310 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:23:38,255-Speed 5146.47 samples/sec Loss 3.1718 LearningRate 0.0464 Epoch: 6 Global Step: 106320 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:23:40,265-Speed 5095.85 samples/sec Loss 3.2414 LearningRate 0.0464 Epoch: 6 Global Step: 106330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:23:42,269-Speed 5112.24 samples/sec Loss 3.1960 LearningRate 0.0464 Epoch: 6 Global Step: 106340 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:23:44,253-Speed 5163.76 samples/sec Loss 3.3245 LearningRate 0.0464 Epoch: 6 Global Step: 106350 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:23:46,234-Speed 5168.79 samples/sec Loss 3.2583 LearningRate 0.0464 Epoch: 6 Global Step: 106360 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:23:48,236-Speed 5117.36 samples/sec Loss 3.2297 LearningRate 0.0464 Epoch: 6 Global Step: 106370 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:23:50,231-Speed 5136.96 samples/sec Loss 3.2904 LearningRate 0.0464 Epoch: 6 Global Step: 106380 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:23:52,229-Speed 5127.17 samples/sec Loss 3.2518 LearningRate 0.0464 Epoch: 6 Global Step: 106390 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:23:54,201-Speed 5193.30 samples/sec Loss 3.1752 LearningRate 0.0464 Epoch: 6 Global Step: 106400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:23:56,180-Speed 5175.77 samples/sec Loss 3.2324 LearningRate 0.0464 Epoch: 6 Global Step: 106410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:23:58,197-Speed 5078.01 samples/sec Loss 3.2652 LearningRate 0.0464 Epoch: 6 Global Step: 106420 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:24:00,202-Speed 5108.83 samples/sec Loss 3.2155 LearningRate 0.0464 Epoch: 6 Global Step: 106430 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:24:02,194-Speed 5142.43 samples/sec Loss 3.2637 LearningRate 0.0464 Epoch: 6 Global Step: 106440 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:24:04,183-Speed 5150.83 samples/sec Loss 3.3375 LearningRate 0.0464 Epoch: 6 Global Step: 106450 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:24:06,171-Speed 5150.96 samples/sec Loss 3.2255 LearningRate 0.0464 Epoch: 6 Global Step: 106460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:24:08,152-Speed 5171.57 samples/sec Loss 3.2409 LearningRate 0.0464 Epoch: 6 Global Step: 106470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:24:10,140-Speed 5152.03 samples/sec Loss 3.2420 LearningRate 0.0464 Epoch: 6 Global Step: 106480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:24:12,122-Speed 5169.10 samples/sec Loss 3.2936 LearningRate 0.0464 Epoch: 6 Global Step: 106490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:24:14,106-Speed 5162.24 samples/sec Loss 3.2631 LearningRate 0.0464 Epoch: 6 Global Step: 106500 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:24:16,088-Speed 5170.27 samples/sec Loss 3.2508 LearningRate 0.0464 Epoch: 6 Global Step: 106510 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:24:18,088-Speed 5119.91 samples/sec Loss 3.1770 LearningRate 0.0464 Epoch: 6 Global Step: 106520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:24:20,069-Speed 5171.78 samples/sec Loss 3.2867 LearningRate 0.0464 Epoch: 6 Global Step: 106530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:24:22,049-Speed 5172.40 samples/sec Loss 3.3462 LearningRate 0.0464 Epoch: 6 Global Step: 106540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:24:24,031-Speed 5168.00 samples/sec Loss 3.2681 LearningRate 0.0464 Epoch: 6 Global Step: 106550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:24:26,057-Speed 5057.54 samples/sec Loss 3.2271 LearningRate 0.0463 Epoch: 6 Global Step: 106560 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:24:28,070-Speed 5086.69 samples/sec Loss 3.1848 LearningRate 0.0463 Epoch: 6 Global Step: 106570 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:24:30,044-Speed 5190.65 samples/sec Loss 3.2106 LearningRate 0.0463 Epoch: 6 Global Step: 106580 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:24:32,020-Speed 5183.13 samples/sec Loss 3.1486 LearningRate 0.0463 Epoch: 6 Global Step: 106590 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:24:34,032-Speed 5093.19 samples/sec Loss 3.2273 LearningRate 0.0463 Epoch: 6 Global Step: 106600 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:24:36,009-Speed 5179.58 samples/sec Loss 3.2619 LearningRate 0.0463 Epoch: 6 Global Step: 106610 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:24:37,984-Speed 5187.63 samples/sec Loss 3.3063 LearningRate 0.0463 Epoch: 6 Global Step: 106620 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:24:39,956-Speed 5193.69 samples/sec Loss 3.2992 LearningRate 0.0463 Epoch: 6 Global Step: 106630 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:24:41,925-Speed 5203.18 samples/sec Loss 3.2959 LearningRate 0.0463 Epoch: 6 Global Step: 106640 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:24:43,905-Speed 5171.52 samples/sec Loss 3.3182 LearningRate 0.0463 Epoch: 6 Global Step: 106650 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:24:45,913-Speed 5102.16 samples/sec Loss 3.2090 LearningRate 0.0463 Epoch: 6 Global Step: 106660 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:24:47,894-Speed 5169.76 samples/sec Loss 3.2662 LearningRate 0.0463 Epoch: 6 Global Step: 106670 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:24:49,874-Speed 5175.38 samples/sec Loss 3.2094 LearningRate 0.0463 Epoch: 6 Global Step: 106680 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:24:51,883-Speed 5098.09 samples/sec Loss 3.2467 LearningRate 0.0463 Epoch: 6 Global Step: 106690 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:24:53,873-Speed 5147.71 samples/sec Loss 3.3023 LearningRate 0.0463 Epoch: 6 Global Step: 106700 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:24:55,832-Speed 5228.84 samples/sec Loss 3.2203 LearningRate 0.0463 Epoch: 6 Global Step: 106710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:24:57,831-Speed 5126.00 samples/sec Loss 3.2315 LearningRate 0.0463 Epoch: 6 Global Step: 106720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:24:59,816-Speed 5159.19 samples/sec Loss 3.2663 LearningRate 0.0463 Epoch: 6 Global Step: 106730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:25:01,800-Speed 5162.01 samples/sec Loss 3.3014 LearningRate 0.0463 Epoch: 6 Global Step: 106740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:25:03,785-Speed 5161.43 samples/sec Loss 3.3389 LearningRate 0.0463 Epoch: 6 Global Step: 106750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:25:05,767-Speed 5167.10 samples/sec Loss 3.2234 LearningRate 0.0463 Epoch: 6 Global Step: 106760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:25:07,743-Speed 5183.86 samples/sec Loss 3.2840 LearningRate 0.0463 Epoch: 6 Global Step: 106770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:25:09,731-Speed 5152.83 samples/sec Loss 3.1748 LearningRate 0.0463 Epoch: 6 Global Step: 106780 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:25:11,711-Speed 5173.33 samples/sec Loss 3.2634 LearningRate 0.0463 Epoch: 6 Global Step: 106790 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:25:13,694-Speed 5166.20 samples/sec Loss 3.2854 LearningRate 0.0462 Epoch: 6 Global Step: 106800 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:25:15,677-Speed 5164.64 samples/sec Loss 3.2894 LearningRate 0.0462 Epoch: 6 Global Step: 106810 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:25:17,656-Speed 5177.01 samples/sec Loss 3.1487 LearningRate 0.0462 Epoch: 6 Global Step: 106820 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:25:19,644-Speed 5153.79 samples/sec Loss 3.2720 LearningRate 0.0462 Epoch: 6 Global Step: 106830 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:25:21,623-Speed 5174.13 samples/sec Loss 3.2310 LearningRate 0.0462 Epoch: 6 Global Step: 106840 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:25:23,606-Speed 5166.62 samples/sec Loss 3.2336 LearningRate 0.0462 Epoch: 6 Global Step: 106850 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:25:25,589-Speed 5164.61 samples/sec Loss 3.1650 LearningRate 0.0462 Epoch: 6 Global Step: 106860 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:25:27,586-Speed 5147.95 samples/sec Loss 3.2140 LearningRate 0.0462 Epoch: 6 Global Step: 106870 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:25:29,560-Speed 5188.49 samples/sec Loss 3.2165 LearningRate 0.0462 Epoch: 6 Global Step: 106880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:25:31,532-Speed 5194.48 samples/sec Loss 3.2465 LearningRate 0.0462 Epoch: 6 Global Step: 106890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:25:33,511-Speed 5175.20 samples/sec Loss 3.2067 LearningRate 0.0462 Epoch: 6 Global Step: 106900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:25:35,500-Speed 5152.42 samples/sec Loss 3.2604 LearningRate 0.0462 Epoch: 6 Global Step: 106910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:25:37,487-Speed 5154.39 samples/sec Loss 3.2622 LearningRate 0.0462 Epoch: 6 Global Step: 106920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:25:39,465-Speed 5179.68 samples/sec Loss 3.2801 LearningRate 0.0462 Epoch: 6 Global Step: 106930 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:25:41,458-Speed 5138.06 samples/sec Loss 3.1885 LearningRate 0.0462 Epoch: 6 Global Step: 106940 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:25:43,430-Speed 5193.75 samples/sec Loss 3.1874 LearningRate 0.0462 Epoch: 6 Global Step: 106950 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:25:45,413-Speed 5167.41 samples/sec Loss 3.2342 LearningRate 0.0462 Epoch: 6 Global Step: 106960 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:25:47,424-Speed 5092.61 samples/sec Loss 3.2212 LearningRate 0.0462 Epoch: 6 Global Step: 106970 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:25:49,407-Speed 5166.13 samples/sec Loss 3.2751 LearningRate 0.0462 Epoch: 6 Global Step: 106980 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:25:51,389-Speed 5167.63 samples/sec Loss 3.2313 LearningRate 0.0462 Epoch: 6 Global Step: 106990 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:25:53,377-Speed 5151.75 samples/sec Loss 3.2517 LearningRate 0.0462 Epoch: 6 Global Step: 107000 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:25:55,374-Speed 5131.35 samples/sec Loss 3.2672 LearningRate 0.0462 Epoch: 6 Global Step: 107010 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:25:57,357-Speed 5165.61 samples/sec Loss 3.2458 LearningRate 0.0462 Epoch: 6 Global Step: 107020 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:25:59,347-Speed 5146.61 samples/sec Loss 3.2404 LearningRate 0.0462 Epoch: 6 Global Step: 107030 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:26:01,343-Speed 5131.61 samples/sec Loss 3.2010 LearningRate 0.0462 Epoch: 6 Global Step: 107040 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:26:03,348-Speed 5108.23 samples/sec Loss 3.3440 LearningRate 0.0461 Epoch: 6 Global Step: 107050 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:26:05,323-Speed 5187.02 samples/sec Loss 3.2202 LearningRate 0.0461 Epoch: 6 Global Step: 107060 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:26:07,293-Speed 5200.06 samples/sec Loss 3.2094 LearningRate 0.0461 Epoch: 6 Global Step: 107070 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:26:09,267-Speed 5190.04 samples/sec Loss 3.2755 LearningRate 0.0461 Epoch: 6 Global Step: 107080 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:26:11,256-Speed 5150.30 samples/sec Loss 3.2605 LearningRate 0.0461 Epoch: 6 Global Step: 107090 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:26:13,237-Speed 5168.73 samples/sec Loss 3.2373 LearningRate 0.0461 Epoch: 6 Global Step: 107100 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:26:15,215-Speed 5178.53 samples/sec Loss 3.2475 LearningRate 0.0461 Epoch: 6 Global Step: 107110 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:26:17,211-Speed 5133.64 samples/sec Loss 3.2697 LearningRate 0.0461 Epoch: 6 Global Step: 107120 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:26:19,186-Speed 5187.49 samples/sec Loss 3.2904 LearningRate 0.0461 Epoch: 6 Global Step: 107130 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:26:21,159-Speed 5191.31 samples/sec Loss 3.2463 LearningRate 0.0461 Epoch: 6 Global Step: 107140 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:26:23,182-Speed 5061.98 samples/sec Loss 3.1933 LearningRate 0.0461 Epoch: 6 Global Step: 107150 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:26:25,162-Speed 5173.43 samples/sec Loss 3.2658 LearningRate 0.0461 Epoch: 6 Global Step: 107160 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:26:27,164-Speed 5117.65 samples/sec Loss 3.2497 LearningRate 0.0461 Epoch: 6 Global Step: 107170 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:26:29,145-Speed 5171.55 samples/sec Loss 3.2766 LearningRate 0.0461 Epoch: 6 Global Step: 107180 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:26:31,118-Speed 5190.56 samples/sec Loss 3.1860 LearningRate 0.0461 Epoch: 6 Global Step: 107190 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:26:33,097-Speed 5177.04 samples/sec Loss 3.3183 LearningRate 0.0461 Epoch: 6 Global Step: 107200 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:26:35,082-Speed 5158.46 samples/sec Loss 3.1658 LearningRate 0.0461 Epoch: 6 Global Step: 107210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:26:37,088-Speed 5108.66 samples/sec Loss 3.2711 LearningRate 0.0461 Epoch: 6 Global Step: 107220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:26:39,077-Speed 5150.47 samples/sec Loss 3.2659 LearningRate 0.0461 Epoch: 6 Global Step: 107230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:26:41,066-Speed 5149.11 samples/sec Loss 3.2740 LearningRate 0.0461 Epoch: 6 Global Step: 107240 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:26:43,071-Speed 5108.30 samples/sec Loss 3.2835 LearningRate 0.0461 Epoch: 6 Global Step: 107250 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:26:45,057-Speed 5157.37 samples/sec Loss 3.2513 LearningRate 0.0461 Epoch: 6 Global Step: 107260 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:26:47,058-Speed 5118.46 samples/sec Loss 3.2295 LearningRate 0.0461 Epoch: 6 Global Step: 107270 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:26:49,044-Speed 5159.83 samples/sec Loss 3.2433 LearningRate 0.0461 Epoch: 6 Global Step: 107280 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:26:51,016-Speed 5192.54 samples/sec Loss 3.2642 LearningRate 0.0460 Epoch: 6 Global Step: 107290 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:26:53,013-Speed 5130.01 samples/sec Loss 3.2618 LearningRate 0.0460 Epoch: 6 Global Step: 107300 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:26:54,989-Speed 5184.42 samples/sec Loss 3.2341 LearningRate 0.0460 Epoch: 6 Global Step: 107310 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:26:56,972-Speed 5166.24 samples/sec Loss 3.3005 LearningRate 0.0460 Epoch: 6 Global Step: 107320 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:26:58,943-Speed 5196.47 samples/sec Loss 3.2664 LearningRate 0.0460 Epoch: 6 Global Step: 107330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:27:00,948-Speed 5108.54 samples/sec Loss 3.3546 LearningRate 0.0460 Epoch: 6 Global Step: 107340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:27:02,920-Speed 5196.02 samples/sec Loss 3.2280 LearningRate 0.0460 Epoch: 6 Global Step: 107350 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:27:04,923-Speed 5112.58 samples/sec Loss 3.2859 LearningRate 0.0460 Epoch: 6 Global Step: 107360 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:27:06,902-Speed 5177.48 samples/sec Loss 3.3301 LearningRate 0.0460 Epoch: 6 Global Step: 107370 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:27:08,878-Speed 5182.14 samples/sec Loss 3.2515 LearningRate 0.0460 Epoch: 6 Global Step: 107380 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:27:10,875-Speed 5128.88 samples/sec Loss 3.2367 LearningRate 0.0460 Epoch: 6 Global Step: 107390 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:27:12,864-Speed 5149.88 samples/sec Loss 3.2135 LearningRate 0.0460 Epoch: 6 Global Step: 107400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:27:14,886-Speed 5068.09 samples/sec Loss 3.2582 LearningRate 0.0460 Epoch: 6 Global Step: 107410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:27:16,866-Speed 5172.64 samples/sec Loss 3.2005 LearningRate 0.0460 Epoch: 6 Global Step: 107420 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:27:18,858-Speed 5143.12 samples/sec Loss 3.2156 LearningRate 0.0460 Epoch: 6 Global Step: 107430 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:27:20,838-Speed 5173.27 samples/sec Loss 3.2517 LearningRate 0.0460 Epoch: 6 Global Step: 107440 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:27:22,805-Speed 5207.18 samples/sec Loss 3.2239 LearningRate 0.0460 Epoch: 6 Global Step: 107450 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:27:24,789-Speed 5162.48 samples/sec Loss 3.2866 LearningRate 0.0460 Epoch: 6 Global Step: 107460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:27:26,764-Speed 5187.57 samples/sec Loss 3.3041 LearningRate 0.0460 Epoch: 6 Global Step: 107470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:27:28,752-Speed 5151.61 samples/sec Loss 3.3048 LearningRate 0.0460 Epoch: 6 Global Step: 107480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:27:30,729-Speed 5183.63 samples/sec Loss 3.2450 LearningRate 0.0460 Epoch: 6 Global Step: 107490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:27:32,714-Speed 5157.78 samples/sec Loss 3.3044 LearningRate 0.0460 Epoch: 6 Global Step: 107500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:27:34,702-Speed 5154.68 samples/sec Loss 3.1988 LearningRate 0.0460 Epoch: 6 Global Step: 107510 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:27:36,719-Speed 5078.58 samples/sec Loss 3.2477 LearningRate 0.0460 Epoch: 6 Global Step: 107520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:27:38,710-Speed 5143.81 samples/sec Loss 3.3005 LearningRate 0.0460 Epoch: 6 Global Step: 107530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:27:40,686-Speed 5186.20 samples/sec Loss 3.2080 LearningRate 0.0459 Epoch: 6 Global Step: 107540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:27:42,659-Speed 5189.96 samples/sec Loss 3.2900 LearningRate 0.0459 Epoch: 6 Global Step: 107550 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:27:44,634-Speed 5186.34 samples/sec Loss 3.2319 LearningRate 0.0459 Epoch: 6 Global Step: 107560 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:27:46,641-Speed 5103.32 samples/sec Loss 3.2114 LearningRate 0.0459 Epoch: 6 Global Step: 107570 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:27:48,644-Speed 5115.62 samples/sec Loss 3.3259 LearningRate 0.0459 Epoch: 6 Global Step: 107580 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:27:50,641-Speed 5129.14 samples/sec Loss 3.3288 LearningRate 0.0459 Epoch: 6 Global Step: 107590 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:27:52,617-Speed 5181.91 samples/sec Loss 3.3350 LearningRate 0.0459 Epoch: 6 Global Step: 107600 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:27:54,595-Speed 5179.16 samples/sec Loss 3.2437 LearningRate 0.0459 Epoch: 6 Global Step: 107610 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:27:56,573-Speed 5180.24 samples/sec Loss 3.2410 LearningRate 0.0459 Epoch: 6 Global Step: 107620 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:27:58,566-Speed 5138.97 samples/sec Loss 3.2185 LearningRate 0.0459 Epoch: 6 Global Step: 107630 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:28:00,559-Speed 5140.60 samples/sec Loss 3.2601 LearningRate 0.0459 Epoch: 6 Global Step: 107640 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:28:02,536-Speed 5180.87 samples/sec Loss 3.2492 LearningRate 0.0459 Epoch: 6 Global Step: 107650 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:28:04,521-Speed 5159.48 samples/sec Loss 3.2680 LearningRate 0.0459 Epoch: 6 Global Step: 107660 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:28:06,506-Speed 5161.05 samples/sec Loss 3.2924 LearningRate 0.0459 Epoch: 6 Global Step: 107670 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:28:08,499-Speed 5139.71 samples/sec Loss 3.2288 LearningRate 0.0459 Epoch: 6 Global Step: 107680 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:28:10,488-Speed 5150.57 samples/sec Loss 3.1950 LearningRate 0.0459 Epoch: 6 Global Step: 107690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:28:12,503-Speed 5083.11 samples/sec Loss 3.2662 LearningRate 0.0459 Epoch: 6 Global Step: 107700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:28:14,527-Speed 5059.92 samples/sec Loss 3.2229 LearningRate 0.0459 Epoch: 6 Global Step: 107710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:28:16,506-Speed 5176.60 samples/sec Loss 3.2494 LearningRate 0.0459 Epoch: 6 Global Step: 107720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:28:18,475-Speed 5202.78 samples/sec Loss 3.3164 LearningRate 0.0459 Epoch: 6 Global Step: 107730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:28:20,448-Speed 5193.31 samples/sec Loss 3.3072 LearningRate 0.0459 Epoch: 6 Global Step: 107740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:28:22,446-Speed 5125.23 samples/sec Loss 3.1454 LearningRate 0.0459 Epoch: 6 Global Step: 107750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:28:24,416-Speed 5199.95 samples/sec Loss 3.2095 LearningRate 0.0459 Epoch: 6 Global Step: 107760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:28:26,386-Speed 5199.65 samples/sec Loss 3.2276 LearningRate 0.0459 Epoch: 6 Global Step: 107770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:28:28,359-Speed 5192.07 samples/sec Loss 3.2233 LearningRate 0.0459 Epoch: 6 Global Step: 107780 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:28:30,345-Speed 5158.83 samples/sec Loss 3.3141 LearningRate 0.0458 Epoch: 6 Global Step: 107790 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:28:32,324-Speed 5174.02 samples/sec Loss 3.2589 LearningRate 0.0458 Epoch: 6 Global Step: 107800 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:28:34,295-Speed 5198.27 samples/sec Loss 3.2075 LearningRate 0.0458 Epoch: 6 Global Step: 107810 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:28:36,281-Speed 5156.39 samples/sec Loss 3.2949 LearningRate 0.0458 Epoch: 6 Global Step: 107820 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:28:38,280-Speed 5124.32 samples/sec Loss 3.2003 LearningRate 0.0458 Epoch: 6 Global Step: 107830 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:28:40,277-Speed 5128.97 samples/sec Loss 3.2615 LearningRate 0.0458 Epoch: 6 Global Step: 107840 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:28:42,267-Speed 5149.38 samples/sec Loss 3.2188 LearningRate 0.0458 Epoch: 6 Global Step: 107850 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:28:44,256-Speed 5151.04 samples/sec Loss 3.1607 LearningRate 0.0458 Epoch: 6 Global Step: 107860 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:28:46,257-Speed 5118.71 samples/sec Loss 3.2390 LearningRate 0.0458 Epoch: 6 Global Step: 107870 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:28:48,264-Speed 5102.33 samples/sec Loss 3.3637 LearningRate 0.0458 Epoch: 6 Global Step: 107880 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:28:50,231-Speed 5208.39 samples/sec Loss 3.2049 LearningRate 0.0458 Epoch: 6 Global Step: 107890 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:28:52,201-Speed 5198.83 samples/sec Loss 3.2956 LearningRate 0.0458 Epoch: 6 Global Step: 107900 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:28:54,183-Speed 5167.50 samples/sec Loss 3.1970 LearningRate 0.0458 Epoch: 6 Global Step: 107910 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:28:56,160-Speed 5181.29 samples/sec Loss 3.2842 LearningRate 0.0458 Epoch: 6 Global Step: 107920 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:28:58,136-Speed 5186.30 samples/sec Loss 3.3422 LearningRate 0.0458 Epoch: 6 Global Step: 107930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:29:00,121-Speed 5160.47 samples/sec Loss 3.3418 LearningRate 0.0458 Epoch: 6 Global Step: 107940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:29:02,105-Speed 5160.47 samples/sec Loss 3.2711 LearningRate 0.0458 Epoch: 6 Global Step: 107950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:29:04,079-Speed 5191.79 samples/sec Loss 3.1879 LearningRate 0.0458 Epoch: 6 Global Step: 107960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:29:06,072-Speed 5138.72 samples/sec Loss 3.2814 LearningRate 0.0458 Epoch: 6 Global Step: 107970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:29:08,044-Speed 5194.68 samples/sec Loss 3.1658 LearningRate 0.0458 Epoch: 6 Global Step: 107980 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:29:10,023-Speed 5175.66 samples/sec Loss 3.1636 LearningRate 0.0458 Epoch: 6 Global Step: 107990 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:29:12,015-Speed 5143.36 samples/sec Loss 3.2194 LearningRate 0.0458 Epoch: 6 Global Step: 108000 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:29:38,462-[lfw][108000]XNorm: 21.555559 Training: 2022-04-11 06:29:38,463-[lfw][108000]Accuracy-Flip: 0.99733+-0.00281 Training: 2022-04-11 06:29:38,463-[lfw][108000]Accuracy-Highest: 0.99817 Training: 2022-04-11 06:30:09,330-[cfp_fp][108000]XNorm: 19.923181 Training: 2022-04-11 06:30:09,331-[cfp_fp][108000]Accuracy-Flip: 0.97886+-0.00578 Training: 2022-04-11 06:30:09,331-[cfp_fp][108000]Accuracy-Highest: 0.98129 Training: 2022-04-11 06:30:35,850-[agedb_30][108000]XNorm: 21.652081 Training: 2022-04-11 06:30:35,851-[agedb_30][108000]Accuracy-Flip: 0.97933+-0.00847 Training: 2022-04-11 06:30:35,851-[agedb_30][108000]Accuracy-Highest: 0.97950 Training: 2022-04-11 06:30:37,845-Speed 119.31 samples/sec Loss 3.2723 LearningRate 0.0458 Epoch: 6 Global Step: 108010 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:30:39,809-Speed 5214.80 samples/sec Loss 3.2385 LearningRate 0.0458 Epoch: 6 Global Step: 108020 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:30:41,780-Speed 5196.63 samples/sec Loss 3.2907 LearningRate 0.0457 Epoch: 6 Global Step: 108030 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:30:43,747-Speed 5207.04 samples/sec Loss 3.2134 LearningRate 0.0457 Epoch: 6 Global Step: 108040 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:30:45,719-Speed 5195.54 samples/sec Loss 3.2381 LearningRate 0.0457 Epoch: 6 Global Step: 108050 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:30:47,684-Speed 5211.91 samples/sec Loss 3.2693 LearningRate 0.0457 Epoch: 6 Global Step: 108060 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:30:49,661-Speed 5182.45 samples/sec Loss 3.2988 LearningRate 0.0457 Epoch: 6 Global Step: 108070 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:30:51,647-Speed 5158.64 samples/sec Loss 3.2158 LearningRate 0.0457 Epoch: 6 Global Step: 108080 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:30:53,611-Speed 5214.78 samples/sec Loss 3.2671 LearningRate 0.0457 Epoch: 6 Global Step: 108090 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:30:55,599-Speed 5153.51 samples/sec Loss 3.2034 LearningRate 0.0457 Epoch: 6 Global Step: 108100 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:30:57,584-Speed 5160.27 samples/sec Loss 3.2912 LearningRate 0.0457 Epoch: 6 Global Step: 108110 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:30:59,554-Speed 5197.63 samples/sec Loss 3.2340 LearningRate 0.0457 Epoch: 6 Global Step: 108120 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:31:01,523-Speed 5202.48 samples/sec Loss 3.3150 LearningRate 0.0457 Epoch: 6 Global Step: 108130 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:31:03,499-Speed 5183.60 samples/sec Loss 3.2440 LearningRate 0.0457 Epoch: 6 Global Step: 108140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:31:05,473-Speed 5190.93 samples/sec Loss 3.3103 LearningRate 0.0457 Epoch: 6 Global Step: 108150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:31:07,444-Speed 5196.83 samples/sec Loss 3.2625 LearningRate 0.0457 Epoch: 6 Global Step: 108160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:31:09,438-Speed 5136.82 samples/sec Loss 3.2786 LearningRate 0.0457 Epoch: 6 Global Step: 108170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:31:11,420-Speed 5168.73 samples/sec Loss 3.2975 LearningRate 0.0457 Epoch: 6 Global Step: 108180 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:31:13,394-Speed 5189.52 samples/sec Loss 3.3120 LearningRate 0.0457 Epoch: 6 Global Step: 108190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:31:15,382-Speed 5151.16 samples/sec Loss 3.1345 LearningRate 0.0457 Epoch: 6 Global Step: 108200 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:31:17,350-Speed 5205.76 samples/sec Loss 3.2858 LearningRate 0.0457 Epoch: 6 Global Step: 108210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:31:19,340-Speed 5147.61 samples/sec Loss 3.3141 LearningRate 0.0457 Epoch: 6 Global Step: 108220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:31:21,324-Speed 5162.48 samples/sec Loss 3.3036 LearningRate 0.0457 Epoch: 6 Global Step: 108230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:31:23,303-Speed 5176.70 samples/sec Loss 3.2097 LearningRate 0.0457 Epoch: 6 Global Step: 108240 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:31:25,293-Speed 5147.65 samples/sec Loss 3.2634 LearningRate 0.0457 Epoch: 6 Global Step: 108250 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:31:27,277-Speed 5162.99 samples/sec Loss 3.2279 LearningRate 0.0457 Epoch: 6 Global Step: 108260 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:31:29,261-Speed 5161.65 samples/sec Loss 3.2557 LearningRate 0.0457 Epoch: 6 Global Step: 108270 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:31:31,241-Speed 5174.65 samples/sec Loss 3.2088 LearningRate 0.0456 Epoch: 6 Global Step: 108280 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:31:33,248-Speed 5103.99 samples/sec Loss 3.2774 LearningRate 0.0456 Epoch: 6 Global Step: 108290 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:31:35,255-Speed 5103.42 samples/sec Loss 3.3031 LearningRate 0.0456 Epoch: 6 Global Step: 108300 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:31:37,290-Speed 5034.12 samples/sec Loss 3.2033 LearningRate 0.0456 Epoch: 6 Global Step: 108310 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:31:39,281-Speed 5143.49 samples/sec Loss 3.2471 LearningRate 0.0456 Epoch: 6 Global Step: 108320 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:31:41,263-Speed 5169.44 samples/sec Loss 3.2991 LearningRate 0.0456 Epoch: 6 Global Step: 108330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:31:43,236-Speed 5192.34 samples/sec Loss 3.2682 LearningRate 0.0456 Epoch: 6 Global Step: 108340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:31:45,226-Speed 5147.98 samples/sec Loss 3.3065 LearningRate 0.0456 Epoch: 6 Global Step: 108350 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:31:47,205-Speed 5176.01 samples/sec Loss 3.2964 LearningRate 0.0456 Epoch: 6 Global Step: 108360 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:31:49,213-Speed 5099.75 samples/sec Loss 3.2762 LearningRate 0.0456 Epoch: 6 Global Step: 108370 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:31:51,200-Speed 5155.20 samples/sec Loss 3.2894 LearningRate 0.0456 Epoch: 6 Global Step: 108380 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:31:53,193-Speed 5140.99 samples/sec Loss 3.2675 LearningRate 0.0456 Epoch: 6 Global Step: 108390 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:31:55,175-Speed 5167.75 samples/sec Loss 3.2794 LearningRate 0.0456 Epoch: 6 Global Step: 108400 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:31:57,169-Speed 5136.12 samples/sec Loss 3.2414 LearningRate 0.0456 Epoch: 6 Global Step: 108410 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:31:59,157-Speed 5152.55 samples/sec Loss 3.3007 LearningRate 0.0456 Epoch: 6 Global Step: 108420 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:32:01,147-Speed 5147.41 samples/sec Loss 3.2732 LearningRate 0.0456 Epoch: 6 Global Step: 108430 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:32:03,144-Speed 5131.00 samples/sec Loss 3.2564 LearningRate 0.0456 Epoch: 6 Global Step: 108440 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:32:05,150-Speed 5104.36 samples/sec Loss 3.3266 LearningRate 0.0456 Epoch: 6 Global Step: 108450 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:32:07,142-Speed 5142.29 samples/sec Loss 3.2819 LearningRate 0.0456 Epoch: 6 Global Step: 108460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:32:09,123-Speed 5172.64 samples/sec Loss 3.3174 LearningRate 0.0456 Epoch: 6 Global Step: 108470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:32:11,114-Speed 5143.86 samples/sec Loss 3.1923 LearningRate 0.0456 Epoch: 6 Global Step: 108480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:32:13,138-Speed 5061.02 samples/sec Loss 3.2010 LearningRate 0.0456 Epoch: 6 Global Step: 108490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:32:15,136-Speed 5128.68 samples/sec Loss 3.3164 LearningRate 0.0456 Epoch: 6 Global Step: 108500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:32:17,118-Speed 5166.64 samples/sec Loss 3.3301 LearningRate 0.0456 Epoch: 6 Global Step: 108510 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:32:19,095-Speed 5181.33 samples/sec Loss 3.2308 LearningRate 0.0456 Epoch: 6 Global Step: 108520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:32:21,081-Speed 5157.81 samples/sec Loss 3.2246 LearningRate 0.0455 Epoch: 6 Global Step: 108530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:32:23,083-Speed 5116.05 samples/sec Loss 3.2415 LearningRate 0.0455 Epoch: 6 Global Step: 108540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:32:25,076-Speed 5141.67 samples/sec Loss 3.2889 LearningRate 0.0455 Epoch: 6 Global Step: 108550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:32:27,064-Speed 5151.95 samples/sec Loss 3.2899 LearningRate 0.0455 Epoch: 6 Global Step: 108560 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:32:29,042-Speed 5178.81 samples/sec Loss 3.2369 LearningRate 0.0455 Epoch: 6 Global Step: 108570 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:32:31,047-Speed 5107.21 samples/sec Loss 3.2752 LearningRate 0.0455 Epoch: 6 Global Step: 108580 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:32:33,026-Speed 5176.77 samples/sec Loss 3.2330 LearningRate 0.0455 Epoch: 6 Global Step: 108590 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:32:35,000-Speed 5191.58 samples/sec Loss 3.2418 LearningRate 0.0455 Epoch: 6 Global Step: 108600 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:32:36,975-Speed 5184.76 samples/sec Loss 3.1858 LearningRate 0.0455 Epoch: 6 Global Step: 108610 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:32:38,961-Speed 5158.53 samples/sec Loss 3.2613 LearningRate 0.0455 Epoch: 6 Global Step: 108620 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:32:40,941-Speed 5172.47 samples/sec Loss 3.2288 LearningRate 0.0455 Epoch: 6 Global Step: 108630 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:32:42,913-Speed 5194.25 samples/sec Loss 3.2485 LearningRate 0.0455 Epoch: 6 Global Step: 108640 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:32:44,894-Speed 5171.21 samples/sec Loss 3.2688 LearningRate 0.0455 Epoch: 6 Global Step: 108650 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:32:46,880-Speed 5157.57 samples/sec Loss 3.2476 LearningRate 0.0455 Epoch: 6 Global Step: 108660 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:32:48,858-Speed 5178.71 samples/sec Loss 3.1751 LearningRate 0.0455 Epoch: 6 Global Step: 108670 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-11 06:32:50,855-Speed 5130.74 samples/sec Loss 3.1493 LearningRate 0.0455 Epoch: 6 Global Step: 108680 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-11 06:32:52,833-Speed 5177.81 samples/sec Loss 3.2728 LearningRate 0.0455 Epoch: 6 Global Step: 108690 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-11 06:32:54,816-Speed 5165.95 samples/sec Loss 3.1809 LearningRate 0.0455 Epoch: 6 Global Step: 108700 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-11 06:32:56,800-Speed 5164.25 samples/sec Loss 3.2044 LearningRate 0.0455 Epoch: 6 Global Step: 108710 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-11 06:32:58,787-Speed 5155.48 samples/sec Loss 3.2937 LearningRate 0.0455 Epoch: 6 Global Step: 108720 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-11 06:33:00,788-Speed 5118.99 samples/sec Loss 3.2478 LearningRate 0.0455 Epoch: 6 Global Step: 108730 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-11 06:33:02,809-Speed 5067.67 samples/sec Loss 3.2224 LearningRate 0.0455 Epoch: 6 Global Step: 108740 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-11 06:33:04,793-Speed 5163.74 samples/sec Loss 3.2430 LearningRate 0.0455 Epoch: 6 Global Step: 108750 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-11 06:33:06,766-Speed 5190.60 samples/sec Loss 3.2088 LearningRate 0.0455 Epoch: 6 Global Step: 108760 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-04-11 06:33:08,740-Speed 5189.04 samples/sec Loss 3.2423 LearningRate 0.0454 Epoch: 6 Global Step: 108770 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:33:10,739-Speed 5124.74 samples/sec Loss 3.3093 LearningRate 0.0454 Epoch: 6 Global Step: 108780 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:33:12,741-Speed 5116.93 samples/sec Loss 3.2991 LearningRate 0.0454 Epoch: 6 Global Step: 108790 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:33:14,729-Speed 5151.75 samples/sec Loss 3.2920 LearningRate 0.0454 Epoch: 6 Global Step: 108800 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:33:16,773-Speed 5012.31 samples/sec Loss 3.3261 LearningRate 0.0454 Epoch: 6 Global Step: 108810 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:33:18,744-Speed 5196.27 samples/sec Loss 3.2114 LearningRate 0.0454 Epoch: 6 Global Step: 108820 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:33:20,752-Speed 5102.35 samples/sec Loss 3.1969 LearningRate 0.0454 Epoch: 6 Global Step: 108830 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:33:22,737-Speed 5161.37 samples/sec Loss 3.2248 LearningRate 0.0454 Epoch: 6 Global Step: 108840 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:33:24,741-Speed 5110.16 samples/sec Loss 3.2089 LearningRate 0.0454 Epoch: 6 Global Step: 108850 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:33:26,719-Speed 5179.06 samples/sec Loss 3.2746 LearningRate 0.0454 Epoch: 6 Global Step: 108860 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:33:28,698-Speed 5174.97 samples/sec Loss 3.3026 LearningRate 0.0454 Epoch: 6 Global Step: 108870 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:33:30,669-Speed 5195.80 samples/sec Loss 3.2505 LearningRate 0.0454 Epoch: 6 Global Step: 108880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:33:32,642-Speed 5191.98 samples/sec Loss 3.2419 LearningRate 0.0454 Epoch: 6 Global Step: 108890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:33:34,630-Speed 5154.29 samples/sec Loss 3.1854 LearningRate 0.0454 Epoch: 6 Global Step: 108900 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:33:36,612-Speed 5168.38 samples/sec Loss 3.2589 LearningRate 0.0454 Epoch: 6 Global Step: 108910 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:33:38,591-Speed 5175.04 samples/sec Loss 3.3009 LearningRate 0.0454 Epoch: 6 Global Step: 108920 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:33:40,565-Speed 5189.00 samples/sec Loss 3.2545 LearningRate 0.0454 Epoch: 6 Global Step: 108930 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:33:42,539-Speed 5189.34 samples/sec Loss 3.3015 LearningRate 0.0454 Epoch: 6 Global Step: 108940 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:33:44,523-Speed 5163.25 samples/sec Loss 3.2818 LearningRate 0.0454 Epoch: 6 Global Step: 108950 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:33:46,514-Speed 5145.33 samples/sec Loss 3.2368 LearningRate 0.0454 Epoch: 6 Global Step: 108960 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:33:48,499-Speed 5160.06 samples/sec Loss 3.2619 LearningRate 0.0454 Epoch: 6 Global Step: 108970 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:33:50,522-Speed 5063.46 samples/sec Loss 3.2085 LearningRate 0.0454 Epoch: 6 Global Step: 108980 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:33:52,500-Speed 5178.40 samples/sec Loss 3.2577 LearningRate 0.0454 Epoch: 6 Global Step: 108990 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:33:54,491-Speed 5145.61 samples/sec Loss 3.2478 LearningRate 0.0454 Epoch: 6 Global Step: 109000 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:33:56,484-Speed 5140.82 samples/sec Loss 3.2236 LearningRate 0.0454 Epoch: 6 Global Step: 109010 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:33:58,467-Speed 5165.08 samples/sec Loss 3.2167 LearningRate 0.0453 Epoch: 6 Global Step: 109020 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:34:00,448-Speed 5170.79 samples/sec Loss 3.2647 LearningRate 0.0453 Epoch: 6 Global Step: 109030 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:34:02,449-Speed 5120.11 samples/sec Loss 3.3443 LearningRate 0.0453 Epoch: 6 Global Step: 109040 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:34:04,469-Speed 5070.63 samples/sec Loss 3.3323 LearningRate 0.0453 Epoch: 6 Global Step: 109050 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:34:06,446-Speed 5179.87 samples/sec Loss 3.2703 LearningRate 0.0453 Epoch: 6 Global Step: 109060 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:34:08,422-Speed 5185.90 samples/sec Loss 3.2230 LearningRate 0.0453 Epoch: 6 Global Step: 109070 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:34:10,403-Speed 5168.98 samples/sec Loss 3.3313 LearningRate 0.0453 Epoch: 6 Global Step: 109080 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:34:12,381-Speed 5179.30 samples/sec Loss 3.2343 LearningRate 0.0453 Epoch: 6 Global Step: 109090 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:34:14,385-Speed 5109.64 samples/sec Loss 3.2131 LearningRate 0.0453 Epoch: 6 Global Step: 109100 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:34:16,362-Speed 5182.05 samples/sec Loss 3.2484 LearningRate 0.0453 Epoch: 6 Global Step: 109110 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:34:18,342-Speed 5175.36 samples/sec Loss 3.2817 LearningRate 0.0453 Epoch: 6 Global Step: 109120 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:34:20,317-Speed 5186.17 samples/sec Loss 3.2758 LearningRate 0.0453 Epoch: 6 Global Step: 109130 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:34:22,324-Speed 5103.86 samples/sec Loss 3.1534 LearningRate 0.0453 Epoch: 6 Global Step: 109140 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:34:24,348-Speed 5061.08 samples/sec Loss 3.2016 LearningRate 0.0453 Epoch: 6 Global Step: 109150 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:34:26,343-Speed 5134.19 samples/sec Loss 3.2462 LearningRate 0.0453 Epoch: 6 Global Step: 109160 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:34:28,346-Speed 5113.91 samples/sec Loss 3.2167 LearningRate 0.0453 Epoch: 6 Global Step: 109170 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:34:30,328-Speed 5168.27 samples/sec Loss 3.3008 LearningRate 0.0453 Epoch: 6 Global Step: 109180 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:34:32,301-Speed 5189.90 samples/sec Loss 3.3007 LearningRate 0.0453 Epoch: 6 Global Step: 109190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:34:34,276-Speed 5186.54 samples/sec Loss 3.2618 LearningRate 0.0453 Epoch: 6 Global Step: 109200 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:34:36,264-Speed 5152.36 samples/sec Loss 3.2099 LearningRate 0.0453 Epoch: 6 Global Step: 109210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:34:38,242-Speed 5181.72 samples/sec Loss 3.2239 LearningRate 0.0453 Epoch: 6 Global Step: 109220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:34:40,223-Speed 5170.82 samples/sec Loss 3.2875 LearningRate 0.0453 Epoch: 6 Global Step: 109230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:34:42,201-Speed 5178.68 samples/sec Loss 3.3062 LearningRate 0.0453 Epoch: 6 Global Step: 109240 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:34:44,179-Speed 5177.31 samples/sec Loss 3.3225 LearningRate 0.0453 Epoch: 6 Global Step: 109250 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:34:46,188-Speed 5099.50 samples/sec Loss 3.3280 LearningRate 0.0453 Epoch: 6 Global Step: 109260 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:34:48,181-Speed 5139.21 samples/sec Loss 3.3377 LearningRate 0.0452 Epoch: 6 Global Step: 109270 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:34:50,159-Speed 5177.63 samples/sec Loss 3.2414 LearningRate 0.0452 Epoch: 6 Global Step: 109280 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:34:52,142-Speed 5166.01 samples/sec Loss 3.3571 LearningRate 0.0452 Epoch: 6 Global Step: 109290 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:34:54,111-Speed 5202.11 samples/sec Loss 3.2539 LearningRate 0.0452 Epoch: 6 Global Step: 109300 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:34:56,088-Speed 5182.19 samples/sec Loss 3.2488 LearningRate 0.0452 Epoch: 6 Global Step: 109310 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:34:58,075-Speed 5154.31 samples/sec Loss 3.2460 LearningRate 0.0452 Epoch: 6 Global Step: 109320 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:35:00,103-Speed 5052.55 samples/sec Loss 3.2701 LearningRate 0.0452 Epoch: 6 Global Step: 109330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:35:02,089-Speed 5158.10 samples/sec Loss 3.2538 LearningRate 0.0452 Epoch: 6 Global Step: 109340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:35:04,077-Speed 5153.06 samples/sec Loss 3.3036 LearningRate 0.0452 Epoch: 6 Global Step: 109350 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:35:06,070-Speed 5138.86 samples/sec Loss 3.2815 LearningRate 0.0452 Epoch: 6 Global Step: 109360 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:35:08,051-Speed 5169.70 samples/sec Loss 3.3136 LearningRate 0.0452 Epoch: 6 Global Step: 109370 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:35:10,032-Speed 5172.22 samples/sec Loss 3.2289 LearningRate 0.0452 Epoch: 6 Global Step: 109380 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:35:12,022-Speed 5146.66 samples/sec Loss 3.3225 LearningRate 0.0452 Epoch: 6 Global Step: 109390 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:35:14,021-Speed 5124.58 samples/sec Loss 3.2469 LearningRate 0.0452 Epoch: 6 Global Step: 109400 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:35:15,999-Speed 5179.72 samples/sec Loss 3.2181 LearningRate 0.0452 Epoch: 6 Global Step: 109410 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:35:17,977-Speed 5177.02 samples/sec Loss 3.1606 LearningRate 0.0452 Epoch: 6 Global Step: 109420 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:35:19,952-Speed 5185.47 samples/sec Loss 3.2482 LearningRate 0.0452 Epoch: 6 Global Step: 109430 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:35:21,934-Speed 5169.71 samples/sec Loss 3.2188 LearningRate 0.0452 Epoch: 6 Global Step: 109440 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:35:23,937-Speed 5114.28 samples/sec Loss 3.2976 LearningRate 0.0452 Epoch: 6 Global Step: 109450 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:35:25,946-Speed 5098.53 samples/sec Loss 3.2341 LearningRate 0.0452 Epoch: 6 Global Step: 109460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:35:27,942-Speed 5132.14 samples/sec Loss 3.1937 LearningRate 0.0452 Epoch: 6 Global Step: 109470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:35:29,923-Speed 5171.51 samples/sec Loss 3.3153 LearningRate 0.0452 Epoch: 6 Global Step: 109480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:35:31,899-Speed 5182.84 samples/sec Loss 3.2654 LearningRate 0.0452 Epoch: 6 Global Step: 109490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:35:33,875-Speed 5185.11 samples/sec Loss 3.1882 LearningRate 0.0452 Epoch: 6 Global Step: 109500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:35:35,859-Speed 5162.77 samples/sec Loss 3.3207 LearningRate 0.0452 Epoch: 6 Global Step: 109510 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:35:37,872-Speed 5087.55 samples/sec Loss 3.2544 LearningRate 0.0451 Epoch: 6 Global Step: 109520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:35:39,857-Speed 5160.81 samples/sec Loss 3.2666 LearningRate 0.0451 Epoch: 6 Global Step: 109530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:35:41,867-Speed 5096.14 samples/sec Loss 3.3347 LearningRate 0.0451 Epoch: 6 Global Step: 109540 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:35:43,851-Speed 5162.27 samples/sec Loss 3.2786 LearningRate 0.0451 Epoch: 6 Global Step: 109550 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:35:45,854-Speed 5116.72 samples/sec Loss 3.2777 LearningRate 0.0451 Epoch: 6 Global Step: 109560 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:35:47,833-Speed 5175.90 samples/sec Loss 3.1708 LearningRate 0.0451 Epoch: 6 Global Step: 109570 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:35:49,812-Speed 5175.36 samples/sec Loss 3.1495 LearningRate 0.0451 Epoch: 6 Global Step: 109580 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:35:51,788-Speed 5182.66 samples/sec Loss 3.2689 LearningRate 0.0451 Epoch: 6 Global Step: 109590 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:35:53,766-Speed 5180.08 samples/sec Loss 3.3440 LearningRate 0.0451 Epoch: 6 Global Step: 109600 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:35:55,750-Speed 5163.90 samples/sec Loss 3.3391 LearningRate 0.0451 Epoch: 6 Global Step: 109610 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:35:57,734-Speed 5161.60 samples/sec Loss 3.2257 LearningRate 0.0451 Epoch: 6 Global Step: 109620 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:35:59,722-Speed 5151.63 samples/sec Loss 3.2066 LearningRate 0.0451 Epoch: 6 Global Step: 109630 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:36:01,698-Speed 5185.49 samples/sec Loss 3.3408 LearningRate 0.0451 Epoch: 6 Global Step: 109640 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:36:03,687-Speed 5148.89 samples/sec Loss 3.2879 LearningRate 0.0451 Epoch: 6 Global Step: 109650 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:36:05,693-Speed 5107.23 samples/sec Loss 3.3174 LearningRate 0.0451 Epoch: 6 Global Step: 109660 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:36:07,673-Speed 5174.66 samples/sec Loss 3.2785 LearningRate 0.0451 Epoch: 6 Global Step: 109670 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:36:09,650-Speed 5180.93 samples/sec Loss 3.2100 LearningRate 0.0451 Epoch: 6 Global Step: 109680 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:36:11,646-Speed 5132.15 samples/sec Loss 3.2990 LearningRate 0.0451 Epoch: 6 Global Step: 109690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:36:13,627-Speed 5170.11 samples/sec Loss 3.1861 LearningRate 0.0451 Epoch: 6 Global Step: 109700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:36:15,607-Speed 5173.55 samples/sec Loss 3.2056 LearningRate 0.0451 Epoch: 6 Global Step: 109710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:36:17,597-Speed 5147.82 samples/sec Loss 3.3229 LearningRate 0.0451 Epoch: 6 Global Step: 109720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:36:19,589-Speed 5143.07 samples/sec Loss 3.2016 LearningRate 0.0451 Epoch: 6 Global Step: 109730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:36:21,563-Speed 5188.08 samples/sec Loss 3.2866 LearningRate 0.0451 Epoch: 6 Global Step: 109740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:36:23,537-Speed 5189.11 samples/sec Loss 3.2654 LearningRate 0.0451 Epoch: 6 Global Step: 109750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:36:25,517-Speed 5175.47 samples/sec Loss 3.2736 LearningRate 0.0451 Epoch: 6 Global Step: 109760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:36:27,504-Speed 5154.52 samples/sec Loss 3.2360 LearningRate 0.0450 Epoch: 6 Global Step: 109770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:36:29,497-Speed 5138.26 samples/sec Loss 3.2255 LearningRate 0.0450 Epoch: 6 Global Step: 109780 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:36:31,480-Speed 5166.26 samples/sec Loss 3.2853 LearningRate 0.0450 Epoch: 6 Global Step: 109790 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:36:33,464-Speed 5163.23 samples/sec Loss 3.2615 LearningRate 0.0450 Epoch: 6 Global Step: 109800 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:36:35,441-Speed 5180.22 samples/sec Loss 3.2937 LearningRate 0.0450 Epoch: 6 Global Step: 109810 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:36:37,430-Speed 5151.00 samples/sec Loss 3.1971 LearningRate 0.0450 Epoch: 6 Global Step: 109820 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:36:39,424-Speed 5137.60 samples/sec Loss 3.2202 LearningRate 0.0450 Epoch: 6 Global Step: 109830 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:36:41,399-Speed 5187.49 samples/sec Loss 3.1956 LearningRate 0.0450 Epoch: 6 Global Step: 109840 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:36:43,369-Speed 5197.46 samples/sec Loss 3.2687 LearningRate 0.0450 Epoch: 6 Global Step: 109850 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:36:45,362-Speed 5140.00 samples/sec Loss 3.1941 LearningRate 0.0450 Epoch: 6 Global Step: 109860 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:36:47,344-Speed 5168.67 samples/sec Loss 3.2816 LearningRate 0.0450 Epoch: 6 Global Step: 109870 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:36:49,334-Speed 5146.48 samples/sec Loss 3.2923 LearningRate 0.0450 Epoch: 6 Global Step: 109880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:36:51,322-Speed 5152.51 samples/sec Loss 3.1201 LearningRate 0.0450 Epoch: 6 Global Step: 109890 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:36:53,305-Speed 5166.36 samples/sec Loss 3.2054 LearningRate 0.0450 Epoch: 6 Global Step: 109900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:36:55,303-Speed 5129.10 samples/sec Loss 3.2899 LearningRate 0.0450 Epoch: 6 Global Step: 109910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:36:57,283-Speed 5173.71 samples/sec Loss 3.2179 LearningRate 0.0450 Epoch: 6 Global Step: 109920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:36:59,266-Speed 5165.66 samples/sec Loss 3.2316 LearningRate 0.0450 Epoch: 6 Global Step: 109930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:37:01,245-Speed 5174.40 samples/sec Loss 3.2354 LearningRate 0.0450 Epoch: 6 Global Step: 109940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:37:03,222-Speed 5181.16 samples/sec Loss 3.2427 LearningRate 0.0450 Epoch: 6 Global Step: 109950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:37:05,201-Speed 5177.10 samples/sec Loss 3.2173 LearningRate 0.0450 Epoch: 6 Global Step: 109960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:37:07,181-Speed 5172.35 samples/sec Loss 3.2060 LearningRate 0.0450 Epoch: 6 Global Step: 109970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:37:09,163-Speed 5167.73 samples/sec Loss 3.2900 LearningRate 0.0450 Epoch: 6 Global Step: 109980 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:37:11,180-Speed 5078.67 samples/sec Loss 3.3113 LearningRate 0.0450 Epoch: 6 Global Step: 109990 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:37:13,188-Speed 5100.67 samples/sec Loss 3.3068 LearningRate 0.0450 Epoch: 6 Global Step: 110000 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:37:39,805-[lfw][110000]XNorm: 22.888233 Training: 2022-04-11 06:37:39,805-[lfw][110000]Accuracy-Flip: 0.99783+-0.00308 Training: 2022-04-11 06:37:39,806-[lfw][110000]Accuracy-Highest: 0.99817 Training: 2022-04-11 06:38:10,442-[cfp_fp][110000]XNorm: 20.759283 Training: 2022-04-11 06:38:10,442-[cfp_fp][110000]Accuracy-Flip: 0.98443+-0.00525 Training: 2022-04-11 06:38:10,443-[cfp_fp][110000]Accuracy-Highest: 0.98443 Training: 2022-04-11 06:38:37,090-[agedb_30][110000]XNorm: 22.797848 Training: 2022-04-11 06:38:37,091-[agedb_30][110000]Accuracy-Flip: 0.97833+-0.00753 Training: 2022-04-11 06:38:37,091-[agedb_30][110000]Accuracy-Highest: 0.97950 Training: 2022-04-11 06:38:39,076-Speed 119.23 samples/sec Loss 3.2498 LearningRate 0.0450 Epoch: 6 Global Step: 110010 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:38:41,062-Speed 5156.25 samples/sec Loss 3.2625 LearningRate 0.0449 Epoch: 6 Global Step: 110020 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:38:43,031-Speed 5203.36 samples/sec Loss 3.2839 LearningRate 0.0449 Epoch: 6 Global Step: 110030 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:38:45,028-Speed 5131.20 samples/sec Loss 3.2836 LearningRate 0.0449 Epoch: 6 Global Step: 110040 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:38:47,004-Speed 5184.28 samples/sec Loss 3.2393 LearningRate 0.0449 Epoch: 6 Global Step: 110050 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:38:48,984-Speed 5171.87 samples/sec Loss 3.2981 LearningRate 0.0449 Epoch: 6 Global Step: 110060 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:38:51,012-Speed 5050.78 samples/sec Loss 3.2261 LearningRate 0.0449 Epoch: 6 Global Step: 110070 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:38:53,073-Speed 4970.54 samples/sec Loss 3.2499 LearningRate 0.0449 Epoch: 6 Global Step: 110080 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:38:55,069-Speed 5131.05 samples/sec Loss 3.2842 LearningRate 0.0449 Epoch: 6 Global Step: 110090 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:38:57,036-Speed 5209.18 samples/sec Loss 3.3187 LearningRate 0.0449 Epoch: 6 Global Step: 110100 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:38:59,015-Speed 5175.07 samples/sec Loss 3.2394 LearningRate 0.0449 Epoch: 6 Global Step: 110110 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:39:00,994-Speed 5176.60 samples/sec Loss 3.2548 LearningRate 0.0449 Epoch: 6 Global Step: 110120 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:39:02,969-Speed 5184.52 samples/sec Loss 3.3017 LearningRate 0.0449 Epoch: 6 Global Step: 110130 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:39:04,934-Speed 5213.38 samples/sec Loss 3.2744 LearningRate 0.0449 Epoch: 6 Global Step: 110140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:39:06,922-Speed 5151.45 samples/sec Loss 3.2041 LearningRate 0.0449 Epoch: 6 Global Step: 110150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:39:08,895-Speed 5194.40 samples/sec Loss 3.2294 LearningRate 0.0449 Epoch: 6 Global Step: 110160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:39:10,868-Speed 5189.79 samples/sec Loss 3.1865 LearningRate 0.0449 Epoch: 6 Global Step: 110170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:39:12,870-Speed 5117.03 samples/sec Loss 3.2041 LearningRate 0.0449 Epoch: 6 Global Step: 110180 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:39:14,863-Speed 5139.46 samples/sec Loss 3.2602 LearningRate 0.0449 Epoch: 6 Global Step: 110190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:39:16,837-Speed 5192.00 samples/sec Loss 3.2085 LearningRate 0.0449 Epoch: 6 Global Step: 110200 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:39:18,821-Speed 5160.41 samples/sec Loss 3.3314 LearningRate 0.0449 Epoch: 6 Global Step: 110210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:39:20,804-Speed 5165.27 samples/sec Loss 3.2238 LearningRate 0.0449 Epoch: 6 Global Step: 110220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:39:22,799-Speed 5135.63 samples/sec Loss 3.2671 LearningRate 0.0449 Epoch: 6 Global Step: 110230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:39:24,792-Speed 5140.34 samples/sec Loss 3.2254 LearningRate 0.0449 Epoch: 6 Global Step: 110240 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:39:26,779-Speed 5154.75 samples/sec Loss 3.2936 LearningRate 0.0449 Epoch: 6 Global Step: 110250 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:39:28,768-Speed 5148.87 samples/sec Loss 3.2597 LearningRate 0.0449 Epoch: 6 Global Step: 110260 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:39:30,743-Speed 5186.75 samples/sec Loss 3.2394 LearningRate 0.0448 Epoch: 6 Global Step: 110270 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:39:32,721-Speed 5178.74 samples/sec Loss 3.2242 LearningRate 0.0448 Epoch: 6 Global Step: 110280 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:39:34,703-Speed 5167.75 samples/sec Loss 3.3120 LearningRate 0.0448 Epoch: 6 Global Step: 110290 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:39:36,689-Speed 5160.11 samples/sec Loss 3.2488 LearningRate 0.0448 Epoch: 6 Global Step: 110300 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:39:38,686-Speed 5127.82 samples/sec Loss 3.3039 LearningRate 0.0448 Epoch: 6 Global Step: 110310 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:39:40,675-Speed 5151.86 samples/sec Loss 3.2303 LearningRate 0.0448 Epoch: 6 Global Step: 110320 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:39:42,643-Speed 5202.88 samples/sec Loss 3.2895 LearningRate 0.0448 Epoch: 6 Global Step: 110330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:39:44,618-Speed 5185.95 samples/sec Loss 3.2420 LearningRate 0.0448 Epoch: 6 Global Step: 110340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:39:46,604-Speed 5160.10 samples/sec Loss 3.2333 LearningRate 0.0448 Epoch: 6 Global Step: 110350 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:39:48,579-Speed 5184.89 samples/sec Loss 3.2163 LearningRate 0.0448 Epoch: 6 Global Step: 110360 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:39:50,563-Speed 5162.43 samples/sec Loss 3.2135 LearningRate 0.0448 Epoch: 6 Global Step: 110370 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:39:52,552-Speed 5152.90 samples/sec Loss 3.3494 LearningRate 0.0448 Epoch: 6 Global Step: 110380 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:39:54,537-Speed 5160.09 samples/sec Loss 3.2633 LearningRate 0.0448 Epoch: 6 Global Step: 110390 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:39:56,514-Speed 5181.90 samples/sec Loss 3.2778 LearningRate 0.0448 Epoch: 6 Global Step: 110400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:39:58,516-Speed 5115.67 samples/sec Loss 3.2652 LearningRate 0.0448 Epoch: 6 Global Step: 110410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:40:00,522-Speed 5106.88 samples/sec Loss 3.2841 LearningRate 0.0448 Epoch: 6 Global Step: 110420 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:40:02,503-Speed 5170.94 samples/sec Loss 3.3104 LearningRate 0.0448 Epoch: 6 Global Step: 110430 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:40:04,491-Speed 5152.66 samples/sec Loss 3.2988 LearningRate 0.0448 Epoch: 6 Global Step: 110440 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:40:06,490-Speed 5122.66 samples/sec Loss 3.3663 LearningRate 0.0448 Epoch: 6 Global Step: 110450 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:40:08,470-Speed 5175.01 samples/sec Loss 3.2931 LearningRate 0.0448 Epoch: 6 Global Step: 110460 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:40:10,448-Speed 5178.22 samples/sec Loss 3.1507 LearningRate 0.0448 Epoch: 6 Global Step: 110470 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:40:12,443-Speed 5133.06 samples/sec Loss 3.0597 LearningRate 0.0448 Epoch: 6 Global Step: 110480 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:40:14,447-Speed 5114.08 samples/sec Loss 3.3230 LearningRate 0.0448 Epoch: 6 Global Step: 110490 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:40:16,428-Speed 5169.53 samples/sec Loss 3.3115 LearningRate 0.0448 Epoch: 6 Global Step: 110500 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:40:18,417-Speed 5151.50 samples/sec Loss 3.3014 LearningRate 0.0447 Epoch: 6 Global Step: 110510 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:40:20,391-Speed 5186.94 samples/sec Loss 3.2533 LearningRate 0.0447 Epoch: 6 Global Step: 110520 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:40:22,366-Speed 5186.42 samples/sec Loss 3.1994 LearningRate 0.0447 Epoch: 6 Global Step: 110530 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:40:24,352-Speed 5160.81 samples/sec Loss 3.2550 LearningRate 0.0447 Epoch: 6 Global Step: 110540 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:40:26,337-Speed 5158.72 samples/sec Loss 3.2593 LearningRate 0.0447 Epoch: 6 Global Step: 110550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:40:28,316-Speed 5175.71 samples/sec Loss 3.2695 LearningRate 0.0447 Epoch: 6 Global Step: 110560 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:40:30,303-Speed 5154.33 samples/sec Loss 3.2530 LearningRate 0.0447 Epoch: 6 Global Step: 110570 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:40:32,288-Speed 5160.10 samples/sec Loss 3.2509 LearningRate 0.0447 Epoch: 6 Global Step: 110580 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:40:34,277-Speed 5150.93 samples/sec Loss 3.1936 LearningRate 0.0447 Epoch: 6 Global Step: 110590 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:40:36,253-Speed 5183.53 samples/sec Loss 3.2665 LearningRate 0.0447 Epoch: 6 Global Step: 110600 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:40:38,228-Speed 5188.79 samples/sec Loss 3.1980 LearningRate 0.0447 Epoch: 6 Global Step: 110610 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:40:40,210-Speed 5167.69 samples/sec Loss 3.2725 LearningRate 0.0447 Epoch: 6 Global Step: 110620 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:40:42,195-Speed 5159.91 samples/sec Loss 3.3085 LearningRate 0.0447 Epoch: 6 Global Step: 110630 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:40:44,177-Speed 5169.40 samples/sec Loss 3.2918 LearningRate 0.0447 Epoch: 6 Global Step: 110640 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:40:46,165-Speed 5150.57 samples/sec Loss 3.2474 LearningRate 0.0447 Epoch: 6 Global Step: 110650 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:40:48,184-Speed 5075.17 samples/sec Loss 3.2284 LearningRate 0.0447 Epoch: 6 Global Step: 110660 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:40:50,195-Speed 5093.77 samples/sec Loss 3.2418 LearningRate 0.0447 Epoch: 6 Global Step: 110670 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:40:52,211-Speed 5080.18 samples/sec Loss 3.1853 LearningRate 0.0447 Epoch: 6 Global Step: 110680 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:40:54,209-Speed 5126.54 samples/sec Loss 3.2292 LearningRate 0.0447 Epoch: 6 Global Step: 110690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:40:56,183-Speed 5189.54 samples/sec Loss 3.2444 LearningRate 0.0447 Epoch: 6 Global Step: 110700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:40:58,157-Speed 5189.20 samples/sec Loss 3.1281 LearningRate 0.0447 Epoch: 6 Global Step: 110710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:41:00,130-Speed 5191.94 samples/sec Loss 3.1615 LearningRate 0.0447 Epoch: 6 Global Step: 110720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:41:02,115-Speed 5161.67 samples/sec Loss 3.2556 LearningRate 0.0447 Epoch: 6 Global Step: 110730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:41:04,094-Speed 5175.77 samples/sec Loss 3.3398 LearningRate 0.0447 Epoch: 6 Global Step: 110740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:41:06,079-Speed 5160.80 samples/sec Loss 3.2327 LearningRate 0.0447 Epoch: 6 Global Step: 110750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:41:08,058-Speed 5173.68 samples/sec Loss 3.2672 LearningRate 0.0446 Epoch: 6 Global Step: 110760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:41:10,054-Speed 5133.82 samples/sec Loss 3.2761 LearningRate 0.0446 Epoch: 6 Global Step: 110770 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:41:12,025-Speed 5195.88 samples/sec Loss 3.2089 LearningRate 0.0446 Epoch: 6 Global Step: 110780 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:41:14,010-Speed 5159.39 samples/sec Loss 3.3070 LearningRate 0.0446 Epoch: 6 Global Step: 110790 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:41:16,003-Speed 5140.74 samples/sec Loss 3.1672 LearningRate 0.0446 Epoch: 6 Global Step: 110800 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:41:17,995-Speed 5144.36 samples/sec Loss 3.2834 LearningRate 0.0446 Epoch: 6 Global Step: 110810 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:41:19,971-Speed 5184.11 samples/sec Loss 3.2562 LearningRate 0.0446 Epoch: 6 Global Step: 110820 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:41:21,949-Speed 5176.97 samples/sec Loss 3.1980 LearningRate 0.0446 Epoch: 6 Global Step: 110830 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:41:23,921-Speed 5194.69 samples/sec Loss 3.2729 LearningRate 0.0446 Epoch: 6 Global Step: 110840 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:41:25,916-Speed 5135.29 samples/sec Loss 3.1643 LearningRate 0.0446 Epoch: 6 Global Step: 110850 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:41:27,890-Speed 5189.01 samples/sec Loss 3.2727 LearningRate 0.0446 Epoch: 6 Global Step: 110860 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:41:29,864-Speed 5187.43 samples/sec Loss 3.2351 LearningRate 0.0446 Epoch: 6 Global Step: 110870 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:41:31,843-Speed 5176.35 samples/sec Loss 3.2354 LearningRate 0.0446 Epoch: 6 Global Step: 110880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:41:33,833-Speed 5148.55 samples/sec Loss 3.3172 LearningRate 0.0446 Epoch: 6 Global Step: 110890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:41:35,832-Speed 5124.55 samples/sec Loss 3.2683 LearningRate 0.0446 Epoch: 6 Global Step: 110900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:41:37,843-Speed 5094.03 samples/sec Loss 3.3369 LearningRate 0.0446 Epoch: 6 Global Step: 110910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:41:39,858-Speed 5083.70 samples/sec Loss 3.2245 LearningRate 0.0446 Epoch: 6 Global Step: 110920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:41:41,832-Speed 5188.00 samples/sec Loss 3.2381 LearningRate 0.0446 Epoch: 6 Global Step: 110930 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:41:43,798-Speed 5211.59 samples/sec Loss 3.2534 LearningRate 0.0446 Epoch: 6 Global Step: 110940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:41:45,779-Speed 5169.52 samples/sec Loss 3.1790 LearningRate 0.0446 Epoch: 6 Global Step: 110950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:41:47,779-Speed 5122.16 samples/sec Loss 3.2100 LearningRate 0.0446 Epoch: 6 Global Step: 110960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:41:49,772-Speed 5139.01 samples/sec Loss 3.2241 LearningRate 0.0446 Epoch: 6 Global Step: 110970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:41:51,770-Speed 5127.98 samples/sec Loss 3.2933 LearningRate 0.0446 Epoch: 6 Global Step: 110980 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:41:53,781-Speed 5094.34 samples/sec Loss 3.1909 LearningRate 0.0446 Epoch: 6 Global Step: 110990 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:41:55,758-Speed 5181.10 samples/sec Loss 3.2715 LearningRate 0.0446 Epoch: 6 Global Step: 111000 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:41:57,737-Speed 5174.98 samples/sec Loss 3.2505 LearningRate 0.0445 Epoch: 6 Global Step: 111010 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:41:59,730-Speed 5140.32 samples/sec Loss 3.2107 LearningRate 0.0445 Epoch: 6 Global Step: 111020 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:42:01,728-Speed 5128.08 samples/sec Loss 3.3147 LearningRate 0.0445 Epoch: 6 Global Step: 111030 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:42:03,726-Speed 5125.27 samples/sec Loss 3.3212 LearningRate 0.0445 Epoch: 6 Global Step: 111040 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:42:05,697-Speed 5195.98 samples/sec Loss 3.2146 LearningRate 0.0445 Epoch: 6 Global Step: 111050 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:42:07,686-Speed 5150.13 samples/sec Loss 3.2089 LearningRate 0.0445 Epoch: 6 Global Step: 111060 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:42:09,660-Speed 5189.09 samples/sec Loss 3.2572 LearningRate 0.0445 Epoch: 6 Global Step: 111070 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:42:11,638-Speed 5178.95 samples/sec Loss 3.2537 LearningRate 0.0445 Epoch: 6 Global Step: 111080 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:42:13,617-Speed 5175.71 samples/sec Loss 3.2756 LearningRate 0.0445 Epoch: 6 Global Step: 111090 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:42:15,607-Speed 5146.65 samples/sec Loss 3.3245 LearningRate 0.0445 Epoch: 6 Global Step: 111100 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:42:17,589-Speed 5168.74 samples/sec Loss 3.2581 LearningRate 0.0445 Epoch: 6 Global Step: 111110 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:42:19,562-Speed 5193.28 samples/sec Loss 3.3418 LearningRate 0.0445 Epoch: 6 Global Step: 111120 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:42:21,587-Speed 5059.84 samples/sec Loss 3.3030 LearningRate 0.0445 Epoch: 6 Global Step: 111130 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:42:23,564-Speed 5180.23 samples/sec Loss 3.2059 LearningRate 0.0445 Epoch: 6 Global Step: 111140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:42:25,555-Speed 5143.98 samples/sec Loss 3.1973 LearningRate 0.0445 Epoch: 6 Global Step: 111150 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:42:27,543-Speed 5153.81 samples/sec Loss 3.2774 LearningRate 0.0445 Epoch: 6 Global Step: 111160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:42:29,518-Speed 5187.86 samples/sec Loss 3.2058 LearningRate 0.0445 Epoch: 6 Global Step: 111170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:42:31,495-Speed 5180.34 samples/sec Loss 3.1834 LearningRate 0.0445 Epoch: 6 Global Step: 111180 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:42:33,484-Speed 5149.55 samples/sec Loss 3.2657 LearningRate 0.0445 Epoch: 6 Global Step: 111190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:42:35,471-Speed 5155.36 samples/sec Loss 3.2588 LearningRate 0.0445 Epoch: 6 Global Step: 111200 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:42:37,465-Speed 5135.92 samples/sec Loss 3.1808 LearningRate 0.0445 Epoch: 6 Global Step: 111210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:42:39,461-Speed 5133.05 samples/sec Loss 3.2594 LearningRate 0.0445 Epoch: 6 Global Step: 111220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:42:41,440-Speed 5175.40 samples/sec Loss 3.3196 LearningRate 0.0445 Epoch: 6 Global Step: 111230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:42:43,422-Speed 5170.04 samples/sec Loss 3.1521 LearningRate 0.0445 Epoch: 6 Global Step: 111240 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:42:45,404-Speed 5167.22 samples/sec Loss 3.2654 LearningRate 0.0445 Epoch: 6 Global Step: 111250 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:42:47,379-Speed 5185.61 samples/sec Loss 3.2501 LearningRate 0.0444 Epoch: 6 Global Step: 111260 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:42:49,369-Speed 5147.17 samples/sec Loss 3.2602 LearningRate 0.0444 Epoch: 6 Global Step: 111270 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:42:51,347-Speed 5180.24 samples/sec Loss 3.2041 LearningRate 0.0444 Epoch: 6 Global Step: 111280 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:42:53,327-Speed 5172.70 samples/sec Loss 3.2863 LearningRate 0.0444 Epoch: 6 Global Step: 111290 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:42:55,311-Speed 5163.33 samples/sec Loss 3.2120 LearningRate 0.0444 Epoch: 6 Global Step: 111300 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:42:57,310-Speed 5124.95 samples/sec Loss 3.3687 LearningRate 0.0444 Epoch: 6 Global Step: 111310 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:42:59,292-Speed 5166.40 samples/sec Loss 3.2985 LearningRate 0.0444 Epoch: 6 Global Step: 111320 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:43:01,275-Speed 5167.38 samples/sec Loss 3.2755 LearningRate 0.0444 Epoch: 6 Global Step: 111330 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:43:03,261-Speed 5158.15 samples/sec Loss 3.2567 LearningRate 0.0444 Epoch: 6 Global Step: 111340 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:43:05,254-Speed 5139.68 samples/sec Loss 3.1986 LearningRate 0.0444 Epoch: 6 Global Step: 111350 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:43:07,238-Speed 5163.00 samples/sec Loss 3.2648 LearningRate 0.0444 Epoch: 6 Global Step: 111360 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:43:09,221-Speed 5164.86 samples/sec Loss 3.2718 LearningRate 0.0444 Epoch: 6 Global Step: 111370 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:43:11,239-Speed 5076.76 samples/sec Loss 3.2573 LearningRate 0.0444 Epoch: 6 Global Step: 111380 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:43:13,254-Speed 5084.38 samples/sec Loss 3.2670 LearningRate 0.0444 Epoch: 6 Global Step: 111390 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:43:15,241-Speed 5152.95 samples/sec Loss 3.2685 LearningRate 0.0444 Epoch: 6 Global Step: 111400 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:43:17,216-Speed 5187.36 samples/sec Loss 3.1564 LearningRate 0.0444 Epoch: 6 Global Step: 111410 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:43:19,217-Speed 5119.97 samples/sec Loss 3.2334 LearningRate 0.0444 Epoch: 6 Global Step: 111420 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:43:21,189-Speed 5194.10 samples/sec Loss 3.2484 LearningRate 0.0444 Epoch: 6 Global Step: 111430 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:43:23,161-Speed 5194.81 samples/sec Loss 3.2367 LearningRate 0.0444 Epoch: 6 Global Step: 111440 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:43:25,141-Speed 5173.22 samples/sec Loss 3.2086 LearningRate 0.0444 Epoch: 6 Global Step: 111450 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:43:27,106-Speed 5213.93 samples/sec Loss 3.2697 LearningRate 0.0444 Epoch: 6 Global Step: 111460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:43:29,091-Speed 5159.20 samples/sec Loss 3.2900 LearningRate 0.0444 Epoch: 6 Global Step: 111470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:43:31,065-Speed 5190.88 samples/sec Loss 3.2407 LearningRate 0.0444 Epoch: 6 Global Step: 111480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:43:33,062-Speed 5127.07 samples/sec Loss 3.2930 LearningRate 0.0444 Epoch: 6 Global Step: 111490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:43:35,034-Speed 5195.47 samples/sec Loss 3.2542 LearningRate 0.0444 Epoch: 6 Global Step: 111500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:43:37,032-Speed 5125.90 samples/sec Loss 3.2695 LearningRate 0.0443 Epoch: 6 Global Step: 111510 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:43:39,012-Speed 5172.82 samples/sec Loss 3.1700 LearningRate 0.0443 Epoch: 6 Global Step: 111520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:43:40,990-Speed 5177.99 samples/sec Loss 3.2840 LearningRate 0.0443 Epoch: 6 Global Step: 111530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:43:42,973-Speed 5168.31 samples/sec Loss 3.3042 LearningRate 0.0443 Epoch: 6 Global Step: 111540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:43:44,959-Speed 5157.01 samples/sec Loss 3.1702 LearningRate 0.0443 Epoch: 6 Global Step: 111550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:43:46,950-Speed 5144.85 samples/sec Loss 3.2127 LearningRate 0.0443 Epoch: 6 Global Step: 111560 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:43:48,924-Speed 5190.01 samples/sec Loss 3.2350 LearningRate 0.0443 Epoch: 6 Global Step: 111570 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:43:50,913-Speed 5150.01 samples/sec Loss 3.2524 LearningRate 0.0443 Epoch: 6 Global Step: 111580 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:43:52,880-Speed 5206.42 samples/sec Loss 3.1990 LearningRate 0.0443 Epoch: 6 Global Step: 111590 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:43:54,851-Speed 5197.49 samples/sec Loss 3.2722 LearningRate 0.0443 Epoch: 6 Global Step: 111600 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:43:56,826-Speed 5186.97 samples/sec Loss 3.2049 LearningRate 0.0443 Epoch: 6 Global Step: 111610 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:43:58,797-Speed 5195.19 samples/sec Loss 3.2663 LearningRate 0.0443 Epoch: 6 Global Step: 111620 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:44:00,771-Speed 5190.80 samples/sec Loss 3.2200 LearningRate 0.0443 Epoch: 6 Global Step: 111630 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:44:02,786-Speed 5084.26 samples/sec Loss 3.2159 LearningRate 0.0443 Epoch: 6 Global Step: 111640 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:44:04,779-Speed 5140.11 samples/sec Loss 3.1436 LearningRate 0.0443 Epoch: 6 Global Step: 111650 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:44:06,752-Speed 5190.43 samples/sec Loss 3.3088 LearningRate 0.0443 Epoch: 6 Global Step: 111660 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:44:08,739-Speed 5154.26 samples/sec Loss 3.1108 LearningRate 0.0443 Epoch: 6 Global Step: 111670 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:44:10,715-Speed 5184.85 samples/sec Loss 3.1831 LearningRate 0.0443 Epoch: 6 Global Step: 111680 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:44:12,697-Speed 5167.70 samples/sec Loss 3.1747 LearningRate 0.0443 Epoch: 6 Global Step: 111690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:44:14,681-Speed 5163.12 samples/sec Loss 3.2771 LearningRate 0.0443 Epoch: 6 Global Step: 111700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:44:16,666-Speed 5159.89 samples/sec Loss 3.2052 LearningRate 0.0443 Epoch: 6 Global Step: 111710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:44:18,640-Speed 5189.38 samples/sec Loss 3.1806 LearningRate 0.0443 Epoch: 6 Global Step: 111720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:44:20,615-Speed 5186.24 samples/sec Loss 3.2834 LearningRate 0.0443 Epoch: 6 Global Step: 111730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:44:22,610-Speed 5136.82 samples/sec Loss 3.2517 LearningRate 0.0443 Epoch: 6 Global Step: 111740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:44:24,583-Speed 5190.58 samples/sec Loss 3.1785 LearningRate 0.0443 Epoch: 6 Global Step: 111750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:44:26,569-Speed 5157.39 samples/sec Loss 3.2430 LearningRate 0.0443 Epoch: 6 Global Step: 111760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:44:28,540-Speed 5199.40 samples/sec Loss 3.2687 LearningRate 0.0442 Epoch: 6 Global Step: 111770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:44:30,528-Speed 5151.80 samples/sec Loss 3.1963 LearningRate 0.0442 Epoch: 6 Global Step: 111780 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:44:32,503-Speed 5186.02 samples/sec Loss 3.2026 LearningRate 0.0442 Epoch: 6 Global Step: 111790 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:44:34,506-Speed 5115.62 samples/sec Loss 3.2328 LearningRate 0.0442 Epoch: 6 Global Step: 111800 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:44:36,498-Speed 5141.96 samples/sec Loss 3.3083 LearningRate 0.0442 Epoch: 6 Global Step: 111810 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:44:38,481-Speed 5163.16 samples/sec Loss 3.2735 LearningRate 0.0442 Epoch: 6 Global Step: 111820 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:44:40,459-Speed 5178.66 samples/sec Loss 3.1957 LearningRate 0.0442 Epoch: 6 Global Step: 111830 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:44:42,460-Speed 5121.03 samples/sec Loss 3.2492 LearningRate 0.0442 Epoch: 6 Global Step: 111840 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:44:44,442-Speed 5168.03 samples/sec Loss 3.2770 LearningRate 0.0442 Epoch: 6 Global Step: 111850 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:44:46,424-Speed 5166.92 samples/sec Loss 3.2602 LearningRate 0.0442 Epoch: 6 Global Step: 111860 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:44:48,416-Speed 5143.74 samples/sec Loss 3.2994 LearningRate 0.0442 Epoch: 6 Global Step: 111870 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:44:50,417-Speed 5119.13 samples/sec Loss 3.2500 LearningRate 0.0442 Epoch: 6 Global Step: 111880 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:44:52,404-Speed 5156.37 samples/sec Loss 3.2485 LearningRate 0.0442 Epoch: 6 Global Step: 111890 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:44:54,379-Speed 5185.23 samples/sec Loss 3.2433 LearningRate 0.0442 Epoch: 6 Global Step: 111900 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:44:56,359-Speed 5173.34 samples/sec Loss 3.1817 LearningRate 0.0442 Epoch: 6 Global Step: 111910 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:44:58,361-Speed 5115.40 samples/sec Loss 3.2444 LearningRate 0.0442 Epoch: 6 Global Step: 111920 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:45:00,339-Speed 5181.55 samples/sec Loss 3.2590 LearningRate 0.0442 Epoch: 6 Global Step: 111930 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:45:02,323-Speed 5162.84 samples/sec Loss 3.2524 LearningRate 0.0442 Epoch: 6 Global Step: 111940 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:45:04,323-Speed 5121.54 samples/sec Loss 3.2992 LearningRate 0.0442 Epoch: 6 Global Step: 111950 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:45:06,301-Speed 5177.77 samples/sec Loss 3.2963 LearningRate 0.0442 Epoch: 6 Global Step: 111960 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:45:08,279-Speed 5179.19 samples/sec Loss 3.2037 LearningRate 0.0442 Epoch: 6 Global Step: 111970 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:45:10,281-Speed 5116.89 samples/sec Loss 3.2615 LearningRate 0.0442 Epoch: 6 Global Step: 111980 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:45:12,277-Speed 5130.00 samples/sec Loss 3.2441 LearningRate 0.0442 Epoch: 6 Global Step: 111990 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:45:14,268-Speed 5146.63 samples/sec Loss 3.2753 LearningRate 0.0442 Epoch: 6 Global Step: 112000 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:45:40,859-[lfw][112000]XNorm: 22.707259 Training: 2022-04-11 06:45:40,860-[lfw][112000]Accuracy-Flip: 0.99733+-0.00238 Training: 2022-04-11 06:45:40,860-[lfw][112000]Accuracy-Highest: 0.99817 Training: 2022-04-11 06:46:11,602-[cfp_fp][112000]XNorm: 20.961463 Training: 2022-04-11 06:46:11,603-[cfp_fp][112000]Accuracy-Flip: 0.98129+-0.00480 Training: 2022-04-11 06:46:11,604-[cfp_fp][112000]Accuracy-Highest: 0.98443 Training: 2022-04-11 06:46:38,026-[agedb_30][112000]XNorm: 22.778134 Training: 2022-04-11 06:46:38,027-[agedb_30][112000]Accuracy-Flip: 0.98050+-0.00671 Training: 2022-04-11 06:46:38,027-[agedb_30][112000]Accuracy-Highest: 0.98050 Training: 2022-04-11 06:46:40,019-Speed 119.42 samples/sec Loss 3.2085 LearningRate 0.0442 Epoch: 6 Global Step: 112010 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:46:41,989-Speed 5200.26 samples/sec Loss 3.2309 LearningRate 0.0441 Epoch: 6 Global Step: 112020 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:46:43,954-Speed 5211.93 samples/sec Loss 3.2410 LearningRate 0.0441 Epoch: 6 Global Step: 112030 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:46:45,922-Speed 5206.08 samples/sec Loss 3.2996 LearningRate 0.0441 Epoch: 6 Global Step: 112040 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:46:47,896-Speed 5188.52 samples/sec Loss 3.2907 LearningRate 0.0441 Epoch: 6 Global Step: 112050 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:46:49,887-Speed 5146.03 samples/sec Loss 3.2383 LearningRate 0.0441 Epoch: 6 Global Step: 112060 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:46:51,855-Speed 5205.06 samples/sec Loss 3.2283 LearningRate 0.0441 Epoch: 6 Global Step: 112070 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:46:53,820-Speed 5210.83 samples/sec Loss 3.2864 LearningRate 0.0441 Epoch: 6 Global Step: 112080 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:46:55,789-Speed 5202.17 samples/sec Loss 3.3252 LearningRate 0.0441 Epoch: 6 Global Step: 112090 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:46:57,759-Speed 5200.01 samples/sec Loss 3.2251 LearningRate 0.0441 Epoch: 6 Global Step: 112100 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:46:59,729-Speed 5200.99 samples/sec Loss 3.2230 LearningRate 0.0441 Epoch: 6 Global Step: 112110 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:47:01,706-Speed 5181.27 samples/sec Loss 3.1327 LearningRate 0.0441 Epoch: 6 Global Step: 112120 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:47:03,682-Speed 5184.24 samples/sec Loss 3.1759 LearningRate 0.0441 Epoch: 6 Global Step: 112130 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:47:05,654-Speed 5194.78 samples/sec Loss 3.2470 LearningRate 0.0441 Epoch: 6 Global Step: 112140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:47:07,635-Speed 5170.18 samples/sec Loss 3.3311 LearningRate 0.0441 Epoch: 6 Global Step: 112150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:47:09,623-Speed 5152.72 samples/sec Loss 3.2917 LearningRate 0.0441 Epoch: 6 Global Step: 112160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:47:11,606-Speed 5165.36 samples/sec Loss 3.2747 LearningRate 0.0441 Epoch: 6 Global Step: 112170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:47:13,585-Speed 5175.78 samples/sec Loss 3.1871 LearningRate 0.0441 Epoch: 6 Global Step: 112180 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:47:15,572-Speed 5155.54 samples/sec Loss 3.1569 LearningRate 0.0441 Epoch: 6 Global Step: 112190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:47:17,550-Speed 5177.86 samples/sec Loss 3.1652 LearningRate 0.0441 Epoch: 6 Global Step: 112200 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:47:19,513-Speed 5219.36 samples/sec Loss 3.2772 LearningRate 0.0441 Epoch: 6 Global Step: 112210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:47:21,482-Speed 5204.22 samples/sec Loss 3.1910 LearningRate 0.0441 Epoch: 6 Global Step: 112220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:47:23,452-Speed 5197.89 samples/sec Loss 3.2507 LearningRate 0.0441 Epoch: 6 Global Step: 112230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:47:25,430-Speed 5180.07 samples/sec Loss 3.2232 LearningRate 0.0441 Epoch: 6 Global Step: 112240 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:47:27,408-Speed 5177.40 samples/sec Loss 3.2952 LearningRate 0.0441 Epoch: 6 Global Step: 112250 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:47:29,387-Speed 5175.44 samples/sec Loss 3.2559 LearningRate 0.0441 Epoch: 6 Global Step: 112260 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:47:31,356-Speed 5203.37 samples/sec Loss 3.2778 LearningRate 0.0440 Epoch: 6 Global Step: 112270 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:47:33,340-Speed 5162.56 samples/sec Loss 3.3157 LearningRate 0.0440 Epoch: 6 Global Step: 112280 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:47:35,320-Speed 5173.35 samples/sec Loss 3.2323 LearningRate 0.0440 Epoch: 6 Global Step: 112290 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:47:37,304-Speed 5163.24 samples/sec Loss 3.2817 LearningRate 0.0440 Epoch: 6 Global Step: 112300 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:47:39,285-Speed 5170.19 samples/sec Loss 3.1886 LearningRate 0.0440 Epoch: 6 Global Step: 112310 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:47:41,265-Speed 5174.64 samples/sec Loss 3.2564 LearningRate 0.0440 Epoch: 6 Global Step: 112320 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:47:43,256-Speed 5145.68 samples/sec Loss 3.2345 LearningRate 0.0440 Epoch: 6 Global Step: 112330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:47:45,243-Speed 5155.01 samples/sec Loss 3.2426 LearningRate 0.0440 Epoch: 6 Global Step: 112340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:47:47,234-Speed 5146.45 samples/sec Loss 3.3020 LearningRate 0.0440 Epoch: 6 Global Step: 112350 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:47:49,218-Speed 5163.77 samples/sec Loss 3.1998 LearningRate 0.0440 Epoch: 6 Global Step: 112360 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:47:51,201-Speed 5164.59 samples/sec Loss 3.1748 LearningRate 0.0440 Epoch: 6 Global Step: 112370 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:47:53,176-Speed 5184.89 samples/sec Loss 3.2185 LearningRate 0.0440 Epoch: 6 Global Step: 112380 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:47:55,160-Speed 5165.47 samples/sec Loss 3.2427 LearningRate 0.0440 Epoch: 6 Global Step: 112390 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:47:57,142-Speed 5166.57 samples/sec Loss 3.2118 LearningRate 0.0440 Epoch: 6 Global Step: 112400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:47:59,153-Speed 5094.59 samples/sec Loss 3.2544 LearningRate 0.0440 Epoch: 6 Global Step: 112410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:48:01,153-Speed 5121.22 samples/sec Loss 3.2431 LearningRate 0.0440 Epoch: 6 Global Step: 112420 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:48:03,133-Speed 5174.69 samples/sec Loss 3.2294 LearningRate 0.0440 Epoch: 6 Global Step: 112430 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:48:05,138-Speed 5109.18 samples/sec Loss 3.1912 LearningRate 0.0440 Epoch: 6 Global Step: 112440 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:48:07,114-Speed 5184.01 samples/sec Loss 3.2389 LearningRate 0.0440 Epoch: 6 Global Step: 112450 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:48:09,093-Speed 5175.59 samples/sec Loss 3.2168 LearningRate 0.0440 Epoch: 6 Global Step: 112460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:48:11,077-Speed 5161.66 samples/sec Loss 3.3117 LearningRate 0.0440 Epoch: 6 Global Step: 112470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:48:13,063-Speed 5159.38 samples/sec Loss 3.2908 LearningRate 0.0440 Epoch: 6 Global Step: 112480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:48:15,052-Speed 5147.86 samples/sec Loss 3.2573 LearningRate 0.0440 Epoch: 6 Global Step: 112490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:48:17,050-Speed 5128.52 samples/sec Loss 3.2937 LearningRate 0.0440 Epoch: 6 Global Step: 112500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:48:19,024-Speed 5188.60 samples/sec Loss 3.2394 LearningRate 0.0440 Epoch: 6 Global Step: 112510 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:48:21,000-Speed 5184.26 samples/sec Loss 3.2302 LearningRate 0.0439 Epoch: 6 Global Step: 112520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:48:22,988-Speed 5152.22 samples/sec Loss 3.1768 LearningRate 0.0439 Epoch: 6 Global Step: 112530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:48:24,984-Speed 5132.50 samples/sec Loss 3.2311 LearningRate 0.0439 Epoch: 6 Global Step: 112540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:48:26,977-Speed 5139.72 samples/sec Loss 3.2028 LearningRate 0.0439 Epoch: 6 Global Step: 112550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:48:28,960-Speed 5165.05 samples/sec Loss 3.1744 LearningRate 0.0439 Epoch: 6 Global Step: 112560 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:48:30,933-Speed 5191.51 samples/sec Loss 3.2490 LearningRate 0.0439 Epoch: 6 Global Step: 112570 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:48:32,914-Speed 5172.02 samples/sec Loss 3.2636 LearningRate 0.0439 Epoch: 6 Global Step: 112580 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:48:34,913-Speed 5123.76 samples/sec Loss 3.2106 LearningRate 0.0439 Epoch: 6 Global Step: 112590 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:48:36,897-Speed 5164.15 samples/sec Loss 3.2356 LearningRate 0.0439 Epoch: 6 Global Step: 112600 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:48:38,887-Speed 5146.63 samples/sec Loss 3.2417 LearningRate 0.0439 Epoch: 6 Global Step: 112610 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:48:40,875-Speed 5153.23 samples/sec Loss 3.1779 LearningRate 0.0439 Epoch: 6 Global Step: 112620 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:48:42,847-Speed 5193.76 samples/sec Loss 3.2058 LearningRate 0.0439 Epoch: 6 Global Step: 112630 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:48:44,821-Speed 5189.78 samples/sec Loss 3.2589 LearningRate 0.0439 Epoch: 6 Global Step: 112640 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:48:46,814-Speed 5139.27 samples/sec Loss 3.2962 LearningRate 0.0439 Epoch: 6 Global Step: 112650 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:48:48,813-Speed 5124.32 samples/sec Loss 3.2595 LearningRate 0.0439 Epoch: 6 Global Step: 112660 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:48:50,804-Speed 5144.44 samples/sec Loss 3.2445 LearningRate 0.0439 Epoch: 6 Global Step: 112670 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:48:52,837-Speed 5039.37 samples/sec Loss 3.2443 LearningRate 0.0439 Epoch: 6 Global Step: 112680 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:48:54,819-Speed 5169.96 samples/sec Loss 3.2126 LearningRate 0.0439 Epoch: 6 Global Step: 112690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:48:56,839-Speed 5070.91 samples/sec Loss 3.1830 LearningRate 0.0439 Epoch: 6 Global Step: 112700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:48:58,903-Speed 4961.71 samples/sec Loss 3.2316 LearningRate 0.0439 Epoch: 6 Global Step: 112710 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:49:00,934-Speed 5043.99 samples/sec Loss 3.2291 LearningRate 0.0439 Epoch: 6 Global Step: 112720 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:49:02,929-Speed 5134.47 samples/sec Loss 3.2383 LearningRate 0.0439 Epoch: 6 Global Step: 112730 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:49:04,918-Speed 5150.97 samples/sec Loss 3.3171 LearningRate 0.0439 Epoch: 6 Global Step: 112740 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:49:06,891-Speed 5192.52 samples/sec Loss 3.2554 LearningRate 0.0439 Epoch: 6 Global Step: 112750 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:49:08,875-Speed 5162.40 samples/sec Loss 3.2448 LearningRate 0.0439 Epoch: 6 Global Step: 112760 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:49:10,868-Speed 5138.08 samples/sec Loss 3.3370 LearningRate 0.0438 Epoch: 6 Global Step: 112770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:49:12,843-Speed 5186.95 samples/sec Loss 3.1854 LearningRate 0.0438 Epoch: 6 Global Step: 112780 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:49:14,863-Speed 5071.46 samples/sec Loss 3.2387 LearningRate 0.0438 Epoch: 6 Global Step: 112790 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:49:16,839-Speed 5184.60 samples/sec Loss 3.2114 LearningRate 0.0438 Epoch: 6 Global Step: 112800 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:49:18,814-Speed 5186.65 samples/sec Loss 3.2866 LearningRate 0.0438 Epoch: 6 Global Step: 112810 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:49:20,807-Speed 5138.80 samples/sec Loss 3.3016 LearningRate 0.0438 Epoch: 6 Global Step: 112820 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:49:22,782-Speed 5186.58 samples/sec Loss 3.2450 LearningRate 0.0438 Epoch: 6 Global Step: 112830 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:49:24,763-Speed 5171.08 samples/sec Loss 3.2734 LearningRate 0.0438 Epoch: 6 Global Step: 112840 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:49:26,761-Speed 5126.72 samples/sec Loss 3.1538 LearningRate 0.0438 Epoch: 6 Global Step: 112850 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:49:28,734-Speed 5192.87 samples/sec Loss 3.2218 LearningRate 0.0438 Epoch: 6 Global Step: 112860 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:49:30,716-Speed 5168.02 samples/sec Loss 3.2289 LearningRate 0.0438 Epoch: 6 Global Step: 112870 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:49:32,689-Speed 5192.13 samples/sec Loss 3.2243 LearningRate 0.0438 Epoch: 6 Global Step: 112880 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:49:34,665-Speed 5181.74 samples/sec Loss 3.1761 LearningRate 0.0438 Epoch: 6 Global Step: 112890 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:49:36,654-Speed 5149.89 samples/sec Loss 3.2350 LearningRate 0.0438 Epoch: 6 Global Step: 112900 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:49:38,633-Speed 5177.37 samples/sec Loss 3.2605 LearningRate 0.0438 Epoch: 6 Global Step: 112910 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:49:40,627-Speed 5137.20 samples/sec Loss 3.2309 LearningRate 0.0438 Epoch: 6 Global Step: 112920 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:49:42,603-Speed 5184.05 samples/sec Loss 3.1141 LearningRate 0.0438 Epoch: 6 Global Step: 112930 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:49:44,587-Speed 5162.64 samples/sec Loss 3.3836 LearningRate 0.0438 Epoch: 6 Global Step: 112940 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:49:46,570-Speed 5165.13 samples/sec Loss 3.1846 LearningRate 0.0438 Epoch: 6 Global Step: 112950 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:49:48,550-Speed 5176.40 samples/sec Loss 3.2027 LearningRate 0.0438 Epoch: 6 Global Step: 112960 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:49:50,518-Speed 5204.52 samples/sec Loss 3.2596 LearningRate 0.0438 Epoch: 6 Global Step: 112970 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:49:52,510-Speed 5141.23 samples/sec Loss 3.1927 LearningRate 0.0438 Epoch: 6 Global Step: 112980 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:49:54,492-Speed 5168.71 samples/sec Loss 3.1888 LearningRate 0.0438 Epoch: 6 Global Step: 112990 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:49:56,475-Speed 5164.86 samples/sec Loss 3.2075 LearningRate 0.0438 Epoch: 6 Global Step: 113000 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:49:58,485-Speed 5096.61 samples/sec Loss 3.2886 LearningRate 0.0438 Epoch: 6 Global Step: 113010 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:50:00,485-Speed 5122.78 samples/sec Loss 3.2520 LearningRate 0.0437 Epoch: 6 Global Step: 113020 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:50:02,479-Speed 5138.47 samples/sec Loss 3.2315 LearningRate 0.0437 Epoch: 6 Global Step: 113030 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:50:04,465-Speed 5155.79 samples/sec Loss 3.2730 LearningRate 0.0437 Epoch: 6 Global Step: 113040 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:50:06,441-Speed 5183.68 samples/sec Loss 3.2632 LearningRate 0.0437 Epoch: 6 Global Step: 113050 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:50:08,408-Speed 5208.26 samples/sec Loss 3.2781 LearningRate 0.0437 Epoch: 6 Global Step: 113060 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:50:10,391-Speed 5165.75 samples/sec Loss 3.2958 LearningRate 0.0437 Epoch: 6 Global Step: 113070 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:50:12,365-Speed 5190.00 samples/sec Loss 3.2812 LearningRate 0.0437 Epoch: 6 Global Step: 113080 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:50:14,358-Speed 5139.53 samples/sec Loss 3.1859 LearningRate 0.0437 Epoch: 6 Global Step: 113090 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:50:16,350-Speed 5142.72 samples/sec Loss 3.2122 LearningRate 0.0437 Epoch: 6 Global Step: 113100 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:50:18,331-Speed 5169.77 samples/sec Loss 3.2641 LearningRate 0.0437 Epoch: 6 Global Step: 113110 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:50:20,334-Speed 5113.96 samples/sec Loss 3.2936 LearningRate 0.0437 Epoch: 6 Global Step: 113120 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:50:22,336-Speed 5117.47 samples/sec Loss 3.2448 LearningRate 0.0437 Epoch: 6 Global Step: 113130 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:50:24,319-Speed 5165.21 samples/sec Loss 3.2705 LearningRate 0.0437 Epoch: 6 Global Step: 113140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:50:26,304-Speed 5159.84 samples/sec Loss 3.3208 LearningRate 0.0437 Epoch: 6 Global Step: 113150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:50:28,281-Speed 5183.36 samples/sec Loss 3.1950 LearningRate 0.0437 Epoch: 6 Global Step: 113160 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:50:30,257-Speed 5181.79 samples/sec Loss 3.1781 LearningRate 0.0437 Epoch: 6 Global Step: 113170 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:50:32,218-Speed 5224.42 samples/sec Loss 3.2448 LearningRate 0.0437 Epoch: 6 Global Step: 113180 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:50:34,196-Speed 5177.02 samples/sec Loss 3.2442 LearningRate 0.0437 Epoch: 6 Global Step: 113190 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:50:36,208-Speed 5091.39 samples/sec Loss 3.1970 LearningRate 0.0437 Epoch: 6 Global Step: 113200 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:50:38,187-Speed 5176.92 samples/sec Loss 3.2534 LearningRate 0.0437 Epoch: 6 Global Step: 113210 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:50:40,176-Speed 5151.18 samples/sec Loss 3.3391 LearningRate 0.0437 Epoch: 6 Global Step: 113220 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:50:42,179-Speed 5111.83 samples/sec Loss 3.2235 LearningRate 0.0437 Epoch: 6 Global Step: 113230 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:50:44,156-Speed 5183.38 samples/sec Loss 3.2811 LearningRate 0.0437 Epoch: 6 Global Step: 113240 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:50:46,138-Speed 5168.85 samples/sec Loss 3.3050 LearningRate 0.0437 Epoch: 6 Global Step: 113250 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:50:48,138-Speed 5120.65 samples/sec Loss 3.2864 LearningRate 0.0437 Epoch: 6 Global Step: 113260 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:50:50,114-Speed 5184.21 samples/sec Loss 3.2587 LearningRate 0.0437 Epoch: 6 Global Step: 113270 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:50:52,106-Speed 5143.03 samples/sec Loss 3.2939 LearningRate 0.0436 Epoch: 6 Global Step: 113280 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:50:54,088-Speed 5166.07 samples/sec Loss 3.2258 LearningRate 0.0436 Epoch: 6 Global Step: 113290 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:50:56,064-Speed 5184.28 samples/sec Loss 3.2659 LearningRate 0.0436 Epoch: 6 Global Step: 113300 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:50:58,046-Speed 5169.62 samples/sec Loss 3.2136 LearningRate 0.0436 Epoch: 6 Global Step: 113310 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:51:00,030-Speed 5161.27 samples/sec Loss 3.1635 LearningRate 0.0436 Epoch: 6 Global Step: 113320 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:51:02,008-Speed 5180.27 samples/sec Loss 3.3162 LearningRate 0.0436 Epoch: 6 Global Step: 113330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:51:03,990-Speed 5168.83 samples/sec Loss 3.1939 LearningRate 0.0436 Epoch: 6 Global Step: 113340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:51:05,984-Speed 5137.20 samples/sec Loss 3.2126 LearningRate 0.0436 Epoch: 6 Global Step: 113350 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:51:07,971-Speed 5154.50 samples/sec Loss 3.1587 LearningRate 0.0436 Epoch: 6 Global Step: 113360 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:51:09,948-Speed 5180.17 samples/sec Loss 3.2311 LearningRate 0.0436 Epoch: 6 Global Step: 113370 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:51:11,944-Speed 5134.16 samples/sec Loss 3.2808 LearningRate 0.0436 Epoch: 6 Global Step: 113380 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:51:13,916-Speed 5193.30 samples/sec Loss 3.1809 LearningRate 0.0436 Epoch: 6 Global Step: 113390 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:51:15,901-Speed 5159.93 samples/sec Loss 3.2162 LearningRate 0.0436 Epoch: 6 Global Step: 113400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:51:17,874-Speed 5192.40 samples/sec Loss 3.1618 LearningRate 0.0436 Epoch: 6 Global Step: 113410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:51:19,848-Speed 5189.46 samples/sec Loss 3.2306 LearningRate 0.0436 Epoch: 6 Global Step: 113420 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:51:21,826-Speed 5178.21 samples/sec Loss 3.2289 LearningRate 0.0436 Epoch: 6 Global Step: 113430 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:51:23,892-Speed 4957.26 samples/sec Loss 3.2989 LearningRate 0.0436 Epoch: 6 Global Step: 113440 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:51:25,880-Speed 5153.78 samples/sec Loss 3.2902 LearningRate 0.0436 Epoch: 6 Global Step: 113450 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:51:27,858-Speed 5178.39 samples/sec Loss 3.1519 LearningRate 0.0436 Epoch: 6 Global Step: 113460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:51:29,832-Speed 5190.30 samples/sec Loss 3.1930 LearningRate 0.0436 Epoch: 6 Global Step: 113470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:51:31,805-Speed 5191.12 samples/sec Loss 3.2529 LearningRate 0.0436 Epoch: 6 Global Step: 113480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:51:33,778-Speed 5192.37 samples/sec Loss 3.1497 LearningRate 0.0436 Epoch: 6 Global Step: 113490 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:51:35,759-Speed 5170.53 samples/sec Loss 3.2288 LearningRate 0.0436 Epoch: 6 Global Step: 113500 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:51:37,740-Speed 5170.60 samples/sec Loss 3.1806 LearningRate 0.0436 Epoch: 6 Global Step: 113510 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:51:39,748-Speed 5102.36 samples/sec Loss 3.2081 LearningRate 0.0436 Epoch: 6 Global Step: 113520 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:51:41,742-Speed 5136.25 samples/sec Loss 3.1981 LearningRate 0.0435 Epoch: 6 Global Step: 113530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:51:43,737-Speed 5132.67 samples/sec Loss 3.1489 LearningRate 0.0435 Epoch: 6 Global Step: 113540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:51:45,726-Speed 5151.83 samples/sec Loss 3.2829 LearningRate 0.0435 Epoch: 6 Global Step: 113550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:51:47,710-Speed 5164.61 samples/sec Loss 3.2144 LearningRate 0.0435 Epoch: 6 Global Step: 113560 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:51:49,687-Speed 5181.45 samples/sec Loss 3.1824 LearningRate 0.0435 Epoch: 6 Global Step: 113570 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:51:51,663-Speed 5183.60 samples/sec Loss 3.2562 LearningRate 0.0435 Epoch: 6 Global Step: 113580 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:51:53,638-Speed 5184.85 samples/sec Loss 3.1832 LearningRate 0.0435 Epoch: 6 Global Step: 113590 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:51:55,614-Speed 5183.50 samples/sec Loss 3.2133 LearningRate 0.0435 Epoch: 6 Global Step: 113600 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:51:57,608-Speed 5139.06 samples/sec Loss 3.2850 LearningRate 0.0435 Epoch: 6 Global Step: 113610 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:51:59,585-Speed 5180.83 samples/sec Loss 3.2385 LearningRate 0.0435 Epoch: 6 Global Step: 113620 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:52:01,603-Speed 5074.63 samples/sec Loss 3.2297 LearningRate 0.0435 Epoch: 6 Global Step: 113630 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:52:03,616-Speed 5087.89 samples/sec Loss 3.2474 LearningRate 0.0435 Epoch: 6 Global Step: 113640 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:52:05,593-Speed 5183.05 samples/sec Loss 3.2338 LearningRate 0.0435 Epoch: 6 Global Step: 113650 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:52:07,588-Speed 5135.44 samples/sec Loss 3.1975 LearningRate 0.0435 Epoch: 6 Global Step: 113660 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:52:09,571-Speed 5164.73 samples/sec Loss 3.1933 LearningRate 0.0435 Epoch: 6 Global Step: 113670 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:52:11,565-Speed 5137.93 samples/sec Loss 3.2064 LearningRate 0.0435 Epoch: 6 Global Step: 113680 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:52:13,541-Speed 5184.59 samples/sec Loss 3.2058 LearningRate 0.0435 Epoch: 6 Global Step: 113690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:52:15,536-Speed 5133.37 samples/sec Loss 3.1307 LearningRate 0.0435 Epoch: 6 Global Step: 113700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:52:17,510-Speed 5189.44 samples/sec Loss 3.2429 LearningRate 0.0435 Epoch: 6 Global Step: 113710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:52:19,486-Speed 5182.45 samples/sec Loss 3.2608 LearningRate 0.0435 Epoch: 6 Global Step: 113720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:52:21,469-Speed 5165.80 samples/sec Loss 3.1766 LearningRate 0.0435 Epoch: 6 Global Step: 113730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:52:23,466-Speed 5129.10 samples/sec Loss 3.1969 LearningRate 0.0435 Epoch: 6 Global Step: 113740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:52:25,464-Speed 5128.24 samples/sec Loss 3.2486 LearningRate 0.0435 Epoch: 6 Global Step: 113750 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:52:27,449-Speed 5159.43 samples/sec Loss 3.1836 LearningRate 0.0435 Epoch: 6 Global Step: 113760 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:52:29,426-Speed 5183.36 samples/sec Loss 3.2583 LearningRate 0.0435 Epoch: 6 Global Step: 113770 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:52:31,404-Speed 5178.79 samples/sec Loss 3.2338 LearningRate 0.0434 Epoch: 6 Global Step: 113780 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:52:33,387-Speed 5165.33 samples/sec Loss 3.2008 LearningRate 0.0434 Epoch: 6 Global Step: 113790 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:52:35,366-Speed 5176.00 samples/sec Loss 3.2433 LearningRate 0.0434 Epoch: 6 Global Step: 113800 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:52:37,355-Speed 5149.77 samples/sec Loss 3.2711 LearningRate 0.0434 Epoch: 6 Global Step: 113810 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:52:39,352-Speed 5130.14 samples/sec Loss 3.2637 LearningRate 0.0434 Epoch: 6 Global Step: 113820 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:52:41,336-Speed 5162.24 samples/sec Loss 3.2923 LearningRate 0.0434 Epoch: 6 Global Step: 113830 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:52:43,325-Speed 5150.09 samples/sec Loss 3.2102 LearningRate 0.0434 Epoch: 6 Global Step: 113840 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:52:45,294-Speed 5200.09 samples/sec Loss 3.2963 LearningRate 0.0434 Epoch: 6 Global Step: 113850 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:52:47,276-Speed 5170.46 samples/sec Loss 3.2799 LearningRate 0.0434 Epoch: 6 Global Step: 113860 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:52:49,261-Speed 5159.78 samples/sec Loss 3.1692 LearningRate 0.0434 Epoch: 6 Global Step: 113870 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:52:51,240-Speed 5177.41 samples/sec Loss 3.2786 LearningRate 0.0434 Epoch: 6 Global Step: 113880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:52:53,221-Speed 5170.35 samples/sec Loss 3.2791 LearningRate 0.0434 Epoch: 6 Global Step: 113890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:52:55,198-Speed 5181.72 samples/sec Loss 3.2198 LearningRate 0.0434 Epoch: 6 Global Step: 113900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:52:57,173-Speed 5186.77 samples/sec Loss 3.2846 LearningRate 0.0434 Epoch: 6 Global Step: 113910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:52:59,149-Speed 5181.64 samples/sec Loss 3.2226 LearningRate 0.0434 Epoch: 6 Global Step: 113920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:53:01,140-Speed 5144.64 samples/sec Loss 3.2389 LearningRate 0.0434 Epoch: 6 Global Step: 113930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:53:03,121-Speed 5171.05 samples/sec Loss 3.2640 LearningRate 0.0434 Epoch: 6 Global Step: 113940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:53:05,127-Speed 5107.42 samples/sec Loss 3.1184 LearningRate 0.0434 Epoch: 6 Global Step: 113950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:53:07,110-Speed 5164.36 samples/sec Loss 3.1755 LearningRate 0.0434 Epoch: 6 Global Step: 113960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:53:09,103-Speed 5140.06 samples/sec Loss 3.1419 LearningRate 0.0434 Epoch: 6 Global Step: 113970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:53:11,103-Speed 5123.43 samples/sec Loss 3.1539 LearningRate 0.0434 Epoch: 6 Global Step: 113980 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:53:13,092-Speed 5149.66 samples/sec Loss 3.1947 LearningRate 0.0434 Epoch: 6 Global Step: 113990 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:53:15,086-Speed 5135.47 samples/sec Loss 3.2204 LearningRate 0.0434 Epoch: 6 Global Step: 114000 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:53:41,697-[lfw][114000]XNorm: 21.908469 Training: 2022-04-11 06:53:41,698-[lfw][114000]Accuracy-Flip: 0.99767+-0.00260 Training: 2022-04-11 06:53:41,698-[lfw][114000]Accuracy-Highest: 0.99817 Training: 2022-04-11 06:54:12,608-[cfp_fp][114000]XNorm: 20.189753 Training: 2022-04-11 06:54:12,609-[cfp_fp][114000]Accuracy-Flip: 0.98300+-0.00401 Training: 2022-04-11 06:54:12,609-[cfp_fp][114000]Accuracy-Highest: 0.98443 Training: 2022-04-11 06:54:39,299-[agedb_30][114000]XNorm: 22.531565 Training: 2022-04-11 06:54:39,300-[agedb_30][114000]Accuracy-Flip: 0.98050+-0.00650 Training: 2022-04-11 06:54:39,300-[agedb_30][114000]Accuracy-Highest: 0.98050 Training: 2022-04-11 06:54:41,295-Speed 118.78 samples/sec Loss 3.2234 LearningRate 0.0434 Epoch: 6 Global Step: 114010 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:54:43,253-Speed 5231.27 samples/sec Loss 3.2806 LearningRate 0.0434 Epoch: 6 Global Step: 114020 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:54:45,223-Speed 5199.92 samples/sec Loss 3.2622 LearningRate 0.0434 Epoch: 6 Global Step: 114030 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:54:47,210-Speed 5155.06 samples/sec Loss 3.2204 LearningRate 0.0433 Epoch: 6 Global Step: 114040 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:54:49,186-Speed 5183.23 samples/sec Loss 3.2804 LearningRate 0.0433 Epoch: 6 Global Step: 114050 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:54:51,154-Speed 5204.99 samples/sec Loss 3.2536 LearningRate 0.0433 Epoch: 6 Global Step: 114060 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:54:53,115-Speed 5224.86 samples/sec Loss 3.2172 LearningRate 0.0433 Epoch: 6 Global Step: 114070 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:54:55,081-Speed 5209.63 samples/sec Loss 3.2254 LearningRate 0.0433 Epoch: 6 Global Step: 114080 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:54:57,050-Speed 5202.62 samples/sec Loss 3.3762 LearningRate 0.0433 Epoch: 6 Global Step: 114090 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:54:59,022-Speed 5194.85 samples/sec Loss 3.2901 LearningRate 0.0433 Epoch: 6 Global Step: 114100 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:55:00,990-Speed 5205.92 samples/sec Loss 3.1880 LearningRate 0.0433 Epoch: 6 Global Step: 114110 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:55:02,971-Speed 5171.13 samples/sec Loss 3.1819 LearningRate 0.0433 Epoch: 6 Global Step: 114120 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:55:04,969-Speed 5125.47 samples/sec Loss 3.1440 LearningRate 0.0433 Epoch: 6 Global Step: 114130 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:55:06,937-Speed 5205.52 samples/sec Loss 3.1770 LearningRate 0.0433 Epoch: 6 Global Step: 114140 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:55:08,921-Speed 5161.79 samples/sec Loss 3.1769 LearningRate 0.0433 Epoch: 6 Global Step: 114150 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:55:10,927-Speed 5108.22 samples/sec Loss 3.2150 LearningRate 0.0433 Epoch: 6 Global Step: 114160 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:55:12,897-Speed 5198.87 samples/sec Loss 3.1999 LearningRate 0.0433 Epoch: 6 Global Step: 114170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:55:14,885-Speed 5151.45 samples/sec Loss 3.1758 LearningRate 0.0433 Epoch: 6 Global Step: 114180 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:55:16,893-Speed 5101.35 samples/sec Loss 3.1848 LearningRate 0.0433 Epoch: 6 Global Step: 114190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:55:18,891-Speed 5127.10 samples/sec Loss 3.2148 LearningRate 0.0433 Epoch: 6 Global Step: 114200 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:55:20,860-Speed 5201.52 samples/sec Loss 3.2129 LearningRate 0.0433 Epoch: 6 Global Step: 114210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:55:22,848-Speed 5152.85 samples/sec Loss 3.2118 LearningRate 0.0433 Epoch: 6 Global Step: 114220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:55:24,835-Speed 5155.32 samples/sec Loss 3.1790 LearningRate 0.0433 Epoch: 6 Global Step: 114230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:55:26,816-Speed 5171.11 samples/sec Loss 3.1943 LearningRate 0.0433 Epoch: 6 Global Step: 114240 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:55:28,794-Speed 5179.07 samples/sec Loss 3.2520 LearningRate 0.0433 Epoch: 6 Global Step: 114250 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:55:30,773-Speed 5175.82 samples/sec Loss 3.2265 LearningRate 0.0433 Epoch: 6 Global Step: 114260 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:55:32,746-Speed 5193.36 samples/sec Loss 3.2446 LearningRate 0.0433 Epoch: 6 Global Step: 114270 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:55:34,764-Speed 5074.45 samples/sec Loss 3.1555 LearningRate 0.0433 Epoch: 6 Global Step: 114280 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:55:36,773-Speed 5099.70 samples/sec Loss 3.1666 LearningRate 0.0432 Epoch: 6 Global Step: 114290 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:55:38,759-Speed 5156.32 samples/sec Loss 3.1966 LearningRate 0.0432 Epoch: 6 Global Step: 114300 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:55:40,735-Speed 5183.42 samples/sec Loss 3.2999 LearningRate 0.0432 Epoch: 6 Global Step: 114310 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:55:42,719-Speed 5163.60 samples/sec Loss 3.1473 LearningRate 0.0432 Epoch: 6 Global Step: 114320 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:55:44,690-Speed 5197.23 samples/sec Loss 3.1904 LearningRate 0.0432 Epoch: 6 Global Step: 114330 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:55:46,681-Speed 5144.49 samples/sec Loss 3.2011 LearningRate 0.0432 Epoch: 6 Global Step: 114340 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:55:48,661-Speed 5175.36 samples/sec Loss 3.1748 LearningRate 0.0432 Epoch: 6 Global Step: 114350 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:55:50,631-Speed 5200.29 samples/sec Loss 3.1884 LearningRate 0.0432 Epoch: 6 Global Step: 114360 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:55:52,601-Speed 5199.35 samples/sec Loss 3.2481 LearningRate 0.0432 Epoch: 6 Global Step: 114370 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:55:54,613-Speed 5090.29 samples/sec Loss 3.2307 LearningRate 0.0432 Epoch: 6 Global Step: 114380 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:55:56,609-Speed 5131.01 samples/sec Loss 3.2449 LearningRate 0.0432 Epoch: 6 Global Step: 114390 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:55:58,607-Speed 5128.35 samples/sec Loss 3.1816 LearningRate 0.0432 Epoch: 6 Global Step: 114400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:56:00,584-Speed 5180.94 samples/sec Loss 3.1914 LearningRate 0.0432 Epoch: 6 Global Step: 114410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:56:02,583-Speed 5124.16 samples/sec Loss 3.1646 LearningRate 0.0432 Epoch: 6 Global Step: 114420 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:56:04,559-Speed 5182.82 samples/sec Loss 3.2569 LearningRate 0.0432 Epoch: 6 Global Step: 114430 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:56:06,533-Speed 5190.54 samples/sec Loss 3.2231 LearningRate 0.0432 Epoch: 6 Global Step: 114440 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:56:08,505-Speed 5193.46 samples/sec Loss 3.2392 LearningRate 0.0432 Epoch: 6 Global Step: 114450 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:56:10,488-Speed 5166.21 samples/sec Loss 3.1149 LearningRate 0.0432 Epoch: 6 Global Step: 114460 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:56:12,462-Speed 5189.06 samples/sec Loss 3.2340 LearningRate 0.0432 Epoch: 6 Global Step: 114470 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:56:14,434-Speed 5194.33 samples/sec Loss 3.1401 LearningRate 0.0432 Epoch: 6 Global Step: 114480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:56:16,407-Speed 5191.81 samples/sec Loss 3.2122 LearningRate 0.0432 Epoch: 6 Global Step: 114490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:56:18,396-Speed 5149.51 samples/sec Loss 3.3122 LearningRate 0.0432 Epoch: 6 Global Step: 114500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:56:20,373-Speed 5180.79 samples/sec Loss 3.2585 LearningRate 0.0432 Epoch: 6 Global Step: 114510 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:56:22,351-Speed 5179.12 samples/sec Loss 3.1435 LearningRate 0.0432 Epoch: 6 Global Step: 114520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:56:24,343-Speed 5142.72 samples/sec Loss 3.1952 LearningRate 0.0432 Epoch: 6 Global Step: 114530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:56:26,323-Speed 5174.04 samples/sec Loss 3.1465 LearningRate 0.0431 Epoch: 6 Global Step: 114540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:56:28,309-Speed 5159.09 samples/sec Loss 3.1390 LearningRate 0.0431 Epoch: 6 Global Step: 114550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:56:30,278-Speed 5201.84 samples/sec Loss 3.1215 LearningRate 0.0431 Epoch: 6 Global Step: 114560 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:56:32,253-Speed 5186.14 samples/sec Loss 3.1460 LearningRate 0.0431 Epoch: 6 Global Step: 114570 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:56:34,232-Speed 5176.37 samples/sec Loss 3.1840 LearningRate 0.0431 Epoch: 6 Global Step: 114580 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:56:36,239-Speed 5102.86 samples/sec Loss 3.1629 LearningRate 0.0431 Epoch: 6 Global Step: 114590 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:56:38,227-Speed 5151.29 samples/sec Loss 3.1561 LearningRate 0.0431 Epoch: 6 Global Step: 114600 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:56:40,202-Speed 5187.47 samples/sec Loss 3.1980 LearningRate 0.0431 Epoch: 6 Global Step: 114610 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:56:42,179-Speed 5181.14 samples/sec Loss 3.1662 LearningRate 0.0431 Epoch: 6 Global Step: 114620 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:56:44,164-Speed 5160.58 samples/sec Loss 3.2818 LearningRate 0.0431 Epoch: 6 Global Step: 114630 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:56:46,169-Speed 5110.61 samples/sec Loss 3.2404 LearningRate 0.0431 Epoch: 6 Global Step: 114640 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:56:48,170-Speed 5118.34 samples/sec Loss 3.2808 LearningRate 0.0431 Epoch: 6 Global Step: 114650 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:56:50,153-Speed 5164.90 samples/sec Loss 3.2257 LearningRate 0.0431 Epoch: 6 Global Step: 114660 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:56:52,122-Speed 5202.48 samples/sec Loss 3.2244 LearningRate 0.0431 Epoch: 6 Global Step: 114670 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:56:54,096-Speed 5189.36 samples/sec Loss 3.1978 LearningRate 0.0431 Epoch: 6 Global Step: 114680 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:56:56,065-Speed 5202.87 samples/sec Loss 3.1140 LearningRate 0.0431 Epoch: 6 Global Step: 114690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:56:58,038-Speed 5190.69 samples/sec Loss 3.2646 LearningRate 0.0431 Epoch: 6 Global Step: 114700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:57:00,020-Speed 5170.35 samples/sec Loss 3.2185 LearningRate 0.0431 Epoch: 6 Global Step: 114710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:57:01,992-Speed 5194.52 samples/sec Loss 3.2209 LearningRate 0.0431 Epoch: 6 Global Step: 114720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:57:03,969-Speed 5181.19 samples/sec Loss 3.1908 LearningRate 0.0431 Epoch: 6 Global Step: 114730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:57:05,939-Speed 5199.74 samples/sec Loss 3.1960 LearningRate 0.0431 Epoch: 6 Global Step: 114740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:57:07,916-Speed 5181.38 samples/sec Loss 3.3127 LearningRate 0.0431 Epoch: 6 Global Step: 114750 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:57:09,886-Speed 5200.02 samples/sec Loss 3.1905 LearningRate 0.0431 Epoch: 6 Global Step: 114760 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:57:11,853-Speed 5205.65 samples/sec Loss 3.1867 LearningRate 0.0431 Epoch: 6 Global Step: 114770 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:57:13,827-Speed 5189.27 samples/sec Loss 3.1887 LearningRate 0.0431 Epoch: 6 Global Step: 114780 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:57:15,813-Speed 5159.11 samples/sec Loss 3.1910 LearningRate 0.0431 Epoch: 6 Global Step: 114790 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:57:17,783-Speed 5199.07 samples/sec Loss 3.2246 LearningRate 0.0430 Epoch: 6 Global Step: 114800 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:57:19,760-Speed 5181.39 samples/sec Loss 3.2440 LearningRate 0.0430 Epoch: 6 Global Step: 114810 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:57:21,731-Speed 5198.93 samples/sec Loss 3.2712 LearningRate 0.0430 Epoch: 6 Global Step: 114820 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:57:23,745-Speed 5084.86 samples/sec Loss 3.1640 LearningRate 0.0430 Epoch: 6 Global Step: 114830 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:57:25,735-Speed 5147.34 samples/sec Loss 3.1861 LearningRate 0.0430 Epoch: 6 Global Step: 114840 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:57:27,715-Speed 5173.78 samples/sec Loss 3.2290 LearningRate 0.0430 Epoch: 6 Global Step: 114850 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:57:29,690-Speed 5185.27 samples/sec Loss 3.2108 LearningRate 0.0430 Epoch: 6 Global Step: 114860 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:57:31,667-Speed 5181.39 samples/sec Loss 3.1125 LearningRate 0.0430 Epoch: 6 Global Step: 114870 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:57:33,647-Speed 5175.35 samples/sec Loss 3.2760 LearningRate 0.0430 Epoch: 6 Global Step: 114880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:57:35,622-Speed 5185.07 samples/sec Loss 3.2551 LearningRate 0.0430 Epoch: 6 Global Step: 114890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:57:37,602-Speed 5174.03 samples/sec Loss 3.1989 LearningRate 0.0430 Epoch: 6 Global Step: 114900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:57:39,604-Speed 5115.29 samples/sec Loss 3.1980 LearningRate 0.0430 Epoch: 6 Global Step: 114910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:57:41,581-Speed 5183.88 samples/sec Loss 3.2708 LearningRate 0.0430 Epoch: 6 Global Step: 114920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:57:43,556-Speed 5185.86 samples/sec Loss 3.1571 LearningRate 0.0430 Epoch: 6 Global Step: 114930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:57:45,533-Speed 5181.30 samples/sec Loss 3.2144 LearningRate 0.0430 Epoch: 6 Global Step: 114940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:57:47,512-Speed 5174.36 samples/sec Loss 3.1692 LearningRate 0.0430 Epoch: 6 Global Step: 114950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:57:49,499-Speed 5155.99 samples/sec Loss 3.1914 LearningRate 0.0430 Epoch: 6 Global Step: 114960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:57:51,484-Speed 5159.96 samples/sec Loss 3.2644 LearningRate 0.0430 Epoch: 6 Global Step: 114970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:57:53,456-Speed 5195.18 samples/sec Loss 3.2373 LearningRate 0.0430 Epoch: 6 Global Step: 114980 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:57:55,428-Speed 5194.45 samples/sec Loss 3.1484 LearningRate 0.0430 Epoch: 6 Global Step: 114990 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:57:57,420-Speed 5143.68 samples/sec Loss 3.2654 LearningRate 0.0430 Epoch: 6 Global Step: 115000 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:57:59,410-Speed 5145.62 samples/sec Loss 3.1977 LearningRate 0.0430 Epoch: 6 Global Step: 115010 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:58:01,398-Speed 5153.66 samples/sec Loss 3.2040 LearningRate 0.0430 Epoch: 6 Global Step: 115020 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:58:03,391-Speed 5139.18 samples/sec Loss 3.2325 LearningRate 0.0430 Epoch: 6 Global Step: 115030 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:58:05,399-Speed 5101.13 samples/sec Loss 3.2410 LearningRate 0.0430 Epoch: 6 Global Step: 115040 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:58:07,369-Speed 5199.20 samples/sec Loss 3.1081 LearningRate 0.0429 Epoch: 6 Global Step: 115050 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:58:09,342-Speed 5192.37 samples/sec Loss 3.2224 LearningRate 0.0429 Epoch: 6 Global Step: 115060 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:58:11,323-Speed 5170.31 samples/sec Loss 3.1714 LearningRate 0.0429 Epoch: 6 Global Step: 115070 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:58:13,278-Speed 5241.62 samples/sec Loss 3.2298 LearningRate 0.0429 Epoch: 6 Global Step: 115080 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:58:15,266-Speed 5152.45 samples/sec Loss 3.1306 LearningRate 0.0429 Epoch: 6 Global Step: 115090 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:58:17,255-Speed 5147.97 samples/sec Loss 3.2234 LearningRate 0.0429 Epoch: 6 Global Step: 115100 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:58:19,238-Speed 5165.66 samples/sec Loss 3.2140 LearningRate 0.0429 Epoch: 6 Global Step: 115110 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:58:21,230-Speed 5142.42 samples/sec Loss 3.2585 LearningRate 0.0429 Epoch: 6 Global Step: 115120 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:58:23,206-Speed 5187.27 samples/sec Loss 3.2770 LearningRate 0.0429 Epoch: 6 Global Step: 115130 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:58:25,201-Speed 5133.85 samples/sec Loss 3.1957 LearningRate 0.0429 Epoch: 6 Global Step: 115140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:58:27,178-Speed 5180.90 samples/sec Loss 3.2300 LearningRate 0.0429 Epoch: 6 Global Step: 115150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:58:29,151-Speed 5192.42 samples/sec Loss 3.1865 LearningRate 0.0429 Epoch: 6 Global Step: 115160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:58:31,119-Speed 5204.70 samples/sec Loss 3.2111 LearningRate 0.0429 Epoch: 6 Global Step: 115170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:58:33,097-Speed 5178.82 samples/sec Loss 3.2011 LearningRate 0.0429 Epoch: 6 Global Step: 115180 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:58:35,076-Speed 5175.50 samples/sec Loss 3.2476 LearningRate 0.0429 Epoch: 6 Global Step: 115190 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:58:37,056-Speed 5173.21 samples/sec Loss 3.1938 LearningRate 0.0429 Epoch: 6 Global Step: 115200 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:58:39,041-Speed 5162.50 samples/sec Loss 3.2714 LearningRate 0.0429 Epoch: 6 Global Step: 115210 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:58:41,044-Speed 5112.97 samples/sec Loss 3.1667 LearningRate 0.0429 Epoch: 6 Global Step: 115220 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:58:43,019-Speed 5187.85 samples/sec Loss 3.2421 LearningRate 0.0429 Epoch: 6 Global Step: 115230 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:58:45,016-Speed 5128.29 samples/sec Loss 3.2219 LearningRate 0.0429 Epoch: 6 Global Step: 115240 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:58:47,048-Speed 5040.50 samples/sec Loss 3.2149 LearningRate 0.0429 Epoch: 6 Global Step: 115250 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:58:49,068-Speed 5071.89 samples/sec Loss 3.1991 LearningRate 0.0429 Epoch: 6 Global Step: 115260 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:58:51,060-Speed 5140.92 samples/sec Loss 3.1457 LearningRate 0.0429 Epoch: 6 Global Step: 115270 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:58:53,044-Speed 5163.63 samples/sec Loss 3.2884 LearningRate 0.0429 Epoch: 6 Global Step: 115280 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:58:55,024-Speed 5174.25 samples/sec Loss 3.2171 LearningRate 0.0429 Epoch: 6 Global Step: 115290 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:58:56,998-Speed 5187.93 samples/sec Loss 3.2010 LearningRate 0.0429 Epoch: 6 Global Step: 115300 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:58:58,986-Speed 5153.62 samples/sec Loss 3.1739 LearningRate 0.0428 Epoch: 6 Global Step: 115310 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:59:00,968-Speed 5168.64 samples/sec Loss 3.1422 LearningRate 0.0428 Epoch: 6 Global Step: 115320 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:59:02,980-Speed 5092.70 samples/sec Loss 3.2419 LearningRate 0.0428 Epoch: 6 Global Step: 115330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:59:04,988-Speed 5100.74 samples/sec Loss 3.1251 LearningRate 0.0428 Epoch: 6 Global Step: 115340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:59:06,969-Speed 5171.50 samples/sec Loss 3.1675 LearningRate 0.0428 Epoch: 6 Global Step: 115350 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:59:08,958-Speed 5149.49 samples/sec Loss 3.1108 LearningRate 0.0428 Epoch: 6 Global Step: 115360 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:59:10,953-Speed 5133.59 samples/sec Loss 3.1106 LearningRate 0.0428 Epoch: 6 Global Step: 115370 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:59:12,956-Speed 5114.75 samples/sec Loss 3.1623 LearningRate 0.0428 Epoch: 6 Global Step: 115380 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:59:14,936-Speed 5171.51 samples/sec Loss 3.1781 LearningRate 0.0428 Epoch: 6 Global Step: 115390 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:59:16,920-Speed 5164.34 samples/sec Loss 3.1411 LearningRate 0.0428 Epoch: 6 Global Step: 115400 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:59:18,897-Speed 5179.69 samples/sec Loss 3.2158 LearningRate 0.0428 Epoch: 6 Global Step: 115410 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:59:20,875-Speed 5178.94 samples/sec Loss 3.1774 LearningRate 0.0428 Epoch: 6 Global Step: 115420 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:59:22,860-Speed 5160.34 samples/sec Loss 3.1871 LearningRate 0.0428 Epoch: 6 Global Step: 115430 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:59:24,846-Speed 5163.91 samples/sec Loss 3.2849 LearningRate 0.0428 Epoch: 6 Global Step: 115440 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:59:26,855-Speed 5099.30 samples/sec Loss 3.2108 LearningRate 0.0428 Epoch: 6 Global Step: 115450 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:59:28,831-Speed 5183.30 samples/sec Loss 3.2683 LearningRate 0.0428 Epoch: 6 Global Step: 115460 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 06:59:30,796-Speed 5213.26 samples/sec Loss 3.1791 LearningRate 0.0428 Epoch: 6 Global Step: 115470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:59:32,766-Speed 5197.39 samples/sec Loss 3.2047 LearningRate 0.0428 Epoch: 6 Global Step: 115480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:59:34,751-Speed 5161.80 samples/sec Loss 3.1046 LearningRate 0.0428 Epoch: 6 Global Step: 115490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:59:36,734-Speed 5165.09 samples/sec Loss 3.1726 LearningRate 0.0428 Epoch: 6 Global Step: 115500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:59:38,743-Speed 5099.20 samples/sec Loss 3.1559 LearningRate 0.0428 Epoch: 6 Global Step: 115510 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:59:40,745-Speed 5114.43 samples/sec Loss 3.2595 LearningRate 0.0428 Epoch: 6 Global Step: 115520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:59:42,724-Speed 5177.19 samples/sec Loss 3.2238 LearningRate 0.0428 Epoch: 6 Global Step: 115530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:59:44,709-Speed 5159.85 samples/sec Loss 3.1708 LearningRate 0.0428 Epoch: 6 Global Step: 115540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:59:46,695-Speed 5158.75 samples/sec Loss 3.1714 LearningRate 0.0428 Epoch: 6 Global Step: 115550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:59:48,677-Speed 5167.80 samples/sec Loss 3.1588 LearningRate 0.0427 Epoch: 6 Global Step: 115560 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 06:59:50,648-Speed 5196.76 samples/sec Loss 3.1547 LearningRate 0.0427 Epoch: 6 Global Step: 115570 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:59:52,629-Speed 5171.44 samples/sec Loss 3.1583 LearningRate 0.0427 Epoch: 6 Global Step: 115580 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:59:54,610-Speed 5172.54 samples/sec Loss 3.0800 LearningRate 0.0427 Epoch: 6 Global Step: 115590 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:59:56,582-Speed 5193.00 samples/sec Loss 3.0856 LearningRate 0.0427 Epoch: 6 Global Step: 115600 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 06:59:58,555-Speed 5191.58 samples/sec Loss 3.2126 LearningRate 0.0427 Epoch: 6 Global Step: 115610 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 07:00:00,531-Speed 5184.55 samples/sec Loss 3.2603 LearningRate 0.0427 Epoch: 6 Global Step: 115620 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 07:00:02,505-Speed 5189.34 samples/sec Loss 3.1648 LearningRate 0.0427 Epoch: 6 Global Step: 115630 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 07:00:04,488-Speed 5166.09 samples/sec Loss 3.2175 LearningRate 0.0427 Epoch: 6 Global Step: 115640 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 07:00:06,466-Speed 5178.25 samples/sec Loss 3.2277 LearningRate 0.0427 Epoch: 6 Global Step: 115650 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 07:00:08,445-Speed 5177.28 samples/sec Loss 3.2208 LearningRate 0.0427 Epoch: 6 Global Step: 115660 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 07:00:10,442-Speed 5128.31 samples/sec Loss 3.0953 LearningRate 0.0427 Epoch: 6 Global Step: 115670 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:00:12,418-Speed 5184.62 samples/sec Loss 3.1518 LearningRate 0.0427 Epoch: 6 Global Step: 115680 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:00:14,404-Speed 5155.94 samples/sec Loss 3.2000 LearningRate 0.0427 Epoch: 6 Global Step: 115690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:00:16,396-Speed 5144.18 samples/sec Loss 3.2297 LearningRate 0.0427 Epoch: 6 Global Step: 115700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:00:18,370-Speed 5187.37 samples/sec Loss 3.1978 LearningRate 0.0427 Epoch: 6 Global Step: 115710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:00:20,355-Speed 5160.87 samples/sec Loss 3.2627 LearningRate 0.0427 Epoch: 6 Global Step: 115720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:00:22,340-Speed 5160.44 samples/sec Loss 3.1517 LearningRate 0.0427 Epoch: 6 Global Step: 115730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:00:24,318-Speed 5177.94 samples/sec Loss 3.1482 LearningRate 0.0427 Epoch: 6 Global Step: 115740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:00:26,315-Speed 5129.54 samples/sec Loss 3.2143 LearningRate 0.0427 Epoch: 6 Global Step: 115750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:00:28,289-Speed 5190.60 samples/sec Loss 3.2450 LearningRate 0.0427 Epoch: 6 Global Step: 115760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:00:30,280-Speed 5145.72 samples/sec Loss 3.2203 LearningRate 0.0427 Epoch: 6 Global Step: 115770 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:00:32,251-Speed 5195.40 samples/sec Loss 3.2040 LearningRate 0.0427 Epoch: 6 Global Step: 115780 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:00:34,233-Speed 5169.95 samples/sec Loss 3.1970 LearningRate 0.0427 Epoch: 6 Global Step: 115790 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:00:36,226-Speed 5138.26 samples/sec Loss 3.2677 LearningRate 0.0427 Epoch: 6 Global Step: 115800 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:00:38,203-Speed 5180.69 samples/sec Loss 3.2678 LearningRate 0.0427 Epoch: 6 Global Step: 115810 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:00:40,212-Speed 5099.30 samples/sec Loss 3.1657 LearningRate 0.0426 Epoch: 6 Global Step: 115820 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:00:42,238-Speed 5055.76 samples/sec Loss 3.1363 LearningRate 0.0426 Epoch: 6 Global Step: 115830 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:00:44,213-Speed 5186.99 samples/sec Loss 3.1600 LearningRate 0.0426 Epoch: 6 Global Step: 115840 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:00:46,196-Speed 5163.87 samples/sec Loss 3.1120 LearningRate 0.0426 Epoch: 6 Global Step: 115850 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:00:48,236-Speed 5022.92 samples/sec Loss 3.1642 LearningRate 0.0426 Epoch: 6 Global Step: 115860 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:00:50,232-Speed 5133.13 samples/sec Loss 3.1998 LearningRate 0.0426 Epoch: 6 Global Step: 115870 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:00:52,269-Speed 5026.14 samples/sec Loss 3.1672 LearningRate 0.0426 Epoch: 6 Global Step: 115880 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:00:54,288-Speed 5076.03 samples/sec Loss 3.1913 LearningRate 0.0426 Epoch: 6 Global Step: 115890 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:00:56,276-Speed 5151.18 samples/sec Loss 3.1679 LearningRate 0.0426 Epoch: 6 Global Step: 115900 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:00:58,240-Speed 5216.62 samples/sec Loss 3.1344 LearningRate 0.0426 Epoch: 6 Global Step: 115910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:01:00,212-Speed 5193.36 samples/sec Loss 3.2156 LearningRate 0.0426 Epoch: 6 Global Step: 115920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:01:02,187-Speed 5187.12 samples/sec Loss 3.2553 LearningRate 0.0426 Epoch: 6 Global Step: 115930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:01:04,173-Speed 5157.52 samples/sec Loss 3.2382 LearningRate 0.0426 Epoch: 6 Global Step: 115940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:01:06,155-Speed 5167.30 samples/sec Loss 3.1091 LearningRate 0.0426 Epoch: 6 Global Step: 115950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:01:08,152-Speed 5131.10 samples/sec Loss 3.1345 LearningRate 0.0426 Epoch: 6 Global Step: 115960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:01:10,146-Speed 5135.45 samples/sec Loss 3.1997 LearningRate 0.0426 Epoch: 6 Global Step: 115970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:01:12,148-Speed 5117.90 samples/sec Loss 3.1887 LearningRate 0.0426 Epoch: 6 Global Step: 115980 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:01:14,143-Speed 5134.05 samples/sec Loss 3.2826 LearningRate 0.0426 Epoch: 6 Global Step: 115990 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:01:16,144-Speed 5118.99 samples/sec Loss 3.1260 LearningRate 0.0426 Epoch: 6 Global Step: 116000 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:01:42,821-[lfw][116000]XNorm: 22.992443 Training: 2022-04-11 07:01:42,822-[lfw][116000]Accuracy-Flip: 0.99783+-0.00308 Training: 2022-04-11 07:01:42,822-[lfw][116000]Accuracy-Highest: 0.99817 Training: 2022-04-11 07:02:13,667-[cfp_fp][116000]XNorm: 21.105977 Training: 2022-04-11 07:02:13,667-[cfp_fp][116000]Accuracy-Flip: 0.98086+-0.00590 Training: 2022-04-11 07:02:13,668-[cfp_fp][116000]Accuracy-Highest: 0.98443 Training: 2022-04-11 07:02:40,266-[agedb_30][116000]XNorm: 22.969298 Training: 2022-04-11 07:02:40,266-[agedb_30][116000]Accuracy-Flip: 0.97950+-0.00675 Training: 2022-04-11 07:02:40,267-[agedb_30][116000]Accuracy-Highest: 0.98050 Training: 2022-04-11 07:02:42,249-Speed 118.93 samples/sec Loss 3.2317 LearningRate 0.0426 Epoch: 6 Global Step: 116010 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:02:44,210-Speed 5224.11 samples/sec Loss 3.2090 LearningRate 0.0426 Epoch: 6 Global Step: 116020 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:02:46,171-Speed 5224.27 samples/sec Loss 3.1842 LearningRate 0.0426 Epoch: 6 Global Step: 116030 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:02:48,149-Speed 5177.54 samples/sec Loss 3.2121 LearningRate 0.0426 Epoch: 6 Global Step: 116040 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:02:50,132-Speed 5166.22 samples/sec Loss 3.2458 LearningRate 0.0426 Epoch: 6 Global Step: 116050 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:02:52,121-Speed 5150.61 samples/sec Loss 3.2000 LearningRate 0.0426 Epoch: 6 Global Step: 116060 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:02:54,098-Speed 5182.06 samples/sec Loss 3.2055 LearningRate 0.0425 Epoch: 6 Global Step: 116070 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:02:56,081-Speed 5166.01 samples/sec Loss 3.2097 LearningRate 0.0425 Epoch: 6 Global Step: 116080 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:02:58,065-Speed 5162.85 samples/sec Loss 3.2087 LearningRate 0.0425 Epoch: 6 Global Step: 116090 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:03:00,039-Speed 5188.45 samples/sec Loss 3.1951 LearningRate 0.0425 Epoch: 6 Global Step: 116100 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:03:02,013-Speed 5190.69 samples/sec Loss 3.1759 LearningRate 0.0425 Epoch: 6 Global Step: 116110 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:03:03,989-Speed 5184.29 samples/sec Loss 3.1862 LearningRate 0.0425 Epoch: 6 Global Step: 116120 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:03:05,969-Speed 5173.40 samples/sec Loss 3.1464 LearningRate 0.0425 Epoch: 6 Global Step: 116130 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:03:07,941-Speed 5195.63 samples/sec Loss 3.2086 LearningRate 0.0425 Epoch: 6 Global Step: 116140 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:03:09,924-Speed 5163.38 samples/sec Loss 3.2102 LearningRate 0.0425 Epoch: 6 Global Step: 116150 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:03:11,919-Speed 5136.12 samples/sec Loss 3.0825 LearningRate 0.0425 Epoch: 6 Global Step: 116160 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:03:13,918-Speed 5124.10 samples/sec Loss 3.2121 LearningRate 0.0425 Epoch: 6 Global Step: 116170 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:03:15,891-Speed 5191.64 samples/sec Loss 3.1997 LearningRate 0.0425 Epoch: 6 Global Step: 116180 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:03:17,871-Speed 5174.08 samples/sec Loss 3.1667 LearningRate 0.0425 Epoch: 6 Global Step: 116190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:03:19,861-Speed 5146.00 samples/sec Loss 3.2342 LearningRate 0.0425 Epoch: 6 Global Step: 116200 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:03:21,844-Speed 5167.26 samples/sec Loss 3.2150 LearningRate 0.0425 Epoch: 6 Global Step: 116210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:03:23,827-Speed 5164.19 samples/sec Loss 3.1742 LearningRate 0.0425 Epoch: 6 Global Step: 116220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:03:25,809-Speed 5168.65 samples/sec Loss 3.1409 LearningRate 0.0425 Epoch: 6 Global Step: 116230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:03:27,801-Speed 5142.78 samples/sec Loss 3.1183 LearningRate 0.0425 Epoch: 6 Global Step: 116240 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:03:29,794-Speed 5138.79 samples/sec Loss 3.1186 LearningRate 0.0425 Epoch: 6 Global Step: 116250 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:03:31,777-Speed 5167.47 samples/sec Loss 3.2011 LearningRate 0.0425 Epoch: 6 Global Step: 116260 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:03:33,766-Speed 5148.25 samples/sec Loss 3.1658 LearningRate 0.0425 Epoch: 6 Global Step: 116270 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:03:35,787-Speed 5068.97 samples/sec Loss 3.2439 LearningRate 0.0425 Epoch: 6 Global Step: 116280 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:03:37,764-Speed 5181.23 samples/sec Loss 3.1184 LearningRate 0.0425 Epoch: 6 Global Step: 116290 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:03:39,750-Speed 5155.86 samples/sec Loss 3.1864 LearningRate 0.0425 Epoch: 6 Global Step: 116300 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:03:41,721-Speed 5199.11 samples/sec Loss 3.1574 LearningRate 0.0425 Epoch: 6 Global Step: 116310 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:03:43,699-Speed 5179.44 samples/sec Loss 3.1404 LearningRate 0.0425 Epoch: 6 Global Step: 116320 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:03:45,674-Speed 5184.94 samples/sec Loss 3.1444 LearningRate 0.0424 Epoch: 6 Global Step: 116330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:03:47,671-Speed 5130.38 samples/sec Loss 3.1699 LearningRate 0.0424 Epoch: 6 Global Step: 116340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:03:49,662-Speed 5143.89 samples/sec Loss 3.1385 LearningRate 0.0424 Epoch: 6 Global Step: 116350 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:03:51,655-Speed 5138.87 samples/sec Loss 3.2328 LearningRate 0.0424 Epoch: 6 Global Step: 116360 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:03:53,637-Speed 5170.13 samples/sec Loss 3.2478 LearningRate 0.0424 Epoch: 6 Global Step: 116370 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:03:55,609-Speed 5192.78 samples/sec Loss 3.2510 LearningRate 0.0424 Epoch: 6 Global Step: 116380 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:03:57,598-Speed 5150.95 samples/sec Loss 3.1461 LearningRate 0.0424 Epoch: 6 Global Step: 116390 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:03:59,584-Speed 5158.22 samples/sec Loss 3.2352 LearningRate 0.0424 Epoch: 6 Global Step: 116400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:04:01,585-Speed 5119.31 samples/sec Loss 3.1983 LearningRate 0.0424 Epoch: 6 Global Step: 116410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:04:03,571-Speed 5158.64 samples/sec Loss 3.2090 LearningRate 0.0424 Epoch: 6 Global Step: 116420 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:04:05,555-Speed 5163.43 samples/sec Loss 3.1638 LearningRate 0.0424 Epoch: 6 Global Step: 116430 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:04:07,525-Speed 5199.04 samples/sec Loss 3.1779 LearningRate 0.0424 Epoch: 6 Global Step: 116440 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:04:09,500-Speed 5184.70 samples/sec Loss 3.1811 LearningRate 0.0424 Epoch: 6 Global Step: 116450 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:04:11,492-Speed 5143.48 samples/sec Loss 3.0987 LearningRate 0.0424 Epoch: 6 Global Step: 116460 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:04:13,475-Speed 5165.81 samples/sec Loss 3.1901 LearningRate 0.0424 Epoch: 6 Global Step: 116470 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:04:15,446-Speed 5197.69 samples/sec Loss 3.2039 LearningRate 0.0424 Epoch: 6 Global Step: 116480 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:04:17,441-Speed 5134.52 samples/sec Loss 3.1406 LearningRate 0.0424 Epoch: 6 Global Step: 116490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:04:19,413-Speed 5192.94 samples/sec Loss 3.1484 LearningRate 0.0424 Epoch: 6 Global Step: 116500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:04:21,410-Speed 5129.51 samples/sec Loss 3.2330 LearningRate 0.0424 Epoch: 6 Global Step: 116510 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:04:23,400-Speed 5146.04 samples/sec Loss 3.1816 LearningRate 0.0424 Epoch: 6 Global Step: 116520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:04:25,386-Speed 5160.28 samples/sec Loss 3.1574 LearningRate 0.0424 Epoch: 6 Global Step: 116530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:04:27,362-Speed 5185.04 samples/sec Loss 3.2339 LearningRate 0.0424 Epoch: 6 Global Step: 116540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:04:29,333-Speed 5196.06 samples/sec Loss 3.2232 LearningRate 0.0424 Epoch: 6 Global Step: 116550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:04:31,307-Speed 5189.65 samples/sec Loss 3.1872 LearningRate 0.0424 Epoch: 6 Global Step: 116560 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:04:33,277-Speed 5198.61 samples/sec Loss 3.1327 LearningRate 0.0424 Epoch: 6 Global Step: 116570 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:04:35,249-Speed 5193.62 samples/sec Loss 3.1236 LearningRate 0.0424 Epoch: 6 Global Step: 116580 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:04:37,222-Speed 5192.46 samples/sec Loss 3.2471 LearningRate 0.0423 Epoch: 6 Global Step: 116590 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:04:39,190-Speed 5204.31 samples/sec Loss 3.2405 LearningRate 0.0423 Epoch: 6 Global Step: 116600 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:04:41,181-Speed 5144.78 samples/sec Loss 3.1880 LearningRate 0.0423 Epoch: 6 Global Step: 116610 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:04:43,155-Speed 5189.03 samples/sec Loss 3.1718 LearningRate 0.0423 Epoch: 6 Global Step: 116620 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:04:45,149-Speed 5139.38 samples/sec Loss 3.2019 LearningRate 0.0423 Epoch: 6 Global Step: 116630 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:04:47,153-Speed 5110.91 samples/sec Loss 3.1528 LearningRate 0.0423 Epoch: 6 Global Step: 116640 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:04:49,168-Speed 5083.33 samples/sec Loss 3.2415 LearningRate 0.0423 Epoch: 6 Global Step: 116650 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:04:51,152-Speed 5163.16 samples/sec Loss 3.1999 LearningRate 0.0423 Epoch: 6 Global Step: 116660 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:04:53,124-Speed 5195.10 samples/sec Loss 3.1251 LearningRate 0.0423 Epoch: 6 Global Step: 116670 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:04:55,093-Speed 5202.64 samples/sec Loss 3.2452 LearningRate 0.0423 Epoch: 6 Global Step: 116680 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:04:57,060-Speed 5206.86 samples/sec Loss 3.2196 LearningRate 0.0423 Epoch: 6 Global Step: 116690 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:04:59,044-Speed 5163.04 samples/sec Loss 3.2151 LearningRate 0.0423 Epoch: 6 Global Step: 116700 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:05:01,055-Speed 5092.95 samples/sec Loss 3.1594 LearningRate 0.0423 Epoch: 6 Global Step: 116710 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:05:03,036-Speed 5171.61 samples/sec Loss 3.1206 LearningRate 0.0423 Epoch: 6 Global Step: 116720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:05:05,014-Speed 5177.77 samples/sec Loss 3.1081 LearningRate 0.0423 Epoch: 6 Global Step: 116730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:05:06,988-Speed 5189.06 samples/sec Loss 3.1612 LearningRate 0.0423 Epoch: 6 Global Step: 116740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:05:08,977-Speed 5151.13 samples/sec Loss 3.1802 LearningRate 0.0423 Epoch: 6 Global Step: 116750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:05:10,972-Speed 5134.39 samples/sec Loss 3.2620 LearningRate 0.0423 Epoch: 6 Global Step: 116760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:05:12,961-Speed 5149.46 samples/sec Loss 3.1775 LearningRate 0.0423 Epoch: 6 Global Step: 116770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:05:14,966-Speed 5108.08 samples/sec Loss 3.2032 LearningRate 0.0423 Epoch: 6 Global Step: 116780 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:05:17,729-Speed 3707.90 samples/sec Loss 3.2296 LearningRate 0.0423 Epoch: 6 Global Step: 116790 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:05:19,705-Speed 5182.81 samples/sec Loss 3.1927 LearningRate 0.0423 Epoch: 6 Global Step: 116800 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:05:21,678-Speed 5191.35 samples/sec Loss 3.1885 LearningRate 0.0423 Epoch: 6 Global Step: 116810 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:05:23,661-Speed 5165.16 samples/sec Loss 3.1237 LearningRate 0.0423 Epoch: 6 Global Step: 116820 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:05:25,886-Speed 4604.62 samples/sec Loss 3.1719 LearningRate 0.0423 Epoch: 6 Global Step: 116830 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:05:56,809-Speed 331.17 samples/sec Loss 2.9489 LearningRate 0.0422 Epoch: 7 Global Step: 116840 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:05:58,799-Speed 5148.91 samples/sec Loss 2.6094 LearningRate 0.0422 Epoch: 7 Global Step: 116850 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:06:00,811-Speed 5089.48 samples/sec Loss 2.6156 LearningRate 0.0422 Epoch: 7 Global Step: 116860 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:06:02,784-Speed 5192.03 samples/sec Loss 2.5687 LearningRate 0.0422 Epoch: 7 Global Step: 116870 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:06:04,773-Speed 5149.76 samples/sec Loss 2.5966 LearningRate 0.0422 Epoch: 7 Global Step: 116880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:06:06,745-Speed 5194.30 samples/sec Loss 2.5787 LearningRate 0.0422 Epoch: 7 Global Step: 116890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:06:09,036-Speed 4471.39 samples/sec Loss 2.5662 LearningRate 0.0422 Epoch: 7 Global Step: 116900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:06:11,007-Speed 5197.35 samples/sec Loss 2.5905 LearningRate 0.0422 Epoch: 7 Global Step: 116910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:06:12,983-Speed 5183.24 samples/sec Loss 2.5415 LearningRate 0.0422 Epoch: 7 Global Step: 116920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:06:14,977-Speed 5136.64 samples/sec Loss 2.5347 LearningRate 0.0422 Epoch: 7 Global Step: 116930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:06:16,968-Speed 5146.08 samples/sec Loss 2.5466 LearningRate 0.0422 Epoch: 7 Global Step: 116940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:06:18,939-Speed 5195.75 samples/sec Loss 2.6004 LearningRate 0.0422 Epoch: 7 Global Step: 116950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:06:20,914-Speed 5188.49 samples/sec Loss 2.5666 LearningRate 0.0422 Epoch: 7 Global Step: 116960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:06:22,897-Speed 5166.40 samples/sec Loss 2.4919 LearningRate 0.0422 Epoch: 7 Global Step: 116970 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:06:24,875-Speed 5178.31 samples/sec Loss 2.5517 LearningRate 0.0422 Epoch: 7 Global Step: 116980 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:06:26,854-Speed 5175.35 samples/sec Loss 2.5565 LearningRate 0.0422 Epoch: 7 Global Step: 116990 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:06:28,838-Speed 5163.71 samples/sec Loss 2.5369 LearningRate 0.0422 Epoch: 7 Global Step: 117000 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:06:30,812-Speed 5187.04 samples/sec Loss 2.5799 LearningRate 0.0422 Epoch: 7 Global Step: 117010 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:06:32,804-Speed 5142.58 samples/sec Loss 2.6220 LearningRate 0.0422 Epoch: 7 Global Step: 117020 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:06:34,785-Speed 5171.33 samples/sec Loss 2.5431 LearningRate 0.0422 Epoch: 7 Global Step: 117030 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:06:36,754-Speed 5201.36 samples/sec Loss 2.5676 LearningRate 0.0422 Epoch: 7 Global Step: 117040 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:06:38,740-Speed 5158.91 samples/sec Loss 2.6192 LearningRate 0.0422 Epoch: 7 Global Step: 117050 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:06:40,726-Speed 5157.75 samples/sec Loss 2.5784 LearningRate 0.0422 Epoch: 7 Global Step: 117060 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:06:42,719-Speed 5141.70 samples/sec Loss 2.6023 LearningRate 0.0422 Epoch: 7 Global Step: 117070 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:06:44,690-Speed 5195.79 samples/sec Loss 2.5347 LearningRate 0.0422 Epoch: 7 Global Step: 117080 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:06:46,688-Speed 5126.06 samples/sec Loss 2.5251 LearningRate 0.0422 Epoch: 7 Global Step: 117090 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:06:48,669-Speed 5170.55 samples/sec Loss 2.5862 LearningRate 0.0421 Epoch: 7 Global Step: 117100 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:06:50,677-Speed 5101.01 samples/sec Loss 2.5893 LearningRate 0.0421 Epoch: 7 Global Step: 117110 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:06:52,650-Speed 5193.46 samples/sec Loss 2.5792 LearningRate 0.0421 Epoch: 7 Global Step: 117120 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:06:54,644-Speed 5136.49 samples/sec Loss 2.5625 LearningRate 0.0421 Epoch: 7 Global Step: 117130 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:06:56,622-Speed 5179.00 samples/sec Loss 2.5556 LearningRate 0.0421 Epoch: 7 Global Step: 117140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:06:58,611-Speed 5149.39 samples/sec Loss 2.5290 LearningRate 0.0421 Epoch: 7 Global Step: 117150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:07:00,611-Speed 5122.89 samples/sec Loss 2.5840 LearningRate 0.0421 Epoch: 7 Global Step: 117160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:07:02,747-Speed 4795.25 samples/sec Loss 2.5578 LearningRate 0.0421 Epoch: 7 Global Step: 117170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:07:04,739-Speed 5142.01 samples/sec Loss 2.5769 LearningRate 0.0421 Epoch: 7 Global Step: 117180 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:07:06,718-Speed 5175.92 samples/sec Loss 2.5493 LearningRate 0.0421 Epoch: 7 Global Step: 117190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:07:08,705-Speed 5155.26 samples/sec Loss 2.5884 LearningRate 0.0421 Epoch: 7 Global Step: 117200 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:07:10,676-Speed 5196.89 samples/sec Loss 2.5980 LearningRate 0.0421 Epoch: 7 Global Step: 117210 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:07:12,668-Speed 5142.41 samples/sec Loss 2.5247 LearningRate 0.0421 Epoch: 7 Global Step: 117220 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:07:14,657-Speed 5149.35 samples/sec Loss 2.5938 LearningRate 0.0421 Epoch: 7 Global Step: 117230 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:07:16,640-Speed 5166.34 samples/sec Loss 2.6246 LearningRate 0.0421 Epoch: 7 Global Step: 117240 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:07:18,623-Speed 5165.57 samples/sec Loss 2.6068 LearningRate 0.0421 Epoch: 7 Global Step: 117250 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:07:20,616-Speed 5141.15 samples/sec Loss 2.6368 LearningRate 0.0421 Epoch: 7 Global Step: 117260 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:07:22,610-Speed 5137.80 samples/sec Loss 2.5781 LearningRate 0.0421 Epoch: 7 Global Step: 117270 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:07:24,599-Speed 5148.02 samples/sec Loss 2.6368 LearningRate 0.0421 Epoch: 7 Global Step: 117280 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:07:26,587-Speed 5152.62 samples/sec Loss 2.5184 LearningRate 0.0421 Epoch: 7 Global Step: 117290 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:07:28,577-Speed 5148.66 samples/sec Loss 2.5841 LearningRate 0.0421 Epoch: 7 Global Step: 117300 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:07:30,546-Speed 5200.51 samples/sec Loss 2.6235 LearningRate 0.0421 Epoch: 7 Global Step: 117310 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 07:07:32,528-Speed 5168.90 samples/sec Loss 2.5962 LearningRate 0.0421 Epoch: 7 Global Step: 117320 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 07:07:34,521-Speed 5139.57 samples/sec Loss 2.5233 LearningRate 0.0421 Epoch: 7 Global Step: 117330 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 07:07:36,515-Speed 5137.91 samples/sec Loss 2.5786 LearningRate 0.0421 Epoch: 7 Global Step: 117340 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 07:07:38,515-Speed 5120.88 samples/sec Loss 2.5712 LearningRate 0.0421 Epoch: 7 Global Step: 117350 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 07:07:40,500-Speed 5161.80 samples/sec Loss 2.6081 LearningRate 0.0420 Epoch: 7 Global Step: 117360 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 07:07:42,483-Speed 5166.45 samples/sec Loss 2.7010 LearningRate 0.0420 Epoch: 7 Global Step: 117370 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 07:07:44,462-Speed 5175.60 samples/sec Loss 2.6824 LearningRate 0.0420 Epoch: 7 Global Step: 117380 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 07:07:46,438-Speed 5182.41 samples/sec Loss 2.5826 LearningRate 0.0420 Epoch: 7 Global Step: 117390 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 07:07:48,414-Speed 5184.58 samples/sec Loss 2.6812 LearningRate 0.0420 Epoch: 7 Global Step: 117400 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 07:07:50,389-Speed 5186.87 samples/sec Loss 2.6061 LearningRate 0.0420 Epoch: 7 Global Step: 117410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:07:52,363-Speed 5188.82 samples/sec Loss 2.6211 LearningRate 0.0420 Epoch: 7 Global Step: 117420 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:07:54,352-Speed 5150.04 samples/sec Loss 2.5641 LearningRate 0.0420 Epoch: 7 Global Step: 117430 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:07:56,325-Speed 5191.70 samples/sec Loss 2.6611 LearningRate 0.0420 Epoch: 7 Global Step: 117440 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:07:58,297-Speed 5193.99 samples/sec Loss 2.5956 LearningRate 0.0420 Epoch: 7 Global Step: 117450 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:08:00,322-Speed 5058.25 samples/sec Loss 2.5761 LearningRate 0.0420 Epoch: 7 Global Step: 117460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:08:02,322-Speed 5121.68 samples/sec Loss 2.5880 LearningRate 0.0420 Epoch: 7 Global Step: 117470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:08:04,313-Speed 5145.60 samples/sec Loss 2.6364 LearningRate 0.0420 Epoch: 7 Global Step: 117480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:08:06,309-Speed 5131.06 samples/sec Loss 2.5955 LearningRate 0.0420 Epoch: 7 Global Step: 117490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:08:08,309-Speed 5123.67 samples/sec Loss 2.6051 LearningRate 0.0420 Epoch: 7 Global Step: 117500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:08:10,307-Speed 5127.28 samples/sec Loss 2.6842 LearningRate 0.0420 Epoch: 7 Global Step: 117510 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:08:12,281-Speed 5187.84 samples/sec Loss 2.6034 LearningRate 0.0420 Epoch: 7 Global Step: 117520 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:08:14,274-Speed 5138.20 samples/sec Loss 2.6430 LearningRate 0.0420 Epoch: 7 Global Step: 117530 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:08:16,255-Speed 5172.99 samples/sec Loss 2.6262 LearningRate 0.0420 Epoch: 7 Global Step: 117540 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:08:18,240-Speed 5160.37 samples/sec Loss 2.6491 LearningRate 0.0420 Epoch: 7 Global Step: 117550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:08:20,214-Speed 5188.99 samples/sec Loss 2.6282 LearningRate 0.0420 Epoch: 7 Global Step: 117560 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:08:22,204-Speed 5146.42 samples/sec Loss 2.5765 LearningRate 0.0420 Epoch: 7 Global Step: 117570 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:08:24,231-Speed 5054.24 samples/sec Loss 2.5863 LearningRate 0.0420 Epoch: 7 Global Step: 117580 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:08:26,220-Speed 5150.65 samples/sec Loss 2.6240 LearningRate 0.0420 Epoch: 7 Global Step: 117590 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:08:28,199-Speed 5175.95 samples/sec Loss 2.7149 LearningRate 0.0420 Epoch: 7 Global Step: 117600 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:08:30,173-Speed 5187.69 samples/sec Loss 2.6242 LearningRate 0.0419 Epoch: 7 Global Step: 117610 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:08:32,148-Speed 5188.39 samples/sec Loss 2.5971 LearningRate 0.0419 Epoch: 7 Global Step: 117620 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:08:34,132-Speed 5163.12 samples/sec Loss 2.6213 LearningRate 0.0419 Epoch: 7 Global Step: 117630 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:08:36,108-Speed 5183.64 samples/sec Loss 2.6736 LearningRate 0.0419 Epoch: 7 Global Step: 117640 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-11 07:08:38,087-Speed 5173.36 samples/sec Loss 2.6138 LearningRate 0.0419 Epoch: 7 Global Step: 117650 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:08:40,075-Speed 5154.37 samples/sec Loss 2.6798 LearningRate 0.0419 Epoch: 7 Global Step: 117660 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:08:42,069-Speed 5138.28 samples/sec Loss 2.6762 LearningRate 0.0419 Epoch: 7 Global Step: 117670 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:08:44,051-Speed 5166.38 samples/sec Loss 2.6504 LearningRate 0.0419 Epoch: 7 Global Step: 117680 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:08:46,053-Speed 5118.68 samples/sec Loss 2.5699 LearningRate 0.0419 Epoch: 7 Global Step: 117690 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:08:48,057-Speed 5110.37 samples/sec Loss 2.6415 LearningRate 0.0419 Epoch: 7 Global Step: 117700 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:08:50,045-Speed 5153.32 samples/sec Loss 2.6759 LearningRate 0.0419 Epoch: 7 Global Step: 117710 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:08:52,038-Speed 5138.63 samples/sec Loss 2.6054 LearningRate 0.0419 Epoch: 7 Global Step: 117720 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:08:54,015-Speed 5182.21 samples/sec Loss 2.7266 LearningRate 0.0419 Epoch: 7 Global Step: 117730 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:08:55,993-Speed 5178.38 samples/sec Loss 2.6273 LearningRate 0.0419 Epoch: 7 Global Step: 117740 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:08:57,977-Speed 5162.30 samples/sec Loss 2.7020 LearningRate 0.0419 Epoch: 7 Global Step: 117750 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:08:59,945-Speed 5204.23 samples/sec Loss 2.6931 LearningRate 0.0419 Epoch: 7 Global Step: 117760 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:09:01,931-Speed 5158.86 samples/sec Loss 2.5983 LearningRate 0.0419 Epoch: 7 Global Step: 117770 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:09:03,904-Speed 5192.08 samples/sec Loss 2.6327 LearningRate 0.0419 Epoch: 7 Global Step: 117780 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:09:05,885-Speed 5170.98 samples/sec Loss 2.5661 LearningRate 0.0419 Epoch: 7 Global Step: 117790 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:09:07,866-Speed 5170.66 samples/sec Loss 2.6958 LearningRate 0.0419 Epoch: 7 Global Step: 117800 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:09:09,845-Speed 5175.75 samples/sec Loss 2.6964 LearningRate 0.0419 Epoch: 7 Global Step: 117810 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:09:11,841-Speed 5132.86 samples/sec Loss 2.6659 LearningRate 0.0419 Epoch: 7 Global Step: 117820 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:09:13,852-Speed 5092.40 samples/sec Loss 2.6873 LearningRate 0.0419 Epoch: 7 Global Step: 117830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:09:15,869-Speed 5079.67 samples/sec Loss 2.7161 LearningRate 0.0419 Epoch: 7 Global Step: 117840 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:09:17,870-Speed 5119.36 samples/sec Loss 2.6976 LearningRate 0.0419 Epoch: 7 Global Step: 117850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:09:19,847-Speed 5182.52 samples/sec Loss 2.6870 LearningRate 0.0419 Epoch: 7 Global Step: 117860 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:09:21,824-Speed 5180.94 samples/sec Loss 2.7566 LearningRate 0.0418 Epoch: 7 Global Step: 117870 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:09:23,831-Speed 5103.19 samples/sec Loss 2.6587 LearningRate 0.0418 Epoch: 7 Global Step: 117880 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:09:25,816-Speed 5159.17 samples/sec Loss 2.6925 LearningRate 0.0418 Epoch: 7 Global Step: 117890 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:09:27,786-Speed 5200.76 samples/sec Loss 2.6254 LearningRate 0.0418 Epoch: 7 Global Step: 117900 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:09:29,778-Speed 5143.01 samples/sec Loss 2.7265 LearningRate 0.0418 Epoch: 7 Global Step: 117910 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:09:31,749-Speed 5194.94 samples/sec Loss 2.7075 LearningRate 0.0418 Epoch: 7 Global Step: 117920 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:09:33,758-Speed 5099.29 samples/sec Loss 2.7107 LearningRate 0.0418 Epoch: 7 Global Step: 117930 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:09:35,737-Speed 5177.63 samples/sec Loss 2.7463 LearningRate 0.0418 Epoch: 7 Global Step: 117940 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:09:37,748-Speed 5091.75 samples/sec Loss 2.6932 LearningRate 0.0418 Epoch: 7 Global Step: 117950 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:09:39,742-Speed 5138.55 samples/sec Loss 2.6420 LearningRate 0.0418 Epoch: 7 Global Step: 117960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:09:41,724-Speed 5169.73 samples/sec Loss 2.6313 LearningRate 0.0418 Epoch: 7 Global Step: 117970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:09:43,695-Speed 5194.91 samples/sec Loss 2.6140 LearningRate 0.0418 Epoch: 7 Global Step: 117980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:09:45,695-Speed 5121.50 samples/sec Loss 2.6800 LearningRate 0.0418 Epoch: 7 Global Step: 117990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:09:47,677-Speed 5168.35 samples/sec Loss 2.6528 LearningRate 0.0418 Epoch: 7 Global Step: 118000 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:10:14,198-[lfw][118000]XNorm: 21.414950 Training: 2022-04-11 07:10:14,199-[lfw][118000]Accuracy-Flip: 0.99800+-0.00277 Training: 2022-04-11 07:10:14,199-[lfw][118000]Accuracy-Highest: 0.99817 Training: 2022-04-11 07:10:44,862-[cfp_fp][118000]XNorm: 19.660913 Training: 2022-04-11 07:10:44,862-[cfp_fp][118000]Accuracy-Flip: 0.97671+-0.00751 Training: 2022-04-11 07:10:44,863-[cfp_fp][118000]Accuracy-Highest: 0.98443 Training: 2022-04-11 07:11:11,344-[agedb_30][118000]XNorm: 21.589230 Training: 2022-04-11 07:11:11,345-[agedb_30][118000]Accuracy-Flip: 0.98150+-0.00747 Training: 2022-04-11 07:11:11,345-[agedb_30][118000]Accuracy-Highest: 0.98150 Training: 2022-04-11 07:11:13,325-Speed 119.56 samples/sec Loss 2.6486 LearningRate 0.0418 Epoch: 7 Global Step: 118010 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:11:15,297-Speed 5194.92 samples/sec Loss 2.7021 LearningRate 0.0418 Epoch: 7 Global Step: 118020 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:11:17,275-Speed 5179.83 samples/sec Loss 2.7014 LearningRate 0.0418 Epoch: 7 Global Step: 118030 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-11 07:11:19,231-Speed 5234.88 samples/sec Loss 2.5937 LearningRate 0.0418 Epoch: 7 Global Step: 118040 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 07:11:21,200-Speed 5203.20 samples/sec Loss 2.7028 LearningRate 0.0418 Epoch: 7 Global Step: 118050 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 07:11:23,165-Speed 5213.61 samples/sec Loss 2.7015 LearningRate 0.0418 Epoch: 7 Global Step: 118060 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 07:11:25,141-Speed 5185.43 samples/sec Loss 2.6778 LearningRate 0.0418 Epoch: 7 Global Step: 118070 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 07:11:27,125-Speed 5163.15 samples/sec Loss 2.7397 LearningRate 0.0418 Epoch: 7 Global Step: 118080 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 07:11:29,096-Speed 5195.49 samples/sec Loss 2.6910 LearningRate 0.0418 Epoch: 7 Global Step: 118090 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 07:11:31,066-Speed 5200.28 samples/sec Loss 2.6565 LearningRate 0.0418 Epoch: 7 Global Step: 118100 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 07:11:33,043-Speed 5182.22 samples/sec Loss 2.6554 LearningRate 0.0418 Epoch: 7 Global Step: 118110 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 07:11:35,019-Speed 5183.01 samples/sec Loss 2.6783 LearningRate 0.0418 Epoch: 7 Global Step: 118120 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 07:11:37,014-Speed 5134.83 samples/sec Loss 2.7713 LearningRate 0.0417 Epoch: 7 Global Step: 118130 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-11 07:11:38,997-Speed 5163.32 samples/sec Loss 2.6491 LearningRate 0.0417 Epoch: 7 Global Step: 118140 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:11:40,980-Speed 5167.39 samples/sec Loss 2.7308 LearningRate 0.0417 Epoch: 7 Global Step: 118150 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:11:42,953-Speed 5191.90 samples/sec Loss 2.6589 LearningRate 0.0417 Epoch: 7 Global Step: 118160 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:11:44,927-Speed 5188.36 samples/sec Loss 2.6851 LearningRate 0.0417 Epoch: 7 Global Step: 118170 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:11:46,908-Speed 5171.29 samples/sec Loss 2.7578 LearningRate 0.0417 Epoch: 7 Global Step: 118180 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:11:48,885-Speed 5181.15 samples/sec Loss 2.6772 LearningRate 0.0417 Epoch: 7 Global Step: 118190 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:11:50,857-Speed 5194.95 samples/sec Loss 2.7170 LearningRate 0.0417 Epoch: 7 Global Step: 118200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:11:52,838-Speed 5170.88 samples/sec Loss 2.6813 LearningRate 0.0417 Epoch: 7 Global Step: 118210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:11:54,819-Speed 5169.79 samples/sec Loss 2.7107 LearningRate 0.0417 Epoch: 7 Global Step: 118220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:11:56,829-Speed 5095.68 samples/sec Loss 2.7255 LearningRate 0.0417 Epoch: 7 Global Step: 118230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:11:58,821-Speed 5144.22 samples/sec Loss 2.6685 LearningRate 0.0417 Epoch: 7 Global Step: 118240 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:12:00,806-Speed 5158.69 samples/sec Loss 2.6530 LearningRate 0.0417 Epoch: 7 Global Step: 118250 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:12:02,783-Speed 5183.51 samples/sec Loss 2.6916 LearningRate 0.0417 Epoch: 7 Global Step: 118260 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:12:04,772-Speed 5150.57 samples/sec Loss 2.6939 LearningRate 0.0417 Epoch: 7 Global Step: 118270 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:12:06,755-Speed 5164.70 samples/sec Loss 2.7169 LearningRate 0.0417 Epoch: 7 Global Step: 118280 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:12:08,730-Speed 5185.35 samples/sec Loss 2.7108 LearningRate 0.0417 Epoch: 7 Global Step: 118290 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:12:10,707-Speed 5182.43 samples/sec Loss 2.6953 LearningRate 0.0417 Epoch: 7 Global Step: 118300 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:12:12,689-Speed 5167.86 samples/sec Loss 2.7413 LearningRate 0.0417 Epoch: 7 Global Step: 118310 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:12:14,685-Speed 5132.67 samples/sec Loss 2.7127 LearningRate 0.0417 Epoch: 7 Global Step: 118320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:12:16,666-Speed 5171.49 samples/sec Loss 2.6861 LearningRate 0.0417 Epoch: 7 Global Step: 118330 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:12:18,643-Speed 5180.83 samples/sec Loss 2.7422 LearningRate 0.0417 Epoch: 7 Global Step: 118340 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:12:20,639-Speed 5130.34 samples/sec Loss 2.7647 LearningRate 0.0417 Epoch: 7 Global Step: 118350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:12:22,637-Speed 5127.11 samples/sec Loss 2.7711 LearningRate 0.0417 Epoch: 7 Global Step: 118360 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:12:24,632-Speed 5134.98 samples/sec Loss 2.6840 LearningRate 0.0417 Epoch: 7 Global Step: 118370 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:12:26,658-Speed 5055.92 samples/sec Loss 2.7515 LearningRate 0.0417 Epoch: 7 Global Step: 118380 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:12:28,642-Speed 5164.95 samples/sec Loss 2.6795 LearningRate 0.0416 Epoch: 7 Global Step: 118390 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:12:30,627-Speed 5158.57 samples/sec Loss 2.7283 LearningRate 0.0416 Epoch: 7 Global Step: 118400 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:12:32,607-Speed 5173.73 samples/sec Loss 2.7004 LearningRate 0.0416 Epoch: 7 Global Step: 118410 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:12:34,588-Speed 5169.80 samples/sec Loss 2.7031 LearningRate 0.0416 Epoch: 7 Global Step: 118420 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:12:36,571-Speed 5166.42 samples/sec Loss 2.6966 LearningRate 0.0416 Epoch: 7 Global Step: 118430 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:12:38,551-Speed 5172.87 samples/sec Loss 2.7835 LearningRate 0.0416 Epoch: 7 Global Step: 118440 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:12:40,535-Speed 5164.21 samples/sec Loss 2.6858 LearningRate 0.0416 Epoch: 7 Global Step: 118450 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:12:42,519-Speed 5163.91 samples/sec Loss 2.6735 LearningRate 0.0416 Epoch: 7 Global Step: 118460 Fp16 Grad Scale: 262144 Required: 14 hours Training: 2022-04-11 07:12:44,489-Speed 5198.91 samples/sec Loss 2.7313 LearningRate 0.0416 Epoch: 7 Global Step: 118470 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:12:46,475-Speed 5158.74 samples/sec Loss 2.7264 LearningRate 0.0416 Epoch: 7 Global Step: 118480 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:12:48,474-Speed 5123.28 samples/sec Loss 2.7377 LearningRate 0.0416 Epoch: 7 Global Step: 118490 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:12:50,481-Speed 5103.80 samples/sec Loss 2.6549 LearningRate 0.0416 Epoch: 7 Global Step: 118500 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:12:52,462-Speed 5171.89 samples/sec Loss 2.7677 LearningRate 0.0416 Epoch: 7 Global Step: 118510 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:12:54,438-Speed 5183.07 samples/sec Loss 2.7087 LearningRate 0.0416 Epoch: 7 Global Step: 118520 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:12:56,411-Speed 5192.59 samples/sec Loss 2.8197 LearningRate 0.0416 Epoch: 7 Global Step: 118530 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:12:58,381-Speed 5199.12 samples/sec Loss 2.6870 LearningRate 0.0416 Epoch: 7 Global Step: 118540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:13:00,370-Speed 5150.31 samples/sec Loss 2.6823 LearningRate 0.0416 Epoch: 7 Global Step: 118550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:13:02,374-Speed 5110.41 samples/sec Loss 2.7610 LearningRate 0.0416 Epoch: 7 Global Step: 118560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:13:04,358-Speed 5164.07 samples/sec Loss 2.7449 LearningRate 0.0416 Epoch: 7 Global Step: 118570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:13:06,336-Speed 5179.17 samples/sec Loss 2.7358 LearningRate 0.0416 Epoch: 7 Global Step: 118580 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:13:08,321-Speed 5158.97 samples/sec Loss 2.7429 LearningRate 0.0416 Epoch: 7 Global Step: 118590 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:13:10,303-Speed 5170.05 samples/sec Loss 2.7011 LearningRate 0.0416 Epoch: 7 Global Step: 118600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:13:12,295-Speed 5142.30 samples/sec Loss 2.7522 LearningRate 0.0416 Epoch: 7 Global Step: 118610 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:13:14,282-Speed 5153.22 samples/sec Loss 2.7054 LearningRate 0.0416 Epoch: 7 Global Step: 118620 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:13:16,284-Speed 5116.63 samples/sec Loss 2.7532 LearningRate 0.0416 Epoch: 7 Global Step: 118630 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:13:18,255-Speed 5197.45 samples/sec Loss 2.6907 LearningRate 0.0416 Epoch: 7 Global Step: 118640 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:13:20,256-Speed 5118.69 samples/sec Loss 2.6545 LearningRate 0.0415 Epoch: 7 Global Step: 118650 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:13:22,252-Speed 5133.06 samples/sec Loss 2.7919 LearningRate 0.0415 Epoch: 7 Global Step: 118660 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:13:24,250-Speed 5127.18 samples/sec Loss 2.7745 LearningRate 0.0415 Epoch: 7 Global Step: 118670 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:13:26,226-Speed 5184.24 samples/sec Loss 2.6931 LearningRate 0.0415 Epoch: 7 Global Step: 118680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:13:28,201-Speed 5186.76 samples/sec Loss 2.6510 LearningRate 0.0415 Epoch: 7 Global Step: 118690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:13:30,185-Speed 5163.52 samples/sec Loss 2.7307 LearningRate 0.0415 Epoch: 7 Global Step: 118700 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:13:32,157-Speed 5193.65 samples/sec Loss 2.7032 LearningRate 0.0415 Epoch: 7 Global Step: 118710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:13:34,146-Speed 5148.20 samples/sec Loss 2.7650 LearningRate 0.0415 Epoch: 7 Global Step: 118720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:13:36,121-Speed 5187.12 samples/sec Loss 2.6886 LearningRate 0.0415 Epoch: 7 Global Step: 118730 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:13:38,097-Speed 5183.56 samples/sec Loss 2.7131 LearningRate 0.0415 Epoch: 7 Global Step: 118740 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:13:40,094-Speed 5128.70 samples/sec Loss 2.7233 LearningRate 0.0415 Epoch: 7 Global Step: 118750 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:13:42,069-Speed 5189.10 samples/sec Loss 2.7820 LearningRate 0.0415 Epoch: 7 Global Step: 118760 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:13:44,046-Speed 5181.20 samples/sec Loss 2.7519 LearningRate 0.0415 Epoch: 7 Global Step: 118770 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:13:46,050-Speed 5111.29 samples/sec Loss 2.7026 LearningRate 0.0415 Epoch: 7 Global Step: 118780 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:13:48,032-Speed 5168.28 samples/sec Loss 2.6873 LearningRate 0.0415 Epoch: 7 Global Step: 118790 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:13:50,045-Speed 5087.52 samples/sec Loss 2.7649 LearningRate 0.0415 Epoch: 7 Global Step: 118800 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:13:52,031-Speed 5157.48 samples/sec Loss 2.7495 LearningRate 0.0415 Epoch: 7 Global Step: 118810 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:13:54,021-Speed 5148.57 samples/sec Loss 2.7942 LearningRate 0.0415 Epoch: 7 Global Step: 118820 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:13:56,012-Speed 5144.83 samples/sec Loss 2.6988 LearningRate 0.0415 Epoch: 7 Global Step: 118830 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:13:57,987-Speed 5186.03 samples/sec Loss 2.7520 LearningRate 0.0415 Epoch: 7 Global Step: 118840 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:13:59,968-Speed 5171.50 samples/sec Loss 2.7042 LearningRate 0.0415 Epoch: 7 Global Step: 118850 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:14:01,961-Speed 5140.39 samples/sec Loss 2.7104 LearningRate 0.0415 Epoch: 7 Global Step: 118860 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:14:03,940-Speed 5176.34 samples/sec Loss 2.8115 LearningRate 0.0415 Epoch: 7 Global Step: 118870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:14:05,917-Speed 5181.00 samples/sec Loss 2.7461 LearningRate 0.0415 Epoch: 7 Global Step: 118880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:14:07,900-Speed 5163.84 samples/sec Loss 2.8091 LearningRate 0.0415 Epoch: 7 Global Step: 118890 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:14:09,877-Speed 5182.53 samples/sec Loss 2.7486 LearningRate 0.0415 Epoch: 7 Global Step: 118900 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:14:11,884-Speed 5102.78 samples/sec Loss 2.7655 LearningRate 0.0414 Epoch: 7 Global Step: 118910 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:14:13,856-Speed 5195.96 samples/sec Loss 2.7543 LearningRate 0.0414 Epoch: 7 Global Step: 118920 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:14:15,827-Speed 5196.09 samples/sec Loss 2.7579 LearningRate 0.0414 Epoch: 7 Global Step: 118930 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:14:17,799-Speed 5193.91 samples/sec Loss 2.7594 LearningRate 0.0414 Epoch: 7 Global Step: 118940 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:14:19,788-Speed 5152.15 samples/sec Loss 2.7558 LearningRate 0.0414 Epoch: 7 Global Step: 118950 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:14:21,781-Speed 5140.17 samples/sec Loss 2.6986 LearningRate 0.0414 Epoch: 7 Global Step: 118960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:14:23,754-Speed 5191.88 samples/sec Loss 2.8556 LearningRate 0.0414 Epoch: 7 Global Step: 118970 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:14:25,726-Speed 5193.10 samples/sec Loss 2.7341 LearningRate 0.0414 Epoch: 7 Global Step: 118980 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:14:27,706-Speed 5173.05 samples/sec Loss 2.7755 LearningRate 0.0414 Epoch: 7 Global Step: 118990 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:14:29,683-Speed 5183.15 samples/sec Loss 2.7299 LearningRate 0.0414 Epoch: 7 Global Step: 119000 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:14:31,648-Speed 5212.12 samples/sec Loss 2.8036 LearningRate 0.0414 Epoch: 7 Global Step: 119010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:14:33,629-Speed 5169.79 samples/sec Loss 2.7798 LearningRate 0.0414 Epoch: 7 Global Step: 119020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:14:35,633-Speed 5111.86 samples/sec Loss 2.8105 LearningRate 0.0414 Epoch: 7 Global Step: 119030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:14:37,618-Speed 5159.80 samples/sec Loss 2.7308 LearningRate 0.0414 Epoch: 7 Global Step: 119040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:14:39,629-Speed 5094.97 samples/sec Loss 2.7716 LearningRate 0.0414 Epoch: 7 Global Step: 119050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:14:41,622-Speed 5139.87 samples/sec Loss 2.7753 LearningRate 0.0414 Epoch: 7 Global Step: 119060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:14:43,602-Speed 5172.30 samples/sec Loss 2.8031 LearningRate 0.0414 Epoch: 7 Global Step: 119070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:14:45,589-Speed 5154.82 samples/sec Loss 2.7618 LearningRate 0.0414 Epoch: 7 Global Step: 119080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:14:47,595-Speed 5106.35 samples/sec Loss 2.8210 LearningRate 0.0414 Epoch: 7 Global Step: 119090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:14:49,596-Speed 5120.89 samples/sec Loss 2.8145 LearningRate 0.0414 Epoch: 7 Global Step: 119100 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:14:51,578-Speed 5166.70 samples/sec Loss 2.7467 LearningRate 0.0414 Epoch: 7 Global Step: 119110 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:14:53,562-Speed 5162.23 samples/sec Loss 2.7568 LearningRate 0.0414 Epoch: 7 Global Step: 119120 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:14:55,549-Speed 5155.06 samples/sec Loss 2.8397 LearningRate 0.0414 Epoch: 7 Global Step: 119130 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:14:57,536-Speed 5156.77 samples/sec Loss 2.7225 LearningRate 0.0414 Epoch: 7 Global Step: 119140 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:14:59,513-Speed 5182.02 samples/sec Loss 2.6709 LearningRate 0.0414 Epoch: 7 Global Step: 119150 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:15:01,494-Speed 5172.12 samples/sec Loss 2.7295 LearningRate 0.0414 Epoch: 7 Global Step: 119160 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:15:03,470-Speed 5181.49 samples/sec Loss 2.7882 LearningRate 0.0413 Epoch: 7 Global Step: 119170 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:15:05,462-Speed 5143.51 samples/sec Loss 2.7879 LearningRate 0.0413 Epoch: 7 Global Step: 119180 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:15:07,437-Speed 5186.43 samples/sec Loss 2.7687 LearningRate 0.0413 Epoch: 7 Global Step: 119190 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:15:09,415-Speed 5178.40 samples/sec Loss 2.8107 LearningRate 0.0413 Epoch: 7 Global Step: 119200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:15:11,398-Speed 5165.85 samples/sec Loss 2.7990 LearningRate 0.0413 Epoch: 7 Global Step: 119210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:15:13,413-Speed 5084.25 samples/sec Loss 2.7793 LearningRate 0.0413 Epoch: 7 Global Step: 119220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:15:15,409-Speed 5129.63 samples/sec Loss 2.7850 LearningRate 0.0413 Epoch: 7 Global Step: 119230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:15:17,395-Speed 5157.99 samples/sec Loss 2.7896 LearningRate 0.0413 Epoch: 7 Global Step: 119240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:15:19,392-Speed 5131.83 samples/sec Loss 2.8781 LearningRate 0.0413 Epoch: 7 Global Step: 119250 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:15:21,368-Speed 5182.38 samples/sec Loss 2.8641 LearningRate 0.0413 Epoch: 7 Global Step: 119260 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:15:23,369-Speed 5119.97 samples/sec Loss 2.7702 LearningRate 0.0413 Epoch: 7 Global Step: 119270 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:15:25,355-Speed 5157.24 samples/sec Loss 2.8041 LearningRate 0.0413 Epoch: 7 Global Step: 119280 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:15:27,339-Speed 5163.97 samples/sec Loss 2.7839 LearningRate 0.0413 Epoch: 7 Global Step: 119290 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:15:29,330-Speed 5147.29 samples/sec Loss 2.8663 LearningRate 0.0413 Epoch: 7 Global Step: 119300 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:15:31,299-Speed 5202.40 samples/sec Loss 2.8473 LearningRate 0.0413 Epoch: 7 Global Step: 119310 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:15:33,280-Speed 5169.74 samples/sec Loss 2.7703 LearningRate 0.0413 Epoch: 7 Global Step: 119320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:15:35,263-Speed 5166.77 samples/sec Loss 2.8217 LearningRate 0.0413 Epoch: 7 Global Step: 119330 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:15:37,259-Speed 5131.06 samples/sec Loss 2.7668 LearningRate 0.0413 Epoch: 7 Global Step: 119340 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:15:39,252-Speed 5139.81 samples/sec Loss 2.8806 LearningRate 0.0413 Epoch: 7 Global Step: 119350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:15:41,241-Speed 5151.20 samples/sec Loss 2.8048 LearningRate 0.0413 Epoch: 7 Global Step: 119360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:15:43,214-Speed 5190.88 samples/sec Loss 2.8038 LearningRate 0.0413 Epoch: 7 Global Step: 119370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:15:45,189-Speed 5186.67 samples/sec Loss 2.7720 LearningRate 0.0413 Epoch: 7 Global Step: 119380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:15:47,170-Speed 5169.71 samples/sec Loss 2.7923 LearningRate 0.0413 Epoch: 7 Global Step: 119390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:15:49,202-Speed 5042.09 samples/sec Loss 2.7534 LearningRate 0.0413 Epoch: 7 Global Step: 119400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:15:51,190-Speed 5152.35 samples/sec Loss 2.7935 LearningRate 0.0413 Epoch: 7 Global Step: 119410 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:15:53,168-Speed 5178.77 samples/sec Loss 2.8264 LearningRate 0.0413 Epoch: 7 Global Step: 119420 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:15:55,153-Speed 5160.49 samples/sec Loss 2.8068 LearningRate 0.0412 Epoch: 7 Global Step: 119430 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:15:57,134-Speed 5169.69 samples/sec Loss 2.8024 LearningRate 0.0412 Epoch: 7 Global Step: 119440 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:15:59,123-Speed 5150.31 samples/sec Loss 2.8713 LearningRate 0.0412 Epoch: 7 Global Step: 119450 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:16:01,107-Speed 5163.51 samples/sec Loss 2.8198 LearningRate 0.0412 Epoch: 7 Global Step: 119460 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:16:03,085-Speed 5180.73 samples/sec Loss 2.8556 LearningRate 0.0412 Epoch: 7 Global Step: 119470 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:16:05,059-Speed 5187.35 samples/sec Loss 2.7207 LearningRate 0.0412 Epoch: 7 Global Step: 119480 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:16:07,036-Speed 5180.94 samples/sec Loss 2.7537 LearningRate 0.0412 Epoch: 7 Global Step: 119490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:16:09,017-Speed 5170.28 samples/sec Loss 2.7602 LearningRate 0.0412 Epoch: 7 Global Step: 119500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:16:10,997-Speed 5174.45 samples/sec Loss 2.8271 LearningRate 0.0412 Epoch: 7 Global Step: 119510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:16:12,983-Speed 5158.37 samples/sec Loss 2.8542 LearningRate 0.0412 Epoch: 7 Global Step: 119520 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:16:14,965-Speed 5166.49 samples/sec Loss 2.7858 LearningRate 0.0412 Epoch: 7 Global Step: 119530 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:16:16,954-Speed 5149.75 samples/sec Loss 2.8189 LearningRate 0.0412 Epoch: 7 Global Step: 119540 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:16:18,945-Speed 5144.46 samples/sec Loss 2.7778 LearningRate 0.0412 Epoch: 7 Global Step: 119550 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:16:20,926-Speed 5172.79 samples/sec Loss 2.8061 LearningRate 0.0412 Epoch: 7 Global Step: 119560 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:16:22,903-Speed 5180.82 samples/sec Loss 2.8613 LearningRate 0.0412 Epoch: 7 Global Step: 119570 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:16:24,894-Speed 5144.35 samples/sec Loss 2.7749 LearningRate 0.0412 Epoch: 7 Global Step: 119580 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:16:26,874-Speed 5174.49 samples/sec Loss 2.7289 LearningRate 0.0412 Epoch: 7 Global Step: 119590 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:16:28,854-Speed 5174.08 samples/sec Loss 2.7726 LearningRate 0.0412 Epoch: 7 Global Step: 119600 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:16:30,831-Speed 5181.23 samples/sec Loss 2.7806 LearningRate 0.0412 Epoch: 7 Global Step: 119610 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:16:32,807-Speed 5184.10 samples/sec Loss 2.7848 LearningRate 0.0412 Epoch: 7 Global Step: 119620 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:16:34,794-Speed 5154.78 samples/sec Loss 2.7541 LearningRate 0.0412 Epoch: 7 Global Step: 119630 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:16:36,782-Speed 5150.65 samples/sec Loss 2.8775 LearningRate 0.0412 Epoch: 7 Global Step: 119640 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:16:38,754-Speed 5194.21 samples/sec Loss 2.8273 LearningRate 0.0412 Epoch: 7 Global Step: 119650 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:16:40,744-Speed 5149.43 samples/sec Loss 2.8422 LearningRate 0.0412 Epoch: 7 Global Step: 119660 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:16:42,726-Speed 5167.43 samples/sec Loss 2.8358 LearningRate 0.0412 Epoch: 7 Global Step: 119670 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:16:44,722-Speed 5132.85 samples/sec Loss 2.8301 LearningRate 0.0412 Epoch: 7 Global Step: 119680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:16:46,754-Speed 5039.73 samples/sec Loss 2.7916 LearningRate 0.0411 Epoch: 7 Global Step: 119690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:16:48,791-Speed 5030.53 samples/sec Loss 2.8683 LearningRate 0.0411 Epoch: 7 Global Step: 119700 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:16:50,794-Speed 5113.80 samples/sec Loss 2.8989 LearningRate 0.0411 Epoch: 7 Global Step: 119710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:16:52,782-Speed 5151.63 samples/sec Loss 2.7485 LearningRate 0.0411 Epoch: 7 Global Step: 119720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:16:54,760-Speed 5177.81 samples/sec Loss 2.7838 LearningRate 0.0411 Epoch: 7 Global Step: 119730 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:16:56,738-Speed 5179.39 samples/sec Loss 2.7856 LearningRate 0.0411 Epoch: 7 Global Step: 119740 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:16:58,728-Speed 5148.13 samples/sec Loss 2.7683 LearningRate 0.0411 Epoch: 7 Global Step: 119750 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:17:00,729-Speed 5119.80 samples/sec Loss 2.8443 LearningRate 0.0411 Epoch: 7 Global Step: 119760 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:17:02,696-Speed 5207.49 samples/sec Loss 2.8705 LearningRate 0.0411 Epoch: 7 Global Step: 119770 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:17:04,677-Speed 5169.21 samples/sec Loss 2.8470 LearningRate 0.0411 Epoch: 7 Global Step: 119780 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:17:06,651-Speed 5189.71 samples/sec Loss 2.7793 LearningRate 0.0411 Epoch: 7 Global Step: 119790 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:17:08,656-Speed 5109.32 samples/sec Loss 2.7973 LearningRate 0.0411 Epoch: 7 Global Step: 119800 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:17:10,650-Speed 5137.26 samples/sec Loss 2.8527 LearningRate 0.0411 Epoch: 7 Global Step: 119810 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:17:12,638-Speed 5152.89 samples/sec Loss 2.8448 LearningRate 0.0411 Epoch: 7 Global Step: 119820 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:17:14,618-Speed 5173.47 samples/sec Loss 2.8639 LearningRate 0.0411 Epoch: 7 Global Step: 119830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:17:16,610-Speed 5142.46 samples/sec Loss 2.8548 LearningRate 0.0411 Epoch: 7 Global Step: 119840 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:17:18,597-Speed 5153.62 samples/sec Loss 2.7612 LearningRate 0.0411 Epoch: 7 Global Step: 119850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:17:20,566-Speed 5202.13 samples/sec Loss 2.7739 LearningRate 0.0411 Epoch: 7 Global Step: 119860 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:17:22,546-Speed 5174.46 samples/sec Loss 2.8407 LearningRate 0.0411 Epoch: 7 Global Step: 119870 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:17:24,527-Speed 5171.67 samples/sec Loss 2.8678 LearningRate 0.0411 Epoch: 7 Global Step: 119880 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:17:26,511-Speed 5161.97 samples/sec Loss 2.8140 LearningRate 0.0411 Epoch: 7 Global Step: 119890 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:17:28,498-Speed 5157.58 samples/sec Loss 2.8531 LearningRate 0.0411 Epoch: 7 Global Step: 119900 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:17:30,474-Speed 5182.76 samples/sec Loss 2.7745 LearningRate 0.0411 Epoch: 7 Global Step: 119910 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:17:32,466-Speed 5141.96 samples/sec Loss 2.8675 LearningRate 0.0411 Epoch: 7 Global Step: 119920 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:17:34,485-Speed 5073.24 samples/sec Loss 2.8715 LearningRate 0.0411 Epoch: 7 Global Step: 119930 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:17:36,464-Speed 5175.91 samples/sec Loss 2.7876 LearningRate 0.0411 Epoch: 7 Global Step: 119940 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:17:38,450-Speed 5158.16 samples/sec Loss 2.8902 LearningRate 0.0410 Epoch: 7 Global Step: 119950 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:17:40,427-Speed 5182.20 samples/sec Loss 2.8347 LearningRate 0.0410 Epoch: 7 Global Step: 119960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:17:42,412-Speed 5158.54 samples/sec Loss 2.7859 LearningRate 0.0410 Epoch: 7 Global Step: 119970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:17:44,387-Speed 5188.42 samples/sec Loss 2.8411 LearningRate 0.0410 Epoch: 7 Global Step: 119980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:17:46,366-Speed 5176.76 samples/sec Loss 2.8206 LearningRate 0.0410 Epoch: 7 Global Step: 119990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:17:48,360-Speed 5136.26 samples/sec Loss 2.7885 LearningRate 0.0410 Epoch: 7 Global Step: 120000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:18:14,897-[lfw][120000]XNorm: 22.805183 Training: 2022-04-11 07:18:14,897-[lfw][120000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 07:18:14,898-[lfw][120000]Accuracy-Highest: 0.99817 Training: 2022-04-11 07:18:45,717-[cfp_fp][120000]XNorm: 21.011003 Training: 2022-04-11 07:18:45,718-[cfp_fp][120000]Accuracy-Flip: 0.98157+-0.00621 Training: 2022-04-11 07:18:45,718-[cfp_fp][120000]Accuracy-Highest: 0.98443 Training: 2022-04-11 07:19:12,419-[agedb_30][120000]XNorm: 22.950901 Training: 2022-04-11 07:19:12,419-[agedb_30][120000]Accuracy-Flip: 0.97883+-0.00806 Training: 2022-04-11 07:19:12,420-[agedb_30][120000]Accuracy-Highest: 0.98150 Training: 2022-04-11 07:19:14,406-Speed 119.01 samples/sec Loss 2.8951 LearningRate 0.0410 Epoch: 7 Global Step: 120010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:19:16,387-Speed 5172.29 samples/sec Loss 2.8390 LearningRate 0.0410 Epoch: 7 Global Step: 120020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:19:18,376-Speed 5148.02 samples/sec Loss 2.8594 LearningRate 0.0410 Epoch: 7 Global Step: 120030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:19:20,359-Speed 5167.38 samples/sec Loss 2.7832 LearningRate 0.0410 Epoch: 7 Global Step: 120040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:19:22,321-Speed 5219.26 samples/sec Loss 2.8594 LearningRate 0.0410 Epoch: 7 Global Step: 120050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:19:24,298-Speed 5181.81 samples/sec Loss 2.8670 LearningRate 0.0410 Epoch: 7 Global Step: 120060 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:19:26,262-Speed 5216.76 samples/sec Loss 2.8440 LearningRate 0.0410 Epoch: 7 Global Step: 120070 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:19:28,230-Speed 5204.77 samples/sec Loss 2.7167 LearningRate 0.0410 Epoch: 7 Global Step: 120080 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:19:30,200-Speed 5198.70 samples/sec Loss 2.8349 LearningRate 0.0410 Epoch: 7 Global Step: 120090 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:19:32,168-Speed 5204.82 samples/sec Loss 2.8275 LearningRate 0.0410 Epoch: 7 Global Step: 120100 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:19:34,134-Speed 5211.04 samples/sec Loss 2.7702 LearningRate 0.0410 Epoch: 7 Global Step: 120110 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:19:36,108-Speed 5189.31 samples/sec Loss 2.8647 LearningRate 0.0410 Epoch: 7 Global Step: 120120 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:19:38,109-Speed 5119.32 samples/sec Loss 2.7865 LearningRate 0.0410 Epoch: 7 Global Step: 120130 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:19:40,084-Speed 5186.54 samples/sec Loss 2.8279 LearningRate 0.0410 Epoch: 7 Global Step: 120140 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:19:42,065-Speed 5170.93 samples/sec Loss 2.8585 LearningRate 0.0410 Epoch: 7 Global Step: 120150 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:19:44,033-Speed 5204.85 samples/sec Loss 2.8652 LearningRate 0.0410 Epoch: 7 Global Step: 120160 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:19:46,008-Speed 5184.94 samples/sec Loss 2.8571 LearningRate 0.0410 Epoch: 7 Global Step: 120170 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:19:48,010-Speed 5116.85 samples/sec Loss 2.8272 LearningRate 0.0410 Epoch: 7 Global Step: 120180 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:19:49,981-Speed 5196.57 samples/sec Loss 2.7935 LearningRate 0.0410 Epoch: 7 Global Step: 120190 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:19:51,952-Speed 5198.20 samples/sec Loss 2.8094 LearningRate 0.0410 Epoch: 7 Global Step: 120200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:19:53,922-Speed 5199.89 samples/sec Loss 2.8238 LearningRate 0.0409 Epoch: 7 Global Step: 120210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:19:55,905-Speed 5163.93 samples/sec Loss 2.8849 LearningRate 0.0409 Epoch: 7 Global Step: 120220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:19:57,906-Speed 5120.69 samples/sec Loss 2.7914 LearningRate 0.0409 Epoch: 7 Global Step: 120230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:19:59,878-Speed 5195.59 samples/sec Loss 2.8357 LearningRate 0.0409 Epoch: 7 Global Step: 120240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:20:01,871-Speed 5139.36 samples/sec Loss 2.8555 LearningRate 0.0409 Epoch: 7 Global Step: 120250 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:20:03,855-Speed 5161.89 samples/sec Loss 2.7943 LearningRate 0.0409 Epoch: 7 Global Step: 120260 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:20:05,828-Speed 5192.82 samples/sec Loss 2.8283 LearningRate 0.0409 Epoch: 7 Global Step: 120270 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:20:07,818-Speed 5146.26 samples/sec Loss 2.8874 LearningRate 0.0409 Epoch: 7 Global Step: 120280 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:20:09,790-Speed 5193.31 samples/sec Loss 2.8157 LearningRate 0.0409 Epoch: 7 Global Step: 120290 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:20:11,765-Speed 5188.31 samples/sec Loss 2.8293 LearningRate 0.0409 Epoch: 7 Global Step: 120300 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:20:13,739-Speed 5189.82 samples/sec Loss 2.8051 LearningRate 0.0409 Epoch: 7 Global Step: 120310 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:20:15,742-Speed 5114.16 samples/sec Loss 2.8782 LearningRate 0.0409 Epoch: 7 Global Step: 120320 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:20:17,719-Speed 5179.30 samples/sec Loss 2.8418 LearningRate 0.0409 Epoch: 7 Global Step: 120330 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:20:19,699-Speed 5173.56 samples/sec Loss 2.8497 LearningRate 0.0409 Epoch: 7 Global Step: 120340 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:20:21,686-Speed 5157.58 samples/sec Loss 2.8904 LearningRate 0.0409 Epoch: 7 Global Step: 120350 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:20:23,687-Speed 5118.46 samples/sec Loss 2.9576 LearningRate 0.0409 Epoch: 7 Global Step: 120360 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:20:25,674-Speed 5155.36 samples/sec Loss 2.7924 LearningRate 0.0409 Epoch: 7 Global Step: 120370 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:20:27,649-Speed 5184.30 samples/sec Loss 2.8856 LearningRate 0.0409 Epoch: 7 Global Step: 120380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:20:29,622-Speed 5194.34 samples/sec Loss 2.9045 LearningRate 0.0409 Epoch: 7 Global Step: 120390 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:20:31,596-Speed 5188.73 samples/sec Loss 2.8360 LearningRate 0.0409 Epoch: 7 Global Step: 120400 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:20:33,570-Speed 5186.92 samples/sec Loss 2.8134 LearningRate 0.0409 Epoch: 7 Global Step: 120410 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:20:35,547-Speed 5183.60 samples/sec Loss 2.9261 LearningRate 0.0409 Epoch: 7 Global Step: 120420 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:20:37,535-Speed 5152.33 samples/sec Loss 2.7643 LearningRate 0.0409 Epoch: 7 Global Step: 120430 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:20:39,514-Speed 5176.27 samples/sec Loss 2.9309 LearningRate 0.0409 Epoch: 7 Global Step: 120440 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:20:41,497-Speed 5163.79 samples/sec Loss 2.8764 LearningRate 0.0409 Epoch: 7 Global Step: 120450 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:20:43,473-Speed 5185.78 samples/sec Loss 2.8385 LearningRate 0.0409 Epoch: 7 Global Step: 120460 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:20:45,450-Speed 5181.28 samples/sec Loss 2.8836 LearningRate 0.0408 Epoch: 7 Global Step: 120470 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:20:47,458-Speed 5099.62 samples/sec Loss 2.8478 LearningRate 0.0408 Epoch: 7 Global Step: 120480 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:20:49,456-Speed 5128.93 samples/sec Loss 2.8903 LearningRate 0.0408 Epoch: 7 Global Step: 120490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:20:51,440-Speed 5161.94 samples/sec Loss 2.9120 LearningRate 0.0408 Epoch: 7 Global Step: 120500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:20:53,432-Speed 5142.17 samples/sec Loss 2.8377 LearningRate 0.0408 Epoch: 7 Global Step: 120510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:20:55,405-Speed 5191.09 samples/sec Loss 2.8084 LearningRate 0.0408 Epoch: 7 Global Step: 120520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:20:57,394-Speed 5149.72 samples/sec Loss 2.8011 LearningRate 0.0408 Epoch: 7 Global Step: 120530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:20:59,378-Speed 5163.44 samples/sec Loss 2.8908 LearningRate 0.0408 Epoch: 7 Global Step: 120540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:21:01,350-Speed 5194.40 samples/sec Loss 2.9214 LearningRate 0.0408 Epoch: 7 Global Step: 120550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:21:03,337-Speed 5156.68 samples/sec Loss 2.9411 LearningRate 0.0408 Epoch: 7 Global Step: 120560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:21:05,316-Speed 5175.26 samples/sec Loss 2.9654 LearningRate 0.0408 Epoch: 7 Global Step: 120570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:21:07,286-Speed 5199.71 samples/sec Loss 2.8756 LearningRate 0.0408 Epoch: 7 Global Step: 120580 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:21:09,260-Speed 5188.95 samples/sec Loss 2.8122 LearningRate 0.0408 Epoch: 7 Global Step: 120590 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:21:11,248-Speed 5153.54 samples/sec Loss 2.8004 LearningRate 0.0408 Epoch: 7 Global Step: 120600 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:21:13,230-Speed 5168.36 samples/sec Loss 2.8506 LearningRate 0.0408 Epoch: 7 Global Step: 120610 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:21:15,210-Speed 5173.70 samples/sec Loss 2.8856 LearningRate 0.0408 Epoch: 7 Global Step: 120620 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:21:17,190-Speed 5171.90 samples/sec Loss 2.9205 LearningRate 0.0408 Epoch: 7 Global Step: 120630 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:21:19,170-Speed 5174.19 samples/sec Loss 2.9268 LearningRate 0.0408 Epoch: 7 Global Step: 120640 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:21:21,149-Speed 5177.23 samples/sec Loss 2.8916 LearningRate 0.0408 Epoch: 7 Global Step: 120650 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:21:23,128-Speed 5174.11 samples/sec Loss 2.8406 LearningRate 0.0408 Epoch: 7 Global Step: 120660 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:21:25,126-Speed 5128.04 samples/sec Loss 2.8581 LearningRate 0.0408 Epoch: 7 Global Step: 120670 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:21:27,100-Speed 5190.19 samples/sec Loss 2.9172 LearningRate 0.0408 Epoch: 7 Global Step: 120680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:21:29,085-Speed 5159.07 samples/sec Loss 2.8689 LearningRate 0.0408 Epoch: 7 Global Step: 120690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:21:31,060-Speed 5187.65 samples/sec Loss 2.8532 LearningRate 0.0408 Epoch: 7 Global Step: 120700 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:21:33,047-Speed 5154.72 samples/sec Loss 2.7736 LearningRate 0.0408 Epoch: 7 Global Step: 120710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:21:35,019-Speed 5192.70 samples/sec Loss 2.8837 LearningRate 0.0408 Epoch: 7 Global Step: 120720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:21:36,988-Speed 5202.07 samples/sec Loss 2.8879 LearningRate 0.0407 Epoch: 7 Global Step: 120730 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:21:38,969-Speed 5171.53 samples/sec Loss 2.8242 LearningRate 0.0407 Epoch: 7 Global Step: 120740 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:21:40,945-Speed 5182.65 samples/sec Loss 2.8548 LearningRate 0.0407 Epoch: 7 Global Step: 120750 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:21:42,917-Speed 5197.16 samples/sec Loss 2.8644 LearningRate 0.0407 Epoch: 7 Global Step: 120760 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:21:44,902-Speed 5160.13 samples/sec Loss 2.9293 LearningRate 0.0407 Epoch: 7 Global Step: 120770 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:21:46,910-Speed 5101.57 samples/sec Loss 2.9572 LearningRate 0.0407 Epoch: 7 Global Step: 120780 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:21:48,889-Speed 5174.63 samples/sec Loss 2.9061 LearningRate 0.0407 Epoch: 7 Global Step: 120790 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:21:50,879-Speed 5148.36 samples/sec Loss 2.8380 LearningRate 0.0407 Epoch: 7 Global Step: 120800 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:21:52,851-Speed 5193.86 samples/sec Loss 2.8373 LearningRate 0.0407 Epoch: 7 Global Step: 120810 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:21:54,822-Speed 5196.36 samples/sec Loss 2.8807 LearningRate 0.0407 Epoch: 7 Global Step: 120820 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:21:56,795-Speed 5192.25 samples/sec Loss 2.8428 LearningRate 0.0407 Epoch: 7 Global Step: 120830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:21:58,791-Speed 5132.78 samples/sec Loss 2.8553 LearningRate 0.0407 Epoch: 7 Global Step: 120840 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:22:00,784-Speed 5139.96 samples/sec Loss 2.8574 LearningRate 0.0407 Epoch: 7 Global Step: 120850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:22:02,777-Speed 5139.97 samples/sec Loss 2.9162 LearningRate 0.0407 Epoch: 7 Global Step: 120860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:22:04,759-Speed 5168.24 samples/sec Loss 2.8666 LearningRate 0.0407 Epoch: 7 Global Step: 120870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:22:06,736-Speed 5180.57 samples/sec Loss 2.8607 LearningRate 0.0407 Epoch: 7 Global Step: 120880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:22:08,713-Speed 5183.14 samples/sec Loss 2.8426 LearningRate 0.0407 Epoch: 7 Global Step: 120890 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:22:10,700-Speed 5154.17 samples/sec Loss 2.9409 LearningRate 0.0407 Epoch: 7 Global Step: 120900 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:22:12,670-Speed 5198.22 samples/sec Loss 2.8369 LearningRate 0.0407 Epoch: 7 Global Step: 120910 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:22:14,644-Speed 5191.44 samples/sec Loss 2.7973 LearningRate 0.0407 Epoch: 7 Global Step: 120920 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:22:16,627-Speed 5163.25 samples/sec Loss 2.8861 LearningRate 0.0407 Epoch: 7 Global Step: 120930 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:22:18,608-Speed 5171.25 samples/sec Loss 2.8774 LearningRate 0.0407 Epoch: 7 Global Step: 120940 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:22:20,599-Speed 5144.73 samples/sec Loss 2.7896 LearningRate 0.0407 Epoch: 7 Global Step: 120950 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:22:22,598-Speed 5124.43 samples/sec Loss 2.8304 LearningRate 0.0407 Epoch: 7 Global Step: 120960 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:22:24,584-Speed 5157.95 samples/sec Loss 2.8679 LearningRate 0.0407 Epoch: 7 Global Step: 120970 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:22:26,572-Speed 5153.93 samples/sec Loss 2.8668 LearningRate 0.0407 Epoch: 7 Global Step: 120980 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:22:28,550-Speed 5178.89 samples/sec Loss 2.8231 LearningRate 0.0406 Epoch: 7 Global Step: 120990 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:22:30,556-Speed 5106.71 samples/sec Loss 2.8274 LearningRate 0.0406 Epoch: 7 Global Step: 121000 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:22:32,526-Speed 5198.74 samples/sec Loss 2.8829 LearningRate 0.0406 Epoch: 7 Global Step: 121010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:22:34,515-Speed 5150.43 samples/sec Loss 2.8277 LearningRate 0.0406 Epoch: 7 Global Step: 121020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:22:36,492-Speed 5179.81 samples/sec Loss 2.8392 LearningRate 0.0406 Epoch: 7 Global Step: 121030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:22:38,466-Speed 5188.75 samples/sec Loss 2.8309 LearningRate 0.0406 Epoch: 7 Global Step: 121040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:22:40,451-Speed 5160.81 samples/sec Loss 2.9091 LearningRate 0.0406 Epoch: 7 Global Step: 121050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:22:45,789-Speed 1918.65 samples/sec Loss 2.8943 LearningRate 0.0406 Epoch: 7 Global Step: 121060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:22:47,796-Speed 5105.16 samples/sec Loss 2.7965 LearningRate 0.0406 Epoch: 7 Global Step: 121070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:22:49,779-Speed 5166.33 samples/sec Loss 2.8749 LearningRate 0.0406 Epoch: 7 Global Step: 121080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:22:51,776-Speed 5129.30 samples/sec Loss 2.8948 LearningRate 0.0406 Epoch: 7 Global Step: 121090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:22:53,757-Speed 5168.65 samples/sec Loss 2.9440 LearningRate 0.0406 Epoch: 7 Global Step: 121100 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:22:55,731-Speed 5190.86 samples/sec Loss 2.8944 LearningRate 0.0406 Epoch: 7 Global Step: 121110 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:22:57,707-Speed 5183.26 samples/sec Loss 2.9041 LearningRate 0.0406 Epoch: 7 Global Step: 121120 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:22:59,718-Speed 5092.84 samples/sec Loss 2.8120 LearningRate 0.0406 Epoch: 7 Global Step: 121130 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:23:01,740-Speed 5066.45 samples/sec Loss 2.8894 LearningRate 0.0406 Epoch: 7 Global Step: 121140 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:23:03,727-Speed 5156.63 samples/sec Loss 2.8871 LearningRate 0.0406 Epoch: 7 Global Step: 121150 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:23:05,720-Speed 5137.99 samples/sec Loss 2.9292 LearningRate 0.0406 Epoch: 7 Global Step: 121160 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:23:07,699-Speed 5177.90 samples/sec Loss 2.8654 LearningRate 0.0406 Epoch: 7 Global Step: 121170 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:23:09,686-Speed 5155.02 samples/sec Loss 2.8783 LearningRate 0.0406 Epoch: 7 Global Step: 121180 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:23:11,666-Speed 5172.99 samples/sec Loss 2.9206 LearningRate 0.0406 Epoch: 7 Global Step: 121190 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:23:13,653-Speed 5155.35 samples/sec Loss 2.9033 LearningRate 0.0406 Epoch: 7 Global Step: 121200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:23:15,648-Speed 5133.88 samples/sec Loss 2.8743 LearningRate 0.0406 Epoch: 7 Global Step: 121210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:23:17,635-Speed 5156.40 samples/sec Loss 2.8865 LearningRate 0.0406 Epoch: 7 Global Step: 121220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:23:19,607-Speed 5193.87 samples/sec Loss 2.9347 LearningRate 0.0406 Epoch: 7 Global Step: 121230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:23:21,583-Speed 5182.23 samples/sec Loss 2.8660 LearningRate 0.0406 Epoch: 7 Global Step: 121240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:23:23,556-Speed 5191.73 samples/sec Loss 2.9112 LearningRate 0.0405 Epoch: 7 Global Step: 121250 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:23:25,559-Speed 5114.11 samples/sec Loss 2.9043 LearningRate 0.0405 Epoch: 7 Global Step: 121260 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:23:27,539-Speed 5174.02 samples/sec Loss 2.8862 LearningRate 0.0405 Epoch: 7 Global Step: 121270 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:23:29,523-Speed 5163.37 samples/sec Loss 2.8597 LearningRate 0.0405 Epoch: 7 Global Step: 121280 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:23:31,497-Speed 5190.46 samples/sec Loss 2.8059 LearningRate 0.0405 Epoch: 7 Global Step: 121290 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:23:33,502-Speed 5108.95 samples/sec Loss 2.8364 LearningRate 0.0405 Epoch: 7 Global Step: 121300 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:23:35,478-Speed 5185.60 samples/sec Loss 2.8536 LearningRate 0.0405 Epoch: 7 Global Step: 121310 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:23:37,451-Speed 5190.22 samples/sec Loss 2.9596 LearningRate 0.0405 Epoch: 7 Global Step: 121320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:23:39,435-Speed 5164.20 samples/sec Loss 2.8509 LearningRate 0.0405 Epoch: 7 Global Step: 121330 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:23:41,413-Speed 5176.30 samples/sec Loss 2.9400 LearningRate 0.0405 Epoch: 7 Global Step: 121340 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:23:43,391-Speed 5180.99 samples/sec Loss 2.9430 LearningRate 0.0405 Epoch: 7 Global Step: 121350 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:23:45,366-Speed 5185.50 samples/sec Loss 2.9487 LearningRate 0.0405 Epoch: 7 Global Step: 121360 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:23:47,355-Speed 5149.02 samples/sec Loss 2.9151 LearningRate 0.0405 Epoch: 7 Global Step: 121370 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:23:49,337-Speed 5169.61 samples/sec Loss 2.9463 LearningRate 0.0405 Epoch: 7 Global Step: 121380 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:23:51,312-Speed 5187.47 samples/sec Loss 2.9170 LearningRate 0.0405 Epoch: 7 Global Step: 121390 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:23:53,287-Speed 5186.29 samples/sec Loss 2.9113 LearningRate 0.0405 Epoch: 7 Global Step: 121400 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:23:55,264-Speed 5180.41 samples/sec Loss 2.9532 LearningRate 0.0405 Epoch: 7 Global Step: 121410 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:23:57,243-Speed 5177.16 samples/sec Loss 2.8828 LearningRate 0.0405 Epoch: 7 Global Step: 121420 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:23:59,244-Speed 5117.62 samples/sec Loss 2.8631 LearningRate 0.0405 Epoch: 7 Global Step: 121430 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:24:01,250-Speed 5105.73 samples/sec Loss 2.8384 LearningRate 0.0405 Epoch: 7 Global Step: 121440 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:24:03,235-Speed 5162.25 samples/sec Loss 2.8974 LearningRate 0.0405 Epoch: 7 Global Step: 121450 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:24:05,223-Speed 5151.55 samples/sec Loss 2.9355 LearningRate 0.0405 Epoch: 7 Global Step: 121460 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:24:07,196-Speed 5190.62 samples/sec Loss 2.8634 LearningRate 0.0405 Epoch: 7 Global Step: 121470 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:24:09,170-Speed 5188.94 samples/sec Loss 2.8514 LearningRate 0.0405 Epoch: 7 Global Step: 121480 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:24:11,140-Speed 5201.24 samples/sec Loss 2.8644 LearningRate 0.0405 Epoch: 7 Global Step: 121490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:24:13,111-Speed 5197.63 samples/sec Loss 2.8421 LearningRate 0.0405 Epoch: 7 Global Step: 121500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:24:15,092-Speed 5169.48 samples/sec Loss 2.9653 LearningRate 0.0404 Epoch: 7 Global Step: 121510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:24:17,066-Speed 5190.62 samples/sec Loss 2.8906 LearningRate 0.0404 Epoch: 7 Global Step: 121520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:24:19,043-Speed 5181.74 samples/sec Loss 2.8593 LearningRate 0.0404 Epoch: 7 Global Step: 121530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:24:21,017-Speed 5186.75 samples/sec Loss 2.9058 LearningRate 0.0404 Epoch: 7 Global Step: 121540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:24:23,005-Speed 5154.13 samples/sec Loss 2.9528 LearningRate 0.0404 Epoch: 7 Global Step: 121550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:24:24,989-Speed 5163.11 samples/sec Loss 2.9534 LearningRate 0.0404 Epoch: 7 Global Step: 121560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:24:26,991-Speed 5115.07 samples/sec Loss 2.8468 LearningRate 0.0404 Epoch: 7 Global Step: 121570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:24:28,975-Speed 5165.02 samples/sec Loss 2.9032 LearningRate 0.0404 Epoch: 7 Global Step: 121580 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:24:30,959-Speed 5161.07 samples/sec Loss 2.8897 LearningRate 0.0404 Epoch: 7 Global Step: 121590 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:24:32,954-Speed 5137.19 samples/sec Loss 2.9005 LearningRate 0.0404 Epoch: 7 Global Step: 121600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:24:34,926-Speed 5192.28 samples/sec Loss 2.9594 LearningRate 0.0404 Epoch: 7 Global Step: 121610 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:24:36,928-Speed 5116.43 samples/sec Loss 2.8936 LearningRate 0.0404 Epoch: 7 Global Step: 121620 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:24:38,913-Speed 5161.63 samples/sec Loss 2.8902 LearningRate 0.0404 Epoch: 7 Global Step: 121630 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:24:40,900-Speed 5155.12 samples/sec Loss 2.8992 LearningRate 0.0404 Epoch: 7 Global Step: 121640 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:24:42,892-Speed 5141.49 samples/sec Loss 2.8389 LearningRate 0.0404 Epoch: 7 Global Step: 121650 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:24:44,857-Speed 5211.87 samples/sec Loss 2.8607 LearningRate 0.0404 Epoch: 7 Global Step: 121660 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:24:46,840-Speed 5166.80 samples/sec Loss 2.9259 LearningRate 0.0404 Epoch: 7 Global Step: 121670 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:24:48,844-Speed 5110.00 samples/sec Loss 2.8852 LearningRate 0.0404 Epoch: 7 Global Step: 121680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:24:50,834-Speed 5149.56 samples/sec Loss 2.8873 LearningRate 0.0404 Epoch: 7 Global Step: 121690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:24:52,813-Speed 5175.13 samples/sec Loss 2.9139 LearningRate 0.0404 Epoch: 7 Global Step: 121700 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:24:54,805-Speed 5142.75 samples/sec Loss 2.9104 LearningRate 0.0404 Epoch: 7 Global Step: 121710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:24:56,801-Speed 5131.41 samples/sec Loss 2.8399 LearningRate 0.0404 Epoch: 7 Global Step: 121720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:24:58,782-Speed 5172.23 samples/sec Loss 2.8112 LearningRate 0.0404 Epoch: 7 Global Step: 121730 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:25:00,783-Speed 5118.49 samples/sec Loss 2.8626 LearningRate 0.0404 Epoch: 7 Global Step: 121740 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:25:02,772-Speed 5150.45 samples/sec Loss 2.9176 LearningRate 0.0404 Epoch: 7 Global Step: 121750 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:25:04,755-Speed 5167.06 samples/sec Loss 2.8578 LearningRate 0.0404 Epoch: 7 Global Step: 121760 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:25:06,733-Speed 5178.58 samples/sec Loss 2.9145 LearningRate 0.0404 Epoch: 7 Global Step: 121770 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:25:08,701-Speed 5206.46 samples/sec Loss 2.8779 LearningRate 0.0403 Epoch: 7 Global Step: 121780 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:25:10,675-Speed 5189.57 samples/sec Loss 2.9322 LearningRate 0.0403 Epoch: 7 Global Step: 121790 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:25:12,660-Speed 5158.82 samples/sec Loss 2.9060 LearningRate 0.0403 Epoch: 7 Global Step: 121800 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:25:14,665-Speed 5110.26 samples/sec Loss 2.8978 LearningRate 0.0403 Epoch: 7 Global Step: 121810 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:25:16,647-Speed 5166.76 samples/sec Loss 2.9429 LearningRate 0.0403 Epoch: 7 Global Step: 121820 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:25:18,631-Speed 5162.84 samples/sec Loss 2.9000 LearningRate 0.0403 Epoch: 7 Global Step: 121830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:25:20,612-Speed 5171.64 samples/sec Loss 2.9346 LearningRate 0.0403 Epoch: 7 Global Step: 121840 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:25:22,595-Speed 5166.47 samples/sec Loss 2.8762 LearningRate 0.0403 Epoch: 7 Global Step: 121850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:25:24,580-Speed 5158.84 samples/sec Loss 2.9002 LearningRate 0.0403 Epoch: 7 Global Step: 121860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:25:26,569-Speed 5149.80 samples/sec Loss 2.9256 LearningRate 0.0403 Epoch: 7 Global Step: 121870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:25:28,550-Speed 5172.43 samples/sec Loss 2.9718 LearningRate 0.0403 Epoch: 7 Global Step: 121880 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:25:30,535-Speed 5160.10 samples/sec Loss 2.9295 LearningRate 0.0403 Epoch: 7 Global Step: 121890 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:25:32,508-Speed 5192.26 samples/sec Loss 2.9788 LearningRate 0.0403 Epoch: 7 Global Step: 121900 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:25:34,485-Speed 5180.77 samples/sec Loss 2.9185 LearningRate 0.0403 Epoch: 7 Global Step: 121910 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:25:36,465-Speed 5174.31 samples/sec Loss 2.9248 LearningRate 0.0403 Epoch: 7 Global Step: 121920 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:25:38,446-Speed 5170.32 samples/sec Loss 2.9116 LearningRate 0.0403 Epoch: 7 Global Step: 121930 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:25:40,453-Speed 5104.09 samples/sec Loss 2.9259 LearningRate 0.0403 Epoch: 7 Global Step: 121940 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:25:42,446-Speed 5139.64 samples/sec Loss 2.9549 LearningRate 0.0403 Epoch: 7 Global Step: 121950 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:25:44,433-Speed 5155.45 samples/sec Loss 2.9587 LearningRate 0.0403 Epoch: 7 Global Step: 121960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:25:46,417-Speed 5161.44 samples/sec Loss 2.8689 LearningRate 0.0403 Epoch: 7 Global Step: 121970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:25:48,413-Speed 5133.32 samples/sec Loss 2.8581 LearningRate 0.0403 Epoch: 7 Global Step: 121980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:25:50,430-Speed 5077.78 samples/sec Loss 2.9464 LearningRate 0.0403 Epoch: 7 Global Step: 121990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:25:52,410-Speed 5174.45 samples/sec Loss 2.9106 LearningRate 0.0403 Epoch: 7 Global Step: 122000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:26:19,085-[lfw][122000]XNorm: 23.622218 Training: 2022-04-11 07:26:19,085-[lfw][122000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 07:26:19,086-[lfw][122000]Accuracy-Highest: 0.99817 Training: 2022-04-11 07:26:50,123-[cfp_fp][122000]XNorm: 21.543124 Training: 2022-04-11 07:26:50,124-[cfp_fp][122000]Accuracy-Flip: 0.97971+-0.00733 Training: 2022-04-11 07:26:50,124-[cfp_fp][122000]Accuracy-Highest: 0.98443 Training: 2022-04-11 07:27:16,880-[agedb_30][122000]XNorm: 23.255424 Training: 2022-04-11 07:27:16,880-[agedb_30][122000]Accuracy-Flip: 0.97917+-0.00588 Training: 2022-04-11 07:27:16,881-[agedb_30][122000]Accuracy-Highest: 0.98150 Training: 2022-04-11 07:27:18,871-Speed 118.43 samples/sec Loss 2.9167 LearningRate 0.0403 Epoch: 7 Global Step: 122010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:27:20,847-Speed 5185.07 samples/sec Loss 2.9151 LearningRate 0.0403 Epoch: 7 Global Step: 122020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:27:22,820-Speed 5190.51 samples/sec Loss 2.8698 LearningRate 0.0403 Epoch: 7 Global Step: 122030 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:27:24,792-Speed 5196.47 samples/sec Loss 2.9647 LearningRate 0.0402 Epoch: 7 Global Step: 122040 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:27:26,760-Speed 5204.49 samples/sec Loss 2.9301 LearningRate 0.0402 Epoch: 7 Global Step: 122050 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:27:28,715-Speed 5238.73 samples/sec Loss 2.8441 LearningRate 0.0402 Epoch: 7 Global Step: 122060 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:27:30,681-Speed 5210.34 samples/sec Loss 2.9292 LearningRate 0.0402 Epoch: 7 Global Step: 122070 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:27:32,650-Speed 5202.73 samples/sec Loss 2.9496 LearningRate 0.0402 Epoch: 7 Global Step: 122080 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:27:34,617-Speed 5208.39 samples/sec Loss 2.9196 LearningRate 0.0402 Epoch: 7 Global Step: 122090 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:27:36,585-Speed 5204.10 samples/sec Loss 2.8879 LearningRate 0.0402 Epoch: 7 Global Step: 122100 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:27:38,572-Speed 5155.21 samples/sec Loss 2.9610 LearningRate 0.0402 Epoch: 7 Global Step: 122110 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:27:40,557-Speed 5160.83 samples/sec Loss 2.8703 LearningRate 0.0402 Epoch: 7 Global Step: 122120 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:27:42,534-Speed 5180.52 samples/sec Loss 2.9177 LearningRate 0.0402 Epoch: 7 Global Step: 122130 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:27:44,513-Speed 5176.97 samples/sec Loss 2.9736 LearningRate 0.0402 Epoch: 7 Global Step: 122140 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:27:46,497-Speed 5162.85 samples/sec Loss 2.8551 LearningRate 0.0402 Epoch: 7 Global Step: 122150 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:27:48,474-Speed 5181.66 samples/sec Loss 2.8923 LearningRate 0.0402 Epoch: 7 Global Step: 122160 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:27:50,460-Speed 5157.83 samples/sec Loss 2.8911 LearningRate 0.0402 Epoch: 7 Global Step: 122170 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:27:52,458-Speed 5128.54 samples/sec Loss 2.8649 LearningRate 0.0402 Epoch: 7 Global Step: 122180 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:27:54,439-Speed 5170.62 samples/sec Loss 2.8966 LearningRate 0.0402 Epoch: 7 Global Step: 122190 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:27:56,423-Speed 5162.26 samples/sec Loss 2.9460 LearningRate 0.0402 Epoch: 7 Global Step: 122200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:27:58,413-Speed 5149.61 samples/sec Loss 2.9285 LearningRate 0.0402 Epoch: 7 Global Step: 122210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:28:00,395-Speed 5167.43 samples/sec Loss 2.9351 LearningRate 0.0402 Epoch: 7 Global Step: 122220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:28:02,374-Speed 5174.88 samples/sec Loss 2.9297 LearningRate 0.0402 Epoch: 7 Global Step: 122230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:28:04,348-Speed 5188.47 samples/sec Loss 2.9788 LearningRate 0.0402 Epoch: 7 Global Step: 122240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:28:06,317-Speed 5203.03 samples/sec Loss 2.9964 LearningRate 0.0402 Epoch: 7 Global Step: 122250 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:28:08,296-Speed 5176.15 samples/sec Loss 2.9054 LearningRate 0.0402 Epoch: 7 Global Step: 122260 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:28:10,273-Speed 5182.18 samples/sec Loss 2.9325 LearningRate 0.0402 Epoch: 7 Global Step: 122270 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:28:12,252-Speed 5176.49 samples/sec Loss 2.8652 LearningRate 0.0402 Epoch: 7 Global Step: 122280 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:28:14,225-Speed 5191.61 samples/sec Loss 2.9204 LearningRate 0.0402 Epoch: 7 Global Step: 122290 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:28:16,229-Speed 5110.20 samples/sec Loss 2.9388 LearningRate 0.0401 Epoch: 7 Global Step: 122300 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:28:18,205-Speed 5182.91 samples/sec Loss 2.8753 LearningRate 0.0401 Epoch: 7 Global Step: 122310 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:28:20,171-Speed 5213.20 samples/sec Loss 2.9767 LearningRate 0.0401 Epoch: 7 Global Step: 122320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:28:22,132-Speed 5221.09 samples/sec Loss 2.9493 LearningRate 0.0401 Epoch: 7 Global Step: 122330 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:28:24,118-Speed 5157.99 samples/sec Loss 2.9637 LearningRate 0.0401 Epoch: 7 Global Step: 122340 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:28:26,099-Speed 5171.51 samples/sec Loss 2.9483 LearningRate 0.0401 Epoch: 7 Global Step: 122350 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:28:28,073-Speed 5190.29 samples/sec Loss 2.9933 LearningRate 0.0401 Epoch: 7 Global Step: 122360 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:28:30,042-Speed 5201.83 samples/sec Loss 2.9825 LearningRate 0.0401 Epoch: 7 Global Step: 122370 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:28:32,018-Speed 5184.15 samples/sec Loss 2.8866 LearningRate 0.0401 Epoch: 7 Global Step: 122380 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:28:33,990-Speed 5194.98 samples/sec Loss 2.9280 LearningRate 0.0401 Epoch: 7 Global Step: 122390 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:28:35,960-Speed 5199.75 samples/sec Loss 2.9337 LearningRate 0.0401 Epoch: 7 Global Step: 122400 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:28:37,936-Speed 5182.88 samples/sec Loss 2.8974 LearningRate 0.0401 Epoch: 7 Global Step: 122410 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:28:39,909-Speed 5191.40 samples/sec Loss 2.9082 LearningRate 0.0401 Epoch: 7 Global Step: 122420 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:28:41,893-Speed 5163.43 samples/sec Loss 3.0220 LearningRate 0.0401 Epoch: 7 Global Step: 122430 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:28:43,863-Speed 5200.29 samples/sec Loss 2.8686 LearningRate 0.0401 Epoch: 7 Global Step: 122440 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:28:45,887-Speed 5059.66 samples/sec Loss 2.8775 LearningRate 0.0401 Epoch: 7 Global Step: 122450 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:28:47,872-Speed 5161.91 samples/sec Loss 2.9151 LearningRate 0.0401 Epoch: 7 Global Step: 122460 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:28:49,873-Speed 5119.16 samples/sec Loss 2.9298 LearningRate 0.0401 Epoch: 7 Global Step: 122470 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:28:51,846-Speed 5190.96 samples/sec Loss 2.9116 LearningRate 0.0401 Epoch: 7 Global Step: 122480 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:28:53,845-Speed 5123.59 samples/sec Loss 2.9044 LearningRate 0.0401 Epoch: 7 Global Step: 122490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:28:55,814-Speed 5205.06 samples/sec Loss 2.8959 LearningRate 0.0401 Epoch: 7 Global Step: 122500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:28:57,787-Speed 5191.93 samples/sec Loss 2.9246 LearningRate 0.0401 Epoch: 7 Global Step: 122510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:28:59,773-Speed 5157.86 samples/sec Loss 2.8644 LearningRate 0.0401 Epoch: 7 Global Step: 122520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:29:01,760-Speed 5154.22 samples/sec Loss 2.9567 LearningRate 0.0401 Epoch: 7 Global Step: 122530 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:29:03,727-Speed 5205.73 samples/sec Loss 2.8975 LearningRate 0.0401 Epoch: 7 Global Step: 122540 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:29:05,697-Speed 5201.06 samples/sec Loss 2.9699 LearningRate 0.0401 Epoch: 7 Global Step: 122550 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:29:07,661-Speed 5216.43 samples/sec Loss 2.9023 LearningRate 0.0401 Epoch: 7 Global Step: 122560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:29:09,652-Speed 5144.13 samples/sec Loss 2.9309 LearningRate 0.0400 Epoch: 7 Global Step: 122570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:29:11,660-Speed 5100.01 samples/sec Loss 2.9110 LearningRate 0.0400 Epoch: 7 Global Step: 122580 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:29:13,673-Speed 5088.51 samples/sec Loss 2.9453 LearningRate 0.0400 Epoch: 7 Global Step: 122590 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:29:15,646-Speed 5193.11 samples/sec Loss 2.8830 LearningRate 0.0400 Epoch: 7 Global Step: 122600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:29:17,617-Speed 5195.76 samples/sec Loss 2.9692 LearningRate 0.0400 Epoch: 7 Global Step: 122610 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:29:19,590-Speed 5192.92 samples/sec Loss 2.9825 LearningRate 0.0400 Epoch: 7 Global Step: 122620 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:29:21,576-Speed 5156.75 samples/sec Loss 2.8820 LearningRate 0.0400 Epoch: 7 Global Step: 122630 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:29:23,548-Speed 5195.03 samples/sec Loss 2.9398 LearningRate 0.0400 Epoch: 7 Global Step: 122640 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:29:25,520-Speed 5194.02 samples/sec Loss 2.9923 LearningRate 0.0400 Epoch: 7 Global Step: 122650 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:29:27,507-Speed 5156.36 samples/sec Loss 2.8540 LearningRate 0.0400 Epoch: 7 Global Step: 122660 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:29:29,477-Speed 5199.31 samples/sec Loss 2.9836 LearningRate 0.0400 Epoch: 7 Global Step: 122670 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:29:31,446-Speed 5202.37 samples/sec Loss 2.9341 LearningRate 0.0400 Epoch: 7 Global Step: 122680 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:29:33,431-Speed 5159.50 samples/sec Loss 2.9391 LearningRate 0.0400 Epoch: 7 Global Step: 122690 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:29:35,409-Speed 5178.78 samples/sec Loss 2.9286 LearningRate 0.0400 Epoch: 7 Global Step: 122700 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:29:37,399-Speed 5149.83 samples/sec Loss 2.9208 LearningRate 0.0400 Epoch: 7 Global Step: 122710 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:29:39,377-Speed 5177.95 samples/sec Loss 2.8774 LearningRate 0.0400 Epoch: 7 Global Step: 122720 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:29:41,367-Speed 5146.59 samples/sec Loss 3.0116 LearningRate 0.0400 Epoch: 7 Global Step: 122730 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:29:43,331-Speed 5215.52 samples/sec Loss 2.9209 LearningRate 0.0400 Epoch: 7 Global Step: 122740 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:29:45,306-Speed 5185.86 samples/sec Loss 2.9508 LearningRate 0.0400 Epoch: 7 Global Step: 122750 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:29:47,276-Speed 5198.70 samples/sec Loss 2.8748 LearningRate 0.0400 Epoch: 7 Global Step: 122760 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:29:49,274-Speed 5127.78 samples/sec Loss 2.8790 LearningRate 0.0400 Epoch: 7 Global Step: 122770 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:29:51,272-Speed 5127.63 samples/sec Loss 2.8331 LearningRate 0.0400 Epoch: 7 Global Step: 122780 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:29:53,244-Speed 5193.35 samples/sec Loss 2.9583 LearningRate 0.0400 Epoch: 7 Global Step: 122790 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:29:55,239-Speed 5135.83 samples/sec Loss 2.9760 LearningRate 0.0400 Epoch: 7 Global Step: 122800 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:29:57,209-Speed 5199.93 samples/sec Loss 2.8417 LearningRate 0.0400 Epoch: 7 Global Step: 122810 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:29:59,194-Speed 5159.22 samples/sec Loss 2.9337 LearningRate 0.0400 Epoch: 7 Global Step: 122820 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:30:01,181-Speed 5156.30 samples/sec Loss 2.9392 LearningRate 0.0399 Epoch: 7 Global Step: 122830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:30:03,158-Speed 5181.78 samples/sec Loss 2.8964 LearningRate 0.0399 Epoch: 7 Global Step: 122840 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:30:05,125-Speed 5205.93 samples/sec Loss 2.9736 LearningRate 0.0399 Epoch: 7 Global Step: 122850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:30:07,101-Speed 5183.57 samples/sec Loss 2.9213 LearningRate 0.0399 Epoch: 7 Global Step: 122860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:30:09,064-Speed 5218.18 samples/sec Loss 2.9511 LearningRate 0.0399 Epoch: 7 Global Step: 122870 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:30:11,057-Speed 5139.79 samples/sec Loss 2.8959 LearningRate 0.0399 Epoch: 7 Global Step: 122880 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:30:13,036-Speed 5177.11 samples/sec Loss 2.8998 LearningRate 0.0399 Epoch: 7 Global Step: 122890 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:30:15,014-Speed 5176.80 samples/sec Loss 2.9191 LearningRate 0.0399 Epoch: 7 Global Step: 122900 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:30:16,989-Speed 5188.35 samples/sec Loss 2.9309 LearningRate 0.0399 Epoch: 7 Global Step: 122910 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:30:18,972-Speed 5166.62 samples/sec Loss 2.8661 LearningRate 0.0399 Epoch: 7 Global Step: 122920 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:30:20,972-Speed 5120.31 samples/sec Loss 2.9212 LearningRate 0.0399 Epoch: 7 Global Step: 122930 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:30:22,963-Speed 5146.01 samples/sec Loss 2.8967 LearningRate 0.0399 Epoch: 7 Global Step: 122940 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:30:24,949-Speed 5157.01 samples/sec Loss 2.9496 LearningRate 0.0399 Epoch: 7 Global Step: 122950 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:30:26,924-Speed 5187.78 samples/sec Loss 2.8823 LearningRate 0.0399 Epoch: 7 Global Step: 122960 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:30:28,901-Speed 5180.87 samples/sec Loss 2.8904 LearningRate 0.0399 Epoch: 7 Global Step: 122970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:30:30,911-Speed 5095.72 samples/sec Loss 2.9623 LearningRate 0.0399 Epoch: 7 Global Step: 122980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:30:32,887-Speed 5182.43 samples/sec Loss 2.8964 LearningRate 0.0399 Epoch: 7 Global Step: 122990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:30:34,871-Speed 5165.26 samples/sec Loss 3.0051 LearningRate 0.0399 Epoch: 7 Global Step: 123000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:30:36,864-Speed 5139.03 samples/sec Loss 2.9333 LearningRate 0.0399 Epoch: 7 Global Step: 123010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:30:38,839-Speed 5187.14 samples/sec Loss 2.9024 LearningRate 0.0399 Epoch: 7 Global Step: 123020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:30:40,807-Speed 5204.21 samples/sec Loss 2.9911 LearningRate 0.0399 Epoch: 7 Global Step: 123030 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:30:42,780-Speed 5191.25 samples/sec Loss 2.9741 LearningRate 0.0399 Epoch: 7 Global Step: 123040 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:30:44,783-Speed 5113.90 samples/sec Loss 2.9272 LearningRate 0.0399 Epoch: 7 Global Step: 123050 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:30:46,777-Speed 5136.25 samples/sec Loss 2.9110 LearningRate 0.0399 Epoch: 7 Global Step: 123060 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:30:48,762-Speed 5161.95 samples/sec Loss 2.9580 LearningRate 0.0399 Epoch: 7 Global Step: 123070 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:30:50,768-Speed 5107.21 samples/sec Loss 2.9409 LearningRate 0.0399 Epoch: 7 Global Step: 123080 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:30:52,754-Speed 5157.32 samples/sec Loss 2.9229 LearningRate 0.0398 Epoch: 7 Global Step: 123090 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:30:54,738-Speed 5161.86 samples/sec Loss 2.9628 LearningRate 0.0398 Epoch: 7 Global Step: 123100 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:30:56,732-Speed 5136.14 samples/sec Loss 2.9807 LearningRate 0.0398 Epoch: 7 Global Step: 123110 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:30:58,710-Speed 5180.22 samples/sec Loss 2.9449 LearningRate 0.0398 Epoch: 7 Global Step: 123120 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:31:00,688-Speed 5180.21 samples/sec Loss 2.8925 LearningRate 0.0398 Epoch: 7 Global Step: 123130 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:31:02,661-Speed 5189.50 samples/sec Loss 2.9259 LearningRate 0.0398 Epoch: 7 Global Step: 123140 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:31:04,638-Speed 5183.06 samples/sec Loss 2.9616 LearningRate 0.0398 Epoch: 7 Global Step: 123150 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:31:06,610-Speed 5193.39 samples/sec Loss 2.9327 LearningRate 0.0398 Epoch: 7 Global Step: 123160 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:31:08,593-Speed 5166.77 samples/sec Loss 2.9264 LearningRate 0.0398 Epoch: 7 Global Step: 123170 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:31:10,606-Speed 5088.63 samples/sec Loss 2.9569 LearningRate 0.0398 Epoch: 7 Global Step: 123180 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:31:12,593-Speed 5153.29 samples/sec Loss 2.8731 LearningRate 0.0398 Epoch: 7 Global Step: 123190 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:31:14,598-Speed 5110.06 samples/sec Loss 2.9420 LearningRate 0.0398 Epoch: 7 Global Step: 123200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:31:16,586-Speed 5151.20 samples/sec Loss 2.9500 LearningRate 0.0398 Epoch: 7 Global Step: 123210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:31:18,568-Speed 5169.40 samples/sec Loss 2.9336 LearningRate 0.0398 Epoch: 7 Global Step: 123220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:31:20,540-Speed 5195.29 samples/sec Loss 2.9655 LearningRate 0.0398 Epoch: 7 Global Step: 123230 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:31:22,523-Speed 5164.87 samples/sec Loss 2.9332 LearningRate 0.0398 Epoch: 7 Global Step: 123240 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:31:24,500-Speed 5181.37 samples/sec Loss 2.9430 LearningRate 0.0398 Epoch: 7 Global Step: 123250 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:31:26,520-Speed 5069.57 samples/sec Loss 2.8706 LearningRate 0.0398 Epoch: 7 Global Step: 123260 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:31:28,498-Speed 5179.32 samples/sec Loss 2.9411 LearningRate 0.0398 Epoch: 7 Global Step: 123270 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:31:30,474-Speed 5183.88 samples/sec Loss 2.8956 LearningRate 0.0398 Epoch: 7 Global Step: 123280 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:31:32,469-Speed 5136.16 samples/sec Loss 2.9133 LearningRate 0.0398 Epoch: 7 Global Step: 123290 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:31:34,451-Speed 5166.43 samples/sec Loss 2.9018 LearningRate 0.0398 Epoch: 7 Global Step: 123300 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:31:36,458-Speed 5105.14 samples/sec Loss 3.0256 LearningRate 0.0398 Epoch: 7 Global Step: 123310 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:31:38,446-Speed 5151.40 samples/sec Loss 2.9284 LearningRate 0.0398 Epoch: 7 Global Step: 123320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:31:40,456-Speed 5096.52 samples/sec Loss 2.9268 LearningRate 0.0398 Epoch: 7 Global Step: 123330 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:31:42,427-Speed 5197.22 samples/sec Loss 3.0056 LearningRate 0.0398 Epoch: 7 Global Step: 123340 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:31:44,403-Speed 5183.77 samples/sec Loss 2.9078 LearningRate 0.0398 Epoch: 7 Global Step: 123350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:31:46,380-Speed 5181.79 samples/sec Loss 3.0028 LearningRate 0.0397 Epoch: 7 Global Step: 123360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:31:48,360-Speed 5174.34 samples/sec Loss 2.9521 LearningRate 0.0397 Epoch: 7 Global Step: 123370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:31:50,336-Speed 5181.86 samples/sec Loss 2.9697 LearningRate 0.0397 Epoch: 7 Global Step: 123380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:31:52,312-Speed 5184.63 samples/sec Loss 2.9046 LearningRate 0.0397 Epoch: 7 Global Step: 123390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:31:54,285-Speed 5193.06 samples/sec Loss 2.9498 LearningRate 0.0397 Epoch: 7 Global Step: 123400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:31:56,269-Speed 5162.07 samples/sec Loss 2.9522 LearningRate 0.0397 Epoch: 7 Global Step: 123410 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:31:58,260-Speed 5143.10 samples/sec Loss 2.9754 LearningRate 0.0397 Epoch: 7 Global Step: 123420 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:32:00,245-Speed 5161.90 samples/sec Loss 2.8824 LearningRate 0.0397 Epoch: 7 Global Step: 123430 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:32:02,225-Speed 5174.26 samples/sec Loss 2.9344 LearningRate 0.0397 Epoch: 7 Global Step: 123440 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:32:04,216-Speed 5146.70 samples/sec Loss 2.9712 LearningRate 0.0397 Epoch: 7 Global Step: 123450 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:32:06,188-Speed 5193.40 samples/sec Loss 2.9455 LearningRate 0.0397 Epoch: 7 Global Step: 123460 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:32:08,161-Speed 5191.25 samples/sec Loss 2.9269 LearningRate 0.0397 Epoch: 7 Global Step: 123470 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:32:10,131-Speed 5200.46 samples/sec Loss 2.8861 LearningRate 0.0397 Epoch: 7 Global Step: 123480 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:32:12,108-Speed 5181.90 samples/sec Loss 2.9630 LearningRate 0.0397 Epoch: 7 Global Step: 123490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:32:14,080-Speed 5192.89 samples/sec Loss 2.9781 LearningRate 0.0397 Epoch: 7 Global Step: 123500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:32:16,053-Speed 5191.35 samples/sec Loss 2.9655 LearningRate 0.0397 Epoch: 7 Global Step: 123510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:32:18,029-Speed 5183.16 samples/sec Loss 2.8883 LearningRate 0.0397 Epoch: 7 Global Step: 123520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:32:20,002-Speed 5193.57 samples/sec Loss 2.9908 LearningRate 0.0397 Epoch: 7 Global Step: 123530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:32:21,977-Speed 5186.01 samples/sec Loss 2.9205 LearningRate 0.0397 Epoch: 7 Global Step: 123540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:32:23,974-Speed 5129.32 samples/sec Loss 2.9001 LearningRate 0.0397 Epoch: 7 Global Step: 123550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:32:25,964-Speed 5147.77 samples/sec Loss 2.9600 LearningRate 0.0397 Epoch: 7 Global Step: 123560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:32:27,937-Speed 5191.17 samples/sec Loss 2.9257 LearningRate 0.0397 Epoch: 7 Global Step: 123570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:32:29,921-Speed 5165.13 samples/sec Loss 2.9658 LearningRate 0.0397 Epoch: 7 Global Step: 123580 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:32:31,895-Speed 5186.44 samples/sec Loss 3.0426 LearningRate 0.0397 Epoch: 7 Global Step: 123590 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:32:33,876-Speed 5172.93 samples/sec Loss 2.9737 LearningRate 0.0397 Epoch: 7 Global Step: 123600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:32:35,848-Speed 5194.15 samples/sec Loss 2.9671 LearningRate 0.0397 Epoch: 7 Global Step: 123610 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:32:37,820-Speed 5192.62 samples/sec Loss 2.9490 LearningRate 0.0396 Epoch: 7 Global Step: 123620 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:32:39,793-Speed 5192.12 samples/sec Loss 2.9726 LearningRate 0.0396 Epoch: 7 Global Step: 123630 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:32:41,779-Speed 5158.10 samples/sec Loss 2.9628 LearningRate 0.0396 Epoch: 7 Global Step: 123640 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:32:43,755-Speed 5183.94 samples/sec Loss 2.9122 LearningRate 0.0396 Epoch: 7 Global Step: 123650 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:32:45,737-Speed 5170.55 samples/sec Loss 2.9327 LearningRate 0.0396 Epoch: 7 Global Step: 123660 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:32:47,724-Speed 5156.06 samples/sec Loss 3.0013 LearningRate 0.0396 Epoch: 7 Global Step: 123670 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:32:49,712-Speed 5150.47 samples/sec Loss 2.9932 LearningRate 0.0396 Epoch: 7 Global Step: 123680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:32:51,709-Speed 5130.77 samples/sec Loss 2.9420 LearningRate 0.0396 Epoch: 7 Global Step: 123690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:32:53,680-Speed 5196.67 samples/sec Loss 2.9873 LearningRate 0.0396 Epoch: 7 Global Step: 123700 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:32:55,644-Speed 5215.97 samples/sec Loss 3.0133 LearningRate 0.0396 Epoch: 7 Global Step: 123710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:32:57,617-Speed 5190.31 samples/sec Loss 2.9157 LearningRate 0.0396 Epoch: 7 Global Step: 123720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:32:59,603-Speed 5159.87 samples/sec Loss 2.9693 LearningRate 0.0396 Epoch: 7 Global Step: 123730 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:33:01,600-Speed 5128.45 samples/sec Loss 2.9588 LearningRate 0.0396 Epoch: 7 Global Step: 123740 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:33:03,583-Speed 5164.69 samples/sec Loss 2.9837 LearningRate 0.0396 Epoch: 7 Global Step: 123750 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:33:05,573-Speed 5150.19 samples/sec Loss 2.9363 LearningRate 0.0396 Epoch: 7 Global Step: 123760 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:33:07,548-Speed 5187.13 samples/sec Loss 2.9629 LearningRate 0.0396 Epoch: 7 Global Step: 123770 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:33:09,525-Speed 5178.82 samples/sec Loss 2.9053 LearningRate 0.0396 Epoch: 7 Global Step: 123780 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:33:11,514-Speed 5152.46 samples/sec Loss 2.8997 LearningRate 0.0396 Epoch: 7 Global Step: 123790 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:33:13,488-Speed 5188.05 samples/sec Loss 2.9861 LearningRate 0.0396 Epoch: 7 Global Step: 123800 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:33:15,495-Speed 5104.70 samples/sec Loss 2.9381 LearningRate 0.0396 Epoch: 7 Global Step: 123810 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:33:17,474-Speed 5174.08 samples/sec Loss 2.9655 LearningRate 0.0396 Epoch: 7 Global Step: 123820 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:33:19,463-Speed 5151.44 samples/sec Loss 2.9685 LearningRate 0.0396 Epoch: 7 Global Step: 123830 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:33:21,459-Speed 5131.48 samples/sec Loss 3.0061 LearningRate 0.0396 Epoch: 7 Global Step: 123840 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:33:23,446-Speed 5155.35 samples/sec Loss 3.0017 LearningRate 0.0396 Epoch: 7 Global Step: 123850 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:33:25,438-Speed 5140.86 samples/sec Loss 2.9263 LearningRate 0.0396 Epoch: 7 Global Step: 123860 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:33:27,437-Speed 5124.54 samples/sec Loss 2.9454 LearningRate 0.0396 Epoch: 7 Global Step: 123870 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:33:29,430-Speed 5141.63 samples/sec Loss 2.8819 LearningRate 0.0396 Epoch: 7 Global Step: 123880 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:33:31,400-Speed 5199.16 samples/sec Loss 3.0239 LearningRate 0.0395 Epoch: 7 Global Step: 123890 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:33:33,381-Speed 5171.80 samples/sec Loss 2.9429 LearningRate 0.0395 Epoch: 7 Global Step: 123900 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:33:35,358-Speed 5181.22 samples/sec Loss 2.9336 LearningRate 0.0395 Epoch: 7 Global Step: 123910 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:33:37,333-Speed 5184.49 samples/sec Loss 2.9721 LearningRate 0.0395 Epoch: 7 Global Step: 123920 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:33:39,313-Speed 5174.98 samples/sec Loss 2.9027 LearningRate 0.0395 Epoch: 7 Global Step: 123930 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:33:41,304-Speed 5143.45 samples/sec Loss 2.9494 LearningRate 0.0395 Epoch: 7 Global Step: 123940 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:33:43,280-Speed 5184.87 samples/sec Loss 2.9334 LearningRate 0.0395 Epoch: 7 Global Step: 123950 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:33:45,270-Speed 5146.49 samples/sec Loss 2.9723 LearningRate 0.0395 Epoch: 7 Global Step: 123960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:33:47,250-Speed 5173.58 samples/sec Loss 3.0230 LearningRate 0.0395 Epoch: 7 Global Step: 123970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:33:49,230-Speed 5173.92 samples/sec Loss 2.9427 LearningRate 0.0395 Epoch: 7 Global Step: 123980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:33:51,203-Speed 5192.29 samples/sec Loss 3.0016 LearningRate 0.0395 Epoch: 7 Global Step: 123990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:33:53,186-Speed 5164.51 samples/sec Loss 2.9390 LearningRate 0.0395 Epoch: 7 Global Step: 124000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:34:19,947-[lfw][124000]XNorm: 22.833076 Training: 2022-04-11 07:34:19,947-[lfw][124000]Accuracy-Flip: 0.99750+-0.00261 Training: 2022-04-11 07:34:19,948-[lfw][124000]Accuracy-Highest: 0.99817 Training: 2022-04-11 07:34:50,629-[cfp_fp][124000]XNorm: 20.878636 Training: 2022-04-11 07:34:50,629-[cfp_fp][124000]Accuracy-Flip: 0.98286+-0.00478 Training: 2022-04-11 07:34:50,630-[cfp_fp][124000]Accuracy-Highest: 0.98443 Training: 2022-04-11 07:35:17,123-[agedb_30][124000]XNorm: 22.733203 Training: 2022-04-11 07:35:17,123-[agedb_30][124000]Accuracy-Flip: 0.97917+-0.00857 Training: 2022-04-11 07:35:17,124-[agedb_30][124000]Accuracy-Highest: 0.98150 Training: 2022-04-11 07:35:19,119-Speed 119.16 samples/sec Loss 2.9314 LearningRate 0.0395 Epoch: 7 Global Step: 124010 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:35:21,102-Speed 5166.46 samples/sec Loss 2.9801 LearningRate 0.0395 Epoch: 7 Global Step: 124020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:35:23,081-Speed 5175.73 samples/sec Loss 2.9852 LearningRate 0.0395 Epoch: 7 Global Step: 124030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:35:25,059-Speed 5177.01 samples/sec Loss 2.9442 LearningRate 0.0395 Epoch: 7 Global Step: 124040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:35:27,050-Speed 5145.06 samples/sec Loss 2.9276 LearningRate 0.0395 Epoch: 7 Global Step: 124050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:35:29,031-Speed 5170.69 samples/sec Loss 3.0180 LearningRate 0.0395 Epoch: 7 Global Step: 124060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:35:31,002-Speed 5197.87 samples/sec Loss 2.9169 LearningRate 0.0395 Epoch: 7 Global Step: 124070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:35:33,001-Speed 5123.25 samples/sec Loss 2.9756 LearningRate 0.0395 Epoch: 7 Global Step: 124080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:35:34,987-Speed 5156.54 samples/sec Loss 2.9274 LearningRate 0.0395 Epoch: 7 Global Step: 124090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:35:37,001-Speed 5087.39 samples/sec Loss 2.8738 LearningRate 0.0395 Epoch: 7 Global Step: 124100 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:35:38,993-Speed 5143.21 samples/sec Loss 3.0332 LearningRate 0.0395 Epoch: 7 Global Step: 124110 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:35:40,984-Speed 5144.23 samples/sec Loss 2.8925 LearningRate 0.0395 Epoch: 7 Global Step: 124120 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:35:42,987-Speed 5113.67 samples/sec Loss 2.9355 LearningRate 0.0395 Epoch: 7 Global Step: 124130 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:35:44,957-Speed 5199.22 samples/sec Loss 2.9859 LearningRate 0.0395 Epoch: 7 Global Step: 124140 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:35:46,930-Speed 5191.89 samples/sec Loss 3.0287 LearningRate 0.0395 Epoch: 7 Global Step: 124150 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:35:48,917-Speed 5154.41 samples/sec Loss 2.9451 LearningRate 0.0394 Epoch: 7 Global Step: 124160 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:35:50,908-Speed 5146.58 samples/sec Loss 3.0028 LearningRate 0.0394 Epoch: 7 Global Step: 124170 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:35:52,898-Speed 5147.50 samples/sec Loss 2.9776 LearningRate 0.0394 Epoch: 7 Global Step: 124180 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:35:54,905-Speed 5102.33 samples/sec Loss 2.9272 LearningRate 0.0394 Epoch: 7 Global Step: 124190 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:35:56,882-Speed 5182.21 samples/sec Loss 3.0214 LearningRate 0.0394 Epoch: 7 Global Step: 124200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:35:58,874-Speed 5141.37 samples/sec Loss 2.9136 LearningRate 0.0394 Epoch: 7 Global Step: 124210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:36:00,870-Speed 5133.10 samples/sec Loss 2.8671 LearningRate 0.0394 Epoch: 7 Global Step: 124220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:36:02,844-Speed 5190.69 samples/sec Loss 2.9167 LearningRate 0.0394 Epoch: 7 Global Step: 124230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:36:04,827-Speed 5164.59 samples/sec Loss 3.0108 LearningRate 0.0394 Epoch: 7 Global Step: 124240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:36:06,797-Speed 5199.84 samples/sec Loss 2.9556 LearningRate 0.0394 Epoch: 7 Global Step: 124250 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:36:08,769-Speed 5194.52 samples/sec Loss 3.0667 LearningRate 0.0394 Epoch: 7 Global Step: 124260 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:36:10,744-Speed 5184.26 samples/sec Loss 2.9164 LearningRate 0.0394 Epoch: 7 Global Step: 124270 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:36:12,746-Speed 5118.08 samples/sec Loss 2.9047 LearningRate 0.0394 Epoch: 7 Global Step: 124280 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:36:14,750-Speed 5109.99 samples/sec Loss 2.9704 LearningRate 0.0394 Epoch: 7 Global Step: 124290 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:36:16,731-Speed 5171.60 samples/sec Loss 2.9754 LearningRate 0.0394 Epoch: 7 Global Step: 124300 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:36:18,732-Speed 5117.89 samples/sec Loss 2.9759 LearningRate 0.0394 Epoch: 7 Global Step: 124310 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:36:20,708-Speed 5184.52 samples/sec Loss 2.9409 LearningRate 0.0394 Epoch: 7 Global Step: 124320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:36:22,682-Speed 5192.38 samples/sec Loss 2.9943 LearningRate 0.0394 Epoch: 7 Global Step: 124330 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:36:24,662-Speed 5172.07 samples/sec Loss 2.9671 LearningRate 0.0394 Epoch: 7 Global Step: 124340 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:36:26,649-Speed 5154.96 samples/sec Loss 2.9271 LearningRate 0.0394 Epoch: 7 Global Step: 124350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:36:28,633-Speed 5161.95 samples/sec Loss 2.9893 LearningRate 0.0394 Epoch: 7 Global Step: 124360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:36:30,617-Speed 5164.43 samples/sec Loss 3.0019 LearningRate 0.0394 Epoch: 7 Global Step: 124370 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:36:32,588-Speed 5195.41 samples/sec Loss 2.9002 LearningRate 0.0394 Epoch: 7 Global Step: 124380 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:36:34,570-Speed 5168.84 samples/sec Loss 2.9989 LearningRate 0.0394 Epoch: 7 Global Step: 124390 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:36:36,555-Speed 5160.04 samples/sec Loss 2.9767 LearningRate 0.0394 Epoch: 7 Global Step: 124400 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:36:38,542-Speed 5154.92 samples/sec Loss 2.9362 LearningRate 0.0394 Epoch: 7 Global Step: 124410 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:36:40,509-Speed 5208.43 samples/sec Loss 3.0245 LearningRate 0.0393 Epoch: 7 Global Step: 124420 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:36:42,501-Speed 5142.91 samples/sec Loss 3.0793 LearningRate 0.0393 Epoch: 7 Global Step: 124430 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:36:44,491-Speed 5146.67 samples/sec Loss 2.9833 LearningRate 0.0393 Epoch: 7 Global Step: 124440 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:36:46,473-Speed 5167.73 samples/sec Loss 2.9588 LearningRate 0.0393 Epoch: 7 Global Step: 124450 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:36:48,462-Speed 5150.67 samples/sec Loss 2.9143 LearningRate 0.0393 Epoch: 7 Global Step: 124460 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:36:50,456-Speed 5138.16 samples/sec Loss 2.9120 LearningRate 0.0393 Epoch: 7 Global Step: 124470 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:36:52,451-Speed 5134.98 samples/sec Loss 2.9142 LearningRate 0.0393 Epoch: 7 Global Step: 124480 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:36:54,433-Speed 5167.28 samples/sec Loss 3.0339 LearningRate 0.0393 Epoch: 7 Global Step: 124490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:36:56,405-Speed 5194.65 samples/sec Loss 2.9903 LearningRate 0.0393 Epoch: 7 Global Step: 124500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:36:58,383-Speed 5178.36 samples/sec Loss 2.9257 LearningRate 0.0393 Epoch: 7 Global Step: 124510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:00,378-Speed 5134.51 samples/sec Loss 2.9563 LearningRate 0.0393 Epoch: 7 Global Step: 124520 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:37:02,365-Speed 5155.18 samples/sec Loss 2.9418 LearningRate 0.0393 Epoch: 7 Global Step: 124530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:04,346-Speed 5171.76 samples/sec Loss 2.8979 LearningRate 0.0393 Epoch: 7 Global Step: 124540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:06,324-Speed 5177.35 samples/sec Loss 2.9918 LearningRate 0.0393 Epoch: 7 Global Step: 124550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:08,301-Speed 5182.17 samples/sec Loss 2.9404 LearningRate 0.0393 Epoch: 7 Global Step: 124560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:10,280-Speed 5175.80 samples/sec Loss 2.8869 LearningRate 0.0393 Epoch: 7 Global Step: 124570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:12,257-Speed 5180.64 samples/sec Loss 2.9832 LearningRate 0.0393 Epoch: 7 Global Step: 124580 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:14,230-Speed 5191.82 samples/sec Loss 2.9419 LearningRate 0.0393 Epoch: 7 Global Step: 124590 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:16,200-Speed 5200.58 samples/sec Loss 2.9618 LearningRate 0.0393 Epoch: 7 Global Step: 124600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:18,167-Speed 5205.20 samples/sec Loss 2.9045 LearningRate 0.0393 Epoch: 7 Global Step: 124610 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:20,139-Speed 5194.99 samples/sec Loss 3.0165 LearningRate 0.0393 Epoch: 7 Global Step: 124620 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:22,108-Speed 5201.63 samples/sec Loss 2.9781 LearningRate 0.0393 Epoch: 7 Global Step: 124630 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:24,102-Speed 5138.73 samples/sec Loss 2.9486 LearningRate 0.0393 Epoch: 7 Global Step: 124640 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:26,082-Speed 5175.04 samples/sec Loss 3.0045 LearningRate 0.0393 Epoch: 7 Global Step: 124650 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:28,065-Speed 5164.64 samples/sec Loss 2.9406 LearningRate 0.0393 Epoch: 7 Global Step: 124660 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:30,041-Speed 5184.25 samples/sec Loss 2.9729 LearningRate 0.0393 Epoch: 7 Global Step: 124670 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:32,013-Speed 5194.72 samples/sec Loss 2.8820 LearningRate 0.0393 Epoch: 7 Global Step: 124680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:33,984-Speed 5195.77 samples/sec Loss 3.0025 LearningRate 0.0392 Epoch: 7 Global Step: 124690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:35,958-Speed 5188.62 samples/sec Loss 2.9684 LearningRate 0.0392 Epoch: 7 Global Step: 124700 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:37,934-Speed 5184.19 samples/sec Loss 2.9951 LearningRate 0.0392 Epoch: 7 Global Step: 124710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:39,910-Speed 5183.89 samples/sec Loss 2.9701 LearningRate 0.0392 Epoch: 7 Global Step: 124720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:41,886-Speed 5182.84 samples/sec Loss 3.0007 LearningRate 0.0392 Epoch: 7 Global Step: 124730 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:43,866-Speed 5173.20 samples/sec Loss 2.9507 LearningRate 0.0392 Epoch: 7 Global Step: 124740 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:45,845-Speed 5177.19 samples/sec Loss 2.9736 LearningRate 0.0392 Epoch: 7 Global Step: 124750 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:47,824-Speed 5175.27 samples/sec Loss 2.9302 LearningRate 0.0392 Epoch: 7 Global Step: 124760 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:49,813-Speed 5150.95 samples/sec Loss 2.9259 LearningRate 0.0392 Epoch: 7 Global Step: 124770 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:51,783-Speed 5200.39 samples/sec Loss 2.9341 LearningRate 0.0392 Epoch: 7 Global Step: 124780 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:53,754-Speed 5197.20 samples/sec Loss 2.9526 LearningRate 0.0392 Epoch: 7 Global Step: 124790 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:55,723-Speed 5202.36 samples/sec Loss 2.9226 LearningRate 0.0392 Epoch: 7 Global Step: 124800 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:57,719-Speed 5130.04 samples/sec Loss 2.9449 LearningRate 0.0392 Epoch: 7 Global Step: 124810 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:37:59,686-Speed 5208.90 samples/sec Loss 2.9591 LearningRate 0.0392 Epoch: 7 Global Step: 124820 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:38:01,671-Speed 5160.69 samples/sec Loss 2.9703 LearningRate 0.0392 Epoch: 7 Global Step: 124830 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:38:03,645-Speed 5188.41 samples/sec Loss 2.9330 LearningRate 0.0392 Epoch: 7 Global Step: 124840 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:38:05,614-Speed 5202.66 samples/sec Loss 2.9675 LearningRate 0.0392 Epoch: 7 Global Step: 124850 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:38:07,576-Speed 5221.18 samples/sec Loss 2.9195 LearningRate 0.0392 Epoch: 7 Global Step: 124860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:38:09,550-Speed 5187.76 samples/sec Loss 2.8769 LearningRate 0.0392 Epoch: 7 Global Step: 124870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:38:11,532-Speed 5169.64 samples/sec Loss 2.9563 LearningRate 0.0392 Epoch: 7 Global Step: 124880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:38:13,532-Speed 5122.09 samples/sec Loss 2.9715 LearningRate 0.0392 Epoch: 7 Global Step: 124890 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:38:15,510-Speed 5178.48 samples/sec Loss 2.9687 LearningRate 0.0392 Epoch: 7 Global Step: 124900 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:38:17,499-Speed 5150.23 samples/sec Loss 2.9156 LearningRate 0.0392 Epoch: 7 Global Step: 124910 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:38:19,466-Speed 5207.05 samples/sec Loss 2.9939 LearningRate 0.0392 Epoch: 7 Global Step: 124920 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:38:21,438-Speed 5194.58 samples/sec Loss 2.8627 LearningRate 0.0392 Epoch: 7 Global Step: 124930 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:38:23,417-Speed 5176.39 samples/sec Loss 2.9109 LearningRate 0.0392 Epoch: 7 Global Step: 124940 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:38:25,398-Speed 5169.10 samples/sec Loss 2.9024 LearningRate 0.0391 Epoch: 7 Global Step: 124950 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:38:27,422-Speed 5061.86 samples/sec Loss 3.0091 LearningRate 0.0391 Epoch: 7 Global Step: 124960 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:38:29,396-Speed 5190.62 samples/sec Loss 2.9510 LearningRate 0.0391 Epoch: 7 Global Step: 124970 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:38:31,370-Speed 5188.78 samples/sec Loss 2.9435 LearningRate 0.0391 Epoch: 7 Global Step: 124980 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:38:33,349-Speed 5175.83 samples/sec Loss 2.9247 LearningRate 0.0391 Epoch: 7 Global Step: 124990 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:38:35,328-Speed 5176.40 samples/sec Loss 2.9312 LearningRate 0.0391 Epoch: 7 Global Step: 125000 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:38:37,298-Speed 5198.02 samples/sec Loss 3.0072 LearningRate 0.0391 Epoch: 7 Global Step: 125010 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:38:39,283-Speed 5160.21 samples/sec Loss 3.0064 LearningRate 0.0391 Epoch: 7 Global Step: 125020 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:38:41,284-Speed 5119.78 samples/sec Loss 3.0221 LearningRate 0.0391 Epoch: 7 Global Step: 125030 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:38:43,265-Speed 5169.41 samples/sec Loss 3.0614 LearningRate 0.0391 Epoch: 7 Global Step: 125040 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:38:45,248-Speed 5166.07 samples/sec Loss 2.9398 LearningRate 0.0391 Epoch: 7 Global Step: 125050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:38:47,240-Speed 5144.46 samples/sec Loss 2.9694 LearningRate 0.0391 Epoch: 7 Global Step: 125060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:38:49,236-Speed 5131.39 samples/sec Loss 3.0290 LearningRate 0.0391 Epoch: 7 Global Step: 125070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:38:51,211-Speed 5185.06 samples/sec Loss 2.9365 LearningRate 0.0391 Epoch: 7 Global Step: 125080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:38:53,203-Speed 5144.54 samples/sec Loss 2.9233 LearningRate 0.0391 Epoch: 7 Global Step: 125090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:38:55,192-Speed 5147.86 samples/sec Loss 2.9921 LearningRate 0.0391 Epoch: 7 Global Step: 125100 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:38:57,174-Speed 5169.99 samples/sec Loss 2.9330 LearningRate 0.0391 Epoch: 7 Global Step: 125110 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:38:59,150-Speed 5183.92 samples/sec Loss 3.0044 LearningRate 0.0391 Epoch: 7 Global Step: 125120 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:39:01,146-Speed 5130.60 samples/sec Loss 2.9828 LearningRate 0.0391 Epoch: 7 Global Step: 125130 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:39:03,143-Speed 5130.64 samples/sec Loss 3.0780 LearningRate 0.0391 Epoch: 7 Global Step: 125140 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:39:05,138-Speed 5135.36 samples/sec Loss 2.9351 LearningRate 0.0391 Epoch: 7 Global Step: 125150 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:39:07,130-Speed 5142.57 samples/sec Loss 2.9849 LearningRate 0.0391 Epoch: 7 Global Step: 125160 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:39:09,095-Speed 5212.78 samples/sec Loss 2.9325 LearningRate 0.0391 Epoch: 7 Global Step: 125170 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:39:11,103-Speed 5100.14 samples/sec Loss 2.9310 LearningRate 0.0391 Epoch: 7 Global Step: 125180 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:39:13,071-Speed 5203.85 samples/sec Loss 2.9164 LearningRate 0.0391 Epoch: 7 Global Step: 125190 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:39:15,075-Speed 5112.54 samples/sec Loss 2.8963 LearningRate 0.0391 Epoch: 7 Global Step: 125200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:39:17,049-Speed 5187.87 samples/sec Loss 2.9302 LearningRate 0.0391 Epoch: 7 Global Step: 125210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:39:19,028-Speed 5176.71 samples/sec Loss 2.8773 LearningRate 0.0390 Epoch: 7 Global Step: 125220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:39:21,007-Speed 5176.93 samples/sec Loss 2.9577 LearningRate 0.0390 Epoch: 7 Global Step: 125230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:39:23,001-Speed 5136.35 samples/sec Loss 3.0071 LearningRate 0.0390 Epoch: 7 Global Step: 125240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:39:24,996-Speed 5133.42 samples/sec Loss 2.9832 LearningRate 0.0390 Epoch: 7 Global Step: 125250 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:39:26,990-Speed 5138.38 samples/sec Loss 2.9660 LearningRate 0.0390 Epoch: 7 Global Step: 125260 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:39:28,970-Speed 5173.81 samples/sec Loss 2.9625 LearningRate 0.0390 Epoch: 7 Global Step: 125270 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:39:30,961-Speed 5144.94 samples/sec Loss 2.9548 LearningRate 0.0390 Epoch: 7 Global Step: 125280 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:39:32,933-Speed 5195.91 samples/sec Loss 2.9816 LearningRate 0.0390 Epoch: 7 Global Step: 125290 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:39:34,912-Speed 5175.76 samples/sec Loss 3.0822 LearningRate 0.0390 Epoch: 7 Global Step: 125300 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:39:36,896-Speed 5160.96 samples/sec Loss 2.9503 LearningRate 0.0390 Epoch: 7 Global Step: 125310 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:39:38,875-Speed 5176.12 samples/sec Loss 2.9002 LearningRate 0.0390 Epoch: 7 Global Step: 125320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:39:40,879-Speed 5112.15 samples/sec Loss 2.9926 LearningRate 0.0390 Epoch: 7 Global Step: 125330 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:39:42,874-Speed 5134.77 samples/sec Loss 2.9209 LearningRate 0.0390 Epoch: 7 Global Step: 125340 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:39:44,862-Speed 5152.75 samples/sec Loss 3.0156 LearningRate 0.0390 Epoch: 7 Global Step: 125350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:39:46,867-Speed 5108.74 samples/sec Loss 2.9911 LearningRate 0.0390 Epoch: 7 Global Step: 125360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:39:48,845-Speed 5177.53 samples/sec Loss 2.9507 LearningRate 0.0390 Epoch: 7 Global Step: 125370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:39:50,814-Speed 5203.02 samples/sec Loss 2.9353 LearningRate 0.0390 Epoch: 7 Global Step: 125380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:39:52,794-Speed 5174.88 samples/sec Loss 2.9743 LearningRate 0.0390 Epoch: 7 Global Step: 125390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:39:54,767-Speed 5191.60 samples/sec Loss 2.8859 LearningRate 0.0390 Epoch: 7 Global Step: 125400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:39:56,741-Speed 5188.47 samples/sec Loss 2.9129 LearningRate 0.0390 Epoch: 7 Global Step: 125410 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:39:58,737-Speed 5131.65 samples/sec Loss 2.9887 LearningRate 0.0390 Epoch: 7 Global Step: 125420 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:40:00,733-Speed 5133.63 samples/sec Loss 3.0550 LearningRate 0.0390 Epoch: 7 Global Step: 125430 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:40:02,720-Speed 5155.16 samples/sec Loss 2.8946 LearningRate 0.0390 Epoch: 7 Global Step: 125440 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:40:04,704-Speed 5162.30 samples/sec Loss 2.9083 LearningRate 0.0390 Epoch: 7 Global Step: 125450 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:40:06,704-Speed 5121.71 samples/sec Loss 2.8888 LearningRate 0.0390 Epoch: 7 Global Step: 125460 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:40:08,684-Speed 5173.81 samples/sec Loss 2.9273 LearningRate 0.0390 Epoch: 7 Global Step: 125470 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:40:10,662-Speed 5177.80 samples/sec Loss 2.8778 LearningRate 0.0390 Epoch: 7 Global Step: 125480 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:40:12,665-Speed 5113.91 samples/sec Loss 2.9459 LearningRate 0.0389 Epoch: 7 Global Step: 125490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:40:14,656-Speed 5145.97 samples/sec Loss 3.0080 LearningRate 0.0389 Epoch: 7 Global Step: 125500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:40:16,651-Speed 5133.22 samples/sec Loss 2.9959 LearningRate 0.0389 Epoch: 7 Global Step: 125510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:40:18,647-Speed 5132.91 samples/sec Loss 2.8912 LearningRate 0.0389 Epoch: 7 Global Step: 125520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:40:20,625-Speed 5177.70 samples/sec Loss 2.9569 LearningRate 0.0389 Epoch: 7 Global Step: 125530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:40:22,609-Speed 5163.88 samples/sec Loss 2.9787 LearningRate 0.0389 Epoch: 7 Global Step: 125540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:40:24,600-Speed 5143.91 samples/sec Loss 2.9487 LearningRate 0.0389 Epoch: 7 Global Step: 125550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:40:26,620-Speed 5071.36 samples/sec Loss 2.9020 LearningRate 0.0389 Epoch: 7 Global Step: 125560 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:40:28,609-Speed 5150.75 samples/sec Loss 2.9419 LearningRate 0.0389 Epoch: 7 Global Step: 125570 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:40:30,577-Speed 5204.37 samples/sec Loss 3.0060 LearningRate 0.0389 Epoch: 7 Global Step: 125580 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:40:32,559-Speed 5170.15 samples/sec Loss 2.9856 LearningRate 0.0389 Epoch: 7 Global Step: 125590 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:40:34,565-Speed 5106.68 samples/sec Loss 2.9422 LearningRate 0.0389 Epoch: 7 Global Step: 125600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:40:36,557-Speed 5142.38 samples/sec Loss 2.9087 LearningRate 0.0389 Epoch: 7 Global Step: 125610 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:40:38,566-Speed 5097.68 samples/sec Loss 2.9419 LearningRate 0.0389 Epoch: 7 Global Step: 125620 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:40:40,545-Speed 5176.13 samples/sec Loss 3.0282 LearningRate 0.0389 Epoch: 7 Global Step: 125630 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:40:42,538-Speed 5140.33 samples/sec Loss 3.0082 LearningRate 0.0389 Epoch: 7 Global Step: 125640 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:40:44,525-Speed 5153.59 samples/sec Loss 2.9503 LearningRate 0.0389 Epoch: 7 Global Step: 125650 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:40:46,532-Speed 5104.24 samples/sec Loss 2.8435 LearningRate 0.0389 Epoch: 7 Global Step: 125660 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:40:48,521-Speed 5150.21 samples/sec Loss 3.0226 LearningRate 0.0389 Epoch: 7 Global Step: 125670 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:40:50,510-Speed 5149.02 samples/sec Loss 2.9255 LearningRate 0.0389 Epoch: 7 Global Step: 125680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:40:52,494-Speed 5165.41 samples/sec Loss 2.9568 LearningRate 0.0389 Epoch: 7 Global Step: 125690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:40:54,477-Speed 5163.10 samples/sec Loss 2.9656 LearningRate 0.0389 Epoch: 7 Global Step: 125700 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:40:56,477-Speed 5122.24 samples/sec Loss 2.9207 LearningRate 0.0389 Epoch: 7 Global Step: 125710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:40:58,470-Speed 5140.28 samples/sec Loss 3.0741 LearningRate 0.0389 Epoch: 7 Global Step: 125720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:41:00,456-Speed 5158.66 samples/sec Loss 2.9128 LearningRate 0.0389 Epoch: 7 Global Step: 125730 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:41:02,431-Speed 5186.13 samples/sec Loss 2.8965 LearningRate 0.0389 Epoch: 7 Global Step: 125740 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:41:04,415-Speed 5162.92 samples/sec Loss 2.9034 LearningRate 0.0389 Epoch: 7 Global Step: 125750 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:41:06,414-Speed 5122.57 samples/sec Loss 2.8874 LearningRate 0.0388 Epoch: 7 Global Step: 125760 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:41:08,411-Speed 5130.34 samples/sec Loss 2.9793 LearningRate 0.0388 Epoch: 7 Global Step: 125770 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:41:10,410-Speed 5126.58 samples/sec Loss 2.8995 LearningRate 0.0388 Epoch: 7 Global Step: 125780 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:41:12,384-Speed 5186.62 samples/sec Loss 2.9346 LearningRate 0.0388 Epoch: 7 Global Step: 125790 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:41:14,362-Speed 5180.38 samples/sec Loss 2.9174 LearningRate 0.0388 Epoch: 7 Global Step: 125800 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:41:16,357-Speed 5134.21 samples/sec Loss 2.9443 LearningRate 0.0388 Epoch: 7 Global Step: 125810 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:41:18,341-Speed 5163.16 samples/sec Loss 2.9628 LearningRate 0.0388 Epoch: 7 Global Step: 125820 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:41:20,317-Speed 5182.05 samples/sec Loss 2.9522 LearningRate 0.0388 Epoch: 7 Global Step: 125830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:41:22,313-Speed 5132.93 samples/sec Loss 2.8342 LearningRate 0.0388 Epoch: 7 Global Step: 125840 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:41:24,289-Speed 5182.70 samples/sec Loss 2.9679 LearningRate 0.0388 Epoch: 7 Global Step: 125850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:41:26,279-Speed 5147.62 samples/sec Loss 2.9840 LearningRate 0.0388 Epoch: 7 Global Step: 125860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:41:28,276-Speed 5131.16 samples/sec Loss 2.9875 LearningRate 0.0388 Epoch: 7 Global Step: 125870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:41:30,260-Speed 5161.10 samples/sec Loss 2.9607 LearningRate 0.0388 Epoch: 7 Global Step: 125880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:41:32,242-Speed 5169.60 samples/sec Loss 3.0467 LearningRate 0.0388 Epoch: 7 Global Step: 125890 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:41:34,215-Speed 5193.02 samples/sec Loss 2.8958 LearningRate 0.0388 Epoch: 7 Global Step: 125900 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:41:36,201-Speed 5158.36 samples/sec Loss 3.0167 LearningRate 0.0388 Epoch: 7 Global Step: 125910 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:41:38,185-Speed 5161.57 samples/sec Loss 3.0435 LearningRate 0.0388 Epoch: 7 Global Step: 125920 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:41:40,168-Speed 5166.25 samples/sec Loss 2.9267 LearningRate 0.0388 Epoch: 7 Global Step: 125930 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:41:42,142-Speed 5188.78 samples/sec Loss 2.9868 LearningRate 0.0388 Epoch: 7 Global Step: 125940 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:41:44,132-Speed 5146.49 samples/sec Loss 3.0071 LearningRate 0.0388 Epoch: 7 Global Step: 125950 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:41:46,123-Speed 5144.46 samples/sec Loss 2.9355 LearningRate 0.0388 Epoch: 7 Global Step: 125960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:41:48,131-Speed 5102.56 samples/sec Loss 3.0104 LearningRate 0.0388 Epoch: 7 Global Step: 125970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:41:50,159-Speed 5051.04 samples/sec Loss 3.0098 LearningRate 0.0388 Epoch: 7 Global Step: 125980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:41:52,135-Speed 5183.69 samples/sec Loss 3.0524 LearningRate 0.0388 Epoch: 7 Global Step: 125990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:41:54,110-Speed 5186.56 samples/sec Loss 2.9692 LearningRate 0.0388 Epoch: 7 Global Step: 126000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:42:20,664-[lfw][126000]XNorm: 22.477044 Training: 2022-04-11 07:42:20,664-[lfw][126000]Accuracy-Flip: 0.99783+-0.00259 Training: 2022-04-11 07:42:20,665-[lfw][126000]Accuracy-Highest: 0.99817 Training: 2022-04-11 07:42:51,523-[cfp_fp][126000]XNorm: 20.843356 Training: 2022-04-11 07:42:51,524-[cfp_fp][126000]Accuracy-Flip: 0.98229+-0.00558 Training: 2022-04-11 07:42:51,524-[cfp_fp][126000]Accuracy-Highest: 0.98443 Training: 2022-04-11 07:43:18,253-[agedb_30][126000]XNorm: 22.279129 Training: 2022-04-11 07:43:18,254-[agedb_30][126000]Accuracy-Flip: 0.97933+-0.00704 Training: 2022-04-11 07:43:18,254-[agedb_30][126000]Accuracy-Highest: 0.98150 Training: 2022-04-11 07:43:20,243-Speed 118.89 samples/sec Loss 2.9288 LearningRate 0.0388 Epoch: 7 Global Step: 126010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:43:22,207-Speed 5216.99 samples/sec Loss 2.9615 LearningRate 0.0387 Epoch: 7 Global Step: 126020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:43:24,168-Speed 5223.04 samples/sec Loss 2.9461 LearningRate 0.0387 Epoch: 7 Global Step: 126030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:43:26,150-Speed 5166.47 samples/sec Loss 2.9976 LearningRate 0.0387 Epoch: 7 Global Step: 126040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:43:28,117-Speed 5210.16 samples/sec Loss 2.9350 LearningRate 0.0387 Epoch: 7 Global Step: 126050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:43:30,084-Speed 5206.57 samples/sec Loss 2.9874 LearningRate 0.0387 Epoch: 7 Global Step: 126060 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:43:32,063-Speed 5175.72 samples/sec Loss 2.9474 LearningRate 0.0387 Epoch: 7 Global Step: 126070 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:43:34,045-Speed 5169.11 samples/sec Loss 2.8819 LearningRate 0.0387 Epoch: 7 Global Step: 126080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:43:36,028-Speed 5165.40 samples/sec Loss 3.0024 LearningRate 0.0387 Epoch: 7 Global Step: 126090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:43:38,003-Speed 5184.26 samples/sec Loss 3.0462 LearningRate 0.0387 Epoch: 7 Global Step: 126100 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:43:39,973-Speed 5201.39 samples/sec Loss 2.9029 LearningRate 0.0387 Epoch: 7 Global Step: 126110 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:43:41,941-Speed 5205.84 samples/sec Loss 2.9224 LearningRate 0.0387 Epoch: 7 Global Step: 126120 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:43:43,923-Speed 5168.52 samples/sec Loss 2.9568 LearningRate 0.0387 Epoch: 7 Global Step: 126130 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:43:45,891-Speed 5205.22 samples/sec Loss 2.9663 LearningRate 0.0387 Epoch: 7 Global Step: 126140 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:43:47,877-Speed 5155.94 samples/sec Loss 2.9653 LearningRate 0.0387 Epoch: 7 Global Step: 126150 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:43:49,867-Speed 5147.57 samples/sec Loss 2.9527 LearningRate 0.0387 Epoch: 7 Global Step: 126160 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:43:51,855-Speed 5151.20 samples/sec Loss 2.9659 LearningRate 0.0387 Epoch: 7 Global Step: 126170 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:43:53,836-Speed 5172.08 samples/sec Loss 2.9469 LearningRate 0.0387 Epoch: 7 Global Step: 126180 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:43:55,811-Speed 5186.79 samples/sec Loss 2.9591 LearningRate 0.0387 Epoch: 7 Global Step: 126190 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:43:57,810-Speed 5123.17 samples/sec Loss 2.9285 LearningRate 0.0387 Epoch: 7 Global Step: 126200 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:43:59,807-Speed 5131.77 samples/sec Loss 2.9617 LearningRate 0.0387 Epoch: 7 Global Step: 126210 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:44:01,775-Speed 5205.34 samples/sec Loss 2.9527 LearningRate 0.0387 Epoch: 7 Global Step: 126220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:44:03,748-Speed 5189.92 samples/sec Loss 3.0215 LearningRate 0.0387 Epoch: 7 Global Step: 126230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:44:05,732-Speed 5164.10 samples/sec Loss 2.9299 LearningRate 0.0387 Epoch: 7 Global Step: 126240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:44:07,705-Speed 5191.42 samples/sec Loss 2.8783 LearningRate 0.0387 Epoch: 7 Global Step: 126250 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:44:09,686-Speed 5170.83 samples/sec Loss 2.9652 LearningRate 0.0387 Epoch: 7 Global Step: 126260 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:44:11,665-Speed 5176.54 samples/sec Loss 3.0080 LearningRate 0.0387 Epoch: 7 Global Step: 126270 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:44:13,647-Speed 5166.78 samples/sec Loss 2.9390 LearningRate 0.0387 Epoch: 7 Global Step: 126280 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:44:15,619-Speed 5194.23 samples/sec Loss 2.9537 LearningRate 0.0386 Epoch: 7 Global Step: 126290 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:44:17,604-Speed 5161.55 samples/sec Loss 2.9784 LearningRate 0.0386 Epoch: 7 Global Step: 126300 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:44:19,589-Speed 5160.36 samples/sec Loss 2.9929 LearningRate 0.0386 Epoch: 7 Global Step: 126310 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:44:21,585-Speed 5130.94 samples/sec Loss 2.8959 LearningRate 0.0386 Epoch: 7 Global Step: 126320 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:44:23,583-Speed 5127.98 samples/sec Loss 2.9539 LearningRate 0.0386 Epoch: 7 Global Step: 126330 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:44:25,559-Speed 5184.26 samples/sec Loss 2.9770 LearningRate 0.0386 Epoch: 7 Global Step: 126340 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:44:27,537-Speed 5177.65 samples/sec Loss 2.9761 LearningRate 0.0386 Epoch: 7 Global Step: 126350 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:44:29,515-Speed 5178.61 samples/sec Loss 2.9940 LearningRate 0.0386 Epoch: 7 Global Step: 126360 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:44:31,494-Speed 5178.38 samples/sec Loss 2.9231 LearningRate 0.0386 Epoch: 7 Global Step: 126370 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:44:33,474-Speed 5172.88 samples/sec Loss 2.9088 LearningRate 0.0386 Epoch: 7 Global Step: 126380 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:44:35,486-Speed 5089.89 samples/sec Loss 2.9878 LearningRate 0.0386 Epoch: 7 Global Step: 126390 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:44:37,495-Speed 5098.97 samples/sec Loss 3.0386 LearningRate 0.0386 Epoch: 7 Global Step: 126400 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:44:39,482-Speed 5154.57 samples/sec Loss 2.9265 LearningRate 0.0386 Epoch: 7 Global Step: 126410 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:44:41,487-Speed 5108.72 samples/sec Loss 2.9026 LearningRate 0.0386 Epoch: 7 Global Step: 126420 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:44:43,460-Speed 5193.56 samples/sec Loss 2.9482 LearningRate 0.0386 Epoch: 7 Global Step: 126430 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:44:45,483-Speed 5063.40 samples/sec Loss 3.0041 LearningRate 0.0386 Epoch: 7 Global Step: 126440 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:44:47,476-Speed 5140.71 samples/sec Loss 2.9102 LearningRate 0.0386 Epoch: 7 Global Step: 126450 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:44:49,469-Speed 5139.29 samples/sec Loss 2.9953 LearningRate 0.0386 Epoch: 7 Global Step: 126460 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:44:51,448-Speed 5175.48 samples/sec Loss 3.0022 LearningRate 0.0386 Epoch: 7 Global Step: 126470 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:44:53,417-Speed 5201.63 samples/sec Loss 2.9768 LearningRate 0.0386 Epoch: 7 Global Step: 126480 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:44:55,389-Speed 5194.92 samples/sec Loss 2.9238 LearningRate 0.0386 Epoch: 7 Global Step: 126490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:44:57,358-Speed 5200.78 samples/sec Loss 3.0066 LearningRate 0.0386 Epoch: 7 Global Step: 126500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:44:59,331-Speed 5192.51 samples/sec Loss 2.9202 LearningRate 0.0386 Epoch: 7 Global Step: 126510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:45:01,353-Speed 5066.03 samples/sec Loss 2.9288 LearningRate 0.0386 Epoch: 7 Global Step: 126520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:45:03,353-Speed 5121.93 samples/sec Loss 2.9530 LearningRate 0.0386 Epoch: 7 Global Step: 126530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:45:05,327-Speed 5188.98 samples/sec Loss 3.0522 LearningRate 0.0386 Epoch: 7 Global Step: 126540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:45:07,294-Speed 5208.95 samples/sec Loss 2.9032 LearningRate 0.0386 Epoch: 7 Global Step: 126550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:45:09,276-Speed 5166.66 samples/sec Loss 2.8545 LearningRate 0.0385 Epoch: 7 Global Step: 126560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:45:11,252-Speed 5184.00 samples/sec Loss 3.0169 LearningRate 0.0385 Epoch: 7 Global Step: 126570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:45:13,294-Speed 5017.20 samples/sec Loss 3.0337 LearningRate 0.0385 Epoch: 7 Global Step: 126580 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:45:15,272-Speed 5178.91 samples/sec Loss 3.0087 LearningRate 0.0385 Epoch: 7 Global Step: 126590 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:45:17,239-Speed 5207.18 samples/sec Loss 2.9609 LearningRate 0.0385 Epoch: 7 Global Step: 126600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:45:19,211-Speed 5192.48 samples/sec Loss 2.9590 LearningRate 0.0385 Epoch: 7 Global Step: 126610 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:45:21,211-Speed 5123.40 samples/sec Loss 2.9736 LearningRate 0.0385 Epoch: 7 Global Step: 126620 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:45:23,191-Speed 5172.58 samples/sec Loss 2.9708 LearningRate 0.0385 Epoch: 7 Global Step: 126630 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:45:25,196-Speed 5109.94 samples/sec Loss 2.9079 LearningRate 0.0385 Epoch: 7 Global Step: 126640 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:45:27,177-Speed 5170.05 samples/sec Loss 3.0281 LearningRate 0.0385 Epoch: 7 Global Step: 126650 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:45:29,144-Speed 5207.56 samples/sec Loss 2.8868 LearningRate 0.0385 Epoch: 7 Global Step: 126660 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:45:31,136-Speed 5143.52 samples/sec Loss 2.9573 LearningRate 0.0385 Epoch: 7 Global Step: 126670 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:45:33,105-Speed 5201.41 samples/sec Loss 2.9241 LearningRate 0.0385 Epoch: 7 Global Step: 126680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:45:35,075-Speed 5200.60 samples/sec Loss 2.9881 LearningRate 0.0385 Epoch: 7 Global Step: 126690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:45:37,050-Speed 5186.29 samples/sec Loss 3.0256 LearningRate 0.0385 Epoch: 7 Global Step: 126700 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:45:39,035-Speed 5160.77 samples/sec Loss 2.9939 LearningRate 0.0385 Epoch: 7 Global Step: 126710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:45:41,033-Speed 5124.39 samples/sec Loss 2.9621 LearningRate 0.0385 Epoch: 7 Global Step: 126720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:45:43,004-Speed 5198.79 samples/sec Loss 2.9004 LearningRate 0.0385 Epoch: 7 Global Step: 126730 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:45:44,975-Speed 5197.05 samples/sec Loss 2.9663 LearningRate 0.0385 Epoch: 7 Global Step: 126740 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:45:46,965-Speed 5147.25 samples/sec Loss 3.0193 LearningRate 0.0385 Epoch: 7 Global Step: 126750 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:45:48,971-Speed 5106.99 samples/sec Loss 2.9826 LearningRate 0.0385 Epoch: 7 Global Step: 126760 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:45:50,937-Speed 5210.76 samples/sec Loss 2.9921 LearningRate 0.0385 Epoch: 7 Global Step: 126770 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:45:52,910-Speed 5191.42 samples/sec Loss 2.9811 LearningRate 0.0385 Epoch: 7 Global Step: 126780 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:45:54,880-Speed 5198.17 samples/sec Loss 2.9130 LearningRate 0.0385 Epoch: 7 Global Step: 126790 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:45:56,860-Speed 5173.30 samples/sec Loss 2.9245 LearningRate 0.0385 Epoch: 7 Global Step: 126800 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:45:58,837-Speed 5183.04 samples/sec Loss 2.9272 LearningRate 0.0385 Epoch: 7 Global Step: 126810 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:46:00,827-Speed 5146.19 samples/sec Loss 2.9695 LearningRate 0.0385 Epoch: 7 Global Step: 126820 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:46:02,820-Speed 5140.68 samples/sec Loss 2.9416 LearningRate 0.0384 Epoch: 7 Global Step: 126830 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:46:04,802-Speed 5168.31 samples/sec Loss 2.9961 LearningRate 0.0384 Epoch: 7 Global Step: 126840 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:46:06,778-Speed 5183.94 samples/sec Loss 3.0614 LearningRate 0.0384 Epoch: 7 Global Step: 126850 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:46:08,762-Speed 5162.87 samples/sec Loss 2.9649 LearningRate 0.0384 Epoch: 7 Global Step: 126860 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:46:10,784-Speed 5066.26 samples/sec Loss 2.9879 LearningRate 0.0384 Epoch: 7 Global Step: 126870 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:46:12,754-Speed 5200.04 samples/sec Loss 2.9446 LearningRate 0.0384 Epoch: 7 Global Step: 126880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:46:14,726-Speed 5193.07 samples/sec Loss 2.9540 LearningRate 0.0384 Epoch: 7 Global Step: 126890 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:46:16,722-Speed 5132.67 samples/sec Loss 2.9077 LearningRate 0.0384 Epoch: 7 Global Step: 126900 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:46:18,710-Speed 5151.25 samples/sec Loss 2.9685 LearningRate 0.0384 Epoch: 7 Global Step: 126910 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:46:20,686-Speed 5185.87 samples/sec Loss 2.9629 LearningRate 0.0384 Epoch: 7 Global Step: 126920 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:46:22,656-Speed 5198.82 samples/sec Loss 2.8949 LearningRate 0.0384 Epoch: 7 Global Step: 126930 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:46:24,636-Speed 5172.57 samples/sec Loss 2.9064 LearningRate 0.0384 Epoch: 7 Global Step: 126940 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:46:26,612-Speed 5185.37 samples/sec Loss 2.9221 LearningRate 0.0384 Epoch: 7 Global Step: 126950 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:46:28,581-Speed 5202.14 samples/sec Loss 3.0367 LearningRate 0.0384 Epoch: 7 Global Step: 126960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:46:30,550-Speed 5202.22 samples/sec Loss 2.8833 LearningRate 0.0384 Epoch: 7 Global Step: 126970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:46:32,526-Speed 5184.21 samples/sec Loss 2.9327 LearningRate 0.0384 Epoch: 7 Global Step: 126980 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:46:34,500-Speed 5190.48 samples/sec Loss 2.9061 LearningRate 0.0384 Epoch: 7 Global Step: 126990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:46:36,482-Speed 5165.82 samples/sec Loss 2.9178 LearningRate 0.0384 Epoch: 7 Global Step: 127000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:46:38,457-Speed 5188.39 samples/sec Loss 2.9614 LearningRate 0.0384 Epoch: 7 Global Step: 127010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:46:40,430-Speed 5190.14 samples/sec Loss 2.9452 LearningRate 0.0384 Epoch: 7 Global Step: 127020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:46:42,409-Speed 5174.71 samples/sec Loss 2.9143 LearningRate 0.0384 Epoch: 7 Global Step: 127030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:46:44,391-Speed 5170.31 samples/sec Loss 2.9700 LearningRate 0.0384 Epoch: 7 Global Step: 127040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:46:46,367-Speed 5182.53 samples/sec Loss 2.9986 LearningRate 0.0384 Epoch: 7 Global Step: 127050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:46:48,390-Speed 5064.33 samples/sec Loss 3.0044 LearningRate 0.0384 Epoch: 7 Global Step: 127060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:46:50,358-Speed 5205.45 samples/sec Loss 2.9251 LearningRate 0.0384 Epoch: 7 Global Step: 127070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:46:52,328-Speed 5199.37 samples/sec Loss 2.9762 LearningRate 0.0384 Epoch: 7 Global Step: 127080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:46:54,297-Speed 5202.90 samples/sec Loss 2.9822 LearningRate 0.0384 Epoch: 7 Global Step: 127090 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:46:56,259-Speed 5219.40 samples/sec Loss 2.9589 LearningRate 0.0383 Epoch: 7 Global Step: 127100 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:46:58,240-Speed 5171.91 samples/sec Loss 2.9797 LearningRate 0.0383 Epoch: 7 Global Step: 127110 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:47:00,222-Speed 5168.06 samples/sec Loss 2.9408 LearningRate 0.0383 Epoch: 7 Global Step: 127120 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:47:02,189-Speed 5206.90 samples/sec Loss 3.0202 LearningRate 0.0383 Epoch: 7 Global Step: 127130 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:47:04,165-Speed 5185.67 samples/sec Loss 2.8891 LearningRate 0.0383 Epoch: 7 Global Step: 127140 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:47:06,142-Speed 5179.81 samples/sec Loss 2.9712 LearningRate 0.0383 Epoch: 7 Global Step: 127150 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:47:08,118-Speed 5183.52 samples/sec Loss 3.0383 LearningRate 0.0383 Epoch: 7 Global Step: 127160 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:47:10,096-Speed 5179.67 samples/sec Loss 2.9812 LearningRate 0.0383 Epoch: 7 Global Step: 127170 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:47:12,072-Speed 5185.10 samples/sec Loss 2.9924 LearningRate 0.0383 Epoch: 7 Global Step: 127180 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:47:14,050-Speed 5179.15 samples/sec Loss 3.0072 LearningRate 0.0383 Epoch: 7 Global Step: 127190 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:47:16,039-Speed 5149.99 samples/sec Loss 2.9424 LearningRate 0.0383 Epoch: 7 Global Step: 127200 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:47:18,023-Speed 5161.81 samples/sec Loss 2.9337 LearningRate 0.0383 Epoch: 7 Global Step: 127210 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:47:19,993-Speed 5199.52 samples/sec Loss 2.9951 LearningRate 0.0383 Epoch: 7 Global Step: 127220 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:47:21,968-Speed 5187.58 samples/sec Loss 2.9227 LearningRate 0.0383 Epoch: 7 Global Step: 127230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:47:23,960-Speed 5141.66 samples/sec Loss 2.9693 LearningRate 0.0383 Epoch: 7 Global Step: 127240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:47:25,960-Speed 5121.64 samples/sec Loss 2.8987 LearningRate 0.0383 Epoch: 7 Global Step: 127250 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:47:27,961-Speed 5119.88 samples/sec Loss 2.9546 LearningRate 0.0383 Epoch: 7 Global Step: 127260 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:47:29,938-Speed 5180.17 samples/sec Loss 2.9609 LearningRate 0.0383 Epoch: 7 Global Step: 127270 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:47:31,933-Speed 5135.09 samples/sec Loss 2.9766 LearningRate 0.0383 Epoch: 7 Global Step: 127280 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:47:33,918-Speed 5160.26 samples/sec Loss 2.8612 LearningRate 0.0383 Epoch: 7 Global Step: 127290 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:47:35,912-Speed 5136.92 samples/sec Loss 2.8809 LearningRate 0.0383 Epoch: 7 Global Step: 127300 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:47:37,906-Speed 5138.37 samples/sec Loss 2.9845 LearningRate 0.0383 Epoch: 7 Global Step: 127310 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:47:39,877-Speed 5195.42 samples/sec Loss 3.0229 LearningRate 0.0383 Epoch: 7 Global Step: 127320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:47:41,851-Speed 5189.77 samples/sec Loss 3.0507 LearningRate 0.0383 Epoch: 7 Global Step: 127330 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:47:43,827-Speed 5182.31 samples/sec Loss 2.9775 LearningRate 0.0383 Epoch: 7 Global Step: 127340 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:47:45,811-Speed 5164.18 samples/sec Loss 2.9112 LearningRate 0.0383 Epoch: 7 Global Step: 127350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:47:47,796-Speed 5161.74 samples/sec Loss 2.9836 LearningRate 0.0383 Epoch: 7 Global Step: 127360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:47:49,788-Speed 5140.84 samples/sec Loss 2.8721 LearningRate 0.0382 Epoch: 7 Global Step: 127370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:47:51,764-Speed 5186.21 samples/sec Loss 2.9883 LearningRate 0.0382 Epoch: 7 Global Step: 127380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:47:53,741-Speed 5180.37 samples/sec Loss 2.9603 LearningRate 0.0382 Epoch: 7 Global Step: 127390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:47:55,718-Speed 5181.09 samples/sec Loss 2.9101 LearningRate 0.0382 Epoch: 7 Global Step: 127400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:47:57,700-Speed 5169.38 samples/sec Loss 2.9738 LearningRate 0.0382 Epoch: 7 Global Step: 127410 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:47:59,695-Speed 5132.62 samples/sec Loss 2.9064 LearningRate 0.0382 Epoch: 7 Global Step: 127420 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:48:01,675-Speed 5174.07 samples/sec Loss 3.0243 LearningRate 0.0382 Epoch: 7 Global Step: 127430 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:48:03,660-Speed 5160.92 samples/sec Loss 2.9323 LearningRate 0.0382 Epoch: 7 Global Step: 127440 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:48:05,655-Speed 5133.51 samples/sec Loss 2.8930 LearningRate 0.0382 Epoch: 7 Global Step: 127450 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:48:07,628-Speed 5193.07 samples/sec Loss 3.0186 LearningRate 0.0382 Epoch: 7 Global Step: 127460 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:48:09,609-Speed 5169.94 samples/sec Loss 2.9865 LearningRate 0.0382 Epoch: 7 Global Step: 127470 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:48:11,610-Speed 5119.92 samples/sec Loss 3.0660 LearningRate 0.0382 Epoch: 7 Global Step: 127480 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:48:13,583-Speed 5192.14 samples/sec Loss 2.9464 LearningRate 0.0382 Epoch: 7 Global Step: 127490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:48:15,565-Speed 5167.05 samples/sec Loss 2.9453 LearningRate 0.0382 Epoch: 7 Global Step: 127500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:48:17,553-Speed 5153.26 samples/sec Loss 2.9110 LearningRate 0.0382 Epoch: 7 Global Step: 127510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:48:19,524-Speed 5195.43 samples/sec Loss 2.9627 LearningRate 0.0382 Epoch: 7 Global Step: 127520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:48:21,532-Speed 5101.62 samples/sec Loss 2.9949 LearningRate 0.0382 Epoch: 7 Global Step: 127530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:48:23,522-Speed 5148.24 samples/sec Loss 2.9514 LearningRate 0.0382 Epoch: 7 Global Step: 127540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:48:25,513-Speed 5145.26 samples/sec Loss 2.8492 LearningRate 0.0382 Epoch: 7 Global Step: 127550 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:48:27,504-Speed 5145.02 samples/sec Loss 2.9580 LearningRate 0.0382 Epoch: 7 Global Step: 127560 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:48:29,486-Speed 5166.22 samples/sec Loss 3.0342 LearningRate 0.0382 Epoch: 7 Global Step: 127570 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:48:31,465-Speed 5177.18 samples/sec Loss 2.9857 LearningRate 0.0382 Epoch: 7 Global Step: 127580 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:48:33,455-Speed 5147.98 samples/sec Loss 2.8645 LearningRate 0.0382 Epoch: 7 Global Step: 127590 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:48:35,461-Speed 5106.93 samples/sec Loss 3.0067 LearningRate 0.0382 Epoch: 7 Global Step: 127600 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:48:37,444-Speed 5164.33 samples/sec Loss 3.0296 LearningRate 0.0382 Epoch: 7 Global Step: 127610 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:48:39,421-Speed 5181.30 samples/sec Loss 2.9571 LearningRate 0.0382 Epoch: 7 Global Step: 127620 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:48:41,395-Speed 5190.14 samples/sec Loss 2.9894 LearningRate 0.0382 Epoch: 7 Global Step: 127630 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:48:43,373-Speed 5177.54 samples/sec Loss 2.9303 LearningRate 0.0381 Epoch: 7 Global Step: 127640 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:48:45,355-Speed 5168.04 samples/sec Loss 2.9961 LearningRate 0.0381 Epoch: 7 Global Step: 127650 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:48:47,349-Speed 5136.17 samples/sec Loss 2.9858 LearningRate 0.0381 Epoch: 7 Global Step: 127660 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:48:49,329-Speed 5174.07 samples/sec Loss 3.0218 LearningRate 0.0381 Epoch: 7 Global Step: 127670 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:48:51,330-Speed 5119.44 samples/sec Loss 2.9386 LearningRate 0.0381 Epoch: 7 Global Step: 127680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:48:53,309-Speed 5176.73 samples/sec Loss 2.9145 LearningRate 0.0381 Epoch: 7 Global Step: 127690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:48:55,284-Speed 5186.59 samples/sec Loss 2.8694 LearningRate 0.0381 Epoch: 7 Global Step: 127700 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:48:57,255-Speed 5198.04 samples/sec Loss 2.9589 LearningRate 0.0381 Epoch: 7 Global Step: 127710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:48:59,228-Speed 5191.80 samples/sec Loss 2.9861 LearningRate 0.0381 Epoch: 7 Global Step: 127720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:49:01,210-Speed 5167.84 samples/sec Loss 2.9717 LearningRate 0.0381 Epoch: 7 Global Step: 127730 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:49:03,184-Speed 5187.31 samples/sec Loss 2.9874 LearningRate 0.0381 Epoch: 7 Global Step: 127740 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:49:05,159-Speed 5187.68 samples/sec Loss 2.9213 LearningRate 0.0381 Epoch: 7 Global Step: 127750 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:49:07,132-Speed 5192.24 samples/sec Loss 2.9860 LearningRate 0.0381 Epoch: 7 Global Step: 127760 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:49:09,116-Speed 5162.99 samples/sec Loss 2.9930 LearningRate 0.0381 Epoch: 7 Global Step: 127770 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:49:11,089-Speed 5189.79 samples/sec Loss 2.9263 LearningRate 0.0381 Epoch: 7 Global Step: 127780 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:49:13,073-Speed 5162.93 samples/sec Loss 2.9682 LearningRate 0.0381 Epoch: 7 Global Step: 127790 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:49:15,058-Speed 5160.71 samples/sec Loss 2.9356 LearningRate 0.0381 Epoch: 7 Global Step: 127800 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:49:17,031-Speed 5193.79 samples/sec Loss 3.0095 LearningRate 0.0381 Epoch: 7 Global Step: 127810 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:49:19,001-Speed 5199.42 samples/sec Loss 2.9170 LearningRate 0.0381 Epoch: 7 Global Step: 127820 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:49:20,975-Speed 5189.71 samples/sec Loss 2.9622 LearningRate 0.0381 Epoch: 7 Global Step: 127830 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:49:22,959-Speed 5162.63 samples/sec Loss 2.9582 LearningRate 0.0381 Epoch: 7 Global Step: 127840 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:49:24,955-Speed 5131.12 samples/sec Loss 3.0095 LearningRate 0.0381 Epoch: 7 Global Step: 127850 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:49:26,943-Speed 5153.67 samples/sec Loss 3.0510 LearningRate 0.0381 Epoch: 7 Global Step: 127860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:49:28,919-Speed 5182.13 samples/sec Loss 2.9746 LearningRate 0.0381 Epoch: 7 Global Step: 127870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:49:30,894-Speed 5186.92 samples/sec Loss 2.9660 LearningRate 0.0381 Epoch: 7 Global Step: 127880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:49:32,883-Speed 5152.01 samples/sec Loss 2.9610 LearningRate 0.0381 Epoch: 7 Global Step: 127890 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:49:34,867-Speed 5163.96 samples/sec Loss 2.9882 LearningRate 0.0381 Epoch: 7 Global Step: 127900 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:49:36,843-Speed 5183.65 samples/sec Loss 2.9088 LearningRate 0.0380 Epoch: 7 Global Step: 127910 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:49:38,847-Speed 5110.23 samples/sec Loss 2.9336 LearningRate 0.0380 Epoch: 7 Global Step: 127920 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:49:40,836-Speed 5150.26 samples/sec Loss 2.9405 LearningRate 0.0380 Epoch: 7 Global Step: 127930 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:49:42,810-Speed 5189.92 samples/sec Loss 2.9693 LearningRate 0.0380 Epoch: 7 Global Step: 127940 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:49:44,795-Speed 5159.18 samples/sec Loss 2.9826 LearningRate 0.0380 Epoch: 7 Global Step: 127950 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:49:46,776-Speed 5171.87 samples/sec Loss 2.9811 LearningRate 0.0380 Epoch: 7 Global Step: 127960 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:49:48,770-Speed 5135.38 samples/sec Loss 2.8644 LearningRate 0.0380 Epoch: 7 Global Step: 127970 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:49:50,752-Speed 5168.25 samples/sec Loss 2.9319 LearningRate 0.0380 Epoch: 7 Global Step: 127980 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:49:52,732-Speed 5173.10 samples/sec Loss 2.9743 LearningRate 0.0380 Epoch: 7 Global Step: 127990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:49:54,743-Speed 5095.29 samples/sec Loss 2.9912 LearningRate 0.0380 Epoch: 7 Global Step: 128000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:50:21,274-[lfw][128000]XNorm: 22.200681 Training: 2022-04-11 07:50:21,274-[lfw][128000]Accuracy-Flip: 0.99767+-0.00300 Training: 2022-04-11 07:50:21,275-[lfw][128000]Accuracy-Highest: 0.99817 Training: 2022-04-11 07:50:51,989-[cfp_fp][128000]XNorm: 20.776564 Training: 2022-04-11 07:50:51,990-[cfp_fp][128000]Accuracy-Flip: 0.98314+-0.00581 Training: 2022-04-11 07:50:51,990-[cfp_fp][128000]Accuracy-Highest: 0.98443 Training: 2022-04-11 07:51:18,585-[agedb_30][128000]XNorm: 22.368251 Training: 2022-04-11 07:51:18,586-[agedb_30][128000]Accuracy-Flip: 0.98100+-0.00624 Training: 2022-04-11 07:51:18,586-[agedb_30][128000]Accuracy-Highest: 0.98150 Training: 2022-04-11 07:51:20,583-Speed 119.29 samples/sec Loss 2.9044 LearningRate 0.0380 Epoch: 7 Global Step: 128010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:51:22,549-Speed 5210.42 samples/sec Loss 2.9356 LearningRate 0.0380 Epoch: 7 Global Step: 128020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:51:24,520-Speed 5198.03 samples/sec Loss 2.9885 LearningRate 0.0380 Epoch: 7 Global Step: 128030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:51:26,489-Speed 5201.96 samples/sec Loss 2.9460 LearningRate 0.0380 Epoch: 7 Global Step: 128040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:51:28,459-Speed 5200.21 samples/sec Loss 2.9139 LearningRate 0.0380 Epoch: 7 Global Step: 128050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:51:30,421-Speed 5220.76 samples/sec Loss 2.8582 LearningRate 0.0380 Epoch: 7 Global Step: 128060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:51:32,387-Speed 5209.51 samples/sec Loss 2.8830 LearningRate 0.0380 Epoch: 7 Global Step: 128070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:51:34,351-Speed 5214.66 samples/sec Loss 2.8912 LearningRate 0.0380 Epoch: 7 Global Step: 128080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:51:36,323-Speed 5194.21 samples/sec Loss 2.9002 LearningRate 0.0380 Epoch: 7 Global Step: 128090 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:51:38,321-Speed 5128.83 samples/sec Loss 3.0052 LearningRate 0.0380 Epoch: 7 Global Step: 128100 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:51:40,289-Speed 5202.99 samples/sec Loss 2.8908 LearningRate 0.0380 Epoch: 7 Global Step: 128110 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:51:42,255-Speed 5209.75 samples/sec Loss 2.9894 LearningRate 0.0380 Epoch: 7 Global Step: 128120 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:51:44,221-Speed 5210.98 samples/sec Loss 3.0425 LearningRate 0.0380 Epoch: 7 Global Step: 128130 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:51:46,195-Speed 5189.75 samples/sec Loss 2.8911 LearningRate 0.0380 Epoch: 7 Global Step: 128140 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:51:48,186-Speed 5145.75 samples/sec Loss 2.9617 LearningRate 0.0380 Epoch: 7 Global Step: 128150 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:51:50,202-Speed 5080.14 samples/sec Loss 2.9380 LearningRate 0.0380 Epoch: 7 Global Step: 128160 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:51:52,191-Speed 5151.62 samples/sec Loss 2.9775 LearningRate 0.0380 Epoch: 7 Global Step: 128170 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:51:54,160-Speed 5201.21 samples/sec Loss 2.9323 LearningRate 0.0379 Epoch: 7 Global Step: 128180 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:51:56,143-Speed 5165.76 samples/sec Loss 2.9182 LearningRate 0.0379 Epoch: 7 Global Step: 128190 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:51:58,117-Speed 5189.84 samples/sec Loss 2.9014 LearningRate 0.0379 Epoch: 7 Global Step: 128200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:52:00,110-Speed 5139.05 samples/sec Loss 3.0457 LearningRate 0.0379 Epoch: 7 Global Step: 128210 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:52:02,089-Speed 5176.83 samples/sec Loss 2.9886 LearningRate 0.0379 Epoch: 7 Global Step: 128220 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:52:04,087-Speed 5125.63 samples/sec Loss 3.1008 LearningRate 0.0379 Epoch: 7 Global Step: 128230 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:52:06,074-Speed 5155.00 samples/sec Loss 2.9490 LearningRate 0.0379 Epoch: 7 Global Step: 128240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:52:08,065-Speed 5146.61 samples/sec Loss 3.0040 LearningRate 0.0379 Epoch: 7 Global Step: 128250 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:52:10,039-Speed 5192.29 samples/sec Loss 3.0342 LearningRate 0.0379 Epoch: 7 Global Step: 128260 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:52:12,018-Speed 5174.58 samples/sec Loss 2.9606 LearningRate 0.0379 Epoch: 7 Global Step: 128270 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:52:14,001-Speed 5163.85 samples/sec Loss 2.9423 LearningRate 0.0379 Epoch: 7 Global Step: 128280 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:52:15,968-Speed 5209.89 samples/sec Loss 2.9893 LearningRate 0.0379 Epoch: 7 Global Step: 128290 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:52:17,941-Speed 5192.17 samples/sec Loss 2.9129 LearningRate 0.0379 Epoch: 7 Global Step: 128300 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:52:19,927-Speed 5156.13 samples/sec Loss 2.8545 LearningRate 0.0379 Epoch: 7 Global Step: 128310 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:52:21,914-Speed 5155.18 samples/sec Loss 2.9400 LearningRate 0.0379 Epoch: 7 Global Step: 128320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:52:23,934-Speed 5073.28 samples/sec Loss 2.9810 LearningRate 0.0379 Epoch: 7 Global Step: 128330 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:52:25,920-Speed 5158.48 samples/sec Loss 3.0032 LearningRate 0.0379 Epoch: 7 Global Step: 128340 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:52:27,911-Speed 5143.10 samples/sec Loss 2.9818 LearningRate 0.0379 Epoch: 7 Global Step: 128350 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:52:29,891-Speed 5173.01 samples/sec Loss 3.0389 LearningRate 0.0379 Epoch: 7 Global Step: 128360 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:52:31,864-Speed 5192.34 samples/sec Loss 3.0250 LearningRate 0.0379 Epoch: 7 Global Step: 128370 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:52:33,840-Speed 5185.52 samples/sec Loss 2.9519 LearningRate 0.0379 Epoch: 7 Global Step: 128380 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:52:35,866-Speed 5053.64 samples/sec Loss 2.9895 LearningRate 0.0379 Epoch: 7 Global Step: 128390 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:52:37,858-Speed 5141.81 samples/sec Loss 2.9426 LearningRate 0.0379 Epoch: 7 Global Step: 128400 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:52:39,869-Speed 5093.47 samples/sec Loss 2.9461 LearningRate 0.0379 Epoch: 7 Global Step: 128410 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:52:41,853-Speed 5162.68 samples/sec Loss 2.9413 LearningRate 0.0379 Epoch: 7 Global Step: 128420 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:52:43,829-Speed 5185.05 samples/sec Loss 2.9465 LearningRate 0.0379 Epoch: 7 Global Step: 128430 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:52:45,813-Speed 5162.90 samples/sec Loss 2.9726 LearningRate 0.0379 Epoch: 7 Global Step: 128440 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:52:47,807-Speed 5137.84 samples/sec Loss 2.9859 LearningRate 0.0378 Epoch: 7 Global Step: 128450 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:52:49,821-Speed 5085.42 samples/sec Loss 2.9396 LearningRate 0.0378 Epoch: 7 Global Step: 128460 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:52:51,791-Speed 5201.48 samples/sec Loss 2.9545 LearningRate 0.0378 Epoch: 7 Global Step: 128470 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:52:53,763-Speed 5193.46 samples/sec Loss 2.9371 LearningRate 0.0378 Epoch: 7 Global Step: 128480 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:52:55,747-Speed 5163.71 samples/sec Loss 2.9633 LearningRate 0.0378 Epoch: 7 Global Step: 128490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:52:57,753-Speed 5105.92 samples/sec Loss 2.9407 LearningRate 0.0378 Epoch: 7 Global Step: 128500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:52:59,745-Speed 5141.85 samples/sec Loss 2.9917 LearningRate 0.0378 Epoch: 7 Global Step: 128510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:53:01,733-Speed 5152.40 samples/sec Loss 2.9641 LearningRate 0.0378 Epoch: 7 Global Step: 128520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:53:03,727-Speed 5138.43 samples/sec Loss 2.9052 LearningRate 0.0378 Epoch: 7 Global Step: 128530 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:53:05,719-Speed 5140.19 samples/sec Loss 2.8776 LearningRate 0.0378 Epoch: 7 Global Step: 128540 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:53:07,700-Speed 5172.43 samples/sec Loss 2.9650 LearningRate 0.0378 Epoch: 7 Global Step: 128550 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:53:09,678-Speed 5178.28 samples/sec Loss 2.9947 LearningRate 0.0378 Epoch: 7 Global Step: 128560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:53:11,671-Speed 5139.73 samples/sec Loss 2.9236 LearningRate 0.0378 Epoch: 7 Global Step: 128570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:53:13,655-Speed 5163.07 samples/sec Loss 3.0060 LearningRate 0.0378 Epoch: 7 Global Step: 128580 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:53:15,641-Speed 5156.70 samples/sec Loss 2.9747 LearningRate 0.0378 Epoch: 7 Global Step: 128590 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:53:17,614-Speed 5191.75 samples/sec Loss 2.9719 LearningRate 0.0378 Epoch: 7 Global Step: 128600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:53:19,583-Speed 5202.46 samples/sec Loss 2.8846 LearningRate 0.0378 Epoch: 7 Global Step: 128610 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:53:21,570-Speed 5154.34 samples/sec Loss 2.9491 LearningRate 0.0378 Epoch: 7 Global Step: 128620 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:53:23,568-Speed 5128.07 samples/sec Loss 2.9394 LearningRate 0.0378 Epoch: 7 Global Step: 128630 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:53:25,540-Speed 5194.10 samples/sec Loss 2.9652 LearningRate 0.0378 Epoch: 7 Global Step: 128640 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:53:27,518-Speed 5181.62 samples/sec Loss 2.8953 LearningRate 0.0378 Epoch: 7 Global Step: 128650 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:53:29,497-Speed 5175.46 samples/sec Loss 3.0020 LearningRate 0.0378 Epoch: 7 Global Step: 128660 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:53:31,465-Speed 5203.65 samples/sec Loss 2.9342 LearningRate 0.0378 Epoch: 7 Global Step: 128670 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:53:33,448-Speed 5166.50 samples/sec Loss 2.9290 LearningRate 0.0378 Epoch: 7 Global Step: 128680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:53:35,416-Speed 5205.97 samples/sec Loss 2.9746 LearningRate 0.0378 Epoch: 7 Global Step: 128690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:53:37,396-Speed 5173.61 samples/sec Loss 2.9317 LearningRate 0.0378 Epoch: 7 Global Step: 128700 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:53:39,380-Speed 5162.12 samples/sec Loss 3.0114 LearningRate 0.0378 Epoch: 7 Global Step: 128710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:53:41,359-Speed 5175.74 samples/sec Loss 2.9713 LearningRate 0.0377 Epoch: 7 Global Step: 128720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:53:43,325-Speed 5209.80 samples/sec Loss 2.9375 LearningRate 0.0377 Epoch: 7 Global Step: 128730 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:53:45,321-Speed 5132.32 samples/sec Loss 2.9328 LearningRate 0.0377 Epoch: 7 Global Step: 128740 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:53:47,329-Speed 5100.98 samples/sec Loss 2.9661 LearningRate 0.0377 Epoch: 7 Global Step: 128750 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:53:49,329-Speed 5122.76 samples/sec Loss 2.9280 LearningRate 0.0377 Epoch: 7 Global Step: 128760 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:53:51,309-Speed 5172.77 samples/sec Loss 2.8824 LearningRate 0.0377 Epoch: 7 Global Step: 128770 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:53:53,285-Speed 5184.41 samples/sec Loss 2.9677 LearningRate 0.0377 Epoch: 7 Global Step: 128780 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:53:55,257-Speed 5194.83 samples/sec Loss 2.9396 LearningRate 0.0377 Epoch: 7 Global Step: 128790 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:53:57,239-Speed 5168.58 samples/sec Loss 2.9355 LearningRate 0.0377 Epoch: 7 Global Step: 128800 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:53:59,225-Speed 5155.48 samples/sec Loss 2.9553 LearningRate 0.0377 Epoch: 7 Global Step: 128810 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:54:01,196-Speed 5197.52 samples/sec Loss 2.9668 LearningRate 0.0377 Epoch: 7 Global Step: 128820 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:54:03,180-Speed 5162.88 samples/sec Loss 2.9036 LearningRate 0.0377 Epoch: 7 Global Step: 128830 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:54:05,163-Speed 5167.17 samples/sec Loss 2.8753 LearningRate 0.0377 Epoch: 7 Global Step: 128840 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:54:07,130-Speed 5208.75 samples/sec Loss 2.9424 LearningRate 0.0377 Epoch: 7 Global Step: 128850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:54:09,103-Speed 5191.25 samples/sec Loss 2.9160 LearningRate 0.0377 Epoch: 7 Global Step: 128860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:54:11,082-Speed 5174.96 samples/sec Loss 2.9523 LearningRate 0.0377 Epoch: 7 Global Step: 128870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:54:13,062-Speed 5173.54 samples/sec Loss 2.9569 LearningRate 0.0377 Epoch: 7 Global Step: 128880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:54:15,042-Speed 5174.50 samples/sec Loss 2.9818 LearningRate 0.0377 Epoch: 7 Global Step: 128890 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:54:17,011-Speed 5200.71 samples/sec Loss 3.0914 LearningRate 0.0377 Epoch: 7 Global Step: 128900 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:54:18,993-Speed 5168.13 samples/sec Loss 3.0756 LearningRate 0.0377 Epoch: 7 Global Step: 128910 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:54:20,965-Speed 5193.87 samples/sec Loss 2.9769 LearningRate 0.0377 Epoch: 7 Global Step: 128920 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:54:22,954-Speed 5152.06 samples/sec Loss 2.9672 LearningRate 0.0377 Epoch: 7 Global Step: 128930 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:54:24,939-Speed 5160.33 samples/sec Loss 2.9563 LearningRate 0.0377 Epoch: 7 Global Step: 128940 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:54:26,908-Speed 5202.93 samples/sec Loss 2.8896 LearningRate 0.0377 Epoch: 7 Global Step: 128950 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:54:28,910-Speed 5117.54 samples/sec Loss 2.9711 LearningRate 0.0377 Epoch: 7 Global Step: 128960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:54:30,879-Speed 5201.93 samples/sec Loss 2.9047 LearningRate 0.0377 Epoch: 7 Global Step: 128970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:54:32,872-Speed 5138.88 samples/sec Loss 2.9861 LearningRate 0.0377 Epoch: 7 Global Step: 128980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:54:34,860-Speed 5152.46 samples/sec Loss 2.8716 LearningRate 0.0376 Epoch: 7 Global Step: 128990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:54:36,873-Speed 5090.11 samples/sec Loss 2.9773 LearningRate 0.0376 Epoch: 7 Global Step: 129000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:54:38,873-Speed 5119.56 samples/sec Loss 2.9652 LearningRate 0.0376 Epoch: 7 Global Step: 129010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:54:40,859-Speed 5157.14 samples/sec Loss 2.9754 LearningRate 0.0376 Epoch: 7 Global Step: 129020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:54:42,849-Speed 5149.37 samples/sec Loss 2.9486 LearningRate 0.0376 Epoch: 7 Global Step: 129030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:54:44,825-Speed 5183.81 samples/sec Loss 2.8900 LearningRate 0.0376 Epoch: 7 Global Step: 129040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:54:46,796-Speed 5196.55 samples/sec Loss 2.9523 LearningRate 0.0376 Epoch: 7 Global Step: 129050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:54:48,768-Speed 5194.63 samples/sec Loss 2.9804 LearningRate 0.0376 Epoch: 7 Global Step: 129060 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:54:50,759-Speed 5144.91 samples/sec Loss 2.9191 LearningRate 0.0376 Epoch: 7 Global Step: 129070 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:54:52,718-Speed 5230.05 samples/sec Loss 3.0401 LearningRate 0.0376 Epoch: 7 Global Step: 129080 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:54:54,691-Speed 5190.14 samples/sec Loss 2.8945 LearningRate 0.0376 Epoch: 7 Global Step: 129090 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:54:56,666-Speed 5185.96 samples/sec Loss 2.9603 LearningRate 0.0376 Epoch: 7 Global Step: 129100 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:54:58,650-Speed 5164.92 samples/sec Loss 2.9429 LearningRate 0.0376 Epoch: 7 Global Step: 129110 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:55:00,628-Speed 5176.58 samples/sec Loss 2.9796 LearningRate 0.0376 Epoch: 7 Global Step: 129120 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:55:02,605-Speed 5182.46 samples/sec Loss 2.9532 LearningRate 0.0376 Epoch: 7 Global Step: 129130 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:55:04,589-Speed 5164.04 samples/sec Loss 3.0034 LearningRate 0.0376 Epoch: 7 Global Step: 129140 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:55:06,565-Speed 5183.69 samples/sec Loss 2.9922 LearningRate 0.0376 Epoch: 7 Global Step: 129150 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:55:08,552-Speed 5154.30 samples/sec Loss 2.9568 LearningRate 0.0376 Epoch: 7 Global Step: 129160 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:55:10,524-Speed 5194.50 samples/sec Loss 2.9540 LearningRate 0.0376 Epoch: 7 Global Step: 129170 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:55:12,514-Speed 5146.90 samples/sec Loss 2.9374 LearningRate 0.0376 Epoch: 7 Global Step: 129180 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:55:14,534-Speed 5071.68 samples/sec Loss 2.9762 LearningRate 0.0376 Epoch: 7 Global Step: 129190 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:55:16,518-Speed 5162.26 samples/sec Loss 2.9365 LearningRate 0.0376 Epoch: 7 Global Step: 129200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:55:18,498-Speed 5173.82 samples/sec Loss 3.0071 LearningRate 0.0376 Epoch: 7 Global Step: 129210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:55:20,508-Speed 5097.19 samples/sec Loss 2.9578 LearningRate 0.0376 Epoch: 7 Global Step: 129220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:55:22,484-Speed 5184.62 samples/sec Loss 2.9648 LearningRate 0.0376 Epoch: 7 Global Step: 129230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:55:24,480-Speed 5131.68 samples/sec Loss 2.9355 LearningRate 0.0376 Epoch: 7 Global Step: 129240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:55:26,465-Speed 5159.21 samples/sec Loss 3.0114 LearningRate 0.0376 Epoch: 7 Global Step: 129250 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:55:28,437-Speed 5195.72 samples/sec Loss 2.9995 LearningRate 0.0376 Epoch: 7 Global Step: 129260 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:55:30,433-Speed 5131.53 samples/sec Loss 2.8900 LearningRate 0.0375 Epoch: 7 Global Step: 129270 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:55:32,406-Speed 5191.51 samples/sec Loss 2.9302 LearningRate 0.0375 Epoch: 7 Global Step: 129280 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:55:34,383-Speed 5180.75 samples/sec Loss 2.9836 LearningRate 0.0375 Epoch: 7 Global Step: 129290 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:55:36,360-Speed 5182.30 samples/sec Loss 2.9974 LearningRate 0.0375 Epoch: 7 Global Step: 129300 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:55:38,349-Speed 5149.48 samples/sec Loss 2.9212 LearningRate 0.0375 Epoch: 7 Global Step: 129310 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:55:40,334-Speed 5161.32 samples/sec Loss 2.9118 LearningRate 0.0375 Epoch: 7 Global Step: 129320 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:55:42,309-Speed 5186.07 samples/sec Loss 2.9038 LearningRate 0.0375 Epoch: 7 Global Step: 129330 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:55:44,282-Speed 5190.76 samples/sec Loss 3.0285 LearningRate 0.0375 Epoch: 7 Global Step: 129340 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:55:46,257-Speed 5186.92 samples/sec Loss 2.9291 LearningRate 0.0375 Epoch: 7 Global Step: 129350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:55:48,232-Speed 5187.86 samples/sec Loss 2.9273 LearningRate 0.0375 Epoch: 7 Global Step: 129360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:55:50,211-Speed 5174.65 samples/sec Loss 3.0219 LearningRate 0.0375 Epoch: 7 Global Step: 129370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:55:52,187-Speed 5183.98 samples/sec Loss 2.9996 LearningRate 0.0375 Epoch: 7 Global Step: 129380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:55:54,153-Speed 5209.46 samples/sec Loss 2.8925 LearningRate 0.0375 Epoch: 7 Global Step: 129390 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:55:56,141-Speed 5154.45 samples/sec Loss 2.9515 LearningRate 0.0375 Epoch: 7 Global Step: 129400 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:55:58,138-Speed 5128.63 samples/sec Loss 2.9322 LearningRate 0.0375 Epoch: 7 Global Step: 129410 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:56:00,114-Speed 5184.72 samples/sec Loss 2.9726 LearningRate 0.0375 Epoch: 7 Global Step: 129420 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:56:02,109-Speed 5134.27 samples/sec Loss 2.9695 LearningRate 0.0375 Epoch: 7 Global Step: 129430 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:56:04,103-Speed 5137.55 samples/sec Loss 2.9735 LearningRate 0.0375 Epoch: 7 Global Step: 129440 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:56:06,086-Speed 5166.09 samples/sec Loss 2.9951 LearningRate 0.0375 Epoch: 7 Global Step: 129450 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:56:08,066-Speed 5172.98 samples/sec Loss 2.9962 LearningRate 0.0375 Epoch: 7 Global Step: 129460 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:56:10,052-Speed 5156.46 samples/sec Loss 2.9217 LearningRate 0.0375 Epoch: 7 Global Step: 129470 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:56:12,025-Speed 5192.15 samples/sec Loss 2.9087 LearningRate 0.0375 Epoch: 7 Global Step: 129480 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:56:14,032-Speed 5102.93 samples/sec Loss 2.9530 LearningRate 0.0375 Epoch: 7 Global Step: 129490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:56:16,010-Speed 5179.99 samples/sec Loss 2.9758 LearningRate 0.0375 Epoch: 7 Global Step: 129500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:56:17,988-Speed 5178.65 samples/sec Loss 2.9826 LearningRate 0.0375 Epoch: 7 Global Step: 129510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:56:19,978-Speed 5148.68 samples/sec Loss 2.9219 LearningRate 0.0375 Epoch: 7 Global Step: 129520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:56:21,953-Speed 5185.34 samples/sec Loss 2.9754 LearningRate 0.0375 Epoch: 7 Global Step: 129530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:56:23,925-Speed 5195.88 samples/sec Loss 2.8625 LearningRate 0.0374 Epoch: 7 Global Step: 129540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:56:25,911-Speed 5158.23 samples/sec Loss 3.0004 LearningRate 0.0374 Epoch: 7 Global Step: 129550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:56:27,892-Speed 5169.56 samples/sec Loss 2.8685 LearningRate 0.0374 Epoch: 7 Global Step: 129560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:56:29,869-Speed 5181.68 samples/sec Loss 2.9040 LearningRate 0.0374 Epoch: 7 Global Step: 129570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:56:31,837-Speed 5204.98 samples/sec Loss 2.9645 LearningRate 0.0374 Epoch: 7 Global Step: 129580 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:56:33,832-Speed 5133.13 samples/sec Loss 2.9485 LearningRate 0.0374 Epoch: 7 Global Step: 129590 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:56:35,837-Speed 5109.00 samples/sec Loss 2.9306 LearningRate 0.0374 Epoch: 7 Global Step: 129600 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:56:37,863-Speed 5055.09 samples/sec Loss 2.8912 LearningRate 0.0374 Epoch: 7 Global Step: 129610 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:56:39,857-Speed 5139.86 samples/sec Loss 2.8921 LearningRate 0.0374 Epoch: 7 Global Step: 129620 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:56:41,871-Speed 5084.99 samples/sec Loss 2.9120 LearningRate 0.0374 Epoch: 7 Global Step: 129630 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:56:43,859-Speed 5151.18 samples/sec Loss 3.0245 LearningRate 0.0374 Epoch: 7 Global Step: 129640 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:56:45,842-Speed 5167.05 samples/sec Loss 2.9324 LearningRate 0.0374 Epoch: 7 Global Step: 129650 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:56:47,820-Speed 5178.59 samples/sec Loss 2.9142 LearningRate 0.0374 Epoch: 7 Global Step: 129660 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:56:49,802-Speed 5168.24 samples/sec Loss 2.9813 LearningRate 0.0374 Epoch: 7 Global Step: 129670 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:56:51,800-Speed 5126.67 samples/sec Loss 2.9150 LearningRate 0.0374 Epoch: 7 Global Step: 129680 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:56:53,785-Speed 5158.54 samples/sec Loss 2.9582 LearningRate 0.0374 Epoch: 7 Global Step: 129690 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:56:55,768-Speed 5167.18 samples/sec Loss 2.9915 LearningRate 0.0374 Epoch: 7 Global Step: 129700 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:56:57,744-Speed 5182.86 samples/sec Loss 2.9563 LearningRate 0.0374 Epoch: 7 Global Step: 129710 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:56:59,714-Speed 5199.63 samples/sec Loss 2.9940 LearningRate 0.0374 Epoch: 7 Global Step: 129720 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:57:01,700-Speed 5159.68 samples/sec Loss 2.9907 LearningRate 0.0374 Epoch: 7 Global Step: 129730 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:57:03,672-Speed 5193.71 samples/sec Loss 2.9184 LearningRate 0.0374 Epoch: 7 Global Step: 129740 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:57:05,642-Speed 5201.24 samples/sec Loss 2.9477 LearningRate 0.0374 Epoch: 7 Global Step: 129750 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:57:07,620-Speed 5177.54 samples/sec Loss 2.8942 LearningRate 0.0374 Epoch: 7 Global Step: 129760 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 07:57:09,606-Speed 5158.70 samples/sec Loss 2.9485 LearningRate 0.0374 Epoch: 7 Global Step: 129770 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:57:11,581-Speed 5185.22 samples/sec Loss 2.9762 LearningRate 0.0374 Epoch: 7 Global Step: 129780 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:57:13,554-Speed 5190.29 samples/sec Loss 2.9800 LearningRate 0.0374 Epoch: 7 Global Step: 129790 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:57:15,547-Speed 5140.58 samples/sec Loss 2.9429 LearningRate 0.0374 Epoch: 7 Global Step: 129800 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:57:17,524-Speed 5185.46 samples/sec Loss 2.9366 LearningRate 0.0373 Epoch: 7 Global Step: 129810 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:57:19,510-Speed 5156.45 samples/sec Loss 2.8973 LearningRate 0.0373 Epoch: 7 Global Step: 129820 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:57:21,495-Speed 5159.37 samples/sec Loss 2.9084 LearningRate 0.0373 Epoch: 7 Global Step: 129830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:57:23,488-Speed 5140.87 samples/sec Loss 2.9532 LearningRate 0.0373 Epoch: 7 Global Step: 129840 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:57:25,494-Speed 5108.46 samples/sec Loss 2.9892 LearningRate 0.0373 Epoch: 7 Global Step: 129850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:57:27,482-Speed 5151.90 samples/sec Loss 2.9557 LearningRate 0.0373 Epoch: 7 Global Step: 129860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:57:29,473-Speed 5144.46 samples/sec Loss 2.9976 LearningRate 0.0373 Epoch: 7 Global Step: 129870 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:57:31,454-Speed 5171.09 samples/sec Loss 2.9753 LearningRate 0.0373 Epoch: 7 Global Step: 129880 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:57:33,442-Speed 5153.02 samples/sec Loss 2.9666 LearningRate 0.0373 Epoch: 7 Global Step: 129890 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:57:35,423-Speed 5170.04 samples/sec Loss 2.9546 LearningRate 0.0373 Epoch: 7 Global Step: 129900 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:57:37,425-Speed 5115.27 samples/sec Loss 2.9871 LearningRate 0.0373 Epoch: 7 Global Step: 129910 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:57:39,428-Speed 5113.63 samples/sec Loss 2.9533 LearningRate 0.0373 Epoch: 7 Global Step: 129920 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:57:41,420-Speed 5143.16 samples/sec Loss 2.8970 LearningRate 0.0373 Epoch: 7 Global Step: 129930 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:57:43,393-Speed 5192.55 samples/sec Loss 2.9035 LearningRate 0.0373 Epoch: 7 Global Step: 129940 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:57:45,359-Speed 5209.43 samples/sec Loss 2.9266 LearningRate 0.0373 Epoch: 7 Global Step: 129950 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:57:47,349-Speed 5147.40 samples/sec Loss 2.9836 LearningRate 0.0373 Epoch: 7 Global Step: 129960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:57:49,331-Speed 5167.99 samples/sec Loss 2.9551 LearningRate 0.0373 Epoch: 7 Global Step: 129970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:57:51,332-Speed 5120.56 samples/sec Loss 2.8589 LearningRate 0.0373 Epoch: 7 Global Step: 129980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:57:53,304-Speed 5193.32 samples/sec Loss 2.9202 LearningRate 0.0373 Epoch: 7 Global Step: 129990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:57:55,281-Speed 5182.37 samples/sec Loss 2.9513 LearningRate 0.0373 Epoch: 7 Global Step: 130000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:58:21,983-[lfw][130000]XNorm: 22.036494 Training: 2022-04-11 07:58:21,984-[lfw][130000]Accuracy-Flip: 0.99800+-0.00267 Training: 2022-04-11 07:58:21,984-[lfw][130000]Accuracy-Highest: 0.99817 Training: 2022-04-11 07:58:52,771-[cfp_fp][130000]XNorm: 20.658339 Training: 2022-04-11 07:58:52,771-[cfp_fp][130000]Accuracy-Flip: 0.98300+-0.00480 Training: 2022-04-11 07:58:52,772-[cfp_fp][130000]Accuracy-Highest: 0.98443 Training: 2022-04-11 07:59:19,380-[agedb_30][130000]XNorm: 22.253582 Training: 2022-04-11 07:59:19,381-[agedb_30][130000]Accuracy-Flip: 0.98033+-0.00752 Training: 2022-04-11 07:59:19,381-[agedb_30][130000]Accuracy-Highest: 0.98150 Training: 2022-04-11 07:59:21,359-Speed 118.96 samples/sec Loss 2.9194 LearningRate 0.0373 Epoch: 7 Global Step: 130010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:59:23,313-Speed 5240.92 samples/sec Loss 2.9173 LearningRate 0.0373 Epoch: 7 Global Step: 130020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:59:25,283-Speed 5200.87 samples/sec Loss 2.9527 LearningRate 0.0373 Epoch: 7 Global Step: 130030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:59:27,249-Speed 5210.07 samples/sec Loss 2.9484 LearningRate 0.0373 Epoch: 7 Global Step: 130040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:59:29,215-Speed 5211.37 samples/sec Loss 2.8713 LearningRate 0.0373 Epoch: 7 Global Step: 130050 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:59:31,174-Speed 5227.90 samples/sec Loss 3.0047 LearningRate 0.0373 Epoch: 7 Global Step: 130060 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:59:33,149-Speed 5187.96 samples/sec Loss 2.9914 LearningRate 0.0373 Epoch: 7 Global Step: 130070 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:59:35,114-Speed 5212.26 samples/sec Loss 2.9888 LearningRate 0.0373 Epoch: 7 Global Step: 130080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:59:37,077-Speed 5218.58 samples/sec Loss 2.9665 LearningRate 0.0372 Epoch: 7 Global Step: 130090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:59:39,058-Speed 5170.29 samples/sec Loss 2.8275 LearningRate 0.0372 Epoch: 7 Global Step: 130100 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:59:41,023-Speed 5212.80 samples/sec Loss 2.9444 LearningRate 0.0372 Epoch: 7 Global Step: 130110 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:59:42,987-Speed 5214.47 samples/sec Loss 2.8815 LearningRate 0.0372 Epoch: 7 Global Step: 130120 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:59:44,955-Speed 5205.60 samples/sec Loss 2.9257 LearningRate 0.0372 Epoch: 7 Global Step: 130130 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:59:46,945-Speed 5147.88 samples/sec Loss 2.9374 LearningRate 0.0372 Epoch: 7 Global Step: 130140 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:59:48,951-Speed 5105.30 samples/sec Loss 2.9001 LearningRate 0.0372 Epoch: 7 Global Step: 130150 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:59:50,931-Speed 5174.67 samples/sec Loss 2.9504 LearningRate 0.0372 Epoch: 7 Global Step: 130160 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:59:52,911-Speed 5172.68 samples/sec Loss 2.9030 LearningRate 0.0372 Epoch: 7 Global Step: 130170 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 07:59:54,891-Speed 5175.39 samples/sec Loss 2.8756 LearningRate 0.0372 Epoch: 7 Global Step: 130180 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:59:56,857-Speed 5209.31 samples/sec Loss 2.9388 LearningRate 0.0372 Epoch: 7 Global Step: 130190 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 07:59:58,834-Speed 5182.14 samples/sec Loss 2.9544 LearningRate 0.0372 Epoch: 7 Global Step: 130200 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:00:00,807-Speed 5190.59 samples/sec Loss 2.8709 LearningRate 0.0372 Epoch: 7 Global Step: 130210 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:00:02,791-Speed 5164.02 samples/sec Loss 2.8985 LearningRate 0.0372 Epoch: 7 Global Step: 130220 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:00:04,804-Speed 5086.32 samples/sec Loss 2.9287 LearningRate 0.0372 Epoch: 7 Global Step: 130230 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:00:06,773-Speed 5203.38 samples/sec Loss 2.9654 LearningRate 0.0372 Epoch: 7 Global Step: 130240 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:00:08,744-Speed 5197.43 samples/sec Loss 2.9361 LearningRate 0.0372 Epoch: 7 Global Step: 130250 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:00:10,717-Speed 5191.18 samples/sec Loss 2.8768 LearningRate 0.0372 Epoch: 7 Global Step: 130260 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:00:12,697-Speed 5175.74 samples/sec Loss 2.9011 LearningRate 0.0372 Epoch: 7 Global Step: 130270 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:00:14,682-Speed 5159.39 samples/sec Loss 2.9413 LearningRate 0.0372 Epoch: 7 Global Step: 130280 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:00:16,648-Speed 5210.32 samples/sec Loss 2.9662 LearningRate 0.0372 Epoch: 7 Global Step: 130290 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:00:18,619-Speed 5197.46 samples/sec Loss 2.8989 LearningRate 0.0372 Epoch: 7 Global Step: 130300 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:00:20,596-Speed 5180.27 samples/sec Loss 2.9498 LearningRate 0.0372 Epoch: 7 Global Step: 130310 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:00:22,566-Speed 5200.69 samples/sec Loss 2.9016 LearningRate 0.0372 Epoch: 7 Global Step: 130320 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:00:24,543-Speed 5180.94 samples/sec Loss 2.9380 LearningRate 0.0372 Epoch: 7 Global Step: 130330 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:00:26,547-Speed 5109.88 samples/sec Loss 2.9349 LearningRate 0.0372 Epoch: 7 Global Step: 130340 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:00:28,551-Speed 5113.49 samples/sec Loss 2.9001 LearningRate 0.0372 Epoch: 7 Global Step: 130350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:00:30,531-Speed 5172.95 samples/sec Loss 2.9189 LearningRate 0.0371 Epoch: 7 Global Step: 130360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:00:32,504-Speed 5192.34 samples/sec Loss 2.9400 LearningRate 0.0371 Epoch: 7 Global Step: 130370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:00:34,495-Speed 5145.39 samples/sec Loss 2.9282 LearningRate 0.0371 Epoch: 7 Global Step: 130380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:00:36,497-Speed 5117.05 samples/sec Loss 2.8813 LearningRate 0.0371 Epoch: 7 Global Step: 130390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:00:38,489-Speed 5141.57 samples/sec Loss 2.8824 LearningRate 0.0371 Epoch: 7 Global Step: 130400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:00:40,460-Speed 5197.17 samples/sec Loss 2.9346 LearningRate 0.0371 Epoch: 7 Global Step: 130410 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:00:42,429-Speed 5202.30 samples/sec Loss 2.9624 LearningRate 0.0371 Epoch: 7 Global Step: 130420 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:00:44,407-Speed 5178.51 samples/sec Loss 2.8924 LearningRate 0.0371 Epoch: 7 Global Step: 130430 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:00:46,406-Speed 5122.60 samples/sec Loss 2.9089 LearningRate 0.0371 Epoch: 7 Global Step: 130440 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:00:48,390-Speed 5162.96 samples/sec Loss 2.9196 LearningRate 0.0371 Epoch: 7 Global Step: 130450 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:00:50,366-Speed 5185.63 samples/sec Loss 3.0068 LearningRate 0.0371 Epoch: 7 Global Step: 130460 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:00:52,349-Speed 5165.11 samples/sec Loss 2.9635 LearningRate 0.0371 Epoch: 7 Global Step: 130470 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:00:54,318-Speed 5202.89 samples/sec Loss 2.9384 LearningRate 0.0371 Epoch: 7 Global Step: 130480 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:00:56,298-Speed 5173.09 samples/sec Loss 3.0632 LearningRate 0.0371 Epoch: 7 Global Step: 130490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:00:58,292-Speed 5136.26 samples/sec Loss 2.9435 LearningRate 0.0371 Epoch: 7 Global Step: 130500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:01:00,269-Speed 5182.51 samples/sec Loss 2.8931 LearningRate 0.0371 Epoch: 7 Global Step: 130510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:01:02,249-Speed 5173.97 samples/sec Loss 2.9493 LearningRate 0.0371 Epoch: 7 Global Step: 130520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:01:04,220-Speed 5195.42 samples/sec Loss 2.9539 LearningRate 0.0371 Epoch: 7 Global Step: 130530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:01:06,201-Speed 5171.08 samples/sec Loss 2.9270 LearningRate 0.0371 Epoch: 7 Global Step: 130540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:01:08,169-Speed 5206.35 samples/sec Loss 2.9048 LearningRate 0.0371 Epoch: 7 Global Step: 130550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:01:10,143-Speed 5188.45 samples/sec Loss 2.9690 LearningRate 0.0371 Epoch: 7 Global Step: 130560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:01:12,126-Speed 5165.77 samples/sec Loss 2.9515 LearningRate 0.0371 Epoch: 7 Global Step: 130570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:01:14,093-Speed 5207.03 samples/sec Loss 3.0083 LearningRate 0.0371 Epoch: 7 Global Step: 130580 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:01:16,086-Speed 5140.42 samples/sec Loss 2.9739 LearningRate 0.0371 Epoch: 7 Global Step: 130590 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:01:18,072-Speed 5156.95 samples/sec Loss 2.9371 LearningRate 0.0371 Epoch: 7 Global Step: 130600 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:01:20,040-Speed 5204.51 samples/sec Loss 2.9984 LearningRate 0.0371 Epoch: 7 Global Step: 130610 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:01:22,036-Speed 5132.58 samples/sec Loss 2.9343 LearningRate 0.0371 Epoch: 7 Global Step: 130620 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:01:24,045-Speed 5098.76 samples/sec Loss 2.9858 LearningRate 0.0370 Epoch: 7 Global Step: 130630 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:01:26,048-Speed 5112.81 samples/sec Loss 2.8912 LearningRate 0.0370 Epoch: 7 Global Step: 130640 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:01:28,026-Speed 5179.46 samples/sec Loss 2.8899 LearningRate 0.0370 Epoch: 7 Global Step: 130650 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:01:30,001-Speed 5185.52 samples/sec Loss 2.8782 LearningRate 0.0370 Epoch: 7 Global Step: 130660 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:01:31,968-Speed 5209.40 samples/sec Loss 2.9204 LearningRate 0.0370 Epoch: 7 Global Step: 130670 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:01:33,940-Speed 5193.24 samples/sec Loss 2.9116 LearningRate 0.0370 Epoch: 7 Global Step: 130680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:01:35,957-Speed 5080.29 samples/sec Loss 2.9351 LearningRate 0.0370 Epoch: 7 Global Step: 130690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:01:37,925-Speed 5204.82 samples/sec Loss 3.0361 LearningRate 0.0370 Epoch: 7 Global Step: 130700 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:01:39,897-Speed 5195.44 samples/sec Loss 2.9322 LearningRate 0.0370 Epoch: 7 Global Step: 130710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:01:41,893-Speed 5129.54 samples/sec Loss 2.9348 LearningRate 0.0370 Epoch: 7 Global Step: 130720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:01:43,886-Speed 5141.54 samples/sec Loss 2.9458 LearningRate 0.0370 Epoch: 7 Global Step: 130730 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:01:45,851-Speed 5212.72 samples/sec Loss 2.9049 LearningRate 0.0370 Epoch: 7 Global Step: 130740 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:01:47,828-Speed 5179.87 samples/sec Loss 2.9380 LearningRate 0.0370 Epoch: 7 Global Step: 130750 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:01:49,815-Speed 5155.70 samples/sec Loss 2.9809 LearningRate 0.0370 Epoch: 7 Global Step: 130760 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:01:51,783-Speed 5204.66 samples/sec Loss 2.9443 LearningRate 0.0370 Epoch: 7 Global Step: 130770 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:01:53,750-Speed 5207.72 samples/sec Loss 2.9776 LearningRate 0.0370 Epoch: 7 Global Step: 130780 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:01:55,723-Speed 5191.23 samples/sec Loss 2.9408 LearningRate 0.0370 Epoch: 7 Global Step: 130790 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:01:57,706-Speed 5167.37 samples/sec Loss 2.9412 LearningRate 0.0370 Epoch: 7 Global Step: 130800 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:01:59,686-Speed 5172.32 samples/sec Loss 2.9259 LearningRate 0.0370 Epoch: 7 Global Step: 130810 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:02:01,676-Speed 5147.39 samples/sec Loss 2.9426 LearningRate 0.0370 Epoch: 7 Global Step: 130820 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:02:03,659-Speed 5166.07 samples/sec Loss 2.9013 LearningRate 0.0370 Epoch: 7 Global Step: 130830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:02:05,644-Speed 5163.10 samples/sec Loss 2.9553 LearningRate 0.0370 Epoch: 7 Global Step: 130840 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:02:07,627-Speed 5163.78 samples/sec Loss 2.9699 LearningRate 0.0370 Epoch: 7 Global Step: 130850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:02:09,601-Speed 5189.41 samples/sec Loss 2.8875 LearningRate 0.0370 Epoch: 7 Global Step: 130860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:02:11,582-Speed 5170.18 samples/sec Loss 2.9071 LearningRate 0.0370 Epoch: 7 Global Step: 130870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:02:13,569-Speed 5156.69 samples/sec Loss 2.9318 LearningRate 0.0370 Epoch: 7 Global Step: 130880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:02:15,548-Speed 5174.61 samples/sec Loss 2.9165 LearningRate 0.0370 Epoch: 7 Global Step: 130890 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:02:17,527-Speed 5177.93 samples/sec Loss 2.9579 LearningRate 0.0370 Epoch: 7 Global Step: 130900 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:02:19,496-Speed 5203.31 samples/sec Loss 2.8328 LearningRate 0.0369 Epoch: 7 Global Step: 130910 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:02:21,481-Speed 5160.64 samples/sec Loss 2.8909 LearningRate 0.0369 Epoch: 7 Global Step: 130920 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:02:23,465-Speed 5161.17 samples/sec Loss 2.9263 LearningRate 0.0369 Epoch: 7 Global Step: 130930 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:02:25,433-Speed 5205.86 samples/sec Loss 2.8907 LearningRate 0.0369 Epoch: 7 Global Step: 130940 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:02:27,423-Speed 5146.81 samples/sec Loss 2.9777 LearningRate 0.0369 Epoch: 7 Global Step: 130950 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:02:29,390-Speed 5209.31 samples/sec Loss 2.9625 LearningRate 0.0369 Epoch: 7 Global Step: 130960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:02:31,368-Speed 5177.91 samples/sec Loss 2.9600 LearningRate 0.0369 Epoch: 7 Global Step: 130970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:02:33,338-Speed 5199.93 samples/sec Loss 2.9410 LearningRate 0.0369 Epoch: 7 Global Step: 130980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:02:35,325-Speed 5155.36 samples/sec Loss 2.9145 LearningRate 0.0369 Epoch: 7 Global Step: 130990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:02:37,307-Speed 5168.21 samples/sec Loss 2.9605 LearningRate 0.0369 Epoch: 7 Global Step: 131000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:02:39,291-Speed 5163.64 samples/sec Loss 2.8547 LearningRate 0.0369 Epoch: 7 Global Step: 131010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:02:41,266-Speed 5186.74 samples/sec Loss 2.9008 LearningRate 0.0369 Epoch: 7 Global Step: 131020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:02:43,249-Speed 5165.31 samples/sec Loss 2.8740 LearningRate 0.0369 Epoch: 7 Global Step: 131030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:02:45,248-Speed 5124.33 samples/sec Loss 2.8733 LearningRate 0.0369 Epoch: 7 Global Step: 131040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:02:47,226-Speed 5176.35 samples/sec Loss 2.9182 LearningRate 0.0369 Epoch: 7 Global Step: 131050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:02:49,235-Speed 5098.98 samples/sec Loss 2.9472 LearningRate 0.0369 Epoch: 7 Global Step: 131060 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:02:51,206-Speed 5198.76 samples/sec Loss 2.9131 LearningRate 0.0369 Epoch: 7 Global Step: 131070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:02:53,190-Speed 5160.66 samples/sec Loss 2.9713 LearningRate 0.0369 Epoch: 7 Global Step: 131080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:02:55,172-Speed 5168.42 samples/sec Loss 2.8905 LearningRate 0.0369 Epoch: 7 Global Step: 131090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:02:57,160-Speed 5154.25 samples/sec Loss 2.8818 LearningRate 0.0369 Epoch: 7 Global Step: 131100 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:02:59,152-Speed 5140.96 samples/sec Loss 2.8976 LearningRate 0.0369 Epoch: 7 Global Step: 131110 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:03:01,128-Speed 5186.06 samples/sec Loss 2.9584 LearningRate 0.0369 Epoch: 7 Global Step: 131120 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:03:03,110-Speed 5167.93 samples/sec Loss 2.9356 LearningRate 0.0369 Epoch: 7 Global Step: 131130 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:03:05,092-Speed 5167.31 samples/sec Loss 2.9153 LearningRate 0.0369 Epoch: 7 Global Step: 131140 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:03:07,066-Speed 5189.43 samples/sec Loss 2.8730 LearningRate 0.0369 Epoch: 7 Global Step: 131150 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:03:09,046-Speed 5172.04 samples/sec Loss 2.9610 LearningRate 0.0369 Epoch: 7 Global Step: 131160 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:03:11,042-Speed 5132.07 samples/sec Loss 2.8871 LearningRate 0.0369 Epoch: 7 Global Step: 131170 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:03:13,025-Speed 5166.40 samples/sec Loss 2.9269 LearningRate 0.0368 Epoch: 7 Global Step: 131180 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:03:15,019-Speed 5136.39 samples/sec Loss 2.8934 LearningRate 0.0368 Epoch: 7 Global Step: 131190 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:03:17,009-Speed 5147.53 samples/sec Loss 2.9912 LearningRate 0.0368 Epoch: 7 Global Step: 131200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:03:18,987-Speed 5178.40 samples/sec Loss 2.9705 LearningRate 0.0368 Epoch: 7 Global Step: 131210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:03:20,965-Speed 5179.80 samples/sec Loss 2.7988 LearningRate 0.0368 Epoch: 7 Global Step: 131220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:03:22,940-Speed 5187.47 samples/sec Loss 2.9250 LearningRate 0.0368 Epoch: 7 Global Step: 131230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:03:24,922-Speed 5166.90 samples/sec Loss 2.9255 LearningRate 0.0368 Epoch: 7 Global Step: 131240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:03:26,900-Speed 5179.33 samples/sec Loss 2.9023 LearningRate 0.0368 Epoch: 7 Global Step: 131250 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:03:28,880-Speed 5172.89 samples/sec Loss 2.9586 LearningRate 0.0368 Epoch: 7 Global Step: 131260 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:03:30,868-Speed 5153.80 samples/sec Loss 2.8708 LearningRate 0.0368 Epoch: 7 Global Step: 131270 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:03:32,842-Speed 5189.36 samples/sec Loss 3.0503 LearningRate 0.0368 Epoch: 7 Global Step: 131280 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:03:34,811-Speed 5199.75 samples/sec Loss 2.9150 LearningRate 0.0368 Epoch: 7 Global Step: 131290 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:03:36,823-Speed 5091.98 samples/sec Loss 2.9301 LearningRate 0.0368 Epoch: 7 Global Step: 131300 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:03:38,846-Speed 5064.69 samples/sec Loss 2.8893 LearningRate 0.0368 Epoch: 7 Global Step: 131310 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:03:40,860-Speed 5086.00 samples/sec Loss 3.0065 LearningRate 0.0368 Epoch: 7 Global Step: 131320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:03:42,834-Speed 5188.55 samples/sec Loss 2.9214 LearningRate 0.0368 Epoch: 7 Global Step: 131330 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:03:44,816-Speed 5167.86 samples/sec Loss 2.9092 LearningRate 0.0368 Epoch: 7 Global Step: 131340 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:03:46,833-Speed 5079.21 samples/sec Loss 2.8950 LearningRate 0.0368 Epoch: 7 Global Step: 131350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:03:48,862-Speed 5047.86 samples/sec Loss 2.8975 LearningRate 0.0368 Epoch: 7 Global Step: 131360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:03:50,840-Speed 5180.62 samples/sec Loss 2.9530 LearningRate 0.0368 Epoch: 7 Global Step: 131370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:03:52,817-Speed 5179.12 samples/sec Loss 2.9092 LearningRate 0.0368 Epoch: 7 Global Step: 131380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:03:54,791-Speed 5189.30 samples/sec Loss 2.8931 LearningRate 0.0368 Epoch: 7 Global Step: 131390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:03:56,767-Speed 5183.58 samples/sec Loss 2.9024 LearningRate 0.0368 Epoch: 7 Global Step: 131400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:03:58,736-Speed 5202.20 samples/sec Loss 2.9417 LearningRate 0.0368 Epoch: 7 Global Step: 131410 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:04:00,727-Speed 5145.74 samples/sec Loss 2.9235 LearningRate 0.0368 Epoch: 7 Global Step: 131420 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:04:02,711-Speed 5164.46 samples/sec Loss 2.9713 LearningRate 0.0368 Epoch: 7 Global Step: 131430 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:04:04,689-Speed 5177.22 samples/sec Loss 2.8849 LearningRate 0.0368 Epoch: 7 Global Step: 131440 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:04:06,658-Speed 5204.89 samples/sec Loss 2.9870 LearningRate 0.0368 Epoch: 7 Global Step: 131450 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:04:08,637-Speed 5175.39 samples/sec Loss 2.9388 LearningRate 0.0367 Epoch: 7 Global Step: 131460 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:04:10,630-Speed 5139.84 samples/sec Loss 2.8993 LearningRate 0.0367 Epoch: 7 Global Step: 131470 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:04:12,621-Speed 5144.79 samples/sec Loss 2.9928 LearningRate 0.0367 Epoch: 7 Global Step: 131480 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:04:14,614-Speed 5138.65 samples/sec Loss 2.8894 LearningRate 0.0367 Epoch: 7 Global Step: 131490 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:04:16,613-Speed 5124.35 samples/sec Loss 2.8838 LearningRate 0.0367 Epoch: 7 Global Step: 131500 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:04:18,585-Speed 5195.27 samples/sec Loss 2.9974 LearningRate 0.0367 Epoch: 7 Global Step: 131510 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:04:20,552-Speed 5206.86 samples/sec Loss 2.9431 LearningRate 0.0367 Epoch: 7 Global Step: 131520 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:04:22,530-Speed 5177.70 samples/sec Loss 2.9080 LearningRate 0.0367 Epoch: 7 Global Step: 131530 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:04:24,505-Speed 5186.65 samples/sec Loss 2.9398 LearningRate 0.0367 Epoch: 7 Global Step: 131540 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:04:26,482-Speed 5183.35 samples/sec Loss 2.9289 LearningRate 0.0367 Epoch: 7 Global Step: 131550 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:04:28,480-Speed 5128.74 samples/sec Loss 2.9887 LearningRate 0.0367 Epoch: 7 Global Step: 131560 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:04:30,451-Speed 5195.45 samples/sec Loss 2.9509 LearningRate 0.0367 Epoch: 7 Global Step: 131570 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:04:32,429-Speed 5180.47 samples/sec Loss 2.9947 LearningRate 0.0367 Epoch: 7 Global Step: 131580 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:04:34,421-Speed 5140.24 samples/sec Loss 2.9517 LearningRate 0.0367 Epoch: 7 Global Step: 131590 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:04:36,405-Speed 5164.82 samples/sec Loss 2.8847 LearningRate 0.0367 Epoch: 7 Global Step: 131600 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:04:38,405-Speed 5120.59 samples/sec Loss 2.9090 LearningRate 0.0367 Epoch: 7 Global Step: 131610 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:04:40,371-Speed 5210.58 samples/sec Loss 2.8955 LearningRate 0.0367 Epoch: 7 Global Step: 131620 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:04:42,373-Speed 5116.31 samples/sec Loss 2.9737 LearningRate 0.0367 Epoch: 7 Global Step: 131630 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:04:44,377-Speed 5112.34 samples/sec Loss 2.9350 LearningRate 0.0367 Epoch: 7 Global Step: 131640 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:04:46,354-Speed 5181.78 samples/sec Loss 2.8881 LearningRate 0.0367 Epoch: 7 Global Step: 131650 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:04:48,327-Speed 5191.35 samples/sec Loss 2.8739 LearningRate 0.0367 Epoch: 7 Global Step: 131660 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:04:50,311-Speed 5161.69 samples/sec Loss 3.0131 LearningRate 0.0367 Epoch: 7 Global Step: 131670 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:04:52,298-Speed 5154.74 samples/sec Loss 2.9720 LearningRate 0.0367 Epoch: 7 Global Step: 131680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:04:54,267-Speed 5201.80 samples/sec Loss 2.9311 LearningRate 0.0367 Epoch: 7 Global Step: 131690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:04:56,260-Speed 5140.34 samples/sec Loss 2.9493 LearningRate 0.0367 Epoch: 7 Global Step: 131700 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:04:58,242-Speed 5169.07 samples/sec Loss 2.9142 LearningRate 0.0367 Epoch: 7 Global Step: 131710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:05:00,253-Speed 5092.02 samples/sec Loss 2.9169 LearningRate 0.0367 Epoch: 7 Global Step: 131720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:05:02,249-Speed 5134.11 samples/sec Loss 2.8847 LearningRate 0.0366 Epoch: 7 Global Step: 131730 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:05:04,231-Speed 5166.44 samples/sec Loss 2.9397 LearningRate 0.0366 Epoch: 7 Global Step: 131740 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:05:06,213-Speed 5170.85 samples/sec Loss 2.8953 LearningRate 0.0366 Epoch: 7 Global Step: 131750 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:05:08,193-Speed 5173.50 samples/sec Loss 2.9311 LearningRate 0.0366 Epoch: 7 Global Step: 131760 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:05:10,195-Speed 5115.83 samples/sec Loss 2.9576 LearningRate 0.0366 Epoch: 7 Global Step: 131770 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:05:12,184-Speed 5150.53 samples/sec Loss 2.9421 LearningRate 0.0366 Epoch: 7 Global Step: 131780 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:05:14,194-Speed 5095.82 samples/sec Loss 2.8869 LearningRate 0.0366 Epoch: 7 Global Step: 131790 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:05:16,198-Speed 5112.09 samples/sec Loss 2.8629 LearningRate 0.0366 Epoch: 7 Global Step: 131800 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:05:18,179-Speed 5170.31 samples/sec Loss 2.9550 LearningRate 0.0366 Epoch: 7 Global Step: 131810 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:05:20,160-Speed 5169.59 samples/sec Loss 2.8524 LearningRate 0.0366 Epoch: 7 Global Step: 131820 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:05:22,149-Speed 5149.46 samples/sec Loss 2.9805 LearningRate 0.0366 Epoch: 7 Global Step: 131830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:05:24,133-Speed 5164.00 samples/sec Loss 2.9223 LearningRate 0.0366 Epoch: 7 Global Step: 131840 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:05:26,123-Speed 5148.39 samples/sec Loss 2.8975 LearningRate 0.0366 Epoch: 7 Global Step: 131850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:05:28,104-Speed 5171.21 samples/sec Loss 2.9044 LearningRate 0.0366 Epoch: 7 Global Step: 131860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:05:30,076-Speed 5192.52 samples/sec Loss 2.8739 LearningRate 0.0366 Epoch: 7 Global Step: 131870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:05:32,047-Speed 5197.27 samples/sec Loss 2.8998 LearningRate 0.0366 Epoch: 7 Global Step: 131880 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:05:34,041-Speed 5136.39 samples/sec Loss 2.9239 LearningRate 0.0366 Epoch: 7 Global Step: 131890 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:05:36,043-Speed 5118.25 samples/sec Loss 2.9203 LearningRate 0.0366 Epoch: 7 Global Step: 131900 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:05:38,019-Speed 5183.22 samples/sec Loss 2.9045 LearningRate 0.0366 Epoch: 7 Global Step: 131910 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:05:40,014-Speed 5135.48 samples/sec Loss 2.9038 LearningRate 0.0366 Epoch: 7 Global Step: 131920 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:05:42,002-Speed 5151.65 samples/sec Loss 2.9798 LearningRate 0.0366 Epoch: 7 Global Step: 131930 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:05:43,985-Speed 5167.15 samples/sec Loss 2.9590 LearningRate 0.0366 Epoch: 7 Global Step: 131940 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:05:45,989-Speed 5111.50 samples/sec Loss 2.9460 LearningRate 0.0366 Epoch: 7 Global Step: 131950 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:05:47,985-Speed 5131.38 samples/sec Loss 2.8992 LearningRate 0.0366 Epoch: 7 Global Step: 131960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:05:49,991-Speed 5107.22 samples/sec Loss 2.8916 LearningRate 0.0366 Epoch: 7 Global Step: 131970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:05:51,979-Speed 5153.05 samples/sec Loss 2.9016 LearningRate 0.0366 Epoch: 7 Global Step: 131980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:05:53,957-Speed 5178.21 samples/sec Loss 2.9161 LearningRate 0.0366 Epoch: 7 Global Step: 131990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:05:55,929-Speed 5193.07 samples/sec Loss 3.0014 LearningRate 0.0366 Epoch: 7 Global Step: 132000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:06:22,498-[lfw][132000]XNorm: 23.235540 Training: 2022-04-11 08:06:22,498-[lfw][132000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-04-11 08:06:22,499-[lfw][132000]Accuracy-Highest: 0.99833 Training: 2022-04-11 08:06:53,251-[cfp_fp][132000]XNorm: 21.455735 Training: 2022-04-11 08:06:53,251-[cfp_fp][132000]Accuracy-Flip: 0.98257+-0.00545 Training: 2022-04-11 08:06:53,252-[cfp_fp][132000]Accuracy-Highest: 0.98443 Training: 2022-04-11 08:07:19,847-[agedb_30][132000]XNorm: 23.125158 Training: 2022-04-11 08:07:19,848-[agedb_30][132000]Accuracy-Flip: 0.97967+-0.00562 Training: 2022-04-11 08:07:19,849-[agedb_30][132000]Accuracy-Highest: 0.98150 Training: 2022-04-11 08:07:21,824-Speed 119.22 samples/sec Loss 2.8593 LearningRate 0.0365 Epoch: 7 Global Step: 132010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:07:23,796-Speed 5192.73 samples/sec Loss 2.9025 LearningRate 0.0365 Epoch: 7 Global Step: 132020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:07:25,779-Speed 5165.55 samples/sec Loss 2.8861 LearningRate 0.0365 Epoch: 7 Global Step: 132030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:07:27,773-Speed 5136.43 samples/sec Loss 2.8177 LearningRate 0.0365 Epoch: 7 Global Step: 132040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:07:29,735-Speed 5221.95 samples/sec Loss 2.9670 LearningRate 0.0365 Epoch: 7 Global Step: 132050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:07:31,747-Speed 5092.40 samples/sec Loss 2.9694 LearningRate 0.0365 Epoch: 7 Global Step: 132060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:07:33,705-Speed 5232.16 samples/sec Loss 2.9285 LearningRate 0.0365 Epoch: 7 Global Step: 132070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:07:35,692-Speed 5156.16 samples/sec Loss 2.9714 LearningRate 0.0365 Epoch: 7 Global Step: 132080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:07:37,691-Speed 5123.81 samples/sec Loss 2.9526 LearningRate 0.0365 Epoch: 7 Global Step: 132090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:07:39,662-Speed 5195.37 samples/sec Loss 2.9292 LearningRate 0.0365 Epoch: 7 Global Step: 132100 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:07:41,624-Speed 5221.89 samples/sec Loss 2.9162 LearningRate 0.0365 Epoch: 7 Global Step: 132110 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:07:43,577-Speed 5243.49 samples/sec Loss 2.8557 LearningRate 0.0365 Epoch: 7 Global Step: 132120 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:07:45,558-Speed 5170.71 samples/sec Loss 2.9275 LearningRate 0.0365 Epoch: 7 Global Step: 132130 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:07:47,560-Speed 5118.03 samples/sec Loss 2.9175 LearningRate 0.0365 Epoch: 7 Global Step: 132140 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:07:49,527-Speed 5206.96 samples/sec Loss 2.8766 LearningRate 0.0365 Epoch: 7 Global Step: 132150 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:07:51,491-Speed 5216.51 samples/sec Loss 2.8893 LearningRate 0.0365 Epoch: 7 Global Step: 132160 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:07:53,464-Speed 5191.02 samples/sec Loss 2.9336 LearningRate 0.0365 Epoch: 7 Global Step: 132170 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:07:55,430-Speed 5211.01 samples/sec Loss 2.9163 LearningRate 0.0365 Epoch: 7 Global Step: 132180 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:07:57,400-Speed 5200.12 samples/sec Loss 2.9105 LearningRate 0.0365 Epoch: 7 Global Step: 132190 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:07:59,370-Speed 5200.03 samples/sec Loss 2.9178 LearningRate 0.0365 Epoch: 7 Global Step: 132200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:08:01,360-Speed 5146.52 samples/sec Loss 2.8890 LearningRate 0.0365 Epoch: 7 Global Step: 132210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:08:03,335-Speed 5187.42 samples/sec Loss 2.8700 LearningRate 0.0365 Epoch: 7 Global Step: 132220 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:08:05,296-Speed 5221.35 samples/sec Loss 2.8474 LearningRate 0.0365 Epoch: 7 Global Step: 132230 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:08:07,262-Speed 5210.40 samples/sec Loss 2.8772 LearningRate 0.0365 Epoch: 7 Global Step: 132240 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-11 08:08:09,233-Speed 5198.82 samples/sec Loss 2.8852 LearningRate 0.0365 Epoch: 7 Global Step: 132250 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:08:11,221-Speed 5150.86 samples/sec Loss 2.9522 LearningRate 0.0365 Epoch: 7 Global Step: 132260 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:08:13,195-Speed 5190.24 samples/sec Loss 2.8573 LearningRate 0.0365 Epoch: 7 Global Step: 132270 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:08:15,180-Speed 5161.37 samples/sec Loss 2.8614 LearningRate 0.0365 Epoch: 7 Global Step: 132280 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:08:17,149-Speed 5202.76 samples/sec Loss 2.9377 LearningRate 0.0364 Epoch: 7 Global Step: 132290 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:08:19,105-Speed 5236.39 samples/sec Loss 2.9414 LearningRate 0.0364 Epoch: 7 Global Step: 132300 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:08:21,067-Speed 5221.83 samples/sec Loss 2.8922 LearningRate 0.0364 Epoch: 7 Global Step: 132310 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:08:23,051-Speed 5160.86 samples/sec Loss 2.9508 LearningRate 0.0364 Epoch: 7 Global Step: 132320 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:08:25,014-Speed 5218.24 samples/sec Loss 2.9174 LearningRate 0.0364 Epoch: 7 Global Step: 132330 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:08:27,003-Speed 5151.14 samples/sec Loss 2.9510 LearningRate 0.0364 Epoch: 7 Global Step: 132340 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:08:28,979-Speed 5182.47 samples/sec Loss 2.9075 LearningRate 0.0364 Epoch: 7 Global Step: 132350 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:08:30,954-Speed 5187.82 samples/sec Loss 2.9773 LearningRate 0.0364 Epoch: 7 Global Step: 132360 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:08:32,949-Speed 5134.88 samples/sec Loss 2.8975 LearningRate 0.0364 Epoch: 7 Global Step: 132370 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:08:34,929-Speed 5172.23 samples/sec Loss 2.9275 LearningRate 0.0364 Epoch: 7 Global Step: 132380 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:08:36,941-Speed 5093.56 samples/sec Loss 2.8088 LearningRate 0.0364 Epoch: 7 Global Step: 132390 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:08:38,936-Speed 5133.76 samples/sec Loss 2.9609 LearningRate 0.0364 Epoch: 7 Global Step: 132400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:08:40,905-Speed 5202.43 samples/sec Loss 2.8921 LearningRate 0.0364 Epoch: 7 Global Step: 132410 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:08:42,873-Speed 5204.87 samples/sec Loss 2.8852 LearningRate 0.0364 Epoch: 7 Global Step: 132420 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:08:44,842-Speed 5200.80 samples/sec Loss 2.8912 LearningRate 0.0364 Epoch: 7 Global Step: 132430 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:08:46,818-Speed 5184.47 samples/sec Loss 2.9498 LearningRate 0.0364 Epoch: 7 Global Step: 132440 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:08:48,787-Speed 5202.22 samples/sec Loss 2.9392 LearningRate 0.0364 Epoch: 7 Global Step: 132450 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:08:50,767-Speed 5173.51 samples/sec Loss 2.8859 LearningRate 0.0364 Epoch: 7 Global Step: 132460 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:08:52,741-Speed 5187.75 samples/sec Loss 2.8624 LearningRate 0.0364 Epoch: 7 Global Step: 132470 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:08:54,721-Speed 5174.68 samples/sec Loss 2.9248 LearningRate 0.0364 Epoch: 7 Global Step: 132480 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:08:56,701-Speed 5173.67 samples/sec Loss 2.8940 LearningRate 0.0364 Epoch: 7 Global Step: 132490 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:08:58,684-Speed 5165.47 samples/sec Loss 2.8190 LearningRate 0.0364 Epoch: 7 Global Step: 132500 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:09:00,662-Speed 5180.59 samples/sec Loss 2.9137 LearningRate 0.0364 Epoch: 7 Global Step: 132510 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:09:02,637-Speed 5184.48 samples/sec Loss 2.9162 LearningRate 0.0364 Epoch: 7 Global Step: 132520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:09:04,604-Speed 5207.21 samples/sec Loss 2.8269 LearningRate 0.0364 Epoch: 7 Global Step: 132530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:09:06,572-Speed 5206.33 samples/sec Loss 2.8897 LearningRate 0.0364 Epoch: 7 Global Step: 132540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:09:08,538-Speed 5208.84 samples/sec Loss 2.9852 LearningRate 0.0364 Epoch: 7 Global Step: 132550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:09:10,509-Speed 5198.37 samples/sec Loss 2.9794 LearningRate 0.0363 Epoch: 7 Global Step: 132560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:09:12,484-Speed 5186.44 samples/sec Loss 2.9434 LearningRate 0.0363 Epoch: 7 Global Step: 132570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:09:14,455-Speed 5197.66 samples/sec Loss 2.8849 LearningRate 0.0363 Epoch: 7 Global Step: 132580 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:09:16,426-Speed 5196.61 samples/sec Loss 2.9419 LearningRate 0.0363 Epoch: 7 Global Step: 132590 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:09:18,392-Speed 5211.89 samples/sec Loss 2.8621 LearningRate 0.0363 Epoch: 7 Global Step: 132600 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:09:20,371-Speed 5174.59 samples/sec Loss 2.9333 LearningRate 0.0363 Epoch: 7 Global Step: 132610 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:09:22,367-Speed 5132.50 samples/sec Loss 2.9712 LearningRate 0.0363 Epoch: 7 Global Step: 132620 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:09:24,355-Speed 5152.98 samples/sec Loss 2.8907 LearningRate 0.0363 Epoch: 7 Global Step: 132630 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:09:26,350-Speed 5133.01 samples/sec Loss 2.8529 LearningRate 0.0363 Epoch: 7 Global Step: 132640 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:09:28,338-Speed 5152.09 samples/sec Loss 2.9142 LearningRate 0.0363 Epoch: 7 Global Step: 132650 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:09:30,308-Speed 5201.37 samples/sec Loss 2.9207 LearningRate 0.0363 Epoch: 7 Global Step: 132660 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:09:32,275-Speed 5207.47 samples/sec Loss 2.9134 LearningRate 0.0363 Epoch: 7 Global Step: 132670 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:09:34,248-Speed 5191.02 samples/sec Loss 2.8894 LearningRate 0.0363 Epoch: 7 Global Step: 132680 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-11 08:09:36,223-Speed 5186.68 samples/sec Loss 2.9329 LearningRate 0.0363 Epoch: 7 Global Step: 132690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:09:38,207-Speed 5163.46 samples/sec Loss 2.9171 LearningRate 0.0363 Epoch: 7 Global Step: 132700 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:09:40,180-Speed 5192.06 samples/sec Loss 2.8668 LearningRate 0.0363 Epoch: 7 Global Step: 132710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:09:42,167-Speed 5155.46 samples/sec Loss 2.9113 LearningRate 0.0363 Epoch: 7 Global Step: 132720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-11 08:09:44,144-Speed 5181.42 samples/sec Loss 2.9172 LearningRate 0.0363 Epoch: 7 Global Step: 132730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:09:46,120-Speed 5184.88 samples/sec Loss 2.8977 LearningRate 0.0363 Epoch: 7 Global Step: 132740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:09:48,095-Speed 5184.62 samples/sec Loss 2.9476 LearningRate 0.0363 Epoch: 7 Global Step: 132750 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:09:50,085-Speed 5147.43 samples/sec Loss 2.8424 LearningRate 0.0363 Epoch: 7 Global Step: 132760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:09:52,053-Speed 5205.72 samples/sec Loss 2.8969 LearningRate 0.0363 Epoch: 7 Global Step: 132770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:09:54,016-Speed 5218.94 samples/sec Loss 2.9308 LearningRate 0.0363 Epoch: 7 Global Step: 132780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:09:55,979-Speed 5217.75 samples/sec Loss 2.8877 LearningRate 0.0363 Epoch: 7 Global Step: 132790 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:09:57,937-Speed 5230.46 samples/sec Loss 2.8603 LearningRate 0.0363 Epoch: 7 Global Step: 132800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:09:59,906-Speed 5204.21 samples/sec Loss 2.8946 LearningRate 0.0363 Epoch: 7 Global Step: 132810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:10:01,878-Speed 5194.67 samples/sec Loss 2.9045 LearningRate 0.0363 Epoch: 7 Global Step: 132820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:10:03,863-Speed 5160.93 samples/sec Loss 2.9307 LearningRate 0.0363 Epoch: 7 Global Step: 132830 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:10:05,829-Speed 5208.59 samples/sec Loss 2.8401 LearningRate 0.0362 Epoch: 7 Global Step: 132840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:10:07,795-Speed 5211.05 samples/sec Loss 2.9188 LearningRate 0.0362 Epoch: 7 Global Step: 132850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:10:09,776-Speed 5169.93 samples/sec Loss 2.9349 LearningRate 0.0362 Epoch: 7 Global Step: 132860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:10:11,759-Speed 5164.97 samples/sec Loss 2.8775 LearningRate 0.0362 Epoch: 7 Global Step: 132870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:10:13,771-Speed 5092.20 samples/sec Loss 2.8353 LearningRate 0.0362 Epoch: 7 Global Step: 132880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:10:15,751-Speed 5171.92 samples/sec Loss 2.8479 LearningRate 0.0362 Epoch: 7 Global Step: 132890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:10:17,724-Speed 5193.34 samples/sec Loss 2.9955 LearningRate 0.0362 Epoch: 7 Global Step: 132900 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:10:19,704-Speed 5174.60 samples/sec Loss 2.8981 LearningRate 0.0362 Epoch: 7 Global Step: 132910 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:10:21,672-Speed 5204.66 samples/sec Loss 2.8785 LearningRate 0.0362 Epoch: 7 Global Step: 132920 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:10:23,684-Speed 5090.44 samples/sec Loss 2.9862 LearningRate 0.0362 Epoch: 7 Global Step: 132930 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:10:25,681-Speed 5129.46 samples/sec Loss 2.9140 LearningRate 0.0362 Epoch: 7 Global Step: 132940 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:10:27,648-Speed 5207.15 samples/sec Loss 2.8620 LearningRate 0.0362 Epoch: 7 Global Step: 132950 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:10:29,616-Speed 5205.42 samples/sec Loss 2.9589 LearningRate 0.0362 Epoch: 7 Global Step: 132960 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:10:31,579-Speed 5216.78 samples/sec Loss 2.9055 LearningRate 0.0362 Epoch: 7 Global Step: 132970 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:10:33,545-Speed 5212.28 samples/sec Loss 2.8524 LearningRate 0.0362 Epoch: 7 Global Step: 132980 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:10:35,516-Speed 5197.17 samples/sec Loss 2.9378 LearningRate 0.0362 Epoch: 7 Global Step: 132990 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:10:37,490-Speed 5187.50 samples/sec Loss 2.9130 LearningRate 0.0362 Epoch: 7 Global Step: 133000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:10:39,478-Speed 5153.98 samples/sec Loss 2.9430 LearningRate 0.0362 Epoch: 7 Global Step: 133010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:10:41,445-Speed 5209.47 samples/sec Loss 2.8323 LearningRate 0.0362 Epoch: 7 Global Step: 133020 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:10:43,413-Speed 5203.45 samples/sec Loss 2.9104 LearningRate 0.0362 Epoch: 7 Global Step: 133030 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:10:45,392-Speed 5175.62 samples/sec Loss 2.8728 LearningRate 0.0362 Epoch: 7 Global Step: 133040 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:10:47,362-Speed 5199.12 samples/sec Loss 2.9329 LearningRate 0.0362 Epoch: 7 Global Step: 133050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:10:49,336-Speed 5190.74 samples/sec Loss 2.8966 LearningRate 0.0362 Epoch: 7 Global Step: 133060 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:10:51,309-Speed 5189.84 samples/sec Loss 2.9634 LearningRate 0.0362 Epoch: 7 Global Step: 133070 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:10:53,279-Speed 5200.56 samples/sec Loss 2.8141 LearningRate 0.0362 Epoch: 7 Global Step: 133080 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:10:55,262-Speed 5165.48 samples/sec Loss 2.9827 LearningRate 0.0362 Epoch: 7 Global Step: 133090 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:10:57,239-Speed 5181.81 samples/sec Loss 2.9517 LearningRate 0.0362 Epoch: 7 Global Step: 133100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:10:59,219-Speed 5172.82 samples/sec Loss 2.9326 LearningRate 0.0362 Epoch: 7 Global Step: 133110 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:11:01,191-Speed 5195.27 samples/sec Loss 2.9683 LearningRate 0.0361 Epoch: 7 Global Step: 133120 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:11:03,160-Speed 5202.83 samples/sec Loss 2.9611 LearningRate 0.0361 Epoch: 7 Global Step: 133130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:11:05,131-Speed 5196.69 samples/sec Loss 2.9747 LearningRate 0.0361 Epoch: 7 Global Step: 133140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:11:07,111-Speed 5171.84 samples/sec Loss 2.9049 LearningRate 0.0361 Epoch: 7 Global Step: 133150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:11:09,075-Speed 5216.83 samples/sec Loss 2.8660 LearningRate 0.0361 Epoch: 7 Global Step: 133160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:11:11,060-Speed 5159.68 samples/sec Loss 2.8696 LearningRate 0.0361 Epoch: 7 Global Step: 133170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:11:13,033-Speed 5193.53 samples/sec Loss 2.8702 LearningRate 0.0361 Epoch: 7 Global Step: 133180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:11:14,997-Speed 5215.54 samples/sec Loss 2.8970 LearningRate 0.0361 Epoch: 7 Global Step: 133190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:11:16,962-Speed 5212.23 samples/sec Loss 2.9287 LearningRate 0.0361 Epoch: 7 Global Step: 133200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:11:18,923-Speed 5224.83 samples/sec Loss 2.9040 LearningRate 0.0361 Epoch: 7 Global Step: 133210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:11:20,894-Speed 5195.92 samples/sec Loss 2.9325 LearningRate 0.0361 Epoch: 7 Global Step: 133220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:11:22,872-Speed 5180.43 samples/sec Loss 2.9298 LearningRate 0.0361 Epoch: 7 Global Step: 133230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:11:24,837-Speed 5212.42 samples/sec Loss 2.9241 LearningRate 0.0361 Epoch: 7 Global Step: 133240 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:11:26,825-Speed 5152.62 samples/sec Loss 2.9859 LearningRate 0.0361 Epoch: 7 Global Step: 133250 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:11:28,794-Speed 5199.99 samples/sec Loss 2.9013 LearningRate 0.0361 Epoch: 7 Global Step: 133260 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:11:30,761-Speed 5209.65 samples/sec Loss 2.8395 LearningRate 0.0361 Epoch: 7 Global Step: 133270 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:11:32,724-Speed 5217.54 samples/sec Loss 2.8903 LearningRate 0.0361 Epoch: 7 Global Step: 133280 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:11:34,713-Speed 5150.10 samples/sec Loss 2.9248 LearningRate 0.0361 Epoch: 7 Global Step: 133290 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:11:36,684-Speed 5195.27 samples/sec Loss 2.8492 LearningRate 0.0361 Epoch: 7 Global Step: 133300 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:11:38,671-Speed 5157.02 samples/sec Loss 2.9286 LearningRate 0.0361 Epoch: 7 Global Step: 133310 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:11:40,651-Speed 5171.98 samples/sec Loss 2.9055 LearningRate 0.0361 Epoch: 7 Global Step: 133320 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:11:42,626-Speed 5188.00 samples/sec Loss 2.8874 LearningRate 0.0361 Epoch: 7 Global Step: 133330 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:11:44,596-Speed 5200.52 samples/sec Loss 2.8543 LearningRate 0.0361 Epoch: 7 Global Step: 133340 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:11:46,560-Speed 5213.39 samples/sec Loss 2.8873 LearningRate 0.0361 Epoch: 7 Global Step: 133350 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:11:48,541-Speed 5171.72 samples/sec Loss 2.7867 LearningRate 0.0361 Epoch: 7 Global Step: 133360 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:11:50,520-Speed 5176.55 samples/sec Loss 2.8816 LearningRate 0.0361 Epoch: 7 Global Step: 133370 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:11:52,486-Speed 5208.92 samples/sec Loss 2.8890 LearningRate 0.0361 Epoch: 7 Global Step: 133380 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:11:54,454-Speed 5204.68 samples/sec Loss 2.9380 LearningRate 0.0360 Epoch: 7 Global Step: 133390 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:11:56,427-Speed 5192.04 samples/sec Loss 2.8502 LearningRate 0.0360 Epoch: 7 Global Step: 133400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:11:58,414-Speed 5156.41 samples/sec Loss 2.9091 LearningRate 0.0360 Epoch: 7 Global Step: 133410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:12:00,383-Speed 5201.01 samples/sec Loss 2.9781 LearningRate 0.0360 Epoch: 7 Global Step: 133420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:12:02,366-Speed 5166.35 samples/sec Loss 2.9272 LearningRate 0.0360 Epoch: 7 Global Step: 133430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:12:04,336-Speed 5199.58 samples/sec Loss 2.8304 LearningRate 0.0360 Epoch: 7 Global Step: 133440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:12:06,307-Speed 5197.82 samples/sec Loss 2.8912 LearningRate 0.0360 Epoch: 7 Global Step: 133450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:12:08,274-Speed 5206.41 samples/sec Loss 2.8520 LearningRate 0.0360 Epoch: 7 Global Step: 133460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:12:10,247-Speed 5192.46 samples/sec Loss 2.9233 LearningRate 0.0360 Epoch: 7 Global Step: 133470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:12:12,226-Speed 5176.39 samples/sec Loss 2.8387 LearningRate 0.0360 Epoch: 7 Global Step: 133480 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:12:14,192-Speed 5209.89 samples/sec Loss 2.8885 LearningRate 0.0360 Epoch: 7 Global Step: 133490 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:12:16,170-Speed 5179.54 samples/sec Loss 2.8812 LearningRate 0.0360 Epoch: 7 Global Step: 133500 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:12:18,163-Speed 5138.61 samples/sec Loss 2.9487 LearningRate 0.0360 Epoch: 7 Global Step: 133510 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:12:20,377-Speed 4628.18 samples/sec Loss 2.9087 LearningRate 0.0360 Epoch: 7 Global Step: 133520 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:12:51,577-Speed 328.22 samples/sec Loss 2.7233 LearningRate 0.0360 Epoch: 8 Global Step: 133530 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:12:53,561-Speed 5163.14 samples/sec Loss 2.3680 LearningRate 0.0360 Epoch: 8 Global Step: 133540 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:12:55,555-Speed 5137.70 samples/sec Loss 2.3186 LearningRate 0.0360 Epoch: 8 Global Step: 133550 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:12:57,853-Speed 4457.41 samples/sec Loss 2.3047 LearningRate 0.0360 Epoch: 8 Global Step: 133560 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:12:59,813-Speed 5226.16 samples/sec Loss 2.2868 LearningRate 0.0360 Epoch: 8 Global Step: 133570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:13:01,959-Speed 4774.10 samples/sec Loss 2.3028 LearningRate 0.0360 Epoch: 8 Global Step: 133580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:13:03,929-Speed 5197.86 samples/sec Loss 2.2491 LearningRate 0.0360 Epoch: 8 Global Step: 133590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:13:05,889-Speed 5226.47 samples/sec Loss 2.2943 LearningRate 0.0360 Epoch: 8 Global Step: 133600 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:13:07,875-Speed 5158.55 samples/sec Loss 2.2892 LearningRate 0.0360 Epoch: 8 Global Step: 133610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:13:09,846-Speed 5197.26 samples/sec Loss 2.3208 LearningRate 0.0360 Epoch: 8 Global Step: 133620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:13:11,821-Speed 5187.87 samples/sec Loss 2.3631 LearningRate 0.0360 Epoch: 8 Global Step: 133630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:13:13,819-Speed 5127.28 samples/sec Loss 2.2717 LearningRate 0.0360 Epoch: 8 Global Step: 133640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:13:15,832-Speed 5087.59 samples/sec Loss 2.3004 LearningRate 0.0360 Epoch: 8 Global Step: 133650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:13:17,802-Speed 5199.03 samples/sec Loss 2.3838 LearningRate 0.0360 Epoch: 8 Global Step: 133660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:13:19,788-Speed 5159.13 samples/sec Loss 2.2826 LearningRate 0.0359 Epoch: 8 Global Step: 133670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:13:21,747-Speed 5227.69 samples/sec Loss 2.2989 LearningRate 0.0359 Epoch: 8 Global Step: 133680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:13:23,720-Speed 5192.26 samples/sec Loss 2.3631 LearningRate 0.0359 Epoch: 8 Global Step: 133690 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:13:25,683-Speed 5219.27 samples/sec Loss 2.3152 LearningRate 0.0359 Epoch: 8 Global Step: 133700 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:13:27,651-Speed 5205.32 samples/sec Loss 2.2866 LearningRate 0.0359 Epoch: 8 Global Step: 133710 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:13:29,629-Speed 5178.84 samples/sec Loss 2.3595 LearningRate 0.0359 Epoch: 8 Global Step: 133720 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:13:31,602-Speed 5191.40 samples/sec Loss 2.2668 LearningRate 0.0359 Epoch: 8 Global Step: 133730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:13:33,571-Speed 5202.87 samples/sec Loss 2.3022 LearningRate 0.0359 Epoch: 8 Global Step: 133740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:13:35,542-Speed 5196.25 samples/sec Loss 2.2875 LearningRate 0.0359 Epoch: 8 Global Step: 133750 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:13:37,532-Speed 5145.77 samples/sec Loss 2.1979 LearningRate 0.0359 Epoch: 8 Global Step: 133760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:13:39,500-Speed 5205.90 samples/sec Loss 2.2749 LearningRate 0.0359 Epoch: 8 Global Step: 133770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:13:41,470-Speed 5200.21 samples/sec Loss 2.3126 LearningRate 0.0359 Epoch: 8 Global Step: 133780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:13:43,428-Speed 5232.07 samples/sec Loss 2.2625 LearningRate 0.0359 Epoch: 8 Global Step: 133790 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:13:45,437-Speed 5099.62 samples/sec Loss 2.2954 LearningRate 0.0359 Epoch: 8 Global Step: 133800 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:13:47,414-Speed 5180.11 samples/sec Loss 2.2890 LearningRate 0.0359 Epoch: 8 Global Step: 133810 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:13:49,389-Speed 5185.09 samples/sec Loss 2.3583 LearningRate 0.0359 Epoch: 8 Global Step: 133820 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:13:51,401-Speed 5092.53 samples/sec Loss 2.2934 LearningRate 0.0359 Epoch: 8 Global Step: 133830 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:13:53,378-Speed 5180.03 samples/sec Loss 2.2780 LearningRate 0.0359 Epoch: 8 Global Step: 133840 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:13:55,347-Speed 5201.99 samples/sec Loss 2.3507 LearningRate 0.0359 Epoch: 8 Global Step: 133850 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:13:57,320-Speed 5194.00 samples/sec Loss 2.2703 LearningRate 0.0359 Epoch: 8 Global Step: 133860 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:13:59,294-Speed 5189.27 samples/sec Loss 2.2815 LearningRate 0.0359 Epoch: 8 Global Step: 133870 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:14:01,272-Speed 5179.02 samples/sec Loss 2.3103 LearningRate 0.0359 Epoch: 8 Global Step: 133880 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:14:03,255-Speed 5165.92 samples/sec Loss 2.3383 LearningRate 0.0359 Epoch: 8 Global Step: 133890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:14:05,231-Speed 5182.49 samples/sec Loss 2.3486 LearningRate 0.0359 Epoch: 8 Global Step: 133900 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:14:07,206-Speed 5186.00 samples/sec Loss 2.2762 LearningRate 0.0359 Epoch: 8 Global Step: 133910 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:14:09,185-Speed 5177.00 samples/sec Loss 2.2988 LearningRate 0.0359 Epoch: 8 Global Step: 133920 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:14:11,177-Speed 5141.47 samples/sec Loss 2.3226 LearningRate 0.0359 Epoch: 8 Global Step: 133930 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:14:13,154-Speed 5182.08 samples/sec Loss 2.3127 LearningRate 0.0359 Epoch: 8 Global Step: 133940 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:14:15,147-Speed 5138.83 samples/sec Loss 2.3664 LearningRate 0.0358 Epoch: 8 Global Step: 133950 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:14:17,132-Speed 5159.68 samples/sec Loss 2.3040 LearningRate 0.0358 Epoch: 8 Global Step: 133960 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:14:19,106-Speed 5189.94 samples/sec Loss 2.2979 LearningRate 0.0358 Epoch: 8 Global Step: 133970 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:14:21,092-Speed 5160.12 samples/sec Loss 2.2787 LearningRate 0.0358 Epoch: 8 Global Step: 133980 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:14:23,080-Speed 5151.13 samples/sec Loss 2.3419 LearningRate 0.0358 Epoch: 8 Global Step: 133990 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:14:25,051-Speed 5197.65 samples/sec Loss 2.3790 LearningRate 0.0358 Epoch: 8 Global Step: 134000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:14:51,890-[lfw][134000]XNorm: 22.734898 Training: 2022-04-11 08:14:51,890-[lfw][134000]Accuracy-Flip: 0.99750+-0.00271 Training: 2022-04-11 08:14:51,891-[lfw][134000]Accuracy-Highest: 0.99833 Training: 2022-04-11 08:15:22,738-[cfp_fp][134000]XNorm: 21.260429 Training: 2022-04-11 08:15:22,738-[cfp_fp][134000]Accuracy-Flip: 0.98257+-0.00502 Training: 2022-04-11 08:15:22,739-[cfp_fp][134000]Accuracy-Highest: 0.98443 Training: 2022-04-11 08:15:49,311-[agedb_30][134000]XNorm: 22.616122 Training: 2022-04-11 08:15:49,311-[agedb_30][134000]Accuracy-Flip: 0.97950+-0.00810 Training: 2022-04-11 08:15:49,312-[agedb_30][134000]Accuracy-Highest: 0.98150 Training: 2022-04-11 08:15:51,582-Speed 118.34 samples/sec Loss 2.3235 LearningRate 0.0358 Epoch: 8 Global Step: 134010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:15:53,673-Speed 4898.16 samples/sec Loss 2.3016 LearningRate 0.0358 Epoch: 8 Global Step: 134020 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:15:55,631-Speed 5231.06 samples/sec Loss 2.2592 LearningRate 0.0358 Epoch: 8 Global Step: 134030 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:15:57,589-Speed 5233.29 samples/sec Loss 2.2317 LearningRate 0.0358 Epoch: 8 Global Step: 134040 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:15:59,549-Speed 5224.74 samples/sec Loss 2.3628 LearningRate 0.0358 Epoch: 8 Global Step: 134050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:16:01,512-Speed 5220.78 samples/sec Loss 2.2515 LearningRate 0.0358 Epoch: 8 Global Step: 134060 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:16:03,507-Speed 5133.26 samples/sec Loss 2.3913 LearningRate 0.0358 Epoch: 8 Global Step: 134070 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:16:05,465-Speed 5233.21 samples/sec Loss 2.3339 LearningRate 0.0358 Epoch: 8 Global Step: 134080 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:16:07,425-Speed 5226.11 samples/sec Loss 2.3381 LearningRate 0.0358 Epoch: 8 Global Step: 134090 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:16:09,525-Speed 4878.13 samples/sec Loss 2.3085 LearningRate 0.0358 Epoch: 8 Global Step: 134100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:16:11,492-Speed 5205.79 samples/sec Loss 2.3717 LearningRate 0.0358 Epoch: 8 Global Step: 134110 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:16:13,467-Speed 5187.55 samples/sec Loss 2.3857 LearningRate 0.0358 Epoch: 8 Global Step: 134120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:16:15,427-Speed 5226.37 samples/sec Loss 2.2585 LearningRate 0.0358 Epoch: 8 Global Step: 134130 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:16:17,395-Speed 5203.75 samples/sec Loss 2.3796 LearningRate 0.0358 Epoch: 8 Global Step: 134140 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:16:19,356-Speed 5222.60 samples/sec Loss 2.3471 LearningRate 0.0358 Epoch: 8 Global Step: 134150 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:16:21,323-Speed 5207.91 samples/sec Loss 2.2744 LearningRate 0.0358 Epoch: 8 Global Step: 134160 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:16:23,303-Speed 5175.11 samples/sec Loss 2.3798 LearningRate 0.0358 Epoch: 8 Global Step: 134170 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:16:25,272-Speed 5200.34 samples/sec Loss 2.3426 LearningRate 0.0358 Epoch: 8 Global Step: 134180 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:16:27,239-Speed 5207.97 samples/sec Loss 2.3934 LearningRate 0.0358 Epoch: 8 Global Step: 134190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:16:29,202-Speed 5218.42 samples/sec Loss 2.3013 LearningRate 0.0358 Epoch: 8 Global Step: 134200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:16:31,161-Speed 5229.48 samples/sec Loss 2.3609 LearningRate 0.0358 Epoch: 8 Global Step: 134210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:16:33,136-Speed 5186.73 samples/sec Loss 2.3223 LearningRate 0.0358 Epoch: 8 Global Step: 134220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:16:35,108-Speed 5194.50 samples/sec Loss 2.3360 LearningRate 0.0357 Epoch: 8 Global Step: 134230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:16:37,091-Speed 5165.29 samples/sec Loss 2.3944 LearningRate 0.0357 Epoch: 8 Global Step: 134240 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:16:39,050-Speed 5229.82 samples/sec Loss 2.3592 LearningRate 0.0357 Epoch: 8 Global Step: 134250 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:16:41,030-Speed 5174.17 samples/sec Loss 2.3259 LearningRate 0.0357 Epoch: 8 Global Step: 134260 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:16:43,004-Speed 5187.05 samples/sec Loss 2.2526 LearningRate 0.0357 Epoch: 8 Global Step: 134270 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:16:44,985-Speed 5171.02 samples/sec Loss 2.3538 LearningRate 0.0357 Epoch: 8 Global Step: 134280 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:16:47,008-Speed 5063.13 samples/sec Loss 2.3986 LearningRate 0.0357 Epoch: 8 Global Step: 134290 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:16:48,994-Speed 5157.27 samples/sec Loss 2.4078 LearningRate 0.0357 Epoch: 8 Global Step: 134300 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:16:50,969-Speed 5189.16 samples/sec Loss 2.3214 LearningRate 0.0357 Epoch: 8 Global Step: 134310 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:16:52,930-Speed 5223.10 samples/sec Loss 2.3891 LearningRate 0.0357 Epoch: 8 Global Step: 134320 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:16:54,906-Speed 5185.07 samples/sec Loss 2.4132 LearningRate 0.0357 Epoch: 8 Global Step: 134330 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:16:56,877-Speed 5194.81 samples/sec Loss 2.4036 LearningRate 0.0357 Epoch: 8 Global Step: 134340 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:16:58,851-Speed 5189.74 samples/sec Loss 2.3743 LearningRate 0.0357 Epoch: 8 Global Step: 134350 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:17:00,831-Speed 5173.18 samples/sec Loss 2.3368 LearningRate 0.0357 Epoch: 8 Global Step: 134360 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:17:02,790-Speed 5229.21 samples/sec Loss 2.2764 LearningRate 0.0357 Epoch: 8 Global Step: 134370 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:17:04,755-Speed 5214.07 samples/sec Loss 2.3026 LearningRate 0.0357 Epoch: 8 Global Step: 134380 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:17:06,711-Speed 5235.68 samples/sec Loss 2.3381 LearningRate 0.0357 Epoch: 8 Global Step: 134390 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:17:08,674-Speed 5220.12 samples/sec Loss 2.3608 LearningRate 0.0357 Epoch: 8 Global Step: 134400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:17:10,663-Speed 5149.69 samples/sec Loss 2.3597 LearningRate 0.0357 Epoch: 8 Global Step: 134410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:17:12,629-Speed 5210.22 samples/sec Loss 2.3892 LearningRate 0.0357 Epoch: 8 Global Step: 134420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:17:14,611-Speed 5168.36 samples/sec Loss 2.3514 LearningRate 0.0357 Epoch: 8 Global Step: 134430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:17:16,577-Speed 5209.42 samples/sec Loss 2.3190 LearningRate 0.0357 Epoch: 8 Global Step: 134440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:17:18,536-Speed 5230.71 samples/sec Loss 2.3933 LearningRate 0.0357 Epoch: 8 Global Step: 134450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:17:20,512-Speed 5183.76 samples/sec Loss 2.3493 LearningRate 0.0357 Epoch: 8 Global Step: 134460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:17:22,516-Speed 5110.67 samples/sec Loss 2.3686 LearningRate 0.0357 Epoch: 8 Global Step: 134470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:17:24,514-Speed 5127.69 samples/sec Loss 2.3850 LearningRate 0.0357 Epoch: 8 Global Step: 134480 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:17:26,490-Speed 5183.77 samples/sec Loss 2.4611 LearningRate 0.0357 Epoch: 8 Global Step: 134490 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:17:28,491-Speed 5119.05 samples/sec Loss 2.3844 LearningRate 0.0357 Epoch: 8 Global Step: 134500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:17:30,454-Speed 5215.91 samples/sec Loss 2.4269 LearningRate 0.0356 Epoch: 8 Global Step: 134510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:17:32,424-Speed 5201.85 samples/sec Loss 2.3145 LearningRate 0.0356 Epoch: 8 Global Step: 134520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:17:34,388-Speed 5215.29 samples/sec Loss 2.3634 LearningRate 0.0356 Epoch: 8 Global Step: 134530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:17:36,380-Speed 5143.30 samples/sec Loss 2.4248 LearningRate 0.0356 Epoch: 8 Global Step: 134540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:17:38,360-Speed 5182.79 samples/sec Loss 2.4263 LearningRate 0.0356 Epoch: 8 Global Step: 134550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:17:40,332-Speed 5194.04 samples/sec Loss 2.4176 LearningRate 0.0356 Epoch: 8 Global Step: 134560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:17:42,294-Speed 5221.16 samples/sec Loss 2.4105 LearningRate 0.0356 Epoch: 8 Global Step: 134570 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:17:44,261-Speed 5206.34 samples/sec Loss 2.3950 LearningRate 0.0356 Epoch: 8 Global Step: 134580 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:17:46,223-Speed 5221.97 samples/sec Loss 2.3751 LearningRate 0.0356 Epoch: 8 Global Step: 134590 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:17:48,217-Speed 5137.10 samples/sec Loss 2.3734 LearningRate 0.0356 Epoch: 8 Global Step: 134600 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:17:50,184-Speed 5205.10 samples/sec Loss 2.3746 LearningRate 0.0356 Epoch: 8 Global Step: 134610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:17:52,165-Speed 5171.98 samples/sec Loss 2.3542 LearningRate 0.0356 Epoch: 8 Global Step: 134620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:17:54,138-Speed 5193.53 samples/sec Loss 2.4030 LearningRate 0.0356 Epoch: 8 Global Step: 134630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:17:56,096-Speed 5231.56 samples/sec Loss 2.4485 LearningRate 0.0356 Epoch: 8 Global Step: 134640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:17:58,055-Speed 5226.33 samples/sec Loss 2.3697 LearningRate 0.0356 Epoch: 8 Global Step: 134650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:18:00,035-Speed 5175.16 samples/sec Loss 2.3536 LearningRate 0.0356 Epoch: 8 Global Step: 134660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:18:02,013-Speed 5178.31 samples/sec Loss 2.3997 LearningRate 0.0356 Epoch: 8 Global Step: 134670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:18:03,992-Speed 5176.90 samples/sec Loss 2.3637 LearningRate 0.0356 Epoch: 8 Global Step: 134680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:18:05,955-Speed 5215.70 samples/sec Loss 2.4006 LearningRate 0.0356 Epoch: 8 Global Step: 134690 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:18:07,919-Speed 5217.94 samples/sec Loss 2.4034 LearningRate 0.0356 Epoch: 8 Global Step: 134700 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:18:09,896-Speed 5181.17 samples/sec Loss 2.4324 LearningRate 0.0356 Epoch: 8 Global Step: 134710 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:18:11,885-Speed 5148.01 samples/sec Loss 2.4089 LearningRate 0.0356 Epoch: 8 Global Step: 134720 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:18:13,860-Speed 5188.65 samples/sec Loss 2.3302 LearningRate 0.0356 Epoch: 8 Global Step: 134730 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:18:15,829-Speed 5203.49 samples/sec Loss 2.3610 LearningRate 0.0356 Epoch: 8 Global Step: 134740 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:18:17,792-Speed 5216.45 samples/sec Loss 2.3897 LearningRate 0.0356 Epoch: 8 Global Step: 134750 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:18:19,760-Speed 5206.20 samples/sec Loss 2.3973 LearningRate 0.0356 Epoch: 8 Global Step: 134760 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:18:21,779-Speed 5072.82 samples/sec Loss 2.3884 LearningRate 0.0356 Epoch: 8 Global Step: 134770 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:18:23,746-Speed 5207.10 samples/sec Loss 2.4687 LearningRate 0.0356 Epoch: 8 Global Step: 134780 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:18:25,743-Speed 5130.57 samples/sec Loss 2.4476 LearningRate 0.0355 Epoch: 8 Global Step: 134790 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:18:27,707-Speed 5214.20 samples/sec Loss 2.3418 LearningRate 0.0355 Epoch: 8 Global Step: 134800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:18:29,680-Speed 5192.88 samples/sec Loss 2.4294 LearningRate 0.0355 Epoch: 8 Global Step: 134810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:18:31,658-Speed 5177.55 samples/sec Loss 2.4065 LearningRate 0.0355 Epoch: 8 Global Step: 134820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:18:33,641-Speed 5165.60 samples/sec Loss 2.3882 LearningRate 0.0355 Epoch: 8 Global Step: 134830 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:18:35,619-Speed 5180.31 samples/sec Loss 2.3980 LearningRate 0.0355 Epoch: 8 Global Step: 134840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:18:37,608-Speed 5150.99 samples/sec Loss 2.5119 LearningRate 0.0355 Epoch: 8 Global Step: 134850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:18:39,587-Speed 5175.24 samples/sec Loss 2.3652 LearningRate 0.0355 Epoch: 8 Global Step: 134860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:18:41,550-Speed 5218.40 samples/sec Loss 2.3999 LearningRate 0.0355 Epoch: 8 Global Step: 134870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:18:43,530-Speed 5172.78 samples/sec Loss 2.3989 LearningRate 0.0355 Epoch: 8 Global Step: 134880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:18:45,515-Speed 5162.35 samples/sec Loss 2.3397 LearningRate 0.0355 Epoch: 8 Global Step: 134890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:18:47,484-Speed 5202.23 samples/sec Loss 2.3835 LearningRate 0.0355 Epoch: 8 Global Step: 134900 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:18:49,465-Speed 5169.98 samples/sec Loss 2.3734 LearningRate 0.0355 Epoch: 8 Global Step: 134910 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:18:51,443-Speed 5177.53 samples/sec Loss 2.3509 LearningRate 0.0355 Epoch: 8 Global Step: 134920 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:18:53,408-Speed 5213.69 samples/sec Loss 2.4491 LearningRate 0.0355 Epoch: 8 Global Step: 134930 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:18:55,392-Speed 5161.08 samples/sec Loss 2.3505 LearningRate 0.0355 Epoch: 8 Global Step: 134940 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:18:57,367-Speed 5188.10 samples/sec Loss 2.4538 LearningRate 0.0355 Epoch: 8 Global Step: 134950 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:18:59,342-Speed 5187.26 samples/sec Loss 2.5241 LearningRate 0.0355 Epoch: 8 Global Step: 134960 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:19:01,324-Speed 5167.24 samples/sec Loss 2.3225 LearningRate 0.0355 Epoch: 8 Global Step: 134970 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:19:03,296-Speed 5193.97 samples/sec Loss 2.4163 LearningRate 0.0355 Epoch: 8 Global Step: 134980 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:19:05,282-Speed 5158.46 samples/sec Loss 2.4676 LearningRate 0.0355 Epoch: 8 Global Step: 134990 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:19:07,249-Speed 5208.52 samples/sec Loss 2.4069 LearningRate 0.0355 Epoch: 8 Global Step: 135000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:19:09,232-Speed 5164.10 samples/sec Loss 2.4671 LearningRate 0.0355 Epoch: 8 Global Step: 135010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:19:11,212-Speed 5175.46 samples/sec Loss 2.4029 LearningRate 0.0355 Epoch: 8 Global Step: 135020 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:19:13,211-Speed 5123.18 samples/sec Loss 2.4166 LearningRate 0.0355 Epoch: 8 Global Step: 135030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:19:15,178-Speed 5207.94 samples/sec Loss 2.3679 LearningRate 0.0355 Epoch: 8 Global Step: 135040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:19:17,144-Speed 5211.12 samples/sec Loss 2.4237 LearningRate 0.0355 Epoch: 8 Global Step: 135050 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:19:19,111-Speed 5207.05 samples/sec Loss 2.3839 LearningRate 0.0355 Epoch: 8 Global Step: 135060 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:19:21,089-Speed 5179.94 samples/sec Loss 2.4058 LearningRate 0.0354 Epoch: 8 Global Step: 135070 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:19:23,066-Speed 5179.43 samples/sec Loss 2.4359 LearningRate 0.0354 Epoch: 8 Global Step: 135080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:19:25,055-Speed 5150.19 samples/sec Loss 2.4619 LearningRate 0.0354 Epoch: 8 Global Step: 135090 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:19:27,028-Speed 5192.38 samples/sec Loss 2.3827 LearningRate 0.0354 Epoch: 8 Global Step: 135100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:19:28,993-Speed 5211.72 samples/sec Loss 2.4109 LearningRate 0.0354 Epoch: 8 Global Step: 135110 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:19:30,960-Speed 5207.73 samples/sec Loss 2.4308 LearningRate 0.0354 Epoch: 8 Global Step: 135120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:19:32,920-Speed 5227.23 samples/sec Loss 2.4975 LearningRate 0.0354 Epoch: 8 Global Step: 135130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:19:34,892-Speed 5193.75 samples/sec Loss 2.4178 LearningRate 0.0354 Epoch: 8 Global Step: 135140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:19:36,892-Speed 5123.51 samples/sec Loss 2.4375 LearningRate 0.0354 Epoch: 8 Global Step: 135150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:19:38,893-Speed 5119.55 samples/sec Loss 2.4676 LearningRate 0.0354 Epoch: 8 Global Step: 135160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:19:40,857-Speed 5215.09 samples/sec Loss 2.4930 LearningRate 0.0354 Epoch: 8 Global Step: 135170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:19:42,822-Speed 5211.25 samples/sec Loss 2.4415 LearningRate 0.0354 Epoch: 8 Global Step: 135180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:19:44,788-Speed 5211.73 samples/sec Loss 2.4594 LearningRate 0.0354 Epoch: 8 Global Step: 135190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:19:46,757-Speed 5202.39 samples/sec Loss 2.4840 LearningRate 0.0354 Epoch: 8 Global Step: 135200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:19:48,726-Speed 5202.49 samples/sec Loss 2.4616 LearningRate 0.0354 Epoch: 8 Global Step: 135210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:19:50,696-Speed 5200.05 samples/sec Loss 2.4410 LearningRate 0.0354 Epoch: 8 Global Step: 135220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:19:52,661-Speed 5212.11 samples/sec Loss 2.4598 LearningRate 0.0354 Epoch: 8 Global Step: 135230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:19:54,625-Speed 5216.22 samples/sec Loss 2.4384 LearningRate 0.0354 Epoch: 8 Global Step: 135240 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:19:56,587-Speed 5218.69 samples/sec Loss 2.5185 LearningRate 0.0354 Epoch: 8 Global Step: 135250 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:19:58,569-Speed 5172.26 samples/sec Loss 2.4282 LearningRate 0.0354 Epoch: 8 Global Step: 135260 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:20:00,548-Speed 5177.06 samples/sec Loss 2.4965 LearningRate 0.0354 Epoch: 8 Global Step: 135270 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:20:02,512-Speed 5213.10 samples/sec Loss 2.4935 LearningRate 0.0354 Epoch: 8 Global Step: 135280 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:20:04,489-Speed 5182.29 samples/sec Loss 2.5226 LearningRate 0.0354 Epoch: 8 Global Step: 135290 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:20:06,455-Speed 5210.95 samples/sec Loss 2.4720 LearningRate 0.0354 Epoch: 8 Global Step: 135300 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:20:08,420-Speed 5213.56 samples/sec Loss 2.4327 LearningRate 0.0354 Epoch: 8 Global Step: 135310 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:20:10,386-Speed 5209.73 samples/sec Loss 2.4188 LearningRate 0.0354 Epoch: 8 Global Step: 135320 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:20:12,352-Speed 5208.91 samples/sec Loss 2.5072 LearningRate 0.0354 Epoch: 8 Global Step: 135330 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:20:14,334-Speed 5168.92 samples/sec Loss 2.4283 LearningRate 0.0354 Epoch: 8 Global Step: 135340 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:20:16,312-Speed 5178.34 samples/sec Loss 2.4844 LearningRate 0.0353 Epoch: 8 Global Step: 135350 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:20:18,289-Speed 5180.65 samples/sec Loss 2.5211 LearningRate 0.0353 Epoch: 8 Global Step: 135360 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:20:20,261-Speed 5195.70 samples/sec Loss 2.4861 LearningRate 0.0353 Epoch: 8 Global Step: 135370 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:20:22,227-Speed 5211.06 samples/sec Loss 2.4846 LearningRate 0.0353 Epoch: 8 Global Step: 135380 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:20:24,203-Speed 5183.42 samples/sec Loss 2.4532 LearningRate 0.0353 Epoch: 8 Global Step: 135390 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:20:26,191-Speed 5151.29 samples/sec Loss 2.4119 LearningRate 0.0353 Epoch: 8 Global Step: 135400 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:20:28,167-Speed 5185.61 samples/sec Loss 2.5068 LearningRate 0.0353 Epoch: 8 Global Step: 135410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:20:30,130-Speed 5218.02 samples/sec Loss 2.4438 LearningRate 0.0353 Epoch: 8 Global Step: 135420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:20:32,098-Speed 5204.66 samples/sec Loss 2.4413 LearningRate 0.0353 Epoch: 8 Global Step: 135430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:20:34,073-Speed 5186.95 samples/sec Loss 2.4334 LearningRate 0.0353 Epoch: 8 Global Step: 135440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:20:36,060-Speed 5155.82 samples/sec Loss 2.4845 LearningRate 0.0353 Epoch: 8 Global Step: 135450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:20:38,040-Speed 5174.66 samples/sec Loss 2.5442 LearningRate 0.0353 Epoch: 8 Global Step: 135460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:20:40,021-Speed 5170.26 samples/sec Loss 2.4363 LearningRate 0.0353 Epoch: 8 Global Step: 135470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:20:42,003-Speed 5166.76 samples/sec Loss 2.5382 LearningRate 0.0353 Epoch: 8 Global Step: 135480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:20:43,978-Speed 5187.82 samples/sec Loss 2.5158 LearningRate 0.0353 Epoch: 8 Global Step: 135490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:20:45,984-Speed 5107.16 samples/sec Loss 2.3847 LearningRate 0.0353 Epoch: 8 Global Step: 135500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:20:48,037-Speed 4989.08 samples/sec Loss 2.4514 LearningRate 0.0353 Epoch: 8 Global Step: 135510 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:20:50,011-Speed 5188.22 samples/sec Loss 2.4445 LearningRate 0.0353 Epoch: 8 Global Step: 135520 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:20:51,995-Speed 5164.79 samples/sec Loss 2.4630 LearningRate 0.0353 Epoch: 8 Global Step: 135530 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:20:53,966-Speed 5196.37 samples/sec Loss 2.4722 LearningRate 0.0353 Epoch: 8 Global Step: 135540 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:20:55,941-Speed 5187.30 samples/sec Loss 2.4324 LearningRate 0.0353 Epoch: 8 Global Step: 135550 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:20:57,920-Speed 5174.11 samples/sec Loss 2.4471 LearningRate 0.0353 Epoch: 8 Global Step: 135560 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:20:59,896-Speed 5183.27 samples/sec Loss 2.4819 LearningRate 0.0353 Epoch: 8 Global Step: 135570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:21:01,870-Speed 5189.79 samples/sec Loss 2.5182 LearningRate 0.0353 Epoch: 8 Global Step: 135580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:21:03,837-Speed 5208.10 samples/sec Loss 2.4669 LearningRate 0.0353 Epoch: 8 Global Step: 135590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:21:05,814-Speed 5180.39 samples/sec Loss 2.5056 LearningRate 0.0353 Epoch: 8 Global Step: 135600 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:21:07,776-Speed 5221.23 samples/sec Loss 2.4879 LearningRate 0.0353 Epoch: 8 Global Step: 135610 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:21:09,760-Speed 5162.43 samples/sec Loss 2.4418 LearningRate 0.0353 Epoch: 8 Global Step: 135620 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:21:11,756-Speed 5134.12 samples/sec Loss 2.5268 LearningRate 0.0352 Epoch: 8 Global Step: 135630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:21:13,720-Speed 5215.42 samples/sec Loss 2.5571 LearningRate 0.0352 Epoch: 8 Global Step: 135640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:21:15,728-Speed 5100.43 samples/sec Loss 2.5163 LearningRate 0.0352 Epoch: 8 Global Step: 135650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:21:17,740-Speed 5093.24 samples/sec Loss 2.4869 LearningRate 0.0352 Epoch: 8 Global Step: 135660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:21:19,717-Speed 5180.19 samples/sec Loss 2.5101 LearningRate 0.0352 Epoch: 8 Global Step: 135670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:21:21,689-Speed 5193.66 samples/sec Loss 2.4920 LearningRate 0.0352 Epoch: 8 Global Step: 135680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:21:23,675-Speed 5157.31 samples/sec Loss 2.5208 LearningRate 0.0352 Epoch: 8 Global Step: 135690 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:21:25,665-Speed 5149.00 samples/sec Loss 2.5504 LearningRate 0.0352 Epoch: 8 Global Step: 135700 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:21:27,655-Speed 5146.94 samples/sec Loss 2.4189 LearningRate 0.0352 Epoch: 8 Global Step: 135710 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:21:29,642-Speed 5155.82 samples/sec Loss 2.4339 LearningRate 0.0352 Epoch: 8 Global Step: 135720 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:21:31,619-Speed 5179.79 samples/sec Loss 2.5255 LearningRate 0.0352 Epoch: 8 Global Step: 135730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:21:33,607-Speed 5152.53 samples/sec Loss 2.5418 LearningRate 0.0352 Epoch: 8 Global Step: 135740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:21:35,604-Speed 5130.53 samples/sec Loss 2.4937 LearningRate 0.0352 Epoch: 8 Global Step: 135750 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:21:37,579-Speed 5187.52 samples/sec Loss 2.4444 LearningRate 0.0352 Epoch: 8 Global Step: 135760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:21:39,554-Speed 5186.20 samples/sec Loss 2.4273 LearningRate 0.0352 Epoch: 8 Global Step: 135770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:21:41,522-Speed 5203.43 samples/sec Loss 2.4401 LearningRate 0.0352 Epoch: 8 Global Step: 135780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:21:43,490-Speed 5205.74 samples/sec Loss 2.5204 LearningRate 0.0352 Epoch: 8 Global Step: 135790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:21:45,480-Speed 5148.08 samples/sec Loss 2.5142 LearningRate 0.0352 Epoch: 8 Global Step: 135800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:21:47,491-Speed 5093.77 samples/sec Loss 2.5204 LearningRate 0.0352 Epoch: 8 Global Step: 135810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:21:49,481-Speed 5145.84 samples/sec Loss 2.4984 LearningRate 0.0352 Epoch: 8 Global Step: 135820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:21:51,480-Speed 5125.15 samples/sec Loss 2.4380 LearningRate 0.0352 Epoch: 8 Global Step: 135830 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:21:53,449-Speed 5201.98 samples/sec Loss 2.4908 LearningRate 0.0352 Epoch: 8 Global Step: 135840 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:21:55,420-Speed 5198.76 samples/sec Loss 2.4734 LearningRate 0.0352 Epoch: 8 Global Step: 135850 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:21:57,389-Speed 5201.75 samples/sec Loss 2.4735 LearningRate 0.0352 Epoch: 8 Global Step: 135860 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:21:59,364-Speed 5186.29 samples/sec Loss 2.4343 LearningRate 0.0352 Epoch: 8 Global Step: 135870 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:22:01,331-Speed 5207.39 samples/sec Loss 2.4970 LearningRate 0.0352 Epoch: 8 Global Step: 135880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:22:03,298-Speed 5207.77 samples/sec Loss 2.4908 LearningRate 0.0352 Epoch: 8 Global Step: 135890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:22:05,279-Speed 5174.00 samples/sec Loss 2.5040 LearningRate 0.0352 Epoch: 8 Global Step: 135900 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:22:07,252-Speed 5192.66 samples/sec Loss 2.6076 LearningRate 0.0351 Epoch: 8 Global Step: 135910 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:22:09,248-Speed 5132.05 samples/sec Loss 2.5521 LearningRate 0.0351 Epoch: 8 Global Step: 135920 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:22:11,222-Speed 5187.87 samples/sec Loss 2.4984 LearningRate 0.0351 Epoch: 8 Global Step: 135930 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:22:13,209-Speed 5196.89 samples/sec Loss 2.5196 LearningRate 0.0351 Epoch: 8 Global Step: 135940 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:22:15,188-Speed 5177.63 samples/sec Loss 2.5102 LearningRate 0.0351 Epoch: 8 Global Step: 135950 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:22:17,168-Speed 5171.51 samples/sec Loss 2.5310 LearningRate 0.0351 Epoch: 8 Global Step: 135960 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:22:19,137-Speed 5204.27 samples/sec Loss 2.4654 LearningRate 0.0351 Epoch: 8 Global Step: 135970 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:22:21,118-Speed 5202.91 samples/sec Loss 2.4893 LearningRate 0.0351 Epoch: 8 Global Step: 135980 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:22:23,093-Speed 5185.67 samples/sec Loss 2.5317 LearningRate 0.0351 Epoch: 8 Global Step: 135990 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:22:25,066-Speed 5190.80 samples/sec Loss 2.5118 LearningRate 0.0351 Epoch: 8 Global Step: 136000 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:22:51,678-[lfw][136000]XNorm: 24.044638 Training: 2022-04-11 08:22:51,682-[lfw][136000]Accuracy-Flip: 0.99733+-0.00291 Training: 2022-04-11 08:22:51,682-[lfw][136000]Accuracy-Highest: 0.99833 Training: 2022-04-11 08:23:22,459-[cfp_fp][136000]XNorm: 22.601899 Training: 2022-04-11 08:23:22,670-[cfp_fp][136000]Accuracy-Flip: 0.98086+-0.00649 Training: 2022-04-11 08:23:22,671-[cfp_fp][136000]Accuracy-Highest: 0.98443 Training: 2022-04-11 08:23:49,265-[agedb_30][136000]XNorm: 23.842799 Training: 2022-04-11 08:23:49,265-[agedb_30][136000]Accuracy-Flip: 0.97850+-0.00754 Training: 2022-04-11 08:23:49,266-[agedb_30][136000]Accuracy-Highest: 0.98150 Training: 2022-04-11 08:23:51,245-Speed 118.82 samples/sec Loss 2.4929 LearningRate 0.0351 Epoch: 8 Global Step: 136010 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:23:53,197-Speed 5247.15 samples/sec Loss 2.5972 LearningRate 0.0351 Epoch: 8 Global Step: 136020 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:23:55,161-Speed 5213.61 samples/sec Loss 2.4712 LearningRate 0.0351 Epoch: 8 Global Step: 136030 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:23:57,125-Speed 5216.04 samples/sec Loss 2.4964 LearningRate 0.0351 Epoch: 8 Global Step: 136040 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:23:59,092-Speed 5207.73 samples/sec Loss 2.5210 LearningRate 0.0351 Epoch: 8 Global Step: 136050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:24:01,071-Speed 5178.13 samples/sec Loss 2.4346 LearningRate 0.0351 Epoch: 8 Global Step: 136060 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:24:03,034-Speed 5217.78 samples/sec Loss 2.5846 LearningRate 0.0351 Epoch: 8 Global Step: 136070 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:24:04,993-Speed 5227.96 samples/sec Loss 2.5309 LearningRate 0.0351 Epoch: 8 Global Step: 136080 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:24:06,958-Speed 5214.33 samples/sec Loss 2.4954 LearningRate 0.0351 Epoch: 8 Global Step: 136090 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:24:08,921-Speed 5217.07 samples/sec Loss 2.4830 LearningRate 0.0351 Epoch: 8 Global Step: 136100 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:24:10,907-Speed 5157.07 samples/sec Loss 2.4711 LearningRate 0.0351 Epoch: 8 Global Step: 136110 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:24:12,875-Speed 5205.41 samples/sec Loss 2.4966 LearningRate 0.0351 Epoch: 8 Global Step: 136120 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:24:14,870-Speed 5134.90 samples/sec Loss 2.5433 LearningRate 0.0351 Epoch: 8 Global Step: 136130 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:24:16,873-Speed 5113.48 samples/sec Loss 2.5495 LearningRate 0.0351 Epoch: 8 Global Step: 136140 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:24:18,842-Speed 5201.53 samples/sec Loss 2.4911 LearningRate 0.0351 Epoch: 8 Global Step: 136150 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:24:20,822-Speed 5174.76 samples/sec Loss 2.4618 LearningRate 0.0351 Epoch: 8 Global Step: 136160 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:24:22,812-Speed 5147.39 samples/sec Loss 2.4694 LearningRate 0.0351 Epoch: 8 Global Step: 136170 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:24:24,782-Speed 5201.25 samples/sec Loss 2.5384 LearningRate 0.0351 Epoch: 8 Global Step: 136180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:24:26,753-Speed 5195.19 samples/sec Loss 2.4703 LearningRate 0.0350 Epoch: 8 Global Step: 136190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:24:28,722-Speed 5203.31 samples/sec Loss 2.5194 LearningRate 0.0350 Epoch: 8 Global Step: 136200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:24:30,694-Speed 5194.75 samples/sec Loss 2.5041 LearningRate 0.0350 Epoch: 8 Global Step: 136210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:24:32,657-Speed 5216.21 samples/sec Loss 2.5141 LearningRate 0.0350 Epoch: 8 Global Step: 136220 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:24:34,628-Speed 5197.40 samples/sec Loss 2.5885 LearningRate 0.0350 Epoch: 8 Global Step: 136230 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:24:36,611-Speed 5166.96 samples/sec Loss 2.5166 LearningRate 0.0350 Epoch: 8 Global Step: 136240 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:24:38,591-Speed 5172.13 samples/sec Loss 2.5036 LearningRate 0.0350 Epoch: 8 Global Step: 136250 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:24:40,564-Speed 5193.83 samples/sec Loss 2.5337 LearningRate 0.0350 Epoch: 8 Global Step: 136260 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:24:42,543-Speed 5175.91 samples/sec Loss 2.4924 LearningRate 0.0350 Epoch: 8 Global Step: 136270 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:24:44,518-Speed 5187.81 samples/sec Loss 2.4994 LearningRate 0.0350 Epoch: 8 Global Step: 136280 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:24:46,492-Speed 5187.61 samples/sec Loss 2.5108 LearningRate 0.0350 Epoch: 8 Global Step: 136290 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:24:48,481-Speed 5152.13 samples/sec Loss 2.5554 LearningRate 0.0350 Epoch: 8 Global Step: 136300 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:24:50,457-Speed 5184.16 samples/sec Loss 2.5285 LearningRate 0.0350 Epoch: 8 Global Step: 136310 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:24:52,463-Speed 5104.44 samples/sec Loss 2.4756 LearningRate 0.0350 Epoch: 8 Global Step: 136320 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:24:54,433-Speed 5200.25 samples/sec Loss 2.5147 LearningRate 0.0350 Epoch: 8 Global Step: 136330 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:24:56,403-Speed 5199.55 samples/sec Loss 2.4281 LearningRate 0.0350 Epoch: 8 Global Step: 136340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:24:58,376-Speed 5191.63 samples/sec Loss 2.4924 LearningRate 0.0350 Epoch: 8 Global Step: 136350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:25:00,365-Speed 5151.65 samples/sec Loss 2.4489 LearningRate 0.0350 Epoch: 8 Global Step: 136360 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:25:02,331-Speed 5210.73 samples/sec Loss 2.4888 LearningRate 0.0350 Epoch: 8 Global Step: 136370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:25:04,311-Speed 5172.07 samples/sec Loss 2.4864 LearningRate 0.0350 Epoch: 8 Global Step: 136380 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:25:06,286-Speed 5186.73 samples/sec Loss 2.5311 LearningRate 0.0350 Epoch: 8 Global Step: 136390 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:25:08,257-Speed 5197.00 samples/sec Loss 2.4705 LearningRate 0.0350 Epoch: 8 Global Step: 136400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:25:10,233-Speed 5182.67 samples/sec Loss 2.6046 LearningRate 0.0350 Epoch: 8 Global Step: 136410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:25:12,224-Speed 5146.66 samples/sec Loss 2.5314 LearningRate 0.0350 Epoch: 8 Global Step: 136420 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:25:14,196-Speed 5193.19 samples/sec Loss 2.5170 LearningRate 0.0350 Epoch: 8 Global Step: 136430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:25:16,177-Speed 5172.39 samples/sec Loss 2.5107 LearningRate 0.0350 Epoch: 8 Global Step: 136440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:25:18,161-Speed 5162.30 samples/sec Loss 2.4485 LearningRate 0.0350 Epoch: 8 Global Step: 136450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:25:20,127-Speed 5209.94 samples/sec Loss 2.5334 LearningRate 0.0350 Epoch: 8 Global Step: 136460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:25:22,112-Speed 5160.52 samples/sec Loss 2.4865 LearningRate 0.0350 Epoch: 8 Global Step: 136470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:25:24,108-Speed 5133.43 samples/sec Loss 2.5425 LearningRate 0.0349 Epoch: 8 Global Step: 136480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:25:26,101-Speed 5138.25 samples/sec Loss 2.5851 LearningRate 0.0349 Epoch: 8 Global Step: 136490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:25:28,081-Speed 5175.54 samples/sec Loss 2.5236 LearningRate 0.0349 Epoch: 8 Global Step: 136500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:25:30,068-Speed 5154.46 samples/sec Loss 2.4839 LearningRate 0.0349 Epoch: 8 Global Step: 136510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:25:32,037-Speed 5202.25 samples/sec Loss 2.5408 LearningRate 0.0349 Epoch: 8 Global Step: 136520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:25:34,016-Speed 5173.90 samples/sec Loss 2.5894 LearningRate 0.0349 Epoch: 8 Global Step: 136530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:25:35,988-Speed 5196.15 samples/sec Loss 2.5004 LearningRate 0.0349 Epoch: 8 Global Step: 136540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:25:37,964-Speed 5184.08 samples/sec Loss 2.4819 LearningRate 0.0349 Epoch: 8 Global Step: 136550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:25:39,936-Speed 5196.50 samples/sec Loss 2.5655 LearningRate 0.0349 Epoch: 8 Global Step: 136560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:25:41,917-Speed 5168.18 samples/sec Loss 2.5341 LearningRate 0.0349 Epoch: 8 Global Step: 136570 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:25:43,922-Speed 5108.82 samples/sec Loss 2.5942 LearningRate 0.0349 Epoch: 8 Global Step: 136580 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:25:45,906-Speed 5164.65 samples/sec Loss 2.5445 LearningRate 0.0349 Epoch: 8 Global Step: 136590 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:25:47,870-Speed 5215.00 samples/sec Loss 2.4408 LearningRate 0.0349 Epoch: 8 Global Step: 136600 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:25:49,845-Speed 5185.16 samples/sec Loss 2.5050 LearningRate 0.0349 Epoch: 8 Global Step: 136610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:25:51,813-Speed 5207.17 samples/sec Loss 2.5638 LearningRate 0.0349 Epoch: 8 Global Step: 136620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:25:53,790-Speed 5178.98 samples/sec Loss 2.5572 LearningRate 0.0349 Epoch: 8 Global Step: 136630 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:25:55,775-Speed 5162.13 samples/sec Loss 2.5879 LearningRate 0.0349 Epoch: 8 Global Step: 136640 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:25:57,753-Speed 5178.51 samples/sec Loss 2.5295 LearningRate 0.0349 Epoch: 8 Global Step: 136650 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:25:59,726-Speed 5192.90 samples/sec Loss 2.4964 LearningRate 0.0349 Epoch: 8 Global Step: 136660 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:26:01,712-Speed 5158.21 samples/sec Loss 2.5634 LearningRate 0.0349 Epoch: 8 Global Step: 136670 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:26:03,700-Speed 5150.55 samples/sec Loss 2.5680 LearningRate 0.0349 Epoch: 8 Global Step: 136680 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:26:05,675-Speed 5187.55 samples/sec Loss 2.5970 LearningRate 0.0349 Epoch: 8 Global Step: 136690 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:26:07,642-Speed 5206.19 samples/sec Loss 2.5139 LearningRate 0.0349 Epoch: 8 Global Step: 136700 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:26:09,623-Speed 5170.78 samples/sec Loss 2.5418 LearningRate 0.0349 Epoch: 8 Global Step: 136710 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:26:11,601-Speed 5178.31 samples/sec Loss 2.5019 LearningRate 0.0349 Epoch: 8 Global Step: 136720 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:26:13,593-Speed 5144.13 samples/sec Loss 2.5286 LearningRate 0.0349 Epoch: 8 Global Step: 136730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:26:15,563-Speed 5198.45 samples/sec Loss 2.5207 LearningRate 0.0349 Epoch: 8 Global Step: 136740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:26:17,538-Speed 5187.16 samples/sec Loss 2.4789 LearningRate 0.0349 Epoch: 8 Global Step: 136750 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:26:19,517-Speed 5175.47 samples/sec Loss 2.4545 LearningRate 0.0348 Epoch: 8 Global Step: 136760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:26:21,500-Speed 5166.78 samples/sec Loss 2.5267 LearningRate 0.0348 Epoch: 8 Global Step: 136770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:26:23,469-Speed 5202.85 samples/sec Loss 2.5889 LearningRate 0.0348 Epoch: 8 Global Step: 136780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:26:25,441-Speed 5194.59 samples/sec Loss 2.5332 LearningRate 0.0348 Epoch: 8 Global Step: 136790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:26:27,410-Speed 5200.45 samples/sec Loss 2.5183 LearningRate 0.0348 Epoch: 8 Global Step: 136800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:26:29,388-Speed 5180.78 samples/sec Loss 2.5538 LearningRate 0.0348 Epoch: 8 Global Step: 136810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:26:31,362-Speed 5187.15 samples/sec Loss 2.5966 LearningRate 0.0348 Epoch: 8 Global Step: 136820 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:26:33,337-Speed 5187.30 samples/sec Loss 2.4886 LearningRate 0.0348 Epoch: 8 Global Step: 136830 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:26:35,305-Speed 5205.39 samples/sec Loss 2.6023 LearningRate 0.0348 Epoch: 8 Global Step: 136840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:26:37,279-Speed 5187.76 samples/sec Loss 2.5665 LearningRate 0.0348 Epoch: 8 Global Step: 136850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:26:39,262-Speed 5169.39 samples/sec Loss 2.5578 LearningRate 0.0348 Epoch: 8 Global Step: 136860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:26:41,231-Speed 5203.11 samples/sec Loss 2.5242 LearningRate 0.0348 Epoch: 8 Global Step: 136870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:26:43,197-Speed 5210.34 samples/sec Loss 2.5803 LearningRate 0.0348 Epoch: 8 Global Step: 136880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:26:45,167-Speed 5200.93 samples/sec Loss 2.5492 LearningRate 0.0348 Epoch: 8 Global Step: 136890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:26:47,140-Speed 5189.11 samples/sec Loss 2.5714 LearningRate 0.0348 Epoch: 8 Global Step: 136900 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:26:49,109-Speed 5204.16 samples/sec Loss 2.6017 LearningRate 0.0348 Epoch: 8 Global Step: 136910 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:26:51,077-Speed 5205.65 samples/sec Loss 2.7050 LearningRate 0.0348 Epoch: 8 Global Step: 136920 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:26:53,068-Speed 5143.18 samples/sec Loss 2.5389 LearningRate 0.0348 Epoch: 8 Global Step: 136930 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:26:55,041-Speed 5190.79 samples/sec Loss 2.5657 LearningRate 0.0348 Epoch: 8 Global Step: 136940 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:26:57,011-Speed 5199.75 samples/sec Loss 2.5359 LearningRate 0.0348 Epoch: 8 Global Step: 136950 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:26:58,988-Speed 5183.26 samples/sec Loss 2.6152 LearningRate 0.0348 Epoch: 8 Global Step: 136960 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:27:00,969-Speed 5171.39 samples/sec Loss 2.6287 LearningRate 0.0348 Epoch: 8 Global Step: 136970 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:27:02,940-Speed 5195.24 samples/sec Loss 2.5249 LearningRate 0.0348 Epoch: 8 Global Step: 136980 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:27:04,920-Speed 5175.31 samples/sec Loss 2.5065 LearningRate 0.0348 Epoch: 8 Global Step: 136990 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:27:06,879-Speed 5227.65 samples/sec Loss 2.6154 LearningRate 0.0348 Epoch: 8 Global Step: 137000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:27:08,854-Speed 5185.83 samples/sec Loss 2.6637 LearningRate 0.0348 Epoch: 8 Global Step: 137010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:27:10,822-Speed 5205.28 samples/sec Loss 2.6368 LearningRate 0.0348 Epoch: 8 Global Step: 137020 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:27:12,790-Speed 5204.75 samples/sec Loss 2.5888 LearningRate 0.0348 Epoch: 8 Global Step: 137030 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:27:14,774-Speed 5163.98 samples/sec Loss 2.5271 LearningRate 0.0347 Epoch: 8 Global Step: 137040 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:27:16,761-Speed 5155.82 samples/sec Loss 2.5653 LearningRate 0.0347 Epoch: 8 Global Step: 137050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:27:18,733-Speed 5194.87 samples/sec Loss 2.5347 LearningRate 0.0347 Epoch: 8 Global Step: 137060 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:27:20,726-Speed 5139.37 samples/sec Loss 2.5977 LearningRate 0.0347 Epoch: 8 Global Step: 137070 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:27:22,705-Speed 5176.80 samples/sec Loss 2.5696 LearningRate 0.0347 Epoch: 8 Global Step: 137080 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:27:24,677-Speed 5193.89 samples/sec Loss 2.5699 LearningRate 0.0347 Epoch: 8 Global Step: 137090 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:27:26,643-Speed 5211.63 samples/sec Loss 2.5256 LearningRate 0.0347 Epoch: 8 Global Step: 137100 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:27:28,612-Speed 5201.56 samples/sec Loss 2.5586 LearningRate 0.0347 Epoch: 8 Global Step: 137110 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:27:30,583-Speed 5197.50 samples/sec Loss 2.5646 LearningRate 0.0347 Epoch: 8 Global Step: 137120 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:27:32,555-Speed 5194.44 samples/sec Loss 2.6086 LearningRate 0.0347 Epoch: 8 Global Step: 137130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:27:34,524-Speed 5201.97 samples/sec Loss 2.5658 LearningRate 0.0347 Epoch: 8 Global Step: 137140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:27:36,509-Speed 5159.74 samples/sec Loss 2.6165 LearningRate 0.0347 Epoch: 8 Global Step: 137150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:27:38,493-Speed 5162.63 samples/sec Loss 2.6220 LearningRate 0.0347 Epoch: 8 Global Step: 137160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:27:40,483-Speed 5148.05 samples/sec Loss 2.6104 LearningRate 0.0347 Epoch: 8 Global Step: 137170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:27:42,453-Speed 5199.46 samples/sec Loss 2.6175 LearningRate 0.0347 Epoch: 8 Global Step: 137180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:27:44,421-Speed 5206.28 samples/sec Loss 2.5426 LearningRate 0.0347 Epoch: 8 Global Step: 137190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:27:46,407-Speed 5157.74 samples/sec Loss 2.5508 LearningRate 0.0347 Epoch: 8 Global Step: 137200 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:27:48,415-Speed 5101.25 samples/sec Loss 2.5739 LearningRate 0.0347 Epoch: 8 Global Step: 137210 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:27:50,413-Speed 5125.56 samples/sec Loss 2.5824 LearningRate 0.0347 Epoch: 8 Global Step: 137220 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:27:52,398-Speed 5160.68 samples/sec Loss 2.6075 LearningRate 0.0347 Epoch: 8 Global Step: 137230 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:27:54,378-Speed 5171.96 samples/sec Loss 2.5968 LearningRate 0.0347 Epoch: 8 Global Step: 137240 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:27:56,355-Speed 5182.66 samples/sec Loss 2.5956 LearningRate 0.0347 Epoch: 8 Global Step: 137250 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:27:58,340-Speed 5161.69 samples/sec Loss 2.4827 LearningRate 0.0347 Epoch: 8 Global Step: 137260 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:28:00,336-Speed 5129.89 samples/sec Loss 2.5432 LearningRate 0.0347 Epoch: 8 Global Step: 137270 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:28:02,311-Speed 5187.06 samples/sec Loss 2.6029 LearningRate 0.0347 Epoch: 8 Global Step: 137280 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:28:04,296-Speed 5162.25 samples/sec Loss 2.6462 LearningRate 0.0347 Epoch: 8 Global Step: 137290 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:28:06,271-Speed 5186.25 samples/sec Loss 2.5810 LearningRate 0.0347 Epoch: 8 Global Step: 137300 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:28:08,243-Speed 5194.37 samples/sec Loss 2.6019 LearningRate 0.0347 Epoch: 8 Global Step: 137310 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:28:10,213-Speed 5198.25 samples/sec Loss 2.5466 LearningRate 0.0346 Epoch: 8 Global Step: 137320 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:28:12,182-Speed 5203.13 samples/sec Loss 2.5858 LearningRate 0.0346 Epoch: 8 Global Step: 137330 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:28:14,151-Speed 5202.06 samples/sec Loss 2.5336 LearningRate 0.0346 Epoch: 8 Global Step: 137340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:28:16,130-Speed 5177.01 samples/sec Loss 2.6344 LearningRate 0.0346 Epoch: 8 Global Step: 137350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:28:18,113-Speed 5165.47 samples/sec Loss 2.6219 LearningRate 0.0346 Epoch: 8 Global Step: 137360 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:28:20,091-Speed 5176.59 samples/sec Loss 2.5548 LearningRate 0.0346 Epoch: 8 Global Step: 137370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:28:22,085-Speed 5139.67 samples/sec Loss 2.5497 LearningRate 0.0346 Epoch: 8 Global Step: 137380 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:28:24,052-Speed 5206.36 samples/sec Loss 2.5552 LearningRate 0.0346 Epoch: 8 Global Step: 137390 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:28:26,037-Speed 5160.89 samples/sec Loss 2.6199 LearningRate 0.0346 Epoch: 8 Global Step: 137400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:28:28,020-Speed 5165.55 samples/sec Loss 2.5199 LearningRate 0.0346 Epoch: 8 Global Step: 137410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:28:30,001-Speed 5171.82 samples/sec Loss 2.5906 LearningRate 0.0346 Epoch: 8 Global Step: 137420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:28:31,970-Speed 5202.24 samples/sec Loss 2.5137 LearningRate 0.0346 Epoch: 8 Global Step: 137430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:28:33,938-Speed 5203.71 samples/sec Loss 2.5551 LearningRate 0.0346 Epoch: 8 Global Step: 137440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:28:35,909-Speed 5197.45 samples/sec Loss 2.5387 LearningRate 0.0346 Epoch: 8 Global Step: 137450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:28:37,891-Speed 5168.66 samples/sec Loss 2.5428 LearningRate 0.0346 Epoch: 8 Global Step: 137460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:28:39,880-Speed 5151.14 samples/sec Loss 2.5539 LearningRate 0.0346 Epoch: 8 Global Step: 137470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:28:41,873-Speed 5138.45 samples/sec Loss 2.5964 LearningRate 0.0346 Epoch: 8 Global Step: 137480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:28:43,845-Speed 5193.94 samples/sec Loss 2.6119 LearningRate 0.0346 Epoch: 8 Global Step: 137490 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:28:45,818-Speed 5192.68 samples/sec Loss 2.5754 LearningRate 0.0346 Epoch: 8 Global Step: 137500 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:28:47,806-Speed 5153.76 samples/sec Loss 2.5886 LearningRate 0.0346 Epoch: 8 Global Step: 137510 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:28:49,799-Speed 5139.06 samples/sec Loss 2.5873 LearningRate 0.0346 Epoch: 8 Global Step: 137520 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:28:51,786-Speed 5153.96 samples/sec Loss 2.5906 LearningRate 0.0346 Epoch: 8 Global Step: 137530 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:28:53,752-Speed 5211.83 samples/sec Loss 2.6194 LearningRate 0.0346 Epoch: 8 Global Step: 137540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:28:55,725-Speed 5190.95 samples/sec Loss 2.5712 LearningRate 0.0346 Epoch: 8 Global Step: 137550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:28:57,701-Speed 5183.60 samples/sec Loss 2.6073 LearningRate 0.0346 Epoch: 8 Global Step: 137560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:28:59,682-Speed 5172.32 samples/sec Loss 2.5867 LearningRate 0.0346 Epoch: 8 Global Step: 137570 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:29:01,657-Speed 5184.04 samples/sec Loss 2.5735 LearningRate 0.0346 Epoch: 8 Global Step: 137580 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:29:03,627-Speed 5199.59 samples/sec Loss 2.5583 LearningRate 0.0346 Epoch: 8 Global Step: 137590 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:29:05,605-Speed 5178.96 samples/sec Loss 2.5828 LearningRate 0.0346 Epoch: 8 Global Step: 137600 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:29:07,594-Speed 5150.25 samples/sec Loss 2.5652 LearningRate 0.0345 Epoch: 8 Global Step: 137610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:29:09,565-Speed 5199.27 samples/sec Loss 2.5473 LearningRate 0.0345 Epoch: 8 Global Step: 137620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:29:11,540-Speed 5186.25 samples/sec Loss 2.6152 LearningRate 0.0345 Epoch: 8 Global Step: 137630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:29:13,521-Speed 5169.45 samples/sec Loss 2.6252 LearningRate 0.0345 Epoch: 8 Global Step: 137640 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:29:15,502-Speed 5172.12 samples/sec Loss 2.4673 LearningRate 0.0345 Epoch: 8 Global Step: 137650 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:29:17,486-Speed 5161.41 samples/sec Loss 2.5276 LearningRate 0.0345 Epoch: 8 Global Step: 137660 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:29:19,471-Speed 5162.10 samples/sec Loss 2.5564 LearningRate 0.0345 Epoch: 8 Global Step: 137670 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:29:21,460-Speed 5148.24 samples/sec Loss 2.5941 LearningRate 0.0345 Epoch: 8 Global Step: 137680 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:29:23,504-Speed 5011.96 samples/sec Loss 2.6925 LearningRate 0.0345 Epoch: 8 Global Step: 137690 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:29:25,503-Speed 5124.14 samples/sec Loss 2.6142 LearningRate 0.0345 Epoch: 8 Global Step: 137700 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:29:27,531-Speed 5052.20 samples/sec Loss 2.5837 LearningRate 0.0345 Epoch: 8 Global Step: 137710 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:29:29,505-Speed 5188.85 samples/sec Loss 2.6074 LearningRate 0.0345 Epoch: 8 Global Step: 137720 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:29:31,475-Speed 5198.92 samples/sec Loss 2.5630 LearningRate 0.0345 Epoch: 8 Global Step: 137730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:29:33,446-Speed 5197.34 samples/sec Loss 2.6043 LearningRate 0.0345 Epoch: 8 Global Step: 137740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:29:35,434-Speed 5152.11 samples/sec Loss 2.6220 LearningRate 0.0345 Epoch: 8 Global Step: 137750 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:29:37,407-Speed 5192.01 samples/sec Loss 2.5781 LearningRate 0.0345 Epoch: 8 Global Step: 137760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:29:39,381-Speed 5190.98 samples/sec Loss 2.5979 LearningRate 0.0345 Epoch: 8 Global Step: 137770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:29:41,357-Speed 5183.40 samples/sec Loss 2.5387 LearningRate 0.0345 Epoch: 8 Global Step: 137780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:29:43,332-Speed 5186.16 samples/sec Loss 2.5552 LearningRate 0.0345 Epoch: 8 Global Step: 137790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:29:45,325-Speed 5138.76 samples/sec Loss 2.5584 LearningRate 0.0345 Epoch: 8 Global Step: 137800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:29:47,318-Speed 5140.83 samples/sec Loss 2.5942 LearningRate 0.0345 Epoch: 8 Global Step: 137810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:29:49,301-Speed 5165.25 samples/sec Loss 2.5795 LearningRate 0.0345 Epoch: 8 Global Step: 137820 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:29:51,271-Speed 5199.18 samples/sec Loss 2.5919 LearningRate 0.0345 Epoch: 8 Global Step: 137830 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:29:53,246-Speed 5186.88 samples/sec Loss 2.5972 LearningRate 0.0345 Epoch: 8 Global Step: 137840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:29:55,224-Speed 5179.56 samples/sec Loss 2.5836 LearningRate 0.0345 Epoch: 8 Global Step: 137850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:29:57,199-Speed 5186.38 samples/sec Loss 2.6237 LearningRate 0.0345 Epoch: 8 Global Step: 137860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:29:59,175-Speed 5184.05 samples/sec Loss 2.5971 LearningRate 0.0345 Epoch: 8 Global Step: 137870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:30:01,147-Speed 5193.49 samples/sec Loss 2.6186 LearningRate 0.0345 Epoch: 8 Global Step: 137880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:30:03,152-Speed 5109.91 samples/sec Loss 2.6071 LearningRate 0.0344 Epoch: 8 Global Step: 137890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:30:05,130-Speed 5177.18 samples/sec Loss 2.5391 LearningRate 0.0344 Epoch: 8 Global Step: 137900 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:30:07,113-Speed 5165.01 samples/sec Loss 2.5926 LearningRate 0.0344 Epoch: 8 Global Step: 137910 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:30:09,088-Speed 5187.94 samples/sec Loss 2.5742 LearningRate 0.0344 Epoch: 8 Global Step: 137920 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:30:11,090-Speed 5115.93 samples/sec Loss 2.5964 LearningRate 0.0344 Epoch: 8 Global Step: 137930 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:30:13,068-Speed 5179.54 samples/sec Loss 2.5449 LearningRate 0.0344 Epoch: 8 Global Step: 137940 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:30:15,040-Speed 5193.51 samples/sec Loss 2.6677 LearningRate 0.0344 Epoch: 8 Global Step: 137950 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:30:17,011-Speed 5198.02 samples/sec Loss 2.5693 LearningRate 0.0344 Epoch: 8 Global Step: 137960 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:30:18,986-Speed 5185.80 samples/sec Loss 2.5983 LearningRate 0.0344 Epoch: 8 Global Step: 137970 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:30:20,967-Speed 5171.20 samples/sec Loss 2.6489 LearningRate 0.0344 Epoch: 8 Global Step: 137980 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:30:22,957-Speed 5147.15 samples/sec Loss 2.6272 LearningRate 0.0344 Epoch: 8 Global Step: 137990 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:30:24,948-Speed 5145.75 samples/sec Loss 2.5903 LearningRate 0.0344 Epoch: 8 Global Step: 138000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:30:51,539-[lfw][138000]XNorm: 22.828375 Training: 2022-04-11 08:30:51,539-[lfw][138000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 08:30:51,540-[lfw][138000]Accuracy-Highest: 0.99833 Training: 2022-04-11 08:31:22,310-[cfp_fp][138000]XNorm: 21.440767 Training: 2022-04-11 08:31:22,310-[cfp_fp][138000]Accuracy-Flip: 0.98371+-0.00583 Training: 2022-04-11 08:31:22,311-[cfp_fp][138000]Accuracy-Highest: 0.98443 Training: 2022-04-11 08:31:48,808-[agedb_30][138000]XNorm: 23.020997 Training: 2022-04-11 08:31:48,808-[agedb_30][138000]Accuracy-Flip: 0.97733+-0.00793 Training: 2022-04-11 08:31:48,809-[agedb_30][138000]Accuracy-Highest: 0.98150 Training: 2022-04-11 08:31:50,802-Speed 119.27 samples/sec Loss 2.6442 LearningRate 0.0344 Epoch: 8 Global Step: 138010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:31:52,760-Speed 5232.55 samples/sec Loss 2.6162 LearningRate 0.0344 Epoch: 8 Global Step: 138020 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:31:54,718-Speed 5230.54 samples/sec Loss 2.6271 LearningRate 0.0344 Epoch: 8 Global Step: 138030 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:31:56,680-Speed 5221.98 samples/sec Loss 2.6588 LearningRate 0.0344 Epoch: 8 Global Step: 138040 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:31:58,640-Speed 5224.62 samples/sec Loss 2.5877 LearningRate 0.0344 Epoch: 8 Global Step: 138050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:32:00,606-Speed 5212.06 samples/sec Loss 2.6394 LearningRate 0.0344 Epoch: 8 Global Step: 138060 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:32:02,577-Speed 5196.15 samples/sec Loss 2.5583 LearningRate 0.0344 Epoch: 8 Global Step: 138070 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:32:04,542-Speed 5214.24 samples/sec Loss 2.6229 LearningRate 0.0344 Epoch: 8 Global Step: 138080 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:32:06,503-Speed 5223.31 samples/sec Loss 2.6359 LearningRate 0.0344 Epoch: 8 Global Step: 138090 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:32:08,467-Speed 5215.24 samples/sec Loss 2.5784 LearningRate 0.0344 Epoch: 8 Global Step: 138100 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:32:10,438-Speed 5196.42 samples/sec Loss 2.5602 LearningRate 0.0344 Epoch: 8 Global Step: 138110 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:32:12,432-Speed 5137.60 samples/sec Loss 2.5672 LearningRate 0.0344 Epoch: 8 Global Step: 138120 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:32:14,401-Speed 5202.05 samples/sec Loss 2.6103 LearningRate 0.0344 Epoch: 8 Global Step: 138130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:32:16,366-Speed 5211.75 samples/sec Loss 2.6017 LearningRate 0.0344 Epoch: 8 Global Step: 138140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:32:18,335-Speed 5204.42 samples/sec Loss 2.6064 LearningRate 0.0344 Epoch: 8 Global Step: 138150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:32:20,301-Speed 5210.72 samples/sec Loss 2.5993 LearningRate 0.0344 Epoch: 8 Global Step: 138160 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:32:22,277-Speed 5181.54 samples/sec Loss 2.6040 LearningRate 0.0344 Epoch: 8 Global Step: 138170 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:32:24,240-Speed 5219.36 samples/sec Loss 2.5781 LearningRate 0.0343 Epoch: 8 Global Step: 138180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:32:26,224-Speed 5162.09 samples/sec Loss 2.6816 LearningRate 0.0343 Epoch: 8 Global Step: 138190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:32:28,198-Speed 5190.25 samples/sec Loss 2.5962 LearningRate 0.0343 Epoch: 8 Global Step: 138200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:32:30,178-Speed 5173.04 samples/sec Loss 2.6085 LearningRate 0.0343 Epoch: 8 Global Step: 138210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:32:32,149-Speed 5197.27 samples/sec Loss 2.6101 LearningRate 0.0343 Epoch: 8 Global Step: 138220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:32:34,128-Speed 5174.66 samples/sec Loss 2.6527 LearningRate 0.0343 Epoch: 8 Global Step: 138230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:32:36,112-Speed 5165.61 samples/sec Loss 2.6306 LearningRate 0.0343 Epoch: 8 Global Step: 138240 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:32:38,091-Speed 5175.41 samples/sec Loss 2.6447 LearningRate 0.0343 Epoch: 8 Global Step: 138250 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:32:40,058-Speed 5207.33 samples/sec Loss 2.5423 LearningRate 0.0343 Epoch: 8 Global Step: 138260 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:32:42,042-Speed 5162.45 samples/sec Loss 2.6518 LearningRate 0.0343 Epoch: 8 Global Step: 138270 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:32:44,009-Speed 5206.77 samples/sec Loss 2.6665 LearningRate 0.0343 Epoch: 8 Global Step: 138280 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:32:45,980-Speed 5199.17 samples/sec Loss 2.6162 LearningRate 0.0343 Epoch: 8 Global Step: 138290 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:32:47,949-Speed 5201.69 samples/sec Loss 2.6154 LearningRate 0.0343 Epoch: 8 Global Step: 138300 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:32:49,960-Speed 5094.58 samples/sec Loss 2.6683 LearningRate 0.0343 Epoch: 8 Global Step: 138310 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:32:51,939-Speed 5173.90 samples/sec Loss 2.6468 LearningRate 0.0343 Epoch: 8 Global Step: 138320 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:32:53,904-Speed 5212.84 samples/sec Loss 2.6742 LearningRate 0.0343 Epoch: 8 Global Step: 138330 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:32:55,871-Speed 5209.08 samples/sec Loss 2.5982 LearningRate 0.0343 Epoch: 8 Global Step: 138340 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:32:57,844-Speed 5191.40 samples/sec Loss 2.6851 LearningRate 0.0343 Epoch: 8 Global Step: 138350 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:32:59,815-Speed 5198.83 samples/sec Loss 2.6084 LearningRate 0.0343 Epoch: 8 Global Step: 138360 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:33:01,802-Speed 5152.98 samples/sec Loss 2.5545 LearningRate 0.0343 Epoch: 8 Global Step: 138370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:33:03,789-Speed 5156.16 samples/sec Loss 2.6420 LearningRate 0.0343 Epoch: 8 Global Step: 138380 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:33:05,755-Speed 5210.47 samples/sec Loss 2.6095 LearningRate 0.0343 Epoch: 8 Global Step: 138390 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:33:07,724-Speed 5200.21 samples/sec Loss 2.6410 LearningRate 0.0343 Epoch: 8 Global Step: 138400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:33:09,689-Speed 5213.51 samples/sec Loss 2.6238 LearningRate 0.0343 Epoch: 8 Global Step: 138410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:33:11,657-Speed 5207.28 samples/sec Loss 2.5742 LearningRate 0.0343 Epoch: 8 Global Step: 138420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:33:13,654-Speed 5127.86 samples/sec Loss 2.6247 LearningRate 0.0343 Epoch: 8 Global Step: 138430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:33:15,635-Speed 5171.53 samples/sec Loss 2.6179 LearningRate 0.0343 Epoch: 8 Global Step: 138440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:33:17,603-Speed 5205.18 samples/sec Loss 2.6217 LearningRate 0.0343 Epoch: 8 Global Step: 138450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:33:19,595-Speed 5142.20 samples/sec Loss 2.6155 LearningRate 0.0342 Epoch: 8 Global Step: 138460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:33:21,568-Speed 5191.88 samples/sec Loss 2.6008 LearningRate 0.0342 Epoch: 8 Global Step: 138470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:33:23,547-Speed 5176.07 samples/sec Loss 2.5229 LearningRate 0.0342 Epoch: 8 Global Step: 138480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:33:25,540-Speed 5140.45 samples/sec Loss 2.6198 LearningRate 0.0342 Epoch: 8 Global Step: 138490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:33:27,512-Speed 5193.41 samples/sec Loss 2.6127 LearningRate 0.0342 Epoch: 8 Global Step: 138500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:33:29,489-Speed 5180.73 samples/sec Loss 2.6814 LearningRate 0.0342 Epoch: 8 Global Step: 138510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:33:31,453-Speed 5217.27 samples/sec Loss 2.5784 LearningRate 0.0342 Epoch: 8 Global Step: 138520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:33:33,425-Speed 5195.50 samples/sec Loss 2.6531 LearningRate 0.0342 Epoch: 8 Global Step: 138530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:33:35,392-Speed 5207.90 samples/sec Loss 2.6737 LearningRate 0.0342 Epoch: 8 Global Step: 138540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:33:37,359-Speed 5205.34 samples/sec Loss 2.5749 LearningRate 0.0342 Epoch: 8 Global Step: 138550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:33:39,331-Speed 5195.02 samples/sec Loss 2.5680 LearningRate 0.0342 Epoch: 8 Global Step: 138560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:33:41,302-Speed 5198.13 samples/sec Loss 2.6304 LearningRate 0.0342 Epoch: 8 Global Step: 138570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:33:43,269-Speed 5206.51 samples/sec Loss 2.6571 LearningRate 0.0342 Epoch: 8 Global Step: 138580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:33:45,247-Speed 5179.73 samples/sec Loss 2.5810 LearningRate 0.0342 Epoch: 8 Global Step: 138590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:33:47,209-Speed 5219.95 samples/sec Loss 2.6726 LearningRate 0.0342 Epoch: 8 Global Step: 138600 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:33:49,184-Speed 5186.07 samples/sec Loss 2.6020 LearningRate 0.0342 Epoch: 8 Global Step: 138610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:33:51,156-Speed 5195.68 samples/sec Loss 2.5239 LearningRate 0.0342 Epoch: 8 Global Step: 138620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:33:53,145-Speed 5149.26 samples/sec Loss 2.5783 LearningRate 0.0342 Epoch: 8 Global Step: 138630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:33:55,110-Speed 5212.30 samples/sec Loss 2.6177 LearningRate 0.0342 Epoch: 8 Global Step: 138640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:33:57,076-Speed 5212.07 samples/sec Loss 2.6006 LearningRate 0.0342 Epoch: 8 Global Step: 138650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:33:59,042-Speed 5210.45 samples/sec Loss 2.6122 LearningRate 0.0342 Epoch: 8 Global Step: 138660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:34:01,007-Speed 5212.44 samples/sec Loss 2.5723 LearningRate 0.0342 Epoch: 8 Global Step: 138670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:34:02,985-Speed 5179.09 samples/sec Loss 2.6041 LearningRate 0.0342 Epoch: 8 Global Step: 138680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:34:04,964-Speed 5174.45 samples/sec Loss 2.6648 LearningRate 0.0342 Epoch: 8 Global Step: 138690 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:34:06,927-Speed 5217.77 samples/sec Loss 2.6855 LearningRate 0.0342 Epoch: 8 Global Step: 138700 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:34:08,893-Speed 5210.91 samples/sec Loss 2.6566 LearningRate 0.0342 Epoch: 8 Global Step: 138710 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:34:10,874-Speed 5170.83 samples/sec Loss 2.6658 LearningRate 0.0342 Epoch: 8 Global Step: 138720 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:34:12,848-Speed 5191.24 samples/sec Loss 2.6906 LearningRate 0.0342 Epoch: 8 Global Step: 138730 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:34:14,805-Speed 5234.41 samples/sec Loss 2.5592 LearningRate 0.0342 Epoch: 8 Global Step: 138740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:34:16,770-Speed 5211.57 samples/sec Loss 2.6470 LearningRate 0.0341 Epoch: 8 Global Step: 138750 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:34:18,738-Speed 5204.54 samples/sec Loss 2.5972 LearningRate 0.0341 Epoch: 8 Global Step: 138760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:34:20,717-Speed 5177.21 samples/sec Loss 2.6248 LearningRate 0.0341 Epoch: 8 Global Step: 138770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:34:22,684-Speed 5206.42 samples/sec Loss 2.5695 LearningRate 0.0341 Epoch: 8 Global Step: 138780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:34:24,654-Speed 5201.36 samples/sec Loss 2.6471 LearningRate 0.0341 Epoch: 8 Global Step: 138790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:34:26,621-Speed 5207.25 samples/sec Loss 2.6546 LearningRate 0.0341 Epoch: 8 Global Step: 138800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:34:28,594-Speed 5190.64 samples/sec Loss 2.5618 LearningRate 0.0341 Epoch: 8 Global Step: 138810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:34:30,563-Speed 5204.42 samples/sec Loss 2.6770 LearningRate 0.0341 Epoch: 8 Global Step: 138820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:34:32,541-Speed 5177.20 samples/sec Loss 2.6159 LearningRate 0.0341 Epoch: 8 Global Step: 138830 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:34:34,513-Speed 5196.09 samples/sec Loss 2.6639 LearningRate 0.0341 Epoch: 8 Global Step: 138840 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:34:36,495-Speed 5166.81 samples/sec Loss 2.6326 LearningRate 0.0341 Epoch: 8 Global Step: 138850 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:34:38,471-Speed 5184.32 samples/sec Loss 2.6187 LearningRate 0.0341 Epoch: 8 Global Step: 138860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:34:40,445-Speed 5188.40 samples/sec Loss 2.6636 LearningRate 0.0341 Epoch: 8 Global Step: 138870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:34:42,410-Speed 5212.21 samples/sec Loss 2.6384 LearningRate 0.0341 Epoch: 8 Global Step: 138880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:34:44,382-Speed 5194.79 samples/sec Loss 2.6151 LearningRate 0.0341 Epoch: 8 Global Step: 138890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:34:46,368-Speed 5160.26 samples/sec Loss 2.5764 LearningRate 0.0341 Epoch: 8 Global Step: 138900 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:34:48,365-Speed 5128.91 samples/sec Loss 2.6154 LearningRate 0.0341 Epoch: 8 Global Step: 138910 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:34:50,338-Speed 5189.62 samples/sec Loss 2.5244 LearningRate 0.0341 Epoch: 8 Global Step: 138920 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:34:52,317-Speed 5177.13 samples/sec Loss 2.5503 LearningRate 0.0341 Epoch: 8 Global Step: 138930 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:34:54,298-Speed 5173.79 samples/sec Loss 2.6657 LearningRate 0.0341 Epoch: 8 Global Step: 138940 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:34:56,266-Speed 5206.58 samples/sec Loss 2.5569 LearningRate 0.0341 Epoch: 8 Global Step: 138950 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:34:58,234-Speed 5204.24 samples/sec Loss 2.6504 LearningRate 0.0341 Epoch: 8 Global Step: 138960 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:35:00,218-Speed 5162.75 samples/sec Loss 2.6000 LearningRate 0.0341 Epoch: 8 Global Step: 138970 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:35:02,202-Speed 5164.12 samples/sec Loss 2.6334 LearningRate 0.0341 Epoch: 8 Global Step: 138980 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:35:04,184-Speed 5167.09 samples/sec Loss 2.5912 LearningRate 0.0341 Epoch: 8 Global Step: 138990 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:35:06,149-Speed 5211.72 samples/sec Loss 2.6317 LearningRate 0.0341 Epoch: 8 Global Step: 139000 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:35:08,117-Speed 5204.31 samples/sec Loss 2.6085 LearningRate 0.0341 Epoch: 8 Global Step: 139010 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:35:10,104-Speed 5157.64 samples/sec Loss 2.6161 LearningRate 0.0341 Epoch: 8 Global Step: 139020 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:35:12,074-Speed 5199.73 samples/sec Loss 2.6012 LearningRate 0.0340 Epoch: 8 Global Step: 139030 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:35:14,059-Speed 5159.58 samples/sec Loss 2.6441 LearningRate 0.0340 Epoch: 8 Global Step: 139040 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:35:16,024-Speed 5215.07 samples/sec Loss 2.7265 LearningRate 0.0340 Epoch: 8 Global Step: 139050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:35:17,993-Speed 5201.20 samples/sec Loss 2.6234 LearningRate 0.0340 Epoch: 8 Global Step: 139060 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:35:19,973-Speed 5172.50 samples/sec Loss 2.6058 LearningRate 0.0340 Epoch: 8 Global Step: 139070 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:35:21,944-Speed 5198.14 samples/sec Loss 2.6648 LearningRate 0.0340 Epoch: 8 Global Step: 139080 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:35:23,935-Speed 5144.75 samples/sec Loss 2.6388 LearningRate 0.0340 Epoch: 8 Global Step: 139090 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:35:25,904-Speed 5201.76 samples/sec Loss 2.6533 LearningRate 0.0340 Epoch: 8 Global Step: 139100 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:35:27,874-Speed 5200.17 samples/sec Loss 2.6231 LearningRate 0.0340 Epoch: 8 Global Step: 139110 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:35:29,841-Speed 5206.41 samples/sec Loss 2.6808 LearningRate 0.0340 Epoch: 8 Global Step: 139120 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:35:31,821-Speed 5173.90 samples/sec Loss 2.5889 LearningRate 0.0340 Epoch: 8 Global Step: 139130 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:35:33,792-Speed 5198.42 samples/sec Loss 2.6360 LearningRate 0.0340 Epoch: 8 Global Step: 139140 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:35:35,759-Speed 5206.08 samples/sec Loss 2.6453 LearningRate 0.0340 Epoch: 8 Global Step: 139150 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:35:37,733-Speed 5191.36 samples/sec Loss 2.5288 LearningRate 0.0340 Epoch: 8 Global Step: 139160 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:35:39,701-Speed 5204.21 samples/sec Loss 2.6323 LearningRate 0.0340 Epoch: 8 Global Step: 139170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:35:41,667-Speed 5209.99 samples/sec Loss 2.6374 LearningRate 0.0340 Epoch: 8 Global Step: 139180 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:35:43,633-Speed 5210.73 samples/sec Loss 2.6444 LearningRate 0.0340 Epoch: 8 Global Step: 139190 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:35:45,616-Speed 5163.93 samples/sec Loss 2.6105 LearningRate 0.0340 Epoch: 8 Global Step: 139200 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:35:47,605-Speed 5150.68 samples/sec Loss 2.6009 LearningRate 0.0340 Epoch: 8 Global Step: 139210 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:35:49,585-Speed 5172.01 samples/sec Loss 2.6330 LearningRate 0.0340 Epoch: 8 Global Step: 139220 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:35:51,568-Speed 5168.18 samples/sec Loss 2.6114 LearningRate 0.0340 Epoch: 8 Global Step: 139230 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:35:53,536-Speed 5203.71 samples/sec Loss 2.7081 LearningRate 0.0340 Epoch: 8 Global Step: 139240 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:35:55,503-Speed 5209.70 samples/sec Loss 2.6190 LearningRate 0.0340 Epoch: 8 Global Step: 139250 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:35:57,481-Speed 5176.20 samples/sec Loss 2.5924 LearningRate 0.0340 Epoch: 8 Global Step: 139260 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:35:59,451-Speed 5201.23 samples/sec Loss 2.6952 LearningRate 0.0340 Epoch: 8 Global Step: 139270 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:36:01,421-Speed 5199.49 samples/sec Loss 2.6503 LearningRate 0.0340 Epoch: 8 Global Step: 139280 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:36:03,398-Speed 5180.43 samples/sec Loss 2.6278 LearningRate 0.0340 Epoch: 8 Global Step: 139290 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:36:05,365-Speed 5207.07 samples/sec Loss 2.6594 LearningRate 0.0340 Epoch: 8 Global Step: 139300 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:36:07,332-Speed 5208.33 samples/sec Loss 2.6253 LearningRate 0.0340 Epoch: 8 Global Step: 139310 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:36:09,311-Speed 5176.09 samples/sec Loss 2.6213 LearningRate 0.0339 Epoch: 8 Global Step: 139320 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:36:11,308-Speed 5128.55 samples/sec Loss 2.5775 LearningRate 0.0339 Epoch: 8 Global Step: 139330 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:36:13,287-Speed 5176.57 samples/sec Loss 2.6113 LearningRate 0.0339 Epoch: 8 Global Step: 139340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:36:15,256-Speed 5204.02 samples/sec Loss 2.6829 LearningRate 0.0339 Epoch: 8 Global Step: 139350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:36:17,225-Speed 5200.89 samples/sec Loss 2.6154 LearningRate 0.0339 Epoch: 8 Global Step: 139360 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:36:19,195-Speed 5200.46 samples/sec Loss 2.5832 LearningRate 0.0339 Epoch: 8 Global Step: 139370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:36:21,179-Speed 5163.91 samples/sec Loss 2.5887 LearningRate 0.0339 Epoch: 8 Global Step: 139380 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:36:23,157-Speed 5177.16 samples/sec Loss 2.5717 LearningRate 0.0339 Epoch: 8 Global Step: 139390 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:36:25,129-Speed 5195.99 samples/sec Loss 2.6872 LearningRate 0.0339 Epoch: 8 Global Step: 139400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:36:27,108-Speed 5174.52 samples/sec Loss 2.6126 LearningRate 0.0339 Epoch: 8 Global Step: 139410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:36:29,090-Speed 5169.57 samples/sec Loss 2.5993 LearningRate 0.0339 Epoch: 8 Global Step: 139420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:36:31,069-Speed 5173.94 samples/sec Loss 2.6136 LearningRate 0.0339 Epoch: 8 Global Step: 139430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:36:33,043-Speed 5191.62 samples/sec Loss 2.5975 LearningRate 0.0339 Epoch: 8 Global Step: 139440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:36:35,014-Speed 5197.43 samples/sec Loss 2.6369 LearningRate 0.0339 Epoch: 8 Global Step: 139450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:36:36,994-Speed 5171.64 samples/sec Loss 2.6806 LearningRate 0.0339 Epoch: 8 Global Step: 139460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:36:38,963-Speed 5202.64 samples/sec Loss 2.7149 LearningRate 0.0339 Epoch: 8 Global Step: 139470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:36:40,939-Speed 5185.13 samples/sec Loss 2.7049 LearningRate 0.0339 Epoch: 8 Global Step: 139480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:36:42,914-Speed 5185.04 samples/sec Loss 2.6145 LearningRate 0.0339 Epoch: 8 Global Step: 139490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:36:44,883-Speed 5202.25 samples/sec Loss 2.6745 LearningRate 0.0339 Epoch: 8 Global Step: 139500 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:36:46,862-Speed 5175.96 samples/sec Loss 2.6108 LearningRate 0.0339 Epoch: 8 Global Step: 139510 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:36:48,841-Speed 5177.24 samples/sec Loss 2.6348 LearningRate 0.0339 Epoch: 8 Global Step: 139520 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:36:50,826-Speed 5159.41 samples/sec Loss 2.6244 LearningRate 0.0339 Epoch: 8 Global Step: 139530 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:36:52,821-Speed 5135.82 samples/sec Loss 2.6257 LearningRate 0.0339 Epoch: 8 Global Step: 139540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:36:54,807-Speed 5157.30 samples/sec Loss 2.7237 LearningRate 0.0339 Epoch: 8 Global Step: 139550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:36:56,776-Speed 5204.71 samples/sec Loss 2.6140 LearningRate 0.0339 Epoch: 8 Global Step: 139560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:36:58,754-Speed 5177.78 samples/sec Loss 2.7367 LearningRate 0.0339 Epoch: 8 Global Step: 139570 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:37:00,721-Speed 5206.64 samples/sec Loss 2.6941 LearningRate 0.0339 Epoch: 8 Global Step: 139580 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:37:02,704-Speed 5166.15 samples/sec Loss 2.6366 LearningRate 0.0339 Epoch: 8 Global Step: 139590 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:37:04,675-Speed 5196.49 samples/sec Loss 2.6873 LearningRate 0.0339 Epoch: 8 Global Step: 139600 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:37:06,645-Speed 5200.72 samples/sec Loss 2.5781 LearningRate 0.0338 Epoch: 8 Global Step: 139610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:37:08,612-Speed 5205.53 samples/sec Loss 2.6566 LearningRate 0.0338 Epoch: 8 Global Step: 139620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:37:10,592-Speed 5173.32 samples/sec Loss 2.6576 LearningRate 0.0338 Epoch: 8 Global Step: 139630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:37:12,567-Speed 5187.53 samples/sec Loss 2.6044 LearningRate 0.0338 Epoch: 8 Global Step: 139640 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:37:14,536-Speed 5202.15 samples/sec Loss 2.6894 LearningRate 0.0338 Epoch: 8 Global Step: 139650 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:37:16,520-Speed 5163.13 samples/sec Loss 2.6209 LearningRate 0.0338 Epoch: 8 Global Step: 139660 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:37:18,495-Speed 5188.32 samples/sec Loss 2.6120 LearningRate 0.0338 Epoch: 8 Global Step: 139670 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:37:20,467-Speed 5193.80 samples/sec Loss 2.6099 LearningRate 0.0338 Epoch: 8 Global Step: 139680 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:37:22,439-Speed 5193.64 samples/sec Loss 2.6542 LearningRate 0.0338 Epoch: 8 Global Step: 139690 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:37:24,409-Speed 5198.84 samples/sec Loss 2.6617 LearningRate 0.0338 Epoch: 8 Global Step: 139700 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:37:26,393-Speed 5163.46 samples/sec Loss 2.6032 LearningRate 0.0338 Epoch: 8 Global Step: 139710 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:37:28,368-Speed 5186.95 samples/sec Loss 2.6605 LearningRate 0.0338 Epoch: 8 Global Step: 139720 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:37:30,339-Speed 5197.57 samples/sec Loss 2.6313 LearningRate 0.0338 Epoch: 8 Global Step: 139730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:37:32,316-Speed 5181.08 samples/sec Loss 2.6566 LearningRate 0.0338 Epoch: 8 Global Step: 139740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:37:34,298-Speed 5167.12 samples/sec Loss 2.6413 LearningRate 0.0338 Epoch: 8 Global Step: 139750 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:37:36,269-Speed 5198.36 samples/sec Loss 2.6655 LearningRate 0.0338 Epoch: 8 Global Step: 139760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:37:38,255-Speed 5158.07 samples/sec Loss 2.6386 LearningRate 0.0338 Epoch: 8 Global Step: 139770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:37:40,225-Speed 5199.76 samples/sec Loss 2.6617 LearningRate 0.0338 Epoch: 8 Global Step: 139780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:37:42,196-Speed 5197.90 samples/sec Loss 2.6309 LearningRate 0.0338 Epoch: 8 Global Step: 139790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:37:44,173-Speed 5178.98 samples/sec Loss 2.6856 LearningRate 0.0338 Epoch: 8 Global Step: 139800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:37:46,148-Speed 5187.30 samples/sec Loss 2.7112 LearningRate 0.0338 Epoch: 8 Global Step: 139810 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:37:48,119-Speed 5197.75 samples/sec Loss 2.6288 LearningRate 0.0338 Epoch: 8 Global Step: 139820 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:37:50,147-Speed 5049.94 samples/sec Loss 2.6360 LearningRate 0.0338 Epoch: 8 Global Step: 139830 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:37:52,133-Speed 5158.08 samples/sec Loss 2.6101 LearningRate 0.0338 Epoch: 8 Global Step: 139840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:37:54,103-Speed 5199.99 samples/sec Loss 2.6498 LearningRate 0.0338 Epoch: 8 Global Step: 139850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:37:56,082-Speed 5176.55 samples/sec Loss 2.7311 LearningRate 0.0338 Epoch: 8 Global Step: 139860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:37:58,068-Speed 5156.93 samples/sec Loss 2.6523 LearningRate 0.0338 Epoch: 8 Global Step: 139870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:38:00,041-Speed 5193.40 samples/sec Loss 2.6205 LearningRate 0.0338 Epoch: 8 Global Step: 139880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:38:02,015-Speed 5188.90 samples/sec Loss 2.6515 LearningRate 0.0337 Epoch: 8 Global Step: 139890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:38:03,989-Speed 5188.19 samples/sec Loss 2.6220 LearningRate 0.0337 Epoch: 8 Global Step: 139900 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:38:05,963-Speed 5190.03 samples/sec Loss 2.5927 LearningRate 0.0337 Epoch: 8 Global Step: 139910 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:38:07,951-Speed 5152.80 samples/sec Loss 2.6111 LearningRate 0.0337 Epoch: 8 Global Step: 139920 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:38:09,925-Speed 5188.63 samples/sec Loss 2.6053 LearningRate 0.0337 Epoch: 8 Global Step: 139930 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:38:11,904-Speed 5175.59 samples/sec Loss 2.6439 LearningRate 0.0337 Epoch: 8 Global Step: 139940 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:38:13,873-Speed 5202.21 samples/sec Loss 2.6477 LearningRate 0.0337 Epoch: 8 Global Step: 139950 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:38:15,869-Speed 5131.35 samples/sec Loss 2.6320 LearningRate 0.0337 Epoch: 8 Global Step: 139960 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:38:17,855-Speed 5158.36 samples/sec Loss 2.7454 LearningRate 0.0337 Epoch: 8 Global Step: 139970 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:38:19,850-Speed 5135.59 samples/sec Loss 2.6532 LearningRate 0.0337 Epoch: 8 Global Step: 139980 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:38:21,823-Speed 5191.09 samples/sec Loss 2.6144 LearningRate 0.0337 Epoch: 8 Global Step: 139990 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:38:23,822-Speed 5125.70 samples/sec Loss 2.6498 LearningRate 0.0337 Epoch: 8 Global Step: 140000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:38:50,434-[lfw][140000]XNorm: 22.636766 Training: 2022-04-11 08:38:50,434-[lfw][140000]Accuracy-Flip: 0.99767+-0.00281 Training: 2022-04-11 08:38:50,435-[lfw][140000]Accuracy-Highest: 0.99833 Training: 2022-04-11 08:39:21,234-[cfp_fp][140000]XNorm: 21.232768 Training: 2022-04-11 08:39:21,234-[cfp_fp][140000]Accuracy-Flip: 0.98329+-0.00503 Training: 2022-04-11 08:39:21,234-[cfp_fp][140000]Accuracy-Highest: 0.98443 Training: 2022-04-11 08:39:47,847-[agedb_30][140000]XNorm: 22.378309 Training: 2022-04-11 08:39:47,848-[agedb_30][140000]Accuracy-Flip: 0.98050+-0.00796 Training: 2022-04-11 08:39:47,848-[agedb_30][140000]Accuracy-Highest: 0.98150 Training: 2022-04-11 08:39:49,840-Speed 119.05 samples/sec Loss 2.7046 LearningRate 0.0337 Epoch: 8 Global Step: 140010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:39:51,801-Speed 5222.35 samples/sec Loss 2.7128 LearningRate 0.0337 Epoch: 8 Global Step: 140020 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:39:53,770-Speed 5202.94 samples/sec Loss 2.6111 LearningRate 0.0337 Epoch: 8 Global Step: 140030 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:39:55,735-Speed 5212.76 samples/sec Loss 2.6480 LearningRate 0.0337 Epoch: 8 Global Step: 140040 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:39:57,711-Speed 5182.59 samples/sec Loss 2.6725 LearningRate 0.0337 Epoch: 8 Global Step: 140050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:39:59,702-Speed 5147.16 samples/sec Loss 2.6449 LearningRate 0.0337 Epoch: 8 Global Step: 140060 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:40:01,667-Speed 5210.29 samples/sec Loss 2.6087 LearningRate 0.0337 Epoch: 8 Global Step: 140070 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:40:03,636-Speed 5202.13 samples/sec Loss 2.6743 LearningRate 0.0337 Epoch: 8 Global Step: 140080 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:40:05,606-Speed 5200.21 samples/sec Loss 2.6747 LearningRate 0.0337 Epoch: 8 Global Step: 140090 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:40:07,579-Speed 5193.33 samples/sec Loss 2.6812 LearningRate 0.0337 Epoch: 8 Global Step: 140100 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:40:09,550-Speed 5195.74 samples/sec Loss 2.7110 LearningRate 0.0337 Epoch: 8 Global Step: 140110 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:40:11,521-Speed 5198.51 samples/sec Loss 2.5993 LearningRate 0.0337 Epoch: 8 Global Step: 140120 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:40:13,490-Speed 5202.97 samples/sec Loss 2.6198 LearningRate 0.0337 Epoch: 8 Global Step: 140130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:40:15,479-Speed 5148.61 samples/sec Loss 2.6721 LearningRate 0.0337 Epoch: 8 Global Step: 140140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:40:17,449-Speed 5200.77 samples/sec Loss 2.6726 LearningRate 0.0337 Epoch: 8 Global Step: 140150 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:40:19,419-Speed 5198.35 samples/sec Loss 2.6910 LearningRate 0.0337 Epoch: 8 Global Step: 140160 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:40:21,393-Speed 5188.59 samples/sec Loss 2.7178 LearningRate 0.0337 Epoch: 8 Global Step: 140170 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:40:23,379-Speed 5160.01 samples/sec Loss 2.6814 LearningRate 0.0336 Epoch: 8 Global Step: 140180 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:40:25,373-Speed 5135.99 samples/sec Loss 2.6343 LearningRate 0.0336 Epoch: 8 Global Step: 140190 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:40:27,370-Speed 5128.66 samples/sec Loss 2.7093 LearningRate 0.0336 Epoch: 8 Global Step: 140200 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:40:29,343-Speed 5193.00 samples/sec Loss 2.6630 LearningRate 0.0336 Epoch: 8 Global Step: 140210 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:40:31,323-Speed 5173.19 samples/sec Loss 2.7207 LearningRate 0.0336 Epoch: 8 Global Step: 140220 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:40:33,318-Speed 5133.91 samples/sec Loss 2.7018 LearningRate 0.0336 Epoch: 8 Global Step: 140230 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:40:35,301-Speed 5165.65 samples/sec Loss 2.6941 LearningRate 0.0336 Epoch: 8 Global Step: 140240 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:40:37,288-Speed 5156.32 samples/sec Loss 2.7317 LearningRate 0.0336 Epoch: 8 Global Step: 140250 Fp16 Grad Scale: 262144 Required: 13 hours Training: 2022-04-11 08:40:39,274-Speed 5156.15 samples/sec Loss 2.6838 LearningRate 0.0336 Epoch: 8 Global Step: 140260 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:40:41,281-Speed 5105.72 samples/sec Loss 2.6957 LearningRate 0.0336 Epoch: 8 Global Step: 140270 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:40:43,256-Speed 5185.19 samples/sec Loss 2.6691 LearningRate 0.0336 Epoch: 8 Global Step: 140280 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:40:45,252-Speed 5133.27 samples/sec Loss 2.6356 LearningRate 0.0336 Epoch: 8 Global Step: 140290 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:40:47,237-Speed 5158.68 samples/sec Loss 2.6953 LearningRate 0.0336 Epoch: 8 Global Step: 140300 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:40:49,231-Speed 5137.65 samples/sec Loss 2.6425 LearningRate 0.0336 Epoch: 8 Global Step: 140310 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:40:51,234-Speed 5114.12 samples/sec Loss 2.6981 LearningRate 0.0336 Epoch: 8 Global Step: 140320 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:40:53,225-Speed 5145.73 samples/sec Loss 2.7333 LearningRate 0.0336 Epoch: 8 Global Step: 140330 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:40:55,210-Speed 5159.34 samples/sec Loss 2.6056 LearningRate 0.0336 Epoch: 8 Global Step: 140340 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:40:57,177-Speed 5209.63 samples/sec Loss 2.6901 LearningRate 0.0336 Epoch: 8 Global Step: 140350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:40:59,154-Speed 5180.65 samples/sec Loss 2.7050 LearningRate 0.0336 Epoch: 8 Global Step: 140360 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:41:01,166-Speed 5091.05 samples/sec Loss 2.6367 LearningRate 0.0336 Epoch: 8 Global Step: 140370 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:41:03,140-Speed 5188.41 samples/sec Loss 2.7425 LearningRate 0.0336 Epoch: 8 Global Step: 140380 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:41:05,117-Speed 5181.94 samples/sec Loss 2.6802 LearningRate 0.0336 Epoch: 8 Global Step: 140390 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:41:07,095-Speed 5178.95 samples/sec Loss 2.6178 LearningRate 0.0336 Epoch: 8 Global Step: 140400 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:41:09,063-Speed 5204.99 samples/sec Loss 2.6827 LearningRate 0.0336 Epoch: 8 Global Step: 140410 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:41:11,034-Speed 5195.09 samples/sec Loss 2.7149 LearningRate 0.0336 Epoch: 8 Global Step: 140420 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:41:13,027-Speed 5140.95 samples/sec Loss 2.6263 LearningRate 0.0336 Epoch: 8 Global Step: 140430 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:41:14,999-Speed 5196.37 samples/sec Loss 2.7180 LearningRate 0.0336 Epoch: 8 Global Step: 140440 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:41:16,982-Speed 5165.74 samples/sec Loss 2.7506 LearningRate 0.0336 Epoch: 8 Global Step: 140450 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:41:18,952-Speed 5197.91 samples/sec Loss 2.6559 LearningRate 0.0336 Epoch: 8 Global Step: 140460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:41:20,935-Speed 5164.85 samples/sec Loss 2.6699 LearningRate 0.0335 Epoch: 8 Global Step: 140470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:41:22,907-Speed 5194.44 samples/sec Loss 2.7670 LearningRate 0.0335 Epoch: 8 Global Step: 140480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:41:24,879-Speed 5196.16 samples/sec Loss 2.6763 LearningRate 0.0335 Epoch: 8 Global Step: 140490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:41:26,848-Speed 5201.60 samples/sec Loss 2.6879 LearningRate 0.0335 Epoch: 8 Global Step: 140500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:41:28,833-Speed 5158.80 samples/sec Loss 2.6996 LearningRate 0.0335 Epoch: 8 Global Step: 140510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:41:30,807-Speed 5191.10 samples/sec Loss 2.5824 LearningRate 0.0335 Epoch: 8 Global Step: 140520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:41:32,776-Speed 5200.40 samples/sec Loss 2.7499 LearningRate 0.0335 Epoch: 8 Global Step: 140530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:41:34,749-Speed 5191.68 samples/sec Loss 2.7412 LearningRate 0.0335 Epoch: 8 Global Step: 140540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:41:36,731-Speed 5169.16 samples/sec Loss 2.6336 LearningRate 0.0335 Epoch: 8 Global Step: 140550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:41:38,702-Speed 5198.60 samples/sec Loss 2.6260 LearningRate 0.0335 Epoch: 8 Global Step: 140560 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:41:40,684-Speed 5168.19 samples/sec Loss 2.6274 LearningRate 0.0335 Epoch: 8 Global Step: 140570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:41:42,644-Speed 5226.76 samples/sec Loss 2.6659 LearningRate 0.0335 Epoch: 8 Global Step: 140580 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:41:44,599-Speed 5237.49 samples/sec Loss 2.7265 LearningRate 0.0335 Epoch: 8 Global Step: 140590 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:41:46,602-Speed 5115.50 samples/sec Loss 2.6391 LearningRate 0.0335 Epoch: 8 Global Step: 140600 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:41:48,570-Speed 5204.68 samples/sec Loss 2.6504 LearningRate 0.0335 Epoch: 8 Global Step: 140610 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:41:50,547-Speed 5179.28 samples/sec Loss 2.7480 LearningRate 0.0335 Epoch: 8 Global Step: 140620 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:41:52,531-Speed 5165.06 samples/sec Loss 2.6872 LearningRate 0.0335 Epoch: 8 Global Step: 140630 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:41:54,511-Speed 5171.19 samples/sec Loss 2.7470 LearningRate 0.0335 Epoch: 8 Global Step: 140640 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:41:56,493-Speed 5169.55 samples/sec Loss 2.6661 LearningRate 0.0335 Epoch: 8 Global Step: 140650 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:41:58,475-Speed 5169.33 samples/sec Loss 2.6168 LearningRate 0.0335 Epoch: 8 Global Step: 140660 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:42:00,445-Speed 5200.13 samples/sec Loss 2.6479 LearningRate 0.0335 Epoch: 8 Global Step: 140670 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:42:02,411-Speed 5208.96 samples/sec Loss 2.7354 LearningRate 0.0335 Epoch: 8 Global Step: 140680 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:42:04,396-Speed 5160.32 samples/sec Loss 2.6069 LearningRate 0.0335 Epoch: 8 Global Step: 140690 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:42:06,369-Speed 5192.65 samples/sec Loss 2.6656 LearningRate 0.0335 Epoch: 8 Global Step: 140700 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:42:08,338-Speed 5200.85 samples/sec Loss 2.6556 LearningRate 0.0335 Epoch: 8 Global Step: 140710 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:42:10,305-Speed 5207.09 samples/sec Loss 2.6177 LearningRate 0.0335 Epoch: 8 Global Step: 140720 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:42:12,284-Speed 5178.59 samples/sec Loss 2.6501 LearningRate 0.0335 Epoch: 8 Global Step: 140730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:42:14,256-Speed 5194.37 samples/sec Loss 2.7280 LearningRate 0.0335 Epoch: 8 Global Step: 140740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:42:16,241-Speed 5159.32 samples/sec Loss 2.6636 LearningRate 0.0335 Epoch: 8 Global Step: 140750 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:42:18,232-Speed 5146.21 samples/sec Loss 2.7479 LearningRate 0.0334 Epoch: 8 Global Step: 140760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:42:20,201-Speed 5201.88 samples/sec Loss 2.5900 LearningRate 0.0334 Epoch: 8 Global Step: 140770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:42:22,172-Speed 5197.27 samples/sec Loss 2.6784 LearningRate 0.0334 Epoch: 8 Global Step: 140780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:42:24,166-Speed 5136.13 samples/sec Loss 2.7238 LearningRate 0.0334 Epoch: 8 Global Step: 140790 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:42:26,145-Speed 5177.74 samples/sec Loss 2.6693 LearningRate 0.0334 Epoch: 8 Global Step: 140800 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:42:28,120-Speed 5185.08 samples/sec Loss 2.5829 LearningRate 0.0334 Epoch: 8 Global Step: 140810 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:42:30,088-Speed 5205.94 samples/sec Loss 2.6045 LearningRate 0.0334 Epoch: 8 Global Step: 140820 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:42:32,062-Speed 5189.16 samples/sec Loss 2.6430 LearningRate 0.0334 Epoch: 8 Global Step: 140830 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:42:34,046-Speed 5161.71 samples/sec Loss 2.7004 LearningRate 0.0334 Epoch: 8 Global Step: 140840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:42:36,023-Speed 5181.85 samples/sec Loss 2.6713 LearningRate 0.0334 Epoch: 8 Global Step: 140850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:42:38,010-Speed 5156.79 samples/sec Loss 2.6417 LearningRate 0.0334 Epoch: 8 Global Step: 140860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:42:39,996-Speed 5158.01 samples/sec Loss 2.7426 LearningRate 0.0334 Epoch: 8 Global Step: 140870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:42:41,978-Speed 5167.45 samples/sec Loss 2.7645 LearningRate 0.0334 Epoch: 8 Global Step: 140880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:42:43,951-Speed 5192.23 samples/sec Loss 2.6934 LearningRate 0.0334 Epoch: 8 Global Step: 140890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:42:45,921-Speed 5197.71 samples/sec Loss 2.6341 LearningRate 0.0334 Epoch: 8 Global Step: 140900 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:42:47,888-Speed 5207.77 samples/sec Loss 2.6098 LearningRate 0.0334 Epoch: 8 Global Step: 140910 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:42:49,862-Speed 5190.88 samples/sec Loss 2.6254 LearningRate 0.0334 Epoch: 8 Global Step: 140920 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:42:51,850-Speed 5152.36 samples/sec Loss 2.6622 LearningRate 0.0334 Epoch: 8 Global Step: 140930 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:42:53,821-Speed 5197.50 samples/sec Loss 2.6816 LearningRate 0.0334 Epoch: 8 Global Step: 140940 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:42:55,780-Speed 5226.75 samples/sec Loss 2.6396 LearningRate 0.0334 Epoch: 8 Global Step: 140950 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:42:57,760-Speed 5174.27 samples/sec Loss 2.5933 LearningRate 0.0334 Epoch: 8 Global Step: 140960 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:42:59,731-Speed 5197.56 samples/sec Loss 2.6483 LearningRate 0.0334 Epoch: 8 Global Step: 140970 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:43:01,716-Speed 5161.82 samples/sec Loss 2.7033 LearningRate 0.0334 Epoch: 8 Global Step: 140980 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:43:03,689-Speed 5191.62 samples/sec Loss 2.6559 LearningRate 0.0334 Epoch: 8 Global Step: 140990 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:43:05,659-Speed 5199.89 samples/sec Loss 2.6729 LearningRate 0.0334 Epoch: 8 Global Step: 141000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:43:07,653-Speed 5135.04 samples/sec Loss 2.6551 LearningRate 0.0334 Epoch: 8 Global Step: 141010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:43:09,622-Speed 5203.47 samples/sec Loss 2.7060 LearningRate 0.0334 Epoch: 8 Global Step: 141020 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:43:11,601-Speed 5176.44 samples/sec Loss 2.6685 LearningRate 0.0334 Epoch: 8 Global Step: 141030 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:43:13,568-Speed 5205.98 samples/sec Loss 2.6982 LearningRate 0.0334 Epoch: 8 Global Step: 141040 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:43:15,562-Speed 5136.95 samples/sec Loss 2.6731 LearningRate 0.0333 Epoch: 8 Global Step: 141050 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:43:17,530-Speed 5205.51 samples/sec Loss 2.6691 LearningRate 0.0333 Epoch: 8 Global Step: 141060 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:43:19,514-Speed 5163.60 samples/sec Loss 2.6628 LearningRate 0.0333 Epoch: 8 Global Step: 141070 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:43:21,482-Speed 5205.87 samples/sec Loss 2.5862 LearningRate 0.0333 Epoch: 8 Global Step: 141080 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:43:23,466-Speed 5162.72 samples/sec Loss 2.6354 LearningRate 0.0333 Epoch: 8 Global Step: 141090 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:43:25,450-Speed 5161.76 samples/sec Loss 2.6683 LearningRate 0.0333 Epoch: 8 Global Step: 141100 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:43:27,433-Speed 5166.49 samples/sec Loss 2.6641 LearningRate 0.0333 Epoch: 8 Global Step: 141110 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:43:29,405-Speed 5194.02 samples/sec Loss 2.6154 LearningRate 0.0333 Epoch: 8 Global Step: 141120 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:43:31,372-Speed 5207.61 samples/sec Loss 2.6810 LearningRate 0.0333 Epoch: 8 Global Step: 141130 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:43:33,349-Speed 5180.44 samples/sec Loss 2.6883 LearningRate 0.0333 Epoch: 8 Global Step: 141140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:43:35,329-Speed 5175.48 samples/sec Loss 2.7086 LearningRate 0.0333 Epoch: 8 Global Step: 141150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:43:37,319-Speed 5145.46 samples/sec Loss 2.6878 LearningRate 0.0333 Epoch: 8 Global Step: 141160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:43:39,292-Speed 5193.04 samples/sec Loss 2.7227 LearningRate 0.0333 Epoch: 8 Global Step: 141170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:43:41,263-Speed 5196.50 samples/sec Loss 2.6496 LearningRate 0.0333 Epoch: 8 Global Step: 141180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:43:43,244-Speed 5172.78 samples/sec Loss 2.7191 LearningRate 0.0333 Epoch: 8 Global Step: 141190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:43:45,214-Speed 5198.94 samples/sec Loss 2.6466 LearningRate 0.0333 Epoch: 8 Global Step: 141200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:43:47,200-Speed 5157.72 samples/sec Loss 2.6628 LearningRate 0.0333 Epoch: 8 Global Step: 141210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:43:49,204-Speed 5112.20 samples/sec Loss 2.6390 LearningRate 0.0333 Epoch: 8 Global Step: 141220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:43:51,219-Speed 5083.79 samples/sec Loss 2.6283 LearningRate 0.0333 Epoch: 8 Global Step: 141230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:43:53,208-Speed 5147.68 samples/sec Loss 2.6925 LearningRate 0.0333 Epoch: 8 Global Step: 141240 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:43:55,171-Speed 5219.86 samples/sec Loss 2.7057 LearningRate 0.0333 Epoch: 8 Global Step: 141250 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:43:57,147-Speed 5182.48 samples/sec Loss 2.6874 LearningRate 0.0333 Epoch: 8 Global Step: 141260 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:43:59,181-Speed 5036.70 samples/sec Loss 2.6617 LearningRate 0.0333 Epoch: 8 Global Step: 141270 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:44:01,243-Speed 4968.92 samples/sec Loss 2.6170 LearningRate 0.0333 Epoch: 8 Global Step: 141280 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:44:03,233-Speed 5149.02 samples/sec Loss 2.6892 LearningRate 0.0333 Epoch: 8 Global Step: 141290 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:44:05,202-Speed 5201.08 samples/sec Loss 2.6872 LearningRate 0.0333 Epoch: 8 Global Step: 141300 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:44:07,174-Speed 5194.51 samples/sec Loss 2.6755 LearningRate 0.0333 Epoch: 8 Global Step: 141310 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:44:09,147-Speed 5194.19 samples/sec Loss 2.6379 LearningRate 0.0333 Epoch: 8 Global Step: 141320 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:44:11,125-Speed 5176.96 samples/sec Loss 2.6503 LearningRate 0.0332 Epoch: 8 Global Step: 141330 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:44:13,135-Speed 5095.15 samples/sec Loss 2.7362 LearningRate 0.0332 Epoch: 8 Global Step: 141340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:44:15,124-Speed 5151.60 samples/sec Loss 2.6573 LearningRate 0.0332 Epoch: 8 Global Step: 141350 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:44:17,095-Speed 5196.61 samples/sec Loss 2.6157 LearningRate 0.0332 Epoch: 8 Global Step: 141360 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:44:19,065-Speed 5200.42 samples/sec Loss 2.6487 LearningRate 0.0332 Epoch: 8 Global Step: 141370 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:44:21,042-Speed 5181.87 samples/sec Loss 2.6937 LearningRate 0.0332 Epoch: 8 Global Step: 141380 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:44:23,046-Speed 5111.37 samples/sec Loss 2.6456 LearningRate 0.0332 Epoch: 8 Global Step: 141390 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:44:25,019-Speed 5189.96 samples/sec Loss 2.6460 LearningRate 0.0332 Epoch: 8 Global Step: 141400 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:44:26,987-Speed 5207.31 samples/sec Loss 2.6522 LearningRate 0.0332 Epoch: 8 Global Step: 141410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:44:28,959-Speed 5193.54 samples/sec Loss 2.6243 LearningRate 0.0332 Epoch: 8 Global Step: 141420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:44:30,930-Speed 5197.54 samples/sec Loss 2.6737 LearningRate 0.0332 Epoch: 8 Global Step: 141430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:44:32,927-Speed 5128.43 samples/sec Loss 2.6638 LearningRate 0.0332 Epoch: 8 Global Step: 141440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:44:34,917-Speed 5147.02 samples/sec Loss 2.6844 LearningRate 0.0332 Epoch: 8 Global Step: 141450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:44:36,916-Speed 5124.24 samples/sec Loss 2.5521 LearningRate 0.0332 Epoch: 8 Global Step: 141460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:44:38,889-Speed 5190.65 samples/sec Loss 2.6630 LearningRate 0.0332 Epoch: 8 Global Step: 141470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:44:40,867-Speed 5179.07 samples/sec Loss 2.6914 LearningRate 0.0332 Epoch: 8 Global Step: 141480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:44:42,838-Speed 5198.67 samples/sec Loss 2.6714 LearningRate 0.0332 Epoch: 8 Global Step: 141490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:44:44,808-Speed 5199.48 samples/sec Loss 2.6827 LearningRate 0.0332 Epoch: 8 Global Step: 141500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:44:46,779-Speed 5197.92 samples/sec Loss 2.6519 LearningRate 0.0332 Epoch: 8 Global Step: 141510 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:44:48,756-Speed 5182.05 samples/sec Loss 2.6941 LearningRate 0.0332 Epoch: 8 Global Step: 141520 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:44:50,753-Speed 5127.02 samples/sec Loss 2.6804 LearningRate 0.0332 Epoch: 8 Global Step: 141530 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:44:52,734-Speed 5171.64 samples/sec Loss 2.7061 LearningRate 0.0332 Epoch: 8 Global Step: 141540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:44:54,706-Speed 5194.73 samples/sec Loss 2.7502 LearningRate 0.0332 Epoch: 8 Global Step: 141550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:44:56,688-Speed 5168.16 samples/sec Loss 2.6913 LearningRate 0.0332 Epoch: 8 Global Step: 141560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:44:58,684-Speed 5131.68 samples/sec Loss 2.7293 LearningRate 0.0332 Epoch: 8 Global Step: 141570 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:45:00,659-Speed 5185.15 samples/sec Loss 2.6721 LearningRate 0.0332 Epoch: 8 Global Step: 141580 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:45:02,645-Speed 5160.59 samples/sec Loss 2.8100 LearningRate 0.0332 Epoch: 8 Global Step: 141590 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:45:04,614-Speed 5201.03 samples/sec Loss 2.7024 LearningRate 0.0332 Epoch: 8 Global Step: 141600 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:45:06,593-Speed 5176.31 samples/sec Loss 2.6242 LearningRate 0.0332 Epoch: 8 Global Step: 141610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:45:08,566-Speed 5191.77 samples/sec Loss 2.7317 LearningRate 0.0331 Epoch: 8 Global Step: 141620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:45:10,562-Speed 5133.65 samples/sec Loss 2.6268 LearningRate 0.0331 Epoch: 8 Global Step: 141630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:45:12,525-Speed 5216.35 samples/sec Loss 2.7325 LearningRate 0.0331 Epoch: 8 Global Step: 141640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:45:14,501-Speed 5184.93 samples/sec Loss 2.6100 LearningRate 0.0331 Epoch: 8 Global Step: 141650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:45:16,479-Speed 5177.20 samples/sec Loss 2.6334 LearningRate 0.0331 Epoch: 8 Global Step: 141660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:45:18,450-Speed 5197.37 samples/sec Loss 2.6767 LearningRate 0.0331 Epoch: 8 Global Step: 141670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:45:20,423-Speed 5191.24 samples/sec Loss 2.6608 LearningRate 0.0331 Epoch: 8 Global Step: 141680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:45:22,414-Speed 5147.09 samples/sec Loss 2.6909 LearningRate 0.0331 Epoch: 8 Global Step: 141690 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:45:24,398-Speed 5161.86 samples/sec Loss 2.6070 LearningRate 0.0331 Epoch: 8 Global Step: 141700 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:45:26,378-Speed 5173.12 samples/sec Loss 2.6891 LearningRate 0.0331 Epoch: 8 Global Step: 141710 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:45:28,352-Speed 5190.46 samples/sec Loss 2.6167 LearningRate 0.0331 Epoch: 8 Global Step: 141720 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:45:30,333-Speed 5169.92 samples/sec Loss 2.5589 LearningRate 0.0331 Epoch: 8 Global Step: 141730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:45:32,304-Speed 5197.70 samples/sec Loss 2.6787 LearningRate 0.0331 Epoch: 8 Global Step: 141740 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:45:34,277-Speed 5191.34 samples/sec Loss 2.7711 LearningRate 0.0331 Epoch: 8 Global Step: 141750 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:45:36,257-Speed 5173.37 samples/sec Loss 2.6385 LearningRate 0.0331 Epoch: 8 Global Step: 141760 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:45:38,232-Speed 5185.81 samples/sec Loss 2.6674 LearningRate 0.0331 Epoch: 8 Global Step: 141770 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:45:40,207-Speed 5191.01 samples/sec Loss 2.7143 LearningRate 0.0331 Epoch: 8 Global Step: 141780 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:45:42,178-Speed 5196.09 samples/sec Loss 2.6469 LearningRate 0.0331 Epoch: 8 Global Step: 141790 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:45:44,149-Speed 5196.77 samples/sec Loss 2.7074 LearningRate 0.0331 Epoch: 8 Global Step: 141800 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:45:46,126-Speed 5181.13 samples/sec Loss 2.7418 LearningRate 0.0331 Epoch: 8 Global Step: 141810 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:45:48,115-Speed 5151.73 samples/sec Loss 2.6710 LearningRate 0.0331 Epoch: 8 Global Step: 141820 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:45:50,100-Speed 5159.16 samples/sec Loss 2.6444 LearningRate 0.0331 Epoch: 8 Global Step: 141830 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:45:52,067-Speed 5209.52 samples/sec Loss 2.7091 LearningRate 0.0331 Epoch: 8 Global Step: 141840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:45:54,039-Speed 5193.72 samples/sec Loss 2.6514 LearningRate 0.0331 Epoch: 8 Global Step: 141850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:45:56,005-Speed 5208.71 samples/sec Loss 2.6819 LearningRate 0.0331 Epoch: 8 Global Step: 141860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:45:57,986-Speed 5172.31 samples/sec Loss 2.6436 LearningRate 0.0331 Epoch: 8 Global Step: 141870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:45:59,972-Speed 5157.59 samples/sec Loss 2.6562 LearningRate 0.0331 Epoch: 8 Global Step: 141880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:46:01,968-Speed 5130.41 samples/sec Loss 2.7228 LearningRate 0.0331 Epoch: 8 Global Step: 141890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:46:03,956-Speed 5155.43 samples/sec Loss 2.6958 LearningRate 0.0331 Epoch: 8 Global Step: 141900 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:46:05,928-Speed 5192.37 samples/sec Loss 2.7503 LearningRate 0.0330 Epoch: 8 Global Step: 141910 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:46:07,901-Speed 5193.59 samples/sec Loss 2.6759 LearningRate 0.0330 Epoch: 8 Global Step: 141920 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:46:09,877-Speed 5183.28 samples/sec Loss 2.7486 LearningRate 0.0330 Epoch: 8 Global Step: 141930 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:46:11,852-Speed 5187.66 samples/sec Loss 2.6719 LearningRate 0.0330 Epoch: 8 Global Step: 141940 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:46:13,840-Speed 5152.15 samples/sec Loss 2.6767 LearningRate 0.0330 Epoch: 8 Global Step: 141950 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:46:15,821-Speed 5169.46 samples/sec Loss 2.7175 LearningRate 0.0330 Epoch: 8 Global Step: 141960 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:46:17,790-Speed 5203.88 samples/sec Loss 2.6195 LearningRate 0.0330 Epoch: 8 Global Step: 141970 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:46:19,759-Speed 5200.37 samples/sec Loss 2.6806 LearningRate 0.0330 Epoch: 8 Global Step: 141980 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:46:21,730-Speed 5198.75 samples/sec Loss 2.6970 LearningRate 0.0330 Epoch: 8 Global Step: 141990 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:46:23,727-Speed 5129.97 samples/sec Loss 2.7402 LearningRate 0.0330 Epoch: 8 Global Step: 142000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:46:50,332-[lfw][142000]XNorm: 21.981454 Training: 2022-04-11 08:46:50,333-[lfw][142000]Accuracy-Flip: 0.99767+-0.00271 Training: 2022-04-11 08:46:50,334-[lfw][142000]Accuracy-Highest: 0.99833 Training: 2022-04-11 08:47:20,945-[cfp_fp][142000]XNorm: 20.775246 Training: 2022-04-11 08:47:20,946-[cfp_fp][142000]Accuracy-Flip: 0.98357+-0.00597 Training: 2022-04-11 08:47:20,946-[cfp_fp][142000]Accuracy-Highest: 0.98443 Training: 2022-04-11 08:47:47,508-[agedb_30][142000]XNorm: 21.878390 Training: 2022-04-11 08:47:47,509-[agedb_30][142000]Accuracy-Flip: 0.98083+-0.00793 Training: 2022-04-11 08:47:47,509-[agedb_30][142000]Accuracy-Highest: 0.98150 Training: 2022-04-11 08:47:49,492-Speed 119.40 samples/sec Loss 2.6777 LearningRate 0.0330 Epoch: 8 Global Step: 142010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:47:51,475-Speed 5166.81 samples/sec Loss 2.6467 LearningRate 0.0330 Epoch: 8 Global Step: 142020 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:47:53,445-Speed 5199.72 samples/sec Loss 2.7117 LearningRate 0.0330 Epoch: 8 Global Step: 142030 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:47:55,420-Speed 5184.40 samples/sec Loss 2.7346 LearningRate 0.0330 Epoch: 8 Global Step: 142040 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:47:57,395-Speed 5189.16 samples/sec Loss 2.6747 LearningRate 0.0330 Epoch: 8 Global Step: 142050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:47:59,365-Speed 5198.25 samples/sec Loss 2.6948 LearningRate 0.0330 Epoch: 8 Global Step: 142060 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:48:01,340-Speed 5186.69 samples/sec Loss 2.7758 LearningRate 0.0330 Epoch: 8 Global Step: 142070 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:48:03,299-Speed 5229.21 samples/sec Loss 2.7075 LearningRate 0.0330 Epoch: 8 Global Step: 142080 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:48:05,260-Speed 5223.76 samples/sec Loss 2.6636 LearningRate 0.0330 Epoch: 8 Global Step: 142090 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:48:07,219-Speed 5227.66 samples/sec Loss 2.7476 LearningRate 0.0330 Epoch: 8 Global Step: 142100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:48:09,187-Speed 5205.07 samples/sec Loss 2.7204 LearningRate 0.0330 Epoch: 8 Global Step: 142110 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:48:11,173-Speed 5158.51 samples/sec Loss 2.7377 LearningRate 0.0330 Epoch: 8 Global Step: 142120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:48:13,198-Speed 5060.82 samples/sec Loss 2.6568 LearningRate 0.0330 Epoch: 8 Global Step: 142130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:48:15,177-Speed 5175.31 samples/sec Loss 2.5806 LearningRate 0.0330 Epoch: 8 Global Step: 142140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:48:17,140-Speed 5219.00 samples/sec Loss 2.6867 LearningRate 0.0330 Epoch: 8 Global Step: 142150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:48:19,101-Speed 5223.06 samples/sec Loss 2.7197 LearningRate 0.0330 Epoch: 8 Global Step: 142160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:48:21,066-Speed 5213.21 samples/sec Loss 2.7166 LearningRate 0.0330 Epoch: 8 Global Step: 142170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:48:23,027-Speed 5224.01 samples/sec Loss 2.6073 LearningRate 0.0330 Epoch: 8 Global Step: 142180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:48:25,023-Speed 5130.94 samples/sec Loss 2.6691 LearningRate 0.0330 Epoch: 8 Global Step: 142190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:48:27,007-Speed 5163.43 samples/sec Loss 2.6631 LearningRate 0.0330 Epoch: 8 Global Step: 142200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:48:28,973-Speed 5210.83 samples/sec Loss 2.6493 LearningRate 0.0329 Epoch: 8 Global Step: 142210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:48:30,940-Speed 5207.41 samples/sec Loss 2.6666 LearningRate 0.0329 Epoch: 8 Global Step: 142220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:48:32,913-Speed 5189.48 samples/sec Loss 2.6796 LearningRate 0.0329 Epoch: 8 Global Step: 142230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:48:34,909-Speed 5133.41 samples/sec Loss 2.6846 LearningRate 0.0329 Epoch: 8 Global Step: 142240 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:48:36,881-Speed 5194.50 samples/sec Loss 2.6575 LearningRate 0.0329 Epoch: 8 Global Step: 142250 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:48:38,863-Speed 5168.75 samples/sec Loss 2.6947 LearningRate 0.0329 Epoch: 8 Global Step: 142260 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:48:40,855-Speed 5142.00 samples/sec Loss 2.6256 LearningRate 0.0329 Epoch: 8 Global Step: 142270 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:48:42,823-Speed 5205.66 samples/sec Loss 2.7070 LearningRate 0.0329 Epoch: 8 Global Step: 142280 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:48:44,794-Speed 5196.10 samples/sec Loss 2.7164 LearningRate 0.0329 Epoch: 8 Global Step: 142290 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:48:46,773-Speed 5177.17 samples/sec Loss 2.6648 LearningRate 0.0329 Epoch: 8 Global Step: 142300 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:48:48,753-Speed 5172.03 samples/sec Loss 2.6906 LearningRate 0.0329 Epoch: 8 Global Step: 142310 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:48:50,729-Speed 5185.60 samples/sec Loss 2.7013 LearningRate 0.0329 Epoch: 8 Global Step: 142320 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:48:52,704-Speed 5184.96 samples/sec Loss 2.6620 LearningRate 0.0329 Epoch: 8 Global Step: 142330 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:48:54,676-Speed 5194.94 samples/sec Loss 2.7266 LearningRate 0.0329 Epoch: 8 Global Step: 142340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:48:56,657-Speed 5171.84 samples/sec Loss 2.7082 LearningRate 0.0329 Epoch: 8 Global Step: 142350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:48:58,629-Speed 5193.17 samples/sec Loss 2.7159 LearningRate 0.0329 Epoch: 8 Global Step: 142360 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:49:00,610-Speed 5170.56 samples/sec Loss 2.6198 LearningRate 0.0329 Epoch: 8 Global Step: 142370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:49:02,584-Speed 5188.85 samples/sec Loss 2.6618 LearningRate 0.0329 Epoch: 8 Global Step: 142380 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:49:04,558-Speed 5190.98 samples/sec Loss 2.6510 LearningRate 0.0329 Epoch: 8 Global Step: 142390 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:49:06,549-Speed 5144.11 samples/sec Loss 2.6635 LearningRate 0.0329 Epoch: 8 Global Step: 142400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:49:08,529-Speed 5172.49 samples/sec Loss 2.6441 LearningRate 0.0329 Epoch: 8 Global Step: 142410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:49:10,511-Speed 5169.65 samples/sec Loss 2.7295 LearningRate 0.0329 Epoch: 8 Global Step: 142420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:49:12,492-Speed 5168.92 samples/sec Loss 2.6485 LearningRate 0.0329 Epoch: 8 Global Step: 142430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:49:14,463-Speed 5197.91 samples/sec Loss 2.7431 LearningRate 0.0329 Epoch: 8 Global Step: 142440 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:49:16,447-Speed 5162.59 samples/sec Loss 2.6600 LearningRate 0.0329 Epoch: 8 Global Step: 142450 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:49:18,414-Speed 5210.14 samples/sec Loss 2.7332 LearningRate 0.0329 Epoch: 8 Global Step: 142460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:49:20,382-Speed 5203.16 samples/sec Loss 2.6798 LearningRate 0.0329 Epoch: 8 Global Step: 142470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:49:22,353-Speed 5198.49 samples/sec Loss 2.6833 LearningRate 0.0329 Epoch: 8 Global Step: 142480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:49:24,320-Speed 5207.72 samples/sec Loss 2.7301 LearningRate 0.0329 Epoch: 8 Global Step: 142490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:49:26,300-Speed 5172.76 samples/sec Loss 2.6760 LearningRate 0.0328 Epoch: 8 Global Step: 142500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:49:28,267-Speed 5207.31 samples/sec Loss 2.6679 LearningRate 0.0328 Epoch: 8 Global Step: 142510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:49:30,234-Speed 5208.13 samples/sec Loss 2.6726 LearningRate 0.0328 Epoch: 8 Global Step: 142520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:49:32,217-Speed 5166.22 samples/sec Loss 2.6316 LearningRate 0.0328 Epoch: 8 Global Step: 142530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:49:34,195-Speed 5178.50 samples/sec Loss 2.7095 LearningRate 0.0328 Epoch: 8 Global Step: 142540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:49:36,164-Speed 5201.08 samples/sec Loss 2.6961 LearningRate 0.0328 Epoch: 8 Global Step: 142550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:49:38,141-Speed 5182.43 samples/sec Loss 2.6559 LearningRate 0.0328 Epoch: 8 Global Step: 142560 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:49:40,111-Speed 5200.72 samples/sec Loss 2.6916 LearningRate 0.0328 Epoch: 8 Global Step: 142570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:49:42,084-Speed 5190.30 samples/sec Loss 2.7006 LearningRate 0.0328 Epoch: 8 Global Step: 142580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:49:44,047-Speed 5217.64 samples/sec Loss 2.6885 LearningRate 0.0328 Epoch: 8 Global Step: 142590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:49:46,026-Speed 5175.86 samples/sec Loss 2.6540 LearningRate 0.0328 Epoch: 8 Global Step: 142600 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:49:48,005-Speed 5176.76 samples/sec Loss 2.6779 LearningRate 0.0328 Epoch: 8 Global Step: 142610 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:49:49,978-Speed 5192.76 samples/sec Loss 2.6297 LearningRate 0.0328 Epoch: 8 Global Step: 142620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:49:51,969-Speed 5144.93 samples/sec Loss 2.6855 LearningRate 0.0328 Epoch: 8 Global Step: 142630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:49:53,933-Speed 5214.16 samples/sec Loss 2.6801 LearningRate 0.0328 Epoch: 8 Global Step: 142640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:49:55,913-Speed 5173.43 samples/sec Loss 2.6492 LearningRate 0.0328 Epoch: 8 Global Step: 142650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:49:57,889-Speed 5184.34 samples/sec Loss 2.6893 LearningRate 0.0328 Epoch: 8 Global Step: 142660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:49:59,856-Speed 5208.49 samples/sec Loss 2.7385 LearningRate 0.0328 Epoch: 8 Global Step: 142670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:50:01,836-Speed 5172.42 samples/sec Loss 2.7137 LearningRate 0.0328 Epoch: 8 Global Step: 142680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:50:03,806-Speed 5200.94 samples/sec Loss 2.6914 LearningRate 0.0328 Epoch: 8 Global Step: 142690 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:50:05,773-Speed 5208.68 samples/sec Loss 2.7150 LearningRate 0.0328 Epoch: 8 Global Step: 142700 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:50:07,742-Speed 5201.80 samples/sec Loss 2.7022 LearningRate 0.0328 Epoch: 8 Global Step: 142710 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:50:09,710-Speed 5205.03 samples/sec Loss 2.7108 LearningRate 0.0328 Epoch: 8 Global Step: 142720 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:50:11,705-Speed 5133.55 samples/sec Loss 2.6176 LearningRate 0.0328 Epoch: 8 Global Step: 142730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:50:13,687-Speed 5167.26 samples/sec Loss 2.6352 LearningRate 0.0328 Epoch: 8 Global Step: 142740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:50:15,658-Speed 5196.15 samples/sec Loss 2.7191 LearningRate 0.0328 Epoch: 8 Global Step: 142750 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:50:17,636-Speed 5179.40 samples/sec Loss 2.6606 LearningRate 0.0328 Epoch: 8 Global Step: 142760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:50:19,599-Speed 5219.56 samples/sec Loss 2.6320 LearningRate 0.0328 Epoch: 8 Global Step: 142770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:50:21,566-Speed 5208.93 samples/sec Loss 2.7605 LearningRate 0.0328 Epoch: 8 Global Step: 142780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:50:23,542-Speed 5183.16 samples/sec Loss 2.6202 LearningRate 0.0327 Epoch: 8 Global Step: 142790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:50:25,517-Speed 5186.21 samples/sec Loss 2.6793 LearningRate 0.0327 Epoch: 8 Global Step: 142800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:50:27,508-Speed 5143.83 samples/sec Loss 2.6514 LearningRate 0.0327 Epoch: 8 Global Step: 142810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:50:29,482-Speed 5190.40 samples/sec Loss 2.7297 LearningRate 0.0327 Epoch: 8 Global Step: 142820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:50:31,468-Speed 5156.74 samples/sec Loss 2.6717 LearningRate 0.0327 Epoch: 8 Global Step: 142830 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:50:33,436-Speed 5205.12 samples/sec Loss 2.6880 LearningRate 0.0327 Epoch: 8 Global Step: 142840 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:50:35,406-Speed 5201.52 samples/sec Loss 2.7290 LearningRate 0.0327 Epoch: 8 Global Step: 142850 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:50:37,394-Speed 5152.43 samples/sec Loss 2.6701 LearningRate 0.0327 Epoch: 8 Global Step: 142860 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:50:39,380-Speed 5158.48 samples/sec Loss 2.6885 LearningRate 0.0327 Epoch: 8 Global Step: 142870 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:50:41,346-Speed 5210.06 samples/sec Loss 2.6359 LearningRate 0.0327 Epoch: 8 Global Step: 142880 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:50:43,329-Speed 5165.73 samples/sec Loss 2.7172 LearningRate 0.0327 Epoch: 8 Global Step: 142890 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:50:45,300-Speed 5196.08 samples/sec Loss 2.6618 LearningRate 0.0327 Epoch: 8 Global Step: 142900 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:50:47,267-Speed 5207.27 samples/sec Loss 2.7043 LearningRate 0.0327 Epoch: 8 Global Step: 142910 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:50:49,240-Speed 5191.85 samples/sec Loss 2.6476 LearningRate 0.0327 Epoch: 8 Global Step: 142920 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:50:51,210-Speed 5200.01 samples/sec Loss 2.6833 LearningRate 0.0327 Epoch: 8 Global Step: 142930 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:50:53,179-Speed 5201.66 samples/sec Loss 2.7356 LearningRate 0.0327 Epoch: 8 Global Step: 142940 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:50:55,146-Speed 5206.51 samples/sec Loss 2.6905 LearningRate 0.0327 Epoch: 8 Global Step: 142950 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:50:57,120-Speed 5190.53 samples/sec Loss 2.7289 LearningRate 0.0327 Epoch: 8 Global Step: 142960 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:50:59,092-Speed 5195.13 samples/sec Loss 2.6253 LearningRate 0.0327 Epoch: 8 Global Step: 142970 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:51:01,097-Speed 5108.65 samples/sec Loss 2.6479 LearningRate 0.0327 Epoch: 8 Global Step: 142980 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:51:03,090-Speed 5140.80 samples/sec Loss 2.6600 LearningRate 0.0327 Epoch: 8 Global Step: 142990 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:51:05,060-Speed 5199.84 samples/sec Loss 2.6580 LearningRate 0.0327 Epoch: 8 Global Step: 143000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:51:07,043-Speed 5163.50 samples/sec Loss 2.6694 LearningRate 0.0327 Epoch: 8 Global Step: 143010 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:51:09,012-Speed 5203.99 samples/sec Loss 2.6827 LearningRate 0.0327 Epoch: 8 Global Step: 143020 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:51:10,997-Speed 5159.88 samples/sec Loss 2.7758 LearningRate 0.0327 Epoch: 8 Global Step: 143030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:51:13,017-Speed 5069.85 samples/sec Loss 2.6257 LearningRate 0.0327 Epoch: 8 Global Step: 143040 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:51:15,016-Speed 5124.83 samples/sec Loss 2.6606 LearningRate 0.0327 Epoch: 8 Global Step: 143050 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:51:16,990-Speed 5188.41 samples/sec Loss 2.6827 LearningRate 0.0327 Epoch: 8 Global Step: 143060 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:51:18,960-Speed 5201.02 samples/sec Loss 2.6281 LearningRate 0.0327 Epoch: 8 Global Step: 143070 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:51:20,926-Speed 5210.84 samples/sec Loss 2.6508 LearningRate 0.0326 Epoch: 8 Global Step: 143080 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:51:22,894-Speed 5203.92 samples/sec Loss 2.7085 LearningRate 0.0326 Epoch: 8 Global Step: 143090 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:51:24,861-Speed 5207.52 samples/sec Loss 2.7358 LearningRate 0.0326 Epoch: 8 Global Step: 143100 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:51:26,826-Speed 5214.29 samples/sec Loss 2.5887 LearningRate 0.0326 Epoch: 8 Global Step: 143110 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:51:28,793-Speed 5208.06 samples/sec Loss 2.7007 LearningRate 0.0326 Epoch: 8 Global Step: 143120 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:51:30,777-Speed 5161.62 samples/sec Loss 2.7748 LearningRate 0.0326 Epoch: 8 Global Step: 143130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:51:32,755-Speed 5179.66 samples/sec Loss 2.6980 LearningRate 0.0326 Epoch: 8 Global Step: 143140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:51:34,737-Speed 5167.27 samples/sec Loss 2.6450 LearningRate 0.0326 Epoch: 8 Global Step: 143150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:51:36,703-Speed 5210.92 samples/sec Loss 2.6696 LearningRate 0.0326 Epoch: 8 Global Step: 143160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:51:38,671-Speed 5203.63 samples/sec Loss 2.7007 LearningRate 0.0326 Epoch: 8 Global Step: 143170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:51:40,640-Speed 5203.42 samples/sec Loss 2.6134 LearningRate 0.0326 Epoch: 8 Global Step: 143180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:51:42,628-Speed 5153.15 samples/sec Loss 2.6158 LearningRate 0.0326 Epoch: 8 Global Step: 143190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:51:44,593-Speed 5212.74 samples/sec Loss 2.7320 LearningRate 0.0326 Epoch: 8 Global Step: 143200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:51:46,570-Speed 5180.00 samples/sec Loss 2.7339 LearningRate 0.0326 Epoch: 8 Global Step: 143210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:51:48,549-Speed 5177.37 samples/sec Loss 2.7615 LearningRate 0.0326 Epoch: 8 Global Step: 143220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:51:50,520-Speed 5196.18 samples/sec Loss 2.7038 LearningRate 0.0326 Epoch: 8 Global Step: 143230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:51:52,488-Speed 5206.64 samples/sec Loss 2.6706 LearningRate 0.0326 Epoch: 8 Global Step: 143240 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:51:54,474-Speed 5156.30 samples/sec Loss 2.7605 LearningRate 0.0326 Epoch: 8 Global Step: 143250 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:51:56,465-Speed 5145.52 samples/sec Loss 2.6836 LearningRate 0.0326 Epoch: 8 Global Step: 143260 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:51:58,470-Speed 5109.50 samples/sec Loss 2.6340 LearningRate 0.0326 Epoch: 8 Global Step: 143270 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:52:00,463-Speed 5140.47 samples/sec Loss 2.7198 LearningRate 0.0326 Epoch: 8 Global Step: 143280 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:52:02,463-Speed 5119.90 samples/sec Loss 2.6126 LearningRate 0.0326 Epoch: 8 Global Step: 143290 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:52:04,431-Speed 5205.97 samples/sec Loss 2.7249 LearningRate 0.0326 Epoch: 8 Global Step: 143300 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:52:06,405-Speed 5189.78 samples/sec Loss 2.7010 LearningRate 0.0326 Epoch: 8 Global Step: 143310 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:52:08,389-Speed 5161.98 samples/sec Loss 2.6416 LearningRate 0.0326 Epoch: 8 Global Step: 143320 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:52:10,365-Speed 5183.76 samples/sec Loss 2.7199 LearningRate 0.0326 Epoch: 8 Global Step: 143330 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:52:12,360-Speed 5135.78 samples/sec Loss 2.7520 LearningRate 0.0326 Epoch: 8 Global Step: 143340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:52:14,333-Speed 5191.11 samples/sec Loss 2.6422 LearningRate 0.0326 Epoch: 8 Global Step: 143350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:52:16,301-Speed 5204.89 samples/sec Loss 2.6614 LearningRate 0.0326 Epoch: 8 Global Step: 143360 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:52:18,276-Speed 5186.12 samples/sec Loss 2.6822 LearningRate 0.0325 Epoch: 8 Global Step: 143370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:52:20,257-Speed 5170.72 samples/sec Loss 2.6914 LearningRate 0.0325 Epoch: 8 Global Step: 143380 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:52:22,250-Speed 5141.34 samples/sec Loss 2.6888 LearningRate 0.0325 Epoch: 8 Global Step: 143390 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:52:24,208-Speed 5231.22 samples/sec Loss 2.6857 LearningRate 0.0325 Epoch: 8 Global Step: 143400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:52:26,188-Speed 5173.43 samples/sec Loss 2.6162 LearningRate 0.0325 Epoch: 8 Global Step: 143410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:52:28,165-Speed 5181.32 samples/sec Loss 2.6552 LearningRate 0.0325 Epoch: 8 Global Step: 143420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:52:30,132-Speed 5206.40 samples/sec Loss 2.6442 LearningRate 0.0325 Epoch: 8 Global Step: 143430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:52:32,100-Speed 5206.50 samples/sec Loss 2.7477 LearningRate 0.0325 Epoch: 8 Global Step: 143440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:52:34,070-Speed 5197.35 samples/sec Loss 2.7147 LearningRate 0.0325 Epoch: 8 Global Step: 143450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:52:36,037-Speed 5207.97 samples/sec Loss 2.7484 LearningRate 0.0325 Epoch: 8 Global Step: 143460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:52:38,004-Speed 5210.27 samples/sec Loss 2.6680 LearningRate 0.0325 Epoch: 8 Global Step: 143470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:52:39,975-Speed 5197.51 samples/sec Loss 2.6683 LearningRate 0.0325 Epoch: 8 Global Step: 143480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:52:41,951-Speed 5181.36 samples/sec Loss 2.6632 LearningRate 0.0325 Epoch: 8 Global Step: 143490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:52:43,916-Speed 5215.36 samples/sec Loss 2.6732 LearningRate 0.0325 Epoch: 8 Global Step: 143500 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:52:45,891-Speed 5186.20 samples/sec Loss 2.7176 LearningRate 0.0325 Epoch: 8 Global Step: 143510 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:52:47,866-Speed 5184.89 samples/sec Loss 2.6523 LearningRate 0.0325 Epoch: 8 Global Step: 143520 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:52:49,837-Speed 5196.76 samples/sec Loss 2.6694 LearningRate 0.0325 Epoch: 8 Global Step: 143530 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:52:51,813-Speed 5184.48 samples/sec Loss 2.7790 LearningRate 0.0325 Epoch: 8 Global Step: 143540 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:52:53,784-Speed 5196.69 samples/sec Loss 2.7058 LearningRate 0.0325 Epoch: 8 Global Step: 143550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:52:55,757-Speed 5191.75 samples/sec Loss 2.6454 LearningRate 0.0325 Epoch: 8 Global Step: 143560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:52:57,740-Speed 5167.26 samples/sec Loss 2.6668 LearningRate 0.0325 Epoch: 8 Global Step: 143570 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:52:59,728-Speed 5152.50 samples/sec Loss 2.7284 LearningRate 0.0325 Epoch: 8 Global Step: 143580 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:53:01,703-Speed 5186.04 samples/sec Loss 2.6904 LearningRate 0.0325 Epoch: 8 Global Step: 143590 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:53:03,671-Speed 5203.66 samples/sec Loss 2.6753 LearningRate 0.0325 Epoch: 8 Global Step: 143600 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:53:05,639-Speed 5205.63 samples/sec Loss 2.6851 LearningRate 0.0325 Epoch: 8 Global Step: 143610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:53:07,605-Speed 5211.43 samples/sec Loss 2.7340 LearningRate 0.0325 Epoch: 8 Global Step: 143620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:53:09,603-Speed 5126.95 samples/sec Loss 2.6780 LearningRate 0.0325 Epoch: 8 Global Step: 143630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:53:11,570-Speed 5206.88 samples/sec Loss 2.7193 LearningRate 0.0325 Epoch: 8 Global Step: 143640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:53:13,545-Speed 5185.53 samples/sec Loss 2.6884 LearningRate 0.0325 Epoch: 8 Global Step: 143650 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:53:15,513-Speed 5204.57 samples/sec Loss 2.7089 LearningRate 0.0324 Epoch: 8 Global Step: 143660 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:53:17,479-Speed 5212.33 samples/sec Loss 2.7063 LearningRate 0.0324 Epoch: 8 Global Step: 143670 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:53:19,460-Speed 5172.19 samples/sec Loss 2.7431 LearningRate 0.0324 Epoch: 8 Global Step: 143680 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:53:21,463-Speed 5112.09 samples/sec Loss 2.6848 LearningRate 0.0324 Epoch: 8 Global Step: 143690 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:53:23,446-Speed 5165.80 samples/sec Loss 2.7418 LearningRate 0.0324 Epoch: 8 Global Step: 143700 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:53:25,422-Speed 5185.06 samples/sec Loss 2.6878 LearningRate 0.0324 Epoch: 8 Global Step: 143710 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:53:27,392-Speed 5200.46 samples/sec Loss 2.7745 LearningRate 0.0324 Epoch: 8 Global Step: 143720 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:53:29,369-Speed 5180.93 samples/sec Loss 2.6790 LearningRate 0.0324 Epoch: 8 Global Step: 143730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:53:31,333-Speed 5213.54 samples/sec Loss 2.7188 LearningRate 0.0324 Epoch: 8 Global Step: 143740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:53:33,299-Speed 5210.37 samples/sec Loss 2.7475 LearningRate 0.0324 Epoch: 8 Global Step: 143750 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:53:35,272-Speed 5191.82 samples/sec Loss 2.7396 LearningRate 0.0324 Epoch: 8 Global Step: 143760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:53:37,250-Speed 5179.56 samples/sec Loss 2.7046 LearningRate 0.0324 Epoch: 8 Global Step: 143770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:53:39,223-Speed 5191.67 samples/sec Loss 2.6204 LearningRate 0.0324 Epoch: 8 Global Step: 143780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:53:41,190-Speed 5207.77 samples/sec Loss 2.7302 LearningRate 0.0324 Epoch: 8 Global Step: 143790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:53:43,159-Speed 5203.60 samples/sec Loss 2.6362 LearningRate 0.0324 Epoch: 8 Global Step: 143800 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:53:45,120-Speed 5223.33 samples/sec Loss 2.6703 LearningRate 0.0324 Epoch: 8 Global Step: 143810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:53:47,103-Speed 5164.94 samples/sec Loss 2.6041 LearningRate 0.0324 Epoch: 8 Global Step: 143820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:53:49,077-Speed 5189.62 samples/sec Loss 2.6968 LearningRate 0.0324 Epoch: 8 Global Step: 143830 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:53:51,055-Speed 5177.45 samples/sec Loss 2.6537 LearningRate 0.0324 Epoch: 8 Global Step: 143840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:53:53,033-Speed 5178.28 samples/sec Loss 2.6635 LearningRate 0.0324 Epoch: 8 Global Step: 143850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:53:55,000-Speed 5207.71 samples/sec Loss 2.6823 LearningRate 0.0324 Epoch: 8 Global Step: 143860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:53:56,973-Speed 5193.77 samples/sec Loss 2.6970 LearningRate 0.0324 Epoch: 8 Global Step: 143870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:53:58,951-Speed 5177.48 samples/sec Loss 2.6513 LearningRate 0.0324 Epoch: 8 Global Step: 143880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:54:00,935-Speed 5162.88 samples/sec Loss 2.6357 LearningRate 0.0324 Epoch: 8 Global Step: 143890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:54:02,912-Speed 5181.27 samples/sec Loss 2.6644 LearningRate 0.0324 Epoch: 8 Global Step: 143900 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:54:04,887-Speed 5188.39 samples/sec Loss 2.5997 LearningRate 0.0324 Epoch: 8 Global Step: 143910 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:54:06,852-Speed 5213.48 samples/sec Loss 2.6505 LearningRate 0.0324 Epoch: 8 Global Step: 143920 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:54:08,844-Speed 5139.93 samples/sec Loss 2.6664 LearningRate 0.0324 Epoch: 8 Global Step: 143930 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:54:10,821-Speed 5181.46 samples/sec Loss 2.6321 LearningRate 0.0324 Epoch: 8 Global Step: 143940 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:54:12,820-Speed 5125.74 samples/sec Loss 2.6611 LearningRate 0.0324 Epoch: 8 Global Step: 143950 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:54:14,800-Speed 5171.73 samples/sec Loss 2.6872 LearningRate 0.0323 Epoch: 8 Global Step: 143960 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:54:16,778-Speed 5178.70 samples/sec Loss 2.6691 LearningRate 0.0323 Epoch: 8 Global Step: 143970 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:54:18,768-Speed 5148.40 samples/sec Loss 2.6784 LearningRate 0.0323 Epoch: 8 Global Step: 143980 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:54:20,755-Speed 5153.95 samples/sec Loss 2.7050 LearningRate 0.0323 Epoch: 8 Global Step: 143990 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:54:22,724-Speed 5203.57 samples/sec Loss 2.6936 LearningRate 0.0323 Epoch: 8 Global Step: 144000 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:54:49,457-[lfw][144000]XNorm: 22.824690 Training: 2022-04-11 08:54:49,457-[lfw][144000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 08:54:49,458-[lfw][144000]Accuracy-Highest: 0.99833 Training: 2022-04-11 08:55:20,150-[cfp_fp][144000]XNorm: 20.965273 Training: 2022-04-11 08:55:20,151-[cfp_fp][144000]Accuracy-Flip: 0.98329+-0.00491 Training: 2022-04-11 08:55:20,151-[cfp_fp][144000]Accuracy-Highest: 0.98443 Training: 2022-04-11 08:55:46,803-[agedb_30][144000]XNorm: 22.786177 Training: 2022-04-11 08:55:46,804-[agedb_30][144000]Accuracy-Flip: 0.97867+-0.00774 Training: 2022-04-11 08:55:46,804-[agedb_30][144000]Accuracy-Highest: 0.98150 Training: 2022-04-11 08:55:48,797-Speed 118.97 samples/sec Loss 2.7053 LearningRate 0.0323 Epoch: 8 Global Step: 144010 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:55:50,760-Speed 5217.66 samples/sec Loss 2.7077 LearningRate 0.0323 Epoch: 8 Global Step: 144020 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:55:52,726-Speed 5208.99 samples/sec Loss 2.6819 LearningRate 0.0323 Epoch: 8 Global Step: 144030 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:55:54,691-Speed 5213.01 samples/sec Loss 2.6821 LearningRate 0.0323 Epoch: 8 Global Step: 144040 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:55:56,659-Speed 5206.17 samples/sec Loss 2.6099 LearningRate 0.0323 Epoch: 8 Global Step: 144050 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:55:58,626-Speed 5206.46 samples/sec Loss 2.6495 LearningRate 0.0323 Epoch: 8 Global Step: 144060 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:56:00,617-Speed 5145.25 samples/sec Loss 2.6895 LearningRate 0.0323 Epoch: 8 Global Step: 144070 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:56:02,604-Speed 5154.58 samples/sec Loss 2.7287 LearningRate 0.0323 Epoch: 8 Global Step: 144080 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:56:04,580-Speed 5185.68 samples/sec Loss 2.6337 LearningRate 0.0323 Epoch: 8 Global Step: 144090 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:56:06,556-Speed 5184.48 samples/sec Loss 2.6032 LearningRate 0.0323 Epoch: 8 Global Step: 144100 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:56:08,541-Speed 5158.97 samples/sec Loss 2.7658 LearningRate 0.0323 Epoch: 8 Global Step: 144110 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:56:10,536-Speed 5134.44 samples/sec Loss 2.6451 LearningRate 0.0323 Epoch: 8 Global Step: 144120 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:56:12,510-Speed 5190.74 samples/sec Loss 2.6441 LearningRate 0.0323 Epoch: 8 Global Step: 144130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:56:14,483-Speed 5190.33 samples/sec Loss 2.7522 LearningRate 0.0323 Epoch: 8 Global Step: 144140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:56:16,467-Speed 5164.27 samples/sec Loss 2.7066 LearningRate 0.0323 Epoch: 8 Global Step: 144150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:56:18,453-Speed 5156.15 samples/sec Loss 2.7423 LearningRate 0.0323 Epoch: 8 Global Step: 144160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:56:20,433-Speed 5173.79 samples/sec Loss 2.6664 LearningRate 0.0323 Epoch: 8 Global Step: 144170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:56:22,419-Speed 5158.75 samples/sec Loss 2.7437 LearningRate 0.0323 Epoch: 8 Global Step: 144180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:56:24,401-Speed 5166.70 samples/sec Loss 2.7826 LearningRate 0.0323 Epoch: 8 Global Step: 144190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:56:26,392-Speed 5144.62 samples/sec Loss 2.6772 LearningRate 0.0323 Epoch: 8 Global Step: 144200 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:56:28,380-Speed 5153.09 samples/sec Loss 2.6810 LearningRate 0.0323 Epoch: 8 Global Step: 144210 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:56:30,355-Speed 5187.94 samples/sec Loss 2.6333 LearningRate 0.0323 Epoch: 8 Global Step: 144220 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:56:32,359-Speed 5112.95 samples/sec Loss 2.6196 LearningRate 0.0323 Epoch: 8 Global Step: 144230 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:56:34,337-Speed 5176.93 samples/sec Loss 2.6585 LearningRate 0.0323 Epoch: 8 Global Step: 144240 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:56:36,327-Speed 5149.12 samples/sec Loss 2.6754 LearningRate 0.0322 Epoch: 8 Global Step: 144250 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:56:38,302-Speed 5184.56 samples/sec Loss 2.6929 LearningRate 0.0322 Epoch: 8 Global Step: 144260 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:56:40,285-Speed 5166.30 samples/sec Loss 2.7092 LearningRate 0.0322 Epoch: 8 Global Step: 144270 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:56:42,258-Speed 5191.75 samples/sec Loss 2.6389 LearningRate 0.0322 Epoch: 8 Global Step: 144280 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:56:44,235-Speed 5181.82 samples/sec Loss 2.6733 LearningRate 0.0322 Epoch: 8 Global Step: 144290 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:56:46,233-Speed 5125.85 samples/sec Loss 2.6415 LearningRate 0.0322 Epoch: 8 Global Step: 144300 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:56:48,222-Speed 5149.14 samples/sec Loss 2.6472 LearningRate 0.0322 Epoch: 8 Global Step: 144310 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:56:50,223-Speed 5119.36 samples/sec Loss 2.7114 LearningRate 0.0322 Epoch: 8 Global Step: 144320 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:56:52,220-Speed 5131.88 samples/sec Loss 2.6925 LearningRate 0.0322 Epoch: 8 Global Step: 144330 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:56:54,191-Speed 5196.65 samples/sec Loss 2.7397 LearningRate 0.0322 Epoch: 8 Global Step: 144340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:56:56,173-Speed 5167.13 samples/sec Loss 2.7065 LearningRate 0.0322 Epoch: 8 Global Step: 144350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:56:58,153-Speed 5175.21 samples/sec Loss 2.7030 LearningRate 0.0322 Epoch: 8 Global Step: 144360 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:57:00,142-Speed 5149.35 samples/sec Loss 2.6432 LearningRate 0.0322 Epoch: 8 Global Step: 144370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:57:02,127-Speed 5160.58 samples/sec Loss 2.7043 LearningRate 0.0322 Epoch: 8 Global Step: 144380 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:57:04,099-Speed 5192.97 samples/sec Loss 2.6934 LearningRate 0.0322 Epoch: 8 Global Step: 144390 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:57:06,072-Speed 5192.80 samples/sec Loss 2.6952 LearningRate 0.0322 Epoch: 8 Global Step: 144400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:57:08,053-Speed 5169.63 samples/sec Loss 2.7883 LearningRate 0.0322 Epoch: 8 Global Step: 144410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:57:10,063-Speed 5097.74 samples/sec Loss 2.6905 LearningRate 0.0322 Epoch: 8 Global Step: 144420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:57:12,039-Speed 5182.09 samples/sec Loss 2.6442 LearningRate 0.0322 Epoch: 8 Global Step: 144430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:57:14,027-Speed 5153.16 samples/sec Loss 2.6913 LearningRate 0.0322 Epoch: 8 Global Step: 144440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:57:16,002-Speed 5188.37 samples/sec Loss 2.6365 LearningRate 0.0322 Epoch: 8 Global Step: 144450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:57:17,987-Speed 5158.71 samples/sec Loss 2.6827 LearningRate 0.0322 Epoch: 8 Global Step: 144460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:57:19,961-Speed 5190.96 samples/sec Loss 2.7578 LearningRate 0.0322 Epoch: 8 Global Step: 144470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:57:21,926-Speed 5211.16 samples/sec Loss 2.7365 LearningRate 0.0322 Epoch: 8 Global Step: 144480 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:57:23,898-Speed 5194.89 samples/sec Loss 2.6943 LearningRate 0.0322 Epoch: 8 Global Step: 144490 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:57:25,868-Speed 5200.84 samples/sec Loss 2.6965 LearningRate 0.0322 Epoch: 8 Global Step: 144500 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:57:27,844-Speed 5182.45 samples/sec Loss 2.7157 LearningRate 0.0322 Epoch: 8 Global Step: 144510 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:57:29,812-Speed 5204.77 samples/sec Loss 2.5876 LearningRate 0.0322 Epoch: 8 Global Step: 144520 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:57:31,798-Speed 5158.34 samples/sec Loss 2.6865 LearningRate 0.0322 Epoch: 8 Global Step: 144530 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:57:33,777-Speed 5178.04 samples/sec Loss 2.7172 LearningRate 0.0322 Epoch: 8 Global Step: 144540 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:57:35,769-Speed 5142.26 samples/sec Loss 2.7655 LearningRate 0.0321 Epoch: 8 Global Step: 144550 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:57:37,740-Speed 5197.10 samples/sec Loss 2.6519 LearningRate 0.0321 Epoch: 8 Global Step: 144560 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:57:39,718-Speed 5176.61 samples/sec Loss 2.6738 LearningRate 0.0321 Epoch: 8 Global Step: 144570 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:57:41,694-Speed 5185.86 samples/sec Loss 2.7022 LearningRate 0.0321 Epoch: 8 Global Step: 144580 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:57:43,663-Speed 5201.48 samples/sec Loss 2.6557 LearningRate 0.0321 Epoch: 8 Global Step: 144590 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:57:45,630-Speed 5206.90 samples/sec Loss 2.6502 LearningRate 0.0321 Epoch: 8 Global Step: 144600 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:57:47,597-Speed 5209.15 samples/sec Loss 2.6488 LearningRate 0.0321 Epoch: 8 Global Step: 144610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:57:49,564-Speed 5207.35 samples/sec Loss 2.6674 LearningRate 0.0321 Epoch: 8 Global Step: 144620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:57:51,539-Speed 5185.65 samples/sec Loss 2.6867 LearningRate 0.0321 Epoch: 8 Global Step: 144630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:57:53,513-Speed 5190.28 samples/sec Loss 2.7317 LearningRate 0.0321 Epoch: 8 Global Step: 144640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:57:55,480-Speed 5206.64 samples/sec Loss 2.7401 LearningRate 0.0321 Epoch: 8 Global Step: 144650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:57:57,447-Speed 5209.21 samples/sec Loss 2.6133 LearningRate 0.0321 Epoch: 8 Global Step: 144660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:57:59,419-Speed 5192.53 samples/sec Loss 2.6837 LearningRate 0.0321 Epoch: 8 Global Step: 144670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:58:01,402-Speed 5167.45 samples/sec Loss 2.6585 LearningRate 0.0321 Epoch: 8 Global Step: 144680 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:58:03,382-Speed 5172.52 samples/sec Loss 2.6887 LearningRate 0.0321 Epoch: 8 Global Step: 144690 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:58:05,349-Speed 5209.10 samples/sec Loss 2.7003 LearningRate 0.0321 Epoch: 8 Global Step: 144700 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:58:07,317-Speed 5203.67 samples/sec Loss 2.6739 LearningRate 0.0321 Epoch: 8 Global Step: 144710 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:58:09,291-Speed 5190.91 samples/sec Loss 2.6526 LearningRate 0.0321 Epoch: 8 Global Step: 144720 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:58:11,286-Speed 5134.22 samples/sec Loss 2.6709 LearningRate 0.0321 Epoch: 8 Global Step: 144730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:58:13,266-Speed 5172.48 samples/sec Loss 2.6197 LearningRate 0.0321 Epoch: 8 Global Step: 144740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:58:15,237-Speed 5196.46 samples/sec Loss 2.7348 LearningRate 0.0321 Epoch: 8 Global Step: 144750 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:58:17,219-Speed 5168.30 samples/sec Loss 2.7141 LearningRate 0.0321 Epoch: 8 Global Step: 144760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:58:19,202-Speed 5166.42 samples/sec Loss 2.7139 LearningRate 0.0321 Epoch: 8 Global Step: 144770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:58:21,168-Speed 5210.59 samples/sec Loss 2.7313 LearningRate 0.0321 Epoch: 8 Global Step: 144780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:58:23,152-Speed 5163.44 samples/sec Loss 2.7279 LearningRate 0.0321 Epoch: 8 Global Step: 144790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:58:25,129-Speed 5180.61 samples/sec Loss 2.6552 LearningRate 0.0321 Epoch: 8 Global Step: 144800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:58:27,110-Speed 5170.30 samples/sec Loss 2.6882 LearningRate 0.0321 Epoch: 8 Global Step: 144810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:58:29,084-Speed 5189.49 samples/sec Loss 2.6350 LearningRate 0.0321 Epoch: 8 Global Step: 144820 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:58:31,051-Speed 5206.51 samples/sec Loss 2.6582 LearningRate 0.0321 Epoch: 8 Global Step: 144830 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:58:33,021-Speed 5200.37 samples/sec Loss 2.6921 LearningRate 0.0320 Epoch: 8 Global Step: 144840 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:58:35,007-Speed 5157.37 samples/sec Loss 2.7558 LearningRate 0.0320 Epoch: 8 Global Step: 144850 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:58:36,976-Speed 5203.05 samples/sec Loss 2.6859 LearningRate 0.0320 Epoch: 8 Global Step: 144860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:58:38,958-Speed 5166.97 samples/sec Loss 2.7001 LearningRate 0.0320 Epoch: 8 Global Step: 144870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:58:40,929-Speed 5198.40 samples/sec Loss 2.7062 LearningRate 0.0320 Epoch: 8 Global Step: 144880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:58:42,892-Speed 5216.85 samples/sec Loss 2.7737 LearningRate 0.0320 Epoch: 8 Global Step: 144890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:58:44,888-Speed 5132.32 samples/sec Loss 2.6837 LearningRate 0.0320 Epoch: 8 Global Step: 144900 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:58:46,863-Speed 5188.78 samples/sec Loss 2.7283 LearningRate 0.0320 Epoch: 8 Global Step: 144910 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:58:48,834-Speed 5195.39 samples/sec Loss 2.6555 LearningRate 0.0320 Epoch: 8 Global Step: 144920 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:58:50,816-Speed 5167.96 samples/sec Loss 2.5954 LearningRate 0.0320 Epoch: 8 Global Step: 144930 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:58:52,810-Speed 5138.02 samples/sec Loss 2.7179 LearningRate 0.0320 Epoch: 8 Global Step: 144940 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:58:54,782-Speed 5194.31 samples/sec Loss 2.7122 LearningRate 0.0320 Epoch: 8 Global Step: 144950 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:58:56,749-Speed 5208.36 samples/sec Loss 2.7065 LearningRate 0.0320 Epoch: 8 Global Step: 144960 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:58:58,734-Speed 5161.67 samples/sec Loss 2.7007 LearningRate 0.0320 Epoch: 8 Global Step: 144970 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:59:00,724-Speed 5146.84 samples/sec Loss 2.6336 LearningRate 0.0320 Epoch: 8 Global Step: 144980 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:59:02,714-Speed 5145.34 samples/sec Loss 2.6604 LearningRate 0.0320 Epoch: 8 Global Step: 144990 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:59:04,688-Speed 5190.90 samples/sec Loss 2.6308 LearningRate 0.0320 Epoch: 8 Global Step: 145000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:59:06,662-Speed 5189.06 samples/sec Loss 2.6478 LearningRate 0.0320 Epoch: 8 Global Step: 145010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:59:08,638-Speed 5184.79 samples/sec Loss 2.6743 LearningRate 0.0320 Epoch: 8 Global Step: 145020 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:59:10,622-Speed 5162.25 samples/sec Loss 2.6965 LearningRate 0.0320 Epoch: 8 Global Step: 145030 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:59:12,606-Speed 5162.79 samples/sec Loss 2.6814 LearningRate 0.0320 Epoch: 8 Global Step: 145040 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:59:14,588-Speed 5169.25 samples/sec Loss 2.6937 LearningRate 0.0320 Epoch: 8 Global Step: 145050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:59:16,565-Speed 5181.79 samples/sec Loss 2.7060 LearningRate 0.0320 Epoch: 8 Global Step: 145060 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:59:18,540-Speed 5185.15 samples/sec Loss 2.6471 LearningRate 0.0320 Epoch: 8 Global Step: 145070 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:59:20,519-Speed 5176.07 samples/sec Loss 2.6186 LearningRate 0.0320 Epoch: 8 Global Step: 145080 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:59:22,502-Speed 5165.02 samples/sec Loss 2.6490 LearningRate 0.0320 Epoch: 8 Global Step: 145090 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:59:24,490-Speed 5154.12 samples/sec Loss 2.7356 LearningRate 0.0320 Epoch: 8 Global Step: 145100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:59:26,505-Speed 5082.92 samples/sec Loss 2.7380 LearningRate 0.0320 Epoch: 8 Global Step: 145110 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:59:28,479-Speed 5190.69 samples/sec Loss 2.7501 LearningRate 0.0320 Epoch: 8 Global Step: 145120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 08:59:30,448-Speed 5201.33 samples/sec Loss 2.7477 LearningRate 0.0320 Epoch: 8 Global Step: 145130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:59:32,415-Speed 5208.17 samples/sec Loss 2.6447 LearningRate 0.0319 Epoch: 8 Global Step: 145140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:59:34,385-Speed 5198.04 samples/sec Loss 2.6185 LearningRate 0.0319 Epoch: 8 Global Step: 145150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:59:36,367-Speed 5169.87 samples/sec Loss 2.6817 LearningRate 0.0319 Epoch: 8 Global Step: 145160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:59:38,341-Speed 5190.05 samples/sec Loss 2.6381 LearningRate 0.0319 Epoch: 8 Global Step: 145170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:59:40,312-Speed 5194.66 samples/sec Loss 2.6598 LearningRate 0.0319 Epoch: 8 Global Step: 145180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 08:59:42,286-Speed 5191.21 samples/sec Loss 2.6167 LearningRate 0.0319 Epoch: 8 Global Step: 145190 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:59:44,259-Speed 5191.59 samples/sec Loss 2.6769 LearningRate 0.0319 Epoch: 8 Global Step: 145200 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:59:46,252-Speed 5138.39 samples/sec Loss 2.6689 LearningRate 0.0319 Epoch: 8 Global Step: 145210 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:59:48,221-Speed 5201.94 samples/sec Loss 2.7119 LearningRate 0.0319 Epoch: 8 Global Step: 145220 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:59:50,199-Speed 5180.10 samples/sec Loss 2.7351 LearningRate 0.0319 Epoch: 8 Global Step: 145230 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:59:52,177-Speed 5178.94 samples/sec Loss 2.7560 LearningRate 0.0319 Epoch: 8 Global Step: 145240 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:59:54,146-Speed 5203.03 samples/sec Loss 2.7532 LearningRate 0.0319 Epoch: 8 Global Step: 145250 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:59:56,115-Speed 5201.87 samples/sec Loss 2.6692 LearningRate 0.0319 Epoch: 8 Global Step: 145260 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 08:59:58,084-Speed 5201.32 samples/sec Loss 2.6732 LearningRate 0.0319 Epoch: 8 Global Step: 145270 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:00:00,068-Speed 5162.94 samples/sec Loss 2.6784 LearningRate 0.0319 Epoch: 8 Global Step: 145280 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:00:02,039-Speed 5197.00 samples/sec Loss 2.6712 LearningRate 0.0319 Epoch: 8 Global Step: 145290 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:00:04,033-Speed 5136.12 samples/sec Loss 2.6870 LearningRate 0.0319 Epoch: 8 Global Step: 145300 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:00:06,003-Speed 5200.65 samples/sec Loss 2.6703 LearningRate 0.0319 Epoch: 8 Global Step: 145310 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:00:07,974-Speed 5196.55 samples/sec Loss 2.6389 LearningRate 0.0319 Epoch: 8 Global Step: 145320 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:00:09,946-Speed 5195.49 samples/sec Loss 2.7351 LearningRate 0.0319 Epoch: 8 Global Step: 145330 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:00:11,919-Speed 5192.21 samples/sec Loss 2.6529 LearningRate 0.0319 Epoch: 8 Global Step: 145340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:00:13,887-Speed 5204.56 samples/sec Loss 2.6543 LearningRate 0.0319 Epoch: 8 Global Step: 145350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:00:15,872-Speed 5160.70 samples/sec Loss 2.6377 LearningRate 0.0319 Epoch: 8 Global Step: 145360 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:00:17,839-Speed 5207.04 samples/sec Loss 2.6699 LearningRate 0.0319 Epoch: 8 Global Step: 145370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:00:19,811-Speed 5196.56 samples/sec Loss 2.6844 LearningRate 0.0319 Epoch: 8 Global Step: 145380 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:00:21,809-Speed 5125.35 samples/sec Loss 2.6454 LearningRate 0.0319 Epoch: 8 Global Step: 145390 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:00:23,804-Speed 5135.40 samples/sec Loss 2.6405 LearningRate 0.0319 Epoch: 8 Global Step: 145400 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:00:25,797-Speed 5137.59 samples/sec Loss 2.6970 LearningRate 0.0319 Epoch: 8 Global Step: 145410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:00:27,776-Speed 5176.97 samples/sec Loss 2.6227 LearningRate 0.0319 Epoch: 8 Global Step: 145420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:00:29,771-Speed 5135.95 samples/sec Loss 2.6755 LearningRate 0.0318 Epoch: 8 Global Step: 145430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:00:31,744-Speed 5191.26 samples/sec Loss 2.6936 LearningRate 0.0318 Epoch: 8 Global Step: 145440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:00:33,715-Speed 5197.31 samples/sec Loss 2.6936 LearningRate 0.0318 Epoch: 8 Global Step: 145450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:00:35,684-Speed 5203.69 samples/sec Loss 2.6391 LearningRate 0.0318 Epoch: 8 Global Step: 145460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:00:37,673-Speed 5149.59 samples/sec Loss 2.6623 LearningRate 0.0318 Epoch: 8 Global Step: 145470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:00:39,670-Speed 5128.22 samples/sec Loss 2.6682 LearningRate 0.0318 Epoch: 8 Global Step: 145480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:00:41,657-Speed 5155.81 samples/sec Loss 2.6507 LearningRate 0.0318 Epoch: 8 Global Step: 145490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:00:43,640-Speed 5165.60 samples/sec Loss 2.6794 LearningRate 0.0318 Epoch: 8 Global Step: 145500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:00:45,622-Speed 5169.05 samples/sec Loss 2.7157 LearningRate 0.0318 Epoch: 8 Global Step: 145510 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:00:47,628-Speed 5105.37 samples/sec Loss 2.6766 LearningRate 0.0318 Epoch: 8 Global Step: 145520 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:00:49,608-Speed 5173.07 samples/sec Loss 2.6801 LearningRate 0.0318 Epoch: 8 Global Step: 145530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:00:51,587-Speed 5180.28 samples/sec Loss 2.7186 LearningRate 0.0318 Epoch: 8 Global Step: 145540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:00:53,559-Speed 5193.80 samples/sec Loss 2.7120 LearningRate 0.0318 Epoch: 8 Global Step: 145550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:00:55,531-Speed 5193.14 samples/sec Loss 2.7322 LearningRate 0.0318 Epoch: 8 Global Step: 145560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:00:57,499-Speed 5205.92 samples/sec Loss 2.6646 LearningRate 0.0318 Epoch: 8 Global Step: 145570 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:00:59,499-Speed 5121.03 samples/sec Loss 2.6929 LearningRate 0.0318 Epoch: 8 Global Step: 145580 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:01:01,476-Speed 5182.15 samples/sec Loss 2.6984 LearningRate 0.0318 Epoch: 8 Global Step: 145590 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:01:03,452-Speed 5183.44 samples/sec Loss 2.6301 LearningRate 0.0318 Epoch: 8 Global Step: 145600 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:01:05,437-Speed 5161.57 samples/sec Loss 2.6509 LearningRate 0.0318 Epoch: 8 Global Step: 145610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:01:07,416-Speed 5175.66 samples/sec Loss 2.7545 LearningRate 0.0318 Epoch: 8 Global Step: 145620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:01:09,387-Speed 5198.53 samples/sec Loss 2.5922 LearningRate 0.0318 Epoch: 8 Global Step: 145630 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:01:11,354-Speed 5206.50 samples/sec Loss 2.7321 LearningRate 0.0318 Epoch: 8 Global Step: 145640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:01:13,352-Speed 5126.89 samples/sec Loss 2.6775 LearningRate 0.0318 Epoch: 8 Global Step: 145650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:01:15,340-Speed 5152.84 samples/sec Loss 2.7156 LearningRate 0.0318 Epoch: 8 Global Step: 145660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:01:17,321-Speed 5170.70 samples/sec Loss 2.7169 LearningRate 0.0318 Epoch: 8 Global Step: 145670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:01:19,288-Speed 5207.07 samples/sec Loss 2.5894 LearningRate 0.0318 Epoch: 8 Global Step: 145680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:01:21,261-Speed 5190.48 samples/sec Loss 2.6980 LearningRate 0.0318 Epoch: 8 Global Step: 145690 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:01:23,239-Speed 5180.02 samples/sec Loss 2.7133 LearningRate 0.0318 Epoch: 8 Global Step: 145700 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:01:25,221-Speed 5169.46 samples/sec Loss 2.6292 LearningRate 0.0318 Epoch: 8 Global Step: 145710 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:01:27,207-Speed 5158.33 samples/sec Loss 2.6136 LearningRate 0.0318 Epoch: 8 Global Step: 145720 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:01:29,188-Speed 5170.19 samples/sec Loss 2.6920 LearningRate 0.0317 Epoch: 8 Global Step: 145730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:01:31,167-Speed 5175.26 samples/sec Loss 2.6408 LearningRate 0.0317 Epoch: 8 Global Step: 145740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:01:33,152-Speed 5160.81 samples/sec Loss 2.6462 LearningRate 0.0317 Epoch: 8 Global Step: 145750 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:01:35,141-Speed 5149.47 samples/sec Loss 2.6566 LearningRate 0.0317 Epoch: 8 Global Step: 145760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:01:37,115-Speed 5189.78 samples/sec Loss 2.6648 LearningRate 0.0317 Epoch: 8 Global Step: 145770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:01:39,086-Speed 5195.91 samples/sec Loss 2.6459 LearningRate 0.0317 Epoch: 8 Global Step: 145780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:01:41,064-Speed 5180.35 samples/sec Loss 2.6925 LearningRate 0.0317 Epoch: 8 Global Step: 145790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:01:43,052-Speed 5152.17 samples/sec Loss 2.6247 LearningRate 0.0317 Epoch: 8 Global Step: 145800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:01:45,031-Speed 5176.64 samples/sec Loss 2.6431 LearningRate 0.0317 Epoch: 8 Global Step: 145810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:01:47,017-Speed 5156.52 samples/sec Loss 2.7061 LearningRate 0.0317 Epoch: 8 Global Step: 145820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:01:48,997-Speed 5174.44 samples/sec Loss 2.7529 LearningRate 0.0317 Epoch: 8 Global Step: 145830 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:01:50,978-Speed 5170.45 samples/sec Loss 2.6478 LearningRate 0.0317 Epoch: 8 Global Step: 145840 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:01:52,957-Speed 5175.47 samples/sec Loss 2.6339 LearningRate 0.0317 Epoch: 8 Global Step: 145850 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:01:54,932-Speed 5186.14 samples/sec Loss 2.6106 LearningRate 0.0317 Epoch: 8 Global Step: 145860 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:01:56,907-Speed 5186.81 samples/sec Loss 2.7269 LearningRate 0.0317 Epoch: 8 Global Step: 145870 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:01:58,878-Speed 5198.54 samples/sec Loss 2.6691 LearningRate 0.0317 Epoch: 8 Global Step: 145880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:02:00,863-Speed 5158.99 samples/sec Loss 2.6961 LearningRate 0.0317 Epoch: 8 Global Step: 145890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:02:02,840-Speed 5181.88 samples/sec Loss 2.7604 LearningRate 0.0317 Epoch: 8 Global Step: 145900 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:02:04,811-Speed 5198.86 samples/sec Loss 2.6556 LearningRate 0.0317 Epoch: 8 Global Step: 145910 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:02:06,794-Speed 5163.78 samples/sec Loss 2.7467 LearningRate 0.0317 Epoch: 8 Global Step: 145920 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:02:08,772-Speed 5180.33 samples/sec Loss 2.6613 LearningRate 0.0317 Epoch: 8 Global Step: 145930 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:02:10,753-Speed 5170.43 samples/sec Loss 2.6945 LearningRate 0.0317 Epoch: 8 Global Step: 145940 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:02:12,786-Speed 5038.69 samples/sec Loss 2.7055 LearningRate 0.0317 Epoch: 8 Global Step: 145950 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:02:14,764-Speed 5176.92 samples/sec Loss 2.6821 LearningRate 0.0317 Epoch: 8 Global Step: 145960 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:02:16,739-Speed 5187.59 samples/sec Loss 2.7119 LearningRate 0.0317 Epoch: 8 Global Step: 145970 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:02:18,710-Speed 5197.07 samples/sec Loss 2.7238 LearningRate 0.0317 Epoch: 8 Global Step: 145980 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:02:20,681-Speed 5194.97 samples/sec Loss 2.7023 LearningRate 0.0317 Epoch: 8 Global Step: 145990 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:02:22,656-Speed 5186.63 samples/sec Loss 2.6726 LearningRate 0.0317 Epoch: 8 Global Step: 146000 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:02:49,363-[lfw][146000]XNorm: 22.834220 Training: 2022-04-11 09:02:49,363-[lfw][146000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 09:02:49,364-[lfw][146000]Accuracy-Highest: 0.99833 Training: 2022-04-11 09:03:20,166-[cfp_fp][146000]XNorm: 21.354291 Training: 2022-04-11 09:03:20,166-[cfp_fp][146000]Accuracy-Flip: 0.98200+-0.00554 Training: 2022-04-11 09:03:20,167-[cfp_fp][146000]Accuracy-Highest: 0.98443 Training: 2022-04-11 09:03:46,679-[agedb_30][146000]XNorm: 22.774719 Training: 2022-04-11 09:03:46,680-[agedb_30][146000]Accuracy-Flip: 0.98100+-0.00847 Training: 2022-04-11 09:03:46,680-[agedb_30][146000]Accuracy-Highest: 0.98150 Training: 2022-04-11 09:03:48,677-Speed 119.04 samples/sec Loss 2.7478 LearningRate 0.0317 Epoch: 8 Global Step: 146010 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:03:50,650-Speed 5192.32 samples/sec Loss 2.7599 LearningRate 0.0316 Epoch: 8 Global Step: 146020 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:03:52,628-Speed 5178.70 samples/sec Loss 2.6718 LearningRate 0.0316 Epoch: 8 Global Step: 146030 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:03:54,593-Speed 5213.18 samples/sec Loss 2.8066 LearningRate 0.0316 Epoch: 8 Global Step: 146040 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:03:56,560-Speed 5206.94 samples/sec Loss 2.6747 LearningRate 0.0316 Epoch: 8 Global Step: 146050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:03:58,541-Speed 5172.13 samples/sec Loss 2.6864 LearningRate 0.0316 Epoch: 8 Global Step: 146060 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:04:00,520-Speed 5175.38 samples/sec Loss 2.6565 LearningRate 0.0316 Epoch: 8 Global Step: 146070 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:04:02,501-Speed 5169.93 samples/sec Loss 2.7200 LearningRate 0.0316 Epoch: 8 Global Step: 146080 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:04:04,469-Speed 5204.93 samples/sec Loss 2.6528 LearningRate 0.0316 Epoch: 8 Global Step: 146090 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:04:06,442-Speed 5191.92 samples/sec Loss 2.6551 LearningRate 0.0316 Epoch: 8 Global Step: 146100 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:04:08,410-Speed 5204.90 samples/sec Loss 2.7087 LearningRate 0.0316 Epoch: 8 Global Step: 146110 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:04:10,393-Speed 5166.76 samples/sec Loss 2.6458 LearningRate 0.0316 Epoch: 8 Global Step: 146120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:04:12,368-Speed 5186.62 samples/sec Loss 2.6521 LearningRate 0.0316 Epoch: 8 Global Step: 146130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:04:14,356-Speed 5152.56 samples/sec Loss 2.6881 LearningRate 0.0316 Epoch: 8 Global Step: 146140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:04:16,326-Speed 5200.39 samples/sec Loss 2.6637 LearningRate 0.0316 Epoch: 8 Global Step: 146150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:04:18,306-Speed 5172.06 samples/sec Loss 2.6987 LearningRate 0.0316 Epoch: 8 Global Step: 146160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:04:20,283-Speed 5180.99 samples/sec Loss 2.6345 LearningRate 0.0316 Epoch: 8 Global Step: 146170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:04:22,256-Speed 5193.07 samples/sec Loss 2.7687 LearningRate 0.0316 Epoch: 8 Global Step: 146180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:04:24,230-Speed 5188.48 samples/sec Loss 2.6874 LearningRate 0.0316 Epoch: 8 Global Step: 146190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:04:26,204-Speed 5187.94 samples/sec Loss 2.6885 LearningRate 0.0316 Epoch: 8 Global Step: 146200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:04:28,193-Speed 5151.85 samples/sec Loss 2.6983 LearningRate 0.0316 Epoch: 8 Global Step: 146210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:04:30,186-Speed 5138.72 samples/sec Loss 2.7244 LearningRate 0.0316 Epoch: 8 Global Step: 146220 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:04:32,159-Speed 5191.62 samples/sec Loss 2.6533 LearningRate 0.0316 Epoch: 8 Global Step: 146230 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:04:34,132-Speed 5192.54 samples/sec Loss 2.6345 LearningRate 0.0316 Epoch: 8 Global Step: 146240 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:04:36,108-Speed 5184.58 samples/sec Loss 2.6633 LearningRate 0.0316 Epoch: 8 Global Step: 146250 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:04:38,082-Speed 5188.63 samples/sec Loss 2.7372 LearningRate 0.0316 Epoch: 8 Global Step: 146260 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:04:40,064-Speed 5168.82 samples/sec Loss 2.6968 LearningRate 0.0316 Epoch: 8 Global Step: 146270 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:04:42,047-Speed 5165.72 samples/sec Loss 2.7394 LearningRate 0.0316 Epoch: 8 Global Step: 146280 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:04:44,026-Speed 5176.38 samples/sec Loss 2.6402 LearningRate 0.0316 Epoch: 8 Global Step: 146290 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:04:46,006-Speed 5172.37 samples/sec Loss 2.7268 LearningRate 0.0316 Epoch: 8 Global Step: 146300 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:04:48,003-Speed 5129.72 samples/sec Loss 2.7387 LearningRate 0.0316 Epoch: 8 Global Step: 146310 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:04:50,006-Speed 5113.00 samples/sec Loss 2.7055 LearningRate 0.0315 Epoch: 8 Global Step: 146320 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:04:51,995-Speed 5150.47 samples/sec Loss 2.6446 LearningRate 0.0315 Epoch: 8 Global Step: 146330 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:04:53,993-Speed 5126.83 samples/sec Loss 2.6015 LearningRate 0.0315 Epoch: 8 Global Step: 146340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:04:55,968-Speed 5186.62 samples/sec Loss 2.5926 LearningRate 0.0315 Epoch: 8 Global Step: 146350 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:04:57,947-Speed 5176.53 samples/sec Loss 2.6780 LearningRate 0.0315 Epoch: 8 Global Step: 146360 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:04:59,936-Speed 5150.05 samples/sec Loss 2.7276 LearningRate 0.0315 Epoch: 8 Global Step: 146370 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:05:01,916-Speed 5172.65 samples/sec Loss 2.5445 LearningRate 0.0315 Epoch: 8 Global Step: 146380 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:05:03,907-Speed 5146.15 samples/sec Loss 2.6739 LearningRate 0.0315 Epoch: 8 Global Step: 146390 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:05:05,884-Speed 5180.20 samples/sec Loss 2.6582 LearningRate 0.0315 Epoch: 8 Global Step: 146400 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:05:07,863-Speed 5176.15 samples/sec Loss 2.7536 LearningRate 0.0315 Epoch: 8 Global Step: 146410 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:05:09,855-Speed 5142.05 samples/sec Loss 2.7313 LearningRate 0.0315 Epoch: 8 Global Step: 146420 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:05:11,847-Speed 5141.92 samples/sec Loss 2.7541 LearningRate 0.0315 Epoch: 8 Global Step: 146430 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:05:13,837-Speed 5147.32 samples/sec Loss 2.6791 LearningRate 0.0315 Epoch: 8 Global Step: 146440 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:05:15,827-Speed 5149.72 samples/sec Loss 2.6764 LearningRate 0.0315 Epoch: 8 Global Step: 146450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:05:17,806-Speed 5173.67 samples/sec Loss 2.6357 LearningRate 0.0315 Epoch: 8 Global Step: 146460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:05:19,789-Speed 5167.62 samples/sec Loss 2.6590 LearningRate 0.0315 Epoch: 8 Global Step: 146470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:05:21,764-Speed 5184.44 samples/sec Loss 2.6549 LearningRate 0.0315 Epoch: 8 Global Step: 146480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:05:23,749-Speed 5161.94 samples/sec Loss 2.6935 LearningRate 0.0315 Epoch: 8 Global Step: 146490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:05:25,739-Speed 5146.55 samples/sec Loss 2.6920 LearningRate 0.0315 Epoch: 8 Global Step: 146500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:05:27,724-Speed 5160.83 samples/sec Loss 2.6560 LearningRate 0.0315 Epoch: 8 Global Step: 146510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:05:29,702-Speed 5177.61 samples/sec Loss 2.6694 LearningRate 0.0315 Epoch: 8 Global Step: 146520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:05:31,678-Speed 5185.41 samples/sec Loss 2.6675 LearningRate 0.0315 Epoch: 8 Global Step: 146530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:05:33,651-Speed 5191.36 samples/sec Loss 2.7383 LearningRate 0.0315 Epoch: 8 Global Step: 146540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:05:35,653-Speed 5117.93 samples/sec Loss 2.7164 LearningRate 0.0315 Epoch: 8 Global Step: 146550 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:05:37,645-Speed 5142.43 samples/sec Loss 2.6986 LearningRate 0.0315 Epoch: 8 Global Step: 146560 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:05:39,620-Speed 5187.94 samples/sec Loss 2.7150 LearningRate 0.0315 Epoch: 8 Global Step: 146570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:05:41,593-Speed 5191.00 samples/sec Loss 2.7074 LearningRate 0.0315 Epoch: 8 Global Step: 146580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:05:43,568-Speed 5186.91 samples/sec Loss 2.7347 LearningRate 0.0315 Epoch: 8 Global Step: 146590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:05:45,549-Speed 5170.24 samples/sec Loss 2.6264 LearningRate 0.0315 Epoch: 8 Global Step: 146600 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:05:47,523-Speed 5187.98 samples/sec Loss 2.6712 LearningRate 0.0315 Epoch: 8 Global Step: 146610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:05:49,512-Speed 5151.10 samples/sec Loss 2.6208 LearningRate 0.0314 Epoch: 8 Global Step: 146620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:05:51,485-Speed 5190.48 samples/sec Loss 2.6690 LearningRate 0.0314 Epoch: 8 Global Step: 146630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:05:53,461-Speed 5185.43 samples/sec Loss 2.7386 LearningRate 0.0314 Epoch: 8 Global Step: 146640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:05:55,432-Speed 5196.26 samples/sec Loss 2.7333 LearningRate 0.0314 Epoch: 8 Global Step: 146650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:05:57,405-Speed 5192.94 samples/sec Loss 2.7181 LearningRate 0.0314 Epoch: 8 Global Step: 146660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:05:59,381-Speed 5184.22 samples/sec Loss 2.7124 LearningRate 0.0314 Epoch: 8 Global Step: 146670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:06:01,367-Speed 5176.06 samples/sec Loss 2.7427 LearningRate 0.0314 Epoch: 8 Global Step: 146680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:06:03,338-Speed 5196.28 samples/sec Loss 2.6212 LearningRate 0.0314 Epoch: 8 Global Step: 146690 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:06:05,320-Speed 5168.08 samples/sec Loss 2.6785 LearningRate 0.0314 Epoch: 8 Global Step: 146700 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:06:07,306-Speed 5159.22 samples/sec Loss 2.7546 LearningRate 0.0314 Epoch: 8 Global Step: 146710 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:06:09,277-Speed 5197.29 samples/sec Loss 2.6996 LearningRate 0.0314 Epoch: 8 Global Step: 146720 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:06:11,256-Speed 5175.27 samples/sec Loss 2.7306 LearningRate 0.0314 Epoch: 8 Global Step: 146730 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:06:13,292-Speed 5030.42 samples/sec Loss 2.6536 LearningRate 0.0314 Epoch: 8 Global Step: 146740 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:06:15,268-Speed 5183.38 samples/sec Loss 2.5961 LearningRate 0.0314 Epoch: 8 Global Step: 146750 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:06:17,253-Speed 5162.19 samples/sec Loss 2.6378 LearningRate 0.0314 Epoch: 8 Global Step: 146760 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:06:19,227-Speed 5190.46 samples/sec Loss 2.6860 LearningRate 0.0314 Epoch: 8 Global Step: 146770 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:06:21,217-Speed 5147.05 samples/sec Loss 2.6692 LearningRate 0.0314 Epoch: 8 Global Step: 146780 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-11 09:06:23,193-Speed 5182.68 samples/sec Loss 2.6573 LearningRate 0.0314 Epoch: 8 Global Step: 146790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:06:25,177-Speed 5162.66 samples/sec Loss 2.6284 LearningRate 0.0314 Epoch: 8 Global Step: 146800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:06:27,167-Speed 5147.68 samples/sec Loss 2.6907 LearningRate 0.0314 Epoch: 8 Global Step: 146810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:06:29,143-Speed 5184.13 samples/sec Loss 2.6151 LearningRate 0.0314 Epoch: 8 Global Step: 146820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:06:31,115-Speed 5193.54 samples/sec Loss 2.7435 LearningRate 0.0314 Epoch: 8 Global Step: 146830 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:06:33,099-Speed 5162.98 samples/sec Loss 2.6567 LearningRate 0.0314 Epoch: 8 Global Step: 146840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:06:35,104-Speed 5109.28 samples/sec Loss 2.6227 LearningRate 0.0314 Epoch: 8 Global Step: 146850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:06:37,099-Speed 5136.45 samples/sec Loss 2.6219 LearningRate 0.0314 Epoch: 8 Global Step: 146860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:06:39,078-Speed 5176.23 samples/sec Loss 2.6636 LearningRate 0.0314 Epoch: 8 Global Step: 146870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:06:41,083-Speed 5108.48 samples/sec Loss 2.7553 LearningRate 0.0314 Epoch: 8 Global Step: 146880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:06:43,064-Speed 5170.58 samples/sec Loss 2.6690 LearningRate 0.0314 Epoch: 8 Global Step: 146890 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:06:45,048-Speed 5162.52 samples/sec Loss 2.6927 LearningRate 0.0314 Epoch: 8 Global Step: 146900 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:06:47,023-Speed 5188.21 samples/sec Loss 2.7002 LearningRate 0.0314 Epoch: 8 Global Step: 146910 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:06:49,005-Speed 5167.97 samples/sec Loss 2.6524 LearningRate 0.0313 Epoch: 8 Global Step: 146920 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:06:50,985-Speed 5173.53 samples/sec Loss 2.6800 LearningRate 0.0313 Epoch: 8 Global Step: 146930 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:06:52,967-Speed 5168.11 samples/sec Loss 2.7349 LearningRate 0.0313 Epoch: 8 Global Step: 146940 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:06:54,939-Speed 5192.35 samples/sec Loss 2.6288 LearningRate 0.0313 Epoch: 8 Global Step: 146950 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:06:56,918-Speed 5176.66 samples/sec Loss 2.6699 LearningRate 0.0313 Epoch: 8 Global Step: 146960 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:06:58,902-Speed 5164.04 samples/sec Loss 2.6486 LearningRate 0.0313 Epoch: 8 Global Step: 146970 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:07:00,875-Speed 5192.69 samples/sec Loss 2.7310 LearningRate 0.0313 Epoch: 8 Global Step: 146980 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:07:02,864-Speed 5150.21 samples/sec Loss 2.7148 LearningRate 0.0313 Epoch: 8 Global Step: 146990 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:07:04,842-Speed 5177.03 samples/sec Loss 2.6084 LearningRate 0.0313 Epoch: 8 Global Step: 147000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:07:06,824-Speed 5169.70 samples/sec Loss 2.6761 LearningRate 0.0313 Epoch: 8 Global Step: 147010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:07:08,812-Speed 5151.35 samples/sec Loss 2.6550 LearningRate 0.0313 Epoch: 8 Global Step: 147020 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:07:10,791-Speed 5176.75 samples/sec Loss 2.6500 LearningRate 0.0313 Epoch: 8 Global Step: 147030 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:07:12,798-Speed 5103.36 samples/sec Loss 2.6799 LearningRate 0.0313 Epoch: 8 Global Step: 147040 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:07:14,779-Speed 5169.93 samples/sec Loss 2.6782 LearningRate 0.0313 Epoch: 8 Global Step: 147050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:07:16,763-Speed 5163.93 samples/sec Loss 2.6699 LearningRate 0.0313 Epoch: 8 Global Step: 147060 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:07:18,746-Speed 5164.72 samples/sec Loss 2.7030 LearningRate 0.0313 Epoch: 8 Global Step: 147070 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:07:20,728-Speed 5169.70 samples/sec Loss 2.6246 LearningRate 0.0313 Epoch: 8 Global Step: 147080 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:07:22,732-Speed 5112.41 samples/sec Loss 2.6773 LearningRate 0.0313 Epoch: 8 Global Step: 147090 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:07:24,724-Speed 5140.34 samples/sec Loss 2.7091 LearningRate 0.0313 Epoch: 8 Global Step: 147100 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:07:26,711-Speed 5155.43 samples/sec Loss 2.7030 LearningRate 0.0313 Epoch: 8 Global Step: 147110 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:07:28,697-Speed 5157.34 samples/sec Loss 2.6749 LearningRate 0.0313 Epoch: 8 Global Step: 147120 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:07:30,679-Speed 5169.16 samples/sec Loss 2.7282 LearningRate 0.0313 Epoch: 8 Global Step: 147130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:07:32,659-Speed 5171.92 samples/sec Loss 2.6722 LearningRate 0.0313 Epoch: 8 Global Step: 147140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:07:34,633-Speed 5189.65 samples/sec Loss 2.6339 LearningRate 0.0313 Epoch: 8 Global Step: 147150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:07:36,611-Speed 5178.67 samples/sec Loss 2.6635 LearningRate 0.0313 Epoch: 8 Global Step: 147160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:07:38,597-Speed 5157.79 samples/sec Loss 2.6541 LearningRate 0.0313 Epoch: 8 Global Step: 147170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:07:40,574-Speed 5181.35 samples/sec Loss 2.6893 LearningRate 0.0313 Epoch: 8 Global Step: 147180 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:07:42,562-Speed 5153.61 samples/sec Loss 2.7127 LearningRate 0.0313 Epoch: 8 Global Step: 147190 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:07:44,538-Speed 5184.34 samples/sec Loss 2.6210 LearningRate 0.0313 Epoch: 8 Global Step: 147200 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:07:46,512-Speed 5190.44 samples/sec Loss 2.6829 LearningRate 0.0312 Epoch: 8 Global Step: 147210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:07:48,492-Speed 5172.13 samples/sec Loss 2.6923 LearningRate 0.0312 Epoch: 8 Global Step: 147220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:07:50,468-Speed 5182.53 samples/sec Loss 2.6899 LearningRate 0.0312 Epoch: 8 Global Step: 147230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:07:52,463-Speed 5134.44 samples/sec Loss 2.6766 LearningRate 0.0312 Epoch: 8 Global Step: 147240 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:07:54,451-Speed 5153.29 samples/sec Loss 2.7543 LearningRate 0.0312 Epoch: 8 Global Step: 147250 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:07:56,427-Speed 5184.71 samples/sec Loss 2.6728 LearningRate 0.0312 Epoch: 8 Global Step: 147260 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:07:58,418-Speed 5142.83 samples/sec Loss 2.7012 LearningRate 0.0312 Epoch: 8 Global Step: 147270 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:08:00,402-Speed 5164.97 samples/sec Loss 2.7001 LearningRate 0.0312 Epoch: 8 Global Step: 147280 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:08:02,383-Speed 5171.36 samples/sec Loss 2.6801 LearningRate 0.0312 Epoch: 8 Global Step: 147290 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:08:04,356-Speed 5192.41 samples/sec Loss 2.6940 LearningRate 0.0312 Epoch: 8 Global Step: 147300 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:08:06,331-Speed 5185.93 samples/sec Loss 2.5698 LearningRate 0.0312 Epoch: 8 Global Step: 147310 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:08:08,308-Speed 5180.89 samples/sec Loss 2.7519 LearningRate 0.0312 Epoch: 8 Global Step: 147320 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:08:10,284-Speed 5185.04 samples/sec Loss 2.7008 LearningRate 0.0312 Epoch: 8 Global Step: 147330 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:08:12,265-Speed 5170.39 samples/sec Loss 2.6746 LearningRate 0.0312 Epoch: 8 Global Step: 147340 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:08:14,250-Speed 5159.30 samples/sec Loss 2.6245 LearningRate 0.0312 Epoch: 8 Global Step: 147350 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:08:16,225-Speed 5189.09 samples/sec Loss 2.6410 LearningRate 0.0312 Epoch: 8 Global Step: 147360 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:08:18,211-Speed 5156.81 samples/sec Loss 2.6610 LearningRate 0.0312 Epoch: 8 Global Step: 147370 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:08:20,187-Speed 5182.21 samples/sec Loss 2.6976 LearningRate 0.0312 Epoch: 8 Global Step: 147380 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:08:22,173-Speed 5159.91 samples/sec Loss 2.7171 LearningRate 0.0312 Epoch: 8 Global Step: 147390 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-11 09:08:24,161-Speed 5150.93 samples/sec Loss 2.6166 LearningRate 0.0312 Epoch: 8 Global Step: 147400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:08:26,140-Speed 5176.44 samples/sec Loss 2.6960 LearningRate 0.0312 Epoch: 8 Global Step: 147410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:08:28,120-Speed 5174.93 samples/sec Loss 2.7141 LearningRate 0.0312 Epoch: 8 Global Step: 147420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:08:30,095-Speed 5187.58 samples/sec Loss 2.7094 LearningRate 0.0312 Epoch: 8 Global Step: 147430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:08:32,077-Speed 5166.45 samples/sec Loss 2.6239 LearningRate 0.0312 Epoch: 8 Global Step: 147440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:08:34,067-Speed 5148.75 samples/sec Loss 2.7079 LearningRate 0.0312 Epoch: 8 Global Step: 147450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:08:36,055-Speed 5152.15 samples/sec Loss 2.6645 LearningRate 0.0312 Epoch: 8 Global Step: 147460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-11 09:08:38,041-Speed 5158.01 samples/sec Loss 2.6601 LearningRate 0.0312 Epoch: 8 Global Step: 147470 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:08:40,017-Speed 5182.04 samples/sec Loss 2.6424 LearningRate 0.0312 Epoch: 8 Global Step: 147480 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:08:42,003-Speed 5158.58 samples/sec Loss 2.6834 LearningRate 0.0312 Epoch: 8 Global Step: 147490 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:08:43,988-Speed 5159.29 samples/sec Loss 2.6671 LearningRate 0.0312 Epoch: 8 Global Step: 147500 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:08:45,979-Speed 5146.64 samples/sec Loss 2.5924 LearningRate 0.0311 Epoch: 8 Global Step: 147510 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:08:47,958-Speed 5175.99 samples/sec Loss 2.7029 LearningRate 0.0311 Epoch: 8 Global Step: 147520 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:08:49,935-Speed 5181.38 samples/sec Loss 2.6242 LearningRate 0.0311 Epoch: 8 Global Step: 147530 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:08:51,912-Speed 5180.30 samples/sec Loss 2.7415 LearningRate 0.0311 Epoch: 8 Global Step: 147540 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:08:53,904-Speed 5142.58 samples/sec Loss 2.6719 LearningRate 0.0311 Epoch: 8 Global Step: 147550 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:08:55,897-Speed 5138.97 samples/sec Loss 2.7107 LearningRate 0.0311 Epoch: 8 Global Step: 147560 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:08:57,884-Speed 5156.15 samples/sec Loss 2.6724 LearningRate 0.0311 Epoch: 8 Global Step: 147570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:08:59,886-Speed 5117.52 samples/sec Loss 2.7653 LearningRate 0.0311 Epoch: 8 Global Step: 147580 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:09:01,868-Speed 5167.59 samples/sec Loss 2.6266 LearningRate 0.0311 Epoch: 8 Global Step: 147590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:09:03,860-Speed 5141.88 samples/sec Loss 2.7096 LearningRate 0.0311 Epoch: 8 Global Step: 147600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:09:05,843-Speed 5167.01 samples/sec Loss 2.7509 LearningRate 0.0311 Epoch: 8 Global Step: 147610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:09:07,818-Speed 5186.64 samples/sec Loss 2.6710 LearningRate 0.0311 Epoch: 8 Global Step: 147620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:09:09,802-Speed 5163.58 samples/sec Loss 2.6620 LearningRate 0.0311 Epoch: 8 Global Step: 147630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:09:11,779-Speed 5179.75 samples/sec Loss 2.6797 LearningRate 0.0311 Epoch: 8 Global Step: 147640 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:09:13,768-Speed 5149.32 samples/sec Loss 2.6815 LearningRate 0.0311 Epoch: 8 Global Step: 147650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:09:15,776-Speed 5102.24 samples/sec Loss 2.6260 LearningRate 0.0311 Epoch: 8 Global Step: 147660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:09:17,744-Speed 5211.08 samples/sec Loss 2.5961 LearningRate 0.0311 Epoch: 8 Global Step: 147670 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:09:19,726-Speed 5166.27 samples/sec Loss 2.6862 LearningRate 0.0311 Epoch: 8 Global Step: 147680 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:09:21,713-Speed 5156.68 samples/sec Loss 2.7028 LearningRate 0.0311 Epoch: 8 Global Step: 147690 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:09:23,695-Speed 5166.30 samples/sec Loss 2.6545 LearningRate 0.0311 Epoch: 8 Global Step: 147700 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:09:25,693-Speed 5126.65 samples/sec Loss 2.5926 LearningRate 0.0311 Epoch: 8 Global Step: 147710 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:09:27,676-Speed 5167.08 samples/sec Loss 2.6309 LearningRate 0.0311 Epoch: 8 Global Step: 147720 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:09:29,652-Speed 5183.83 samples/sec Loss 2.6634 LearningRate 0.0311 Epoch: 8 Global Step: 147730 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:09:31,629-Speed 5182.16 samples/sec Loss 2.6418 LearningRate 0.0311 Epoch: 8 Global Step: 147740 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:09:33,606-Speed 5179.96 samples/sec Loss 2.6907 LearningRate 0.0311 Epoch: 8 Global Step: 147750 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:09:35,585-Speed 5177.35 samples/sec Loss 2.6676 LearningRate 0.0311 Epoch: 8 Global Step: 147760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:09:37,567-Speed 5168.44 samples/sec Loss 2.6822 LearningRate 0.0311 Epoch: 8 Global Step: 147770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:09:39,548-Speed 5170.40 samples/sec Loss 2.6751 LearningRate 0.0311 Epoch: 8 Global Step: 147780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:09:41,527-Speed 5175.41 samples/sec Loss 2.6766 LearningRate 0.0311 Epoch: 8 Global Step: 147790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:09:43,501-Speed 5190.01 samples/sec Loss 2.6568 LearningRate 0.0311 Epoch: 8 Global Step: 147800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:09:45,479-Speed 5177.29 samples/sec Loss 2.6627 LearningRate 0.0310 Epoch: 8 Global Step: 147810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:09:47,473-Speed 5138.69 samples/sec Loss 2.6146 LearningRate 0.0310 Epoch: 8 Global Step: 147820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:09:49,477-Speed 5112.18 samples/sec Loss 2.6611 LearningRate 0.0310 Epoch: 8 Global Step: 147830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:09:51,449-Speed 5192.92 samples/sec Loss 2.6571 LearningRate 0.0310 Epoch: 8 Global Step: 147840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:09:53,437-Speed 5154.79 samples/sec Loss 2.6620 LearningRate 0.0310 Epoch: 8 Global Step: 147850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:09:55,415-Speed 5177.23 samples/sec Loss 2.7097 LearningRate 0.0310 Epoch: 8 Global Step: 147860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:09:57,390-Speed 5187.60 samples/sec Loss 2.6373 LearningRate 0.0310 Epoch: 8 Global Step: 147870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:09:59,377-Speed 5154.53 samples/sec Loss 2.6624 LearningRate 0.0310 Epoch: 8 Global Step: 147880 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:10:01,353-Speed 5182.31 samples/sec Loss 2.6242 LearningRate 0.0310 Epoch: 8 Global Step: 147890 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:10:03,345-Speed 5143.39 samples/sec Loss 2.6714 LearningRate 0.0310 Epoch: 8 Global Step: 147900 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:10:05,325-Speed 5174.46 samples/sec Loss 2.6465 LearningRate 0.0310 Epoch: 8 Global Step: 147910 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:10:07,301-Speed 5183.32 samples/sec Loss 2.6884 LearningRate 0.0310 Epoch: 8 Global Step: 147920 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:10:09,292-Speed 5145.23 samples/sec Loss 2.6763 LearningRate 0.0310 Epoch: 8 Global Step: 147930 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:10:11,300-Speed 5100.95 samples/sec Loss 2.6487 LearningRate 0.0310 Epoch: 8 Global Step: 147940 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:10:13,286-Speed 5159.40 samples/sec Loss 2.6307 LearningRate 0.0310 Epoch: 8 Global Step: 147950 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:10:15,264-Speed 5179.33 samples/sec Loss 2.6480 LearningRate 0.0310 Epoch: 8 Global Step: 147960 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:10:17,244-Speed 5171.60 samples/sec Loss 2.6156 LearningRate 0.0310 Epoch: 8 Global Step: 147970 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:10:19,217-Speed 5191.94 samples/sec Loss 2.6167 LearningRate 0.0310 Epoch: 8 Global Step: 147980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:10:21,194-Speed 5181.40 samples/sec Loss 2.6308 LearningRate 0.0310 Epoch: 8 Global Step: 147990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:10:23,190-Speed 5131.97 samples/sec Loss 2.6683 LearningRate 0.0310 Epoch: 8 Global Step: 148000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:10:50,023-[lfw][148000]XNorm: 22.099081 Training: 2022-04-11 09:10:50,024-[lfw][148000]Accuracy-Flip: 0.99817+-0.00229 Training: 2022-04-11 09:10:50,024-[lfw][148000]Accuracy-Highest: 0.99833 Training: 2022-04-11 09:11:20,970-[cfp_fp][148000]XNorm: 20.557952 Training: 2022-04-11 09:11:20,970-[cfp_fp][148000]Accuracy-Flip: 0.98400+-0.00518 Training: 2022-04-11 09:11:20,971-[cfp_fp][148000]Accuracy-Highest: 0.98443 Training: 2022-04-11 09:11:47,428-[agedb_30][148000]XNorm: 22.214169 Training: 2022-04-11 09:11:47,428-[agedb_30][148000]Accuracy-Flip: 0.98083+-0.00772 Training: 2022-04-11 09:11:47,429-[agedb_30][148000]Accuracy-Highest: 0.98150 Training: 2022-04-11 09:11:49,416-Speed 118.76 samples/sec Loss 2.6032 LearningRate 0.0310 Epoch: 8 Global Step: 148010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:11:51,398-Speed 5165.95 samples/sec Loss 2.6515 LearningRate 0.0310 Epoch: 8 Global Step: 148020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:11:53,396-Speed 5129.23 samples/sec Loss 2.6331 LearningRate 0.0310 Epoch: 8 Global Step: 148030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:11:55,364-Speed 5204.79 samples/sec Loss 2.7177 LearningRate 0.0310 Epoch: 8 Global Step: 148040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:11:57,327-Speed 5217.28 samples/sec Loss 2.6472 LearningRate 0.0310 Epoch: 8 Global Step: 148050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:11:59,301-Speed 5190.32 samples/sec Loss 2.7019 LearningRate 0.0310 Epoch: 8 Global Step: 148060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:12:01,281-Speed 5174.06 samples/sec Loss 2.5866 LearningRate 0.0310 Epoch: 8 Global Step: 148070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:12:03,253-Speed 5193.54 samples/sec Loss 2.6460 LearningRate 0.0310 Epoch: 8 Global Step: 148080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:12:05,242-Speed 5148.37 samples/sec Loss 2.6634 LearningRate 0.0310 Epoch: 8 Global Step: 148090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:12:07,216-Speed 5190.11 samples/sec Loss 2.7606 LearningRate 0.0310 Epoch: 8 Global Step: 148100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:12:09,197-Speed 5170.09 samples/sec Loss 2.6492 LearningRate 0.0309 Epoch: 8 Global Step: 148110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:12:11,167-Speed 5201.01 samples/sec Loss 2.5624 LearningRate 0.0309 Epoch: 8 Global Step: 148120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:12:13,134-Speed 5205.86 samples/sec Loss 2.7067 LearningRate 0.0309 Epoch: 8 Global Step: 148130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:12:15,104-Speed 5200.74 samples/sec Loss 2.6814 LearningRate 0.0309 Epoch: 8 Global Step: 148140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:12:17,086-Speed 5167.04 samples/sec Loss 2.6549 LearningRate 0.0309 Epoch: 8 Global Step: 148150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:12:19,069-Speed 5164.77 samples/sec Loss 2.6010 LearningRate 0.0309 Epoch: 8 Global Step: 148160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:12:21,058-Speed 5150.46 samples/sec Loss 2.6903 LearningRate 0.0309 Epoch: 8 Global Step: 148170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:12:23,038-Speed 5175.60 samples/sec Loss 2.6728 LearningRate 0.0309 Epoch: 8 Global Step: 148180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:12:25,016-Speed 5179.19 samples/sec Loss 2.6745 LearningRate 0.0309 Epoch: 8 Global Step: 148190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:12:26,990-Speed 5189.44 samples/sec Loss 2.7011 LearningRate 0.0309 Epoch: 8 Global Step: 148200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:12:28,962-Speed 5192.29 samples/sec Loss 2.5941 LearningRate 0.0309 Epoch: 8 Global Step: 148210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:12:30,934-Speed 5195.56 samples/sec Loss 2.6845 LearningRate 0.0309 Epoch: 8 Global Step: 148220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:12:32,924-Speed 5146.86 samples/sec Loss 2.6200 LearningRate 0.0309 Epoch: 8 Global Step: 148230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:12:34,898-Speed 5188.65 samples/sec Loss 2.6927 LearningRate 0.0309 Epoch: 8 Global Step: 148240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:12:36,878-Speed 5174.12 samples/sec Loss 2.6810 LearningRate 0.0309 Epoch: 8 Global Step: 148250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:12:38,853-Speed 5186.31 samples/sec Loss 2.6495 LearningRate 0.0309 Epoch: 8 Global Step: 148260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:12:40,838-Speed 5161.04 samples/sec Loss 2.7207 LearningRate 0.0309 Epoch: 8 Global Step: 148270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:12:42,802-Speed 5213.48 samples/sec Loss 2.6250 LearningRate 0.0309 Epoch: 8 Global Step: 148280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:12:44,809-Speed 5104.63 samples/sec Loss 2.7285 LearningRate 0.0309 Epoch: 8 Global Step: 148290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:12:46,807-Speed 5128.52 samples/sec Loss 2.6280 LearningRate 0.0309 Epoch: 8 Global Step: 148300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:12:48,792-Speed 5159.27 samples/sec Loss 2.7000 LearningRate 0.0309 Epoch: 8 Global Step: 148310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:12:50,770-Speed 5177.60 samples/sec Loss 2.6623 LearningRate 0.0309 Epoch: 8 Global Step: 148320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:12:52,749-Speed 5177.14 samples/sec Loss 2.6605 LearningRate 0.0309 Epoch: 8 Global Step: 148330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:12:54,724-Speed 5186.88 samples/sec Loss 2.6619 LearningRate 0.0309 Epoch: 8 Global Step: 148340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:12:56,697-Speed 5191.93 samples/sec Loss 2.6210 LearningRate 0.0309 Epoch: 8 Global Step: 148350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:12:58,692-Speed 5132.84 samples/sec Loss 2.6944 LearningRate 0.0309 Epoch: 8 Global Step: 148360 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:13:00,686-Speed 5138.72 samples/sec Loss 2.5947 LearningRate 0.0309 Epoch: 8 Global Step: 148370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:13:02,684-Speed 5126.35 samples/sec Loss 2.6595 LearningRate 0.0309 Epoch: 8 Global Step: 148380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:13:04,658-Speed 5190.21 samples/sec Loss 2.6940 LearningRate 0.0309 Epoch: 8 Global Step: 148390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:13:06,650-Speed 5141.33 samples/sec Loss 2.6158 LearningRate 0.0309 Epoch: 8 Global Step: 148400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:13:08,628-Speed 5178.90 samples/sec Loss 2.6354 LearningRate 0.0308 Epoch: 8 Global Step: 148410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:13:10,609-Speed 5171.37 samples/sec Loss 2.6477 LearningRate 0.0308 Epoch: 8 Global Step: 148420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:13:12,587-Speed 5176.83 samples/sec Loss 2.6805 LearningRate 0.0308 Epoch: 8 Global Step: 148430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:13:14,561-Speed 5192.63 samples/sec Loss 2.6742 LearningRate 0.0308 Epoch: 8 Global Step: 148440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:13:16,541-Speed 5171.51 samples/sec Loss 2.6651 LearningRate 0.0308 Epoch: 8 Global Step: 148450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:13:18,525-Speed 5163.52 samples/sec Loss 2.6050 LearningRate 0.0308 Epoch: 8 Global Step: 148460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:13:20,513-Speed 5151.33 samples/sec Loss 2.7181 LearningRate 0.0308 Epoch: 8 Global Step: 148470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:13:22,496-Speed 5167.80 samples/sec Loss 2.6256 LearningRate 0.0308 Epoch: 8 Global Step: 148480 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:13:24,491-Speed 5134.58 samples/sec Loss 2.7017 LearningRate 0.0308 Epoch: 8 Global Step: 148490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:13:26,474-Speed 5164.27 samples/sec Loss 2.6430 LearningRate 0.0308 Epoch: 8 Global Step: 148500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:13:28,449-Speed 5186.48 samples/sec Loss 2.6309 LearningRate 0.0308 Epoch: 8 Global Step: 148510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:13:30,422-Speed 5192.29 samples/sec Loss 2.7098 LearningRate 0.0308 Epoch: 8 Global Step: 148520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:13:32,400-Speed 5178.76 samples/sec Loss 2.6538 LearningRate 0.0308 Epoch: 8 Global Step: 148530 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:13:34,382-Speed 5168.70 samples/sec Loss 2.7307 LearningRate 0.0308 Epoch: 8 Global Step: 148540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:13:36,354-Speed 5192.90 samples/sec Loss 2.6647 LearningRate 0.0308 Epoch: 8 Global Step: 148550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:13:38,353-Speed 5124.86 samples/sec Loss 2.6740 LearningRate 0.0308 Epoch: 8 Global Step: 148560 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:13:40,350-Speed 5130.73 samples/sec Loss 2.5731 LearningRate 0.0308 Epoch: 8 Global Step: 148570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:13:42,324-Speed 5188.87 samples/sec Loss 2.6934 LearningRate 0.0308 Epoch: 8 Global Step: 148580 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:13:44,318-Speed 5136.89 samples/sec Loss 2.7524 LearningRate 0.0308 Epoch: 8 Global Step: 148590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:13:46,306-Speed 5153.04 samples/sec Loss 2.6317 LearningRate 0.0308 Epoch: 8 Global Step: 148600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:13:48,281-Speed 5187.47 samples/sec Loss 2.6580 LearningRate 0.0308 Epoch: 8 Global Step: 148610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:13:50,278-Speed 5126.90 samples/sec Loss 2.6870 LearningRate 0.0308 Epoch: 8 Global Step: 148620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:13:52,251-Speed 5193.57 samples/sec Loss 2.6459 LearningRate 0.0308 Epoch: 8 Global Step: 148630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:13:54,225-Speed 5188.80 samples/sec Loss 2.7302 LearningRate 0.0308 Epoch: 8 Global Step: 148640 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:13:56,202-Speed 5181.14 samples/sec Loss 2.6213 LearningRate 0.0308 Epoch: 8 Global Step: 148650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:13:58,194-Speed 5142.70 samples/sec Loss 2.7282 LearningRate 0.0308 Epoch: 8 Global Step: 148660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:14:00,169-Speed 5184.83 samples/sec Loss 2.7023 LearningRate 0.0308 Epoch: 8 Global Step: 148670 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:14:02,157-Speed 5153.58 samples/sec Loss 2.7068 LearningRate 0.0308 Epoch: 8 Global Step: 148680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:14:04,129-Speed 5192.85 samples/sec Loss 2.6195 LearningRate 0.0308 Epoch: 8 Global Step: 148690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:14:06,103-Speed 5191.06 samples/sec Loss 2.6654 LearningRate 0.0308 Epoch: 8 Global Step: 148700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:14:08,072-Speed 5203.14 samples/sec Loss 2.6977 LearningRate 0.0307 Epoch: 8 Global Step: 148710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:14:10,057-Speed 5160.54 samples/sec Loss 2.5620 LearningRate 0.0307 Epoch: 8 Global Step: 148720 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:14:12,060-Speed 5112.74 samples/sec Loss 2.6369 LearningRate 0.0307 Epoch: 8 Global Step: 148730 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:14:14,039-Speed 5178.15 samples/sec Loss 2.6058 LearningRate 0.0307 Epoch: 8 Global Step: 148740 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:14:16,068-Speed 5047.74 samples/sec Loss 2.6578 LearningRate 0.0307 Epoch: 8 Global Step: 148750 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:14:18,040-Speed 5194.59 samples/sec Loss 2.6463 LearningRate 0.0307 Epoch: 8 Global Step: 148760 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:14:20,010-Speed 5198.84 samples/sec Loss 2.6210 LearningRate 0.0307 Epoch: 8 Global Step: 148770 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:14:21,985-Speed 5187.31 samples/sec Loss 2.6175 LearningRate 0.0307 Epoch: 8 Global Step: 148780 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:14:23,960-Speed 5186.59 samples/sec Loss 2.6449 LearningRate 0.0307 Epoch: 8 Global Step: 148790 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:14:25,936-Speed 5184.28 samples/sec Loss 2.6447 LearningRate 0.0307 Epoch: 8 Global Step: 148800 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:14:27,925-Speed 5149.22 samples/sec Loss 2.6643 LearningRate 0.0307 Epoch: 8 Global Step: 148810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:14:29,899-Speed 5188.88 samples/sec Loss 2.6695 LearningRate 0.0307 Epoch: 8 Global Step: 148820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:14:31,867-Speed 5204.96 samples/sec Loss 2.7023 LearningRate 0.0307 Epoch: 8 Global Step: 148830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:14:33,837-Speed 5200.32 samples/sec Loss 2.6596 LearningRate 0.0307 Epoch: 8 Global Step: 148840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:14:35,821-Speed 5163.68 samples/sec Loss 2.7315 LearningRate 0.0307 Epoch: 8 Global Step: 148850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:14:37,803-Speed 5166.65 samples/sec Loss 2.6625 LearningRate 0.0307 Epoch: 8 Global Step: 148860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:14:39,801-Speed 5128.24 samples/sec Loss 2.6592 LearningRate 0.0307 Epoch: 8 Global Step: 148870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:14:41,774-Speed 5190.52 samples/sec Loss 2.6413 LearningRate 0.0307 Epoch: 8 Global Step: 148880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:14:43,755-Speed 5171.71 samples/sec Loss 2.6549 LearningRate 0.0307 Epoch: 8 Global Step: 148890 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:14:45,740-Speed 5160.25 samples/sec Loss 2.6799 LearningRate 0.0307 Epoch: 8 Global Step: 148900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:14:47,769-Speed 5049.19 samples/sec Loss 2.6231 LearningRate 0.0307 Epoch: 8 Global Step: 148910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:14:49,756-Speed 5154.36 samples/sec Loss 2.6695 LearningRate 0.0307 Epoch: 8 Global Step: 148920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:14:51,748-Speed 5142.78 samples/sec Loss 2.6768 LearningRate 0.0307 Epoch: 8 Global Step: 148930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:14:53,719-Speed 5197.26 samples/sec Loss 2.6493 LearningRate 0.0307 Epoch: 8 Global Step: 148940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:14:55,688-Speed 5202.03 samples/sec Loss 2.6127 LearningRate 0.0307 Epoch: 8 Global Step: 148950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:14:57,663-Speed 5186.68 samples/sec Loss 2.5869 LearningRate 0.0307 Epoch: 8 Global Step: 148960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:14:59,651-Speed 5151.38 samples/sec Loss 2.6502 LearningRate 0.0307 Epoch: 8 Global Step: 148970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:15:01,633-Speed 5169.96 samples/sec Loss 2.6876 LearningRate 0.0307 Epoch: 8 Global Step: 148980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:15:03,610-Speed 5181.16 samples/sec Loss 2.6264 LearningRate 0.0307 Epoch: 8 Global Step: 148990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:15:05,574-Speed 5213.60 samples/sec Loss 2.6196 LearningRate 0.0307 Epoch: 8 Global Step: 149000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:15:07,546-Speed 5194.30 samples/sec Loss 2.6425 LearningRate 0.0306 Epoch: 8 Global Step: 149010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:15:09,548-Speed 5117.86 samples/sec Loss 2.5815 LearningRate 0.0306 Epoch: 8 Global Step: 149020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:15:11,534-Speed 5156.29 samples/sec Loss 2.6414 LearningRate 0.0306 Epoch: 8 Global Step: 149030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:15:13,509-Speed 5189.54 samples/sec Loss 2.6800 LearningRate 0.0306 Epoch: 8 Global Step: 149040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:15:15,483-Speed 5188.78 samples/sec Loss 2.6869 LearningRate 0.0306 Epoch: 8 Global Step: 149050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:15:17,457-Speed 5188.75 samples/sec Loss 2.6463 LearningRate 0.0306 Epoch: 8 Global Step: 149060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:15:19,441-Speed 5163.69 samples/sec Loss 2.6586 LearningRate 0.0306 Epoch: 8 Global Step: 149070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:15:21,421-Speed 5171.98 samples/sec Loss 2.6271 LearningRate 0.0306 Epoch: 8 Global Step: 149080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:15:23,396-Speed 5186.50 samples/sec Loss 2.5955 LearningRate 0.0306 Epoch: 8 Global Step: 149090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:15:25,397-Speed 5118.53 samples/sec Loss 2.6946 LearningRate 0.0306 Epoch: 8 Global Step: 149100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:15:27,382-Speed 5161.46 samples/sec Loss 2.6948 LearningRate 0.0306 Epoch: 8 Global Step: 149110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:15:29,368-Speed 5156.41 samples/sec Loss 2.6694 LearningRate 0.0306 Epoch: 8 Global Step: 149120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:15:31,364-Speed 5133.34 samples/sec Loss 2.6937 LearningRate 0.0306 Epoch: 8 Global Step: 149130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:15:33,354-Speed 5148.16 samples/sec Loss 2.7194 LearningRate 0.0306 Epoch: 8 Global Step: 149140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:15:35,340-Speed 5157.12 samples/sec Loss 2.6820 LearningRate 0.0306 Epoch: 8 Global Step: 149150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:15:37,343-Speed 5114.93 samples/sec Loss 2.6775 LearningRate 0.0306 Epoch: 8 Global Step: 149160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:15:39,357-Speed 5085.84 samples/sec Loss 2.6514 LearningRate 0.0306 Epoch: 8 Global Step: 149170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:15:41,349-Speed 5142.60 samples/sec Loss 2.6556 LearningRate 0.0306 Epoch: 8 Global Step: 149180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:15:43,324-Speed 5185.28 samples/sec Loss 2.5888 LearningRate 0.0306 Epoch: 8 Global Step: 149190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:15:45,310-Speed 5159.77 samples/sec Loss 2.6088 LearningRate 0.0306 Epoch: 8 Global Step: 149200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:15:47,309-Speed 5122.77 samples/sec Loss 2.6451 LearningRate 0.0306 Epoch: 8 Global Step: 149210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:15:49,292-Speed 5166.03 samples/sec Loss 2.6396 LearningRate 0.0306 Epoch: 8 Global Step: 149220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:15:51,286-Speed 5135.39 samples/sec Loss 2.6796 LearningRate 0.0306 Epoch: 8 Global Step: 149230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:15:53,278-Speed 5144.61 samples/sec Loss 2.6272 LearningRate 0.0306 Epoch: 8 Global Step: 149240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:15:55,252-Speed 5190.24 samples/sec Loss 2.5825 LearningRate 0.0306 Epoch: 8 Global Step: 149250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:15:57,223-Speed 5195.38 samples/sec Loss 2.5791 LearningRate 0.0306 Epoch: 8 Global Step: 149260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:15:59,224-Speed 5119.72 samples/sec Loss 2.6218 LearningRate 0.0306 Epoch: 8 Global Step: 149270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:16:01,199-Speed 5186.15 samples/sec Loss 2.6362 LearningRate 0.0306 Epoch: 8 Global Step: 149280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:16:03,172-Speed 5191.62 samples/sec Loss 2.6407 LearningRate 0.0306 Epoch: 8 Global Step: 149290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:16:05,147-Speed 5186.97 samples/sec Loss 2.7197 LearningRate 0.0306 Epoch: 8 Global Step: 149300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:16:07,121-Speed 5189.26 samples/sec Loss 2.6950 LearningRate 0.0306 Epoch: 8 Global Step: 149310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:16:09,105-Speed 5164.47 samples/sec Loss 2.6144 LearningRate 0.0305 Epoch: 8 Global Step: 149320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:16:11,088-Speed 5164.88 samples/sec Loss 2.6179 LearningRate 0.0305 Epoch: 8 Global Step: 149330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:16:13,076-Speed 5150.55 samples/sec Loss 2.7534 LearningRate 0.0305 Epoch: 8 Global Step: 149340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:16:15,053-Speed 5182.50 samples/sec Loss 2.6694 LearningRate 0.0305 Epoch: 8 Global Step: 149350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:16:17,034-Speed 5171.91 samples/sec Loss 2.6789 LearningRate 0.0305 Epoch: 8 Global Step: 149360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:16:19,008-Speed 5187.95 samples/sec Loss 2.6132 LearningRate 0.0305 Epoch: 8 Global Step: 149370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:16:20,984-Speed 5185.66 samples/sec Loss 2.6903 LearningRate 0.0305 Epoch: 8 Global Step: 149380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:16:22,974-Speed 5147.83 samples/sec Loss 2.6545 LearningRate 0.0305 Epoch: 8 Global Step: 149390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:16:24,953-Speed 5175.18 samples/sec Loss 2.6142 LearningRate 0.0305 Epoch: 8 Global Step: 149400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:16:26,927-Speed 5188.93 samples/sec Loss 2.6195 LearningRate 0.0305 Epoch: 8 Global Step: 149410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:16:28,911-Speed 5163.39 samples/sec Loss 2.6335 LearningRate 0.0305 Epoch: 8 Global Step: 149420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:16:30,891-Speed 5172.38 samples/sec Loss 2.6906 LearningRate 0.0305 Epoch: 8 Global Step: 149430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:16:32,862-Speed 5196.94 samples/sec Loss 2.6577 LearningRate 0.0305 Epoch: 8 Global Step: 149440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:16:34,839-Speed 5180.58 samples/sec Loss 2.6091 LearningRate 0.0305 Epoch: 8 Global Step: 149450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:16:36,827-Speed 5153.86 samples/sec Loss 2.6468 LearningRate 0.0305 Epoch: 8 Global Step: 149460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:16:38,806-Speed 5177.49 samples/sec Loss 2.6648 LearningRate 0.0305 Epoch: 8 Global Step: 149470 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:16:40,800-Speed 5136.81 samples/sec Loss 2.6390 LearningRate 0.0305 Epoch: 8 Global Step: 149480 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:16:42,775-Speed 5186.86 samples/sec Loss 2.6690 LearningRate 0.0305 Epoch: 8 Global Step: 149490 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:16:44,754-Speed 5174.89 samples/sec Loss 2.5951 LearningRate 0.0305 Epoch: 8 Global Step: 149500 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:16:46,746-Speed 5142.03 samples/sec Loss 2.6574 LearningRate 0.0305 Epoch: 8 Global Step: 149510 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:16:48,738-Speed 5142.94 samples/sec Loss 2.6354 LearningRate 0.0305 Epoch: 8 Global Step: 149520 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:16:50,716-Speed 5176.77 samples/sec Loss 2.6660 LearningRate 0.0305 Epoch: 8 Global Step: 149530 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:16:52,704-Speed 5153.64 samples/sec Loss 2.6139 LearningRate 0.0305 Epoch: 8 Global Step: 149540 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:16:54,679-Speed 5186.57 samples/sec Loss 2.6033 LearningRate 0.0305 Epoch: 8 Global Step: 149550 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:16:56,671-Speed 5143.42 samples/sec Loss 2.5968 LearningRate 0.0305 Epoch: 8 Global Step: 149560 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:16:58,653-Speed 5167.28 samples/sec Loss 2.5953 LearningRate 0.0305 Epoch: 8 Global Step: 149570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:17:00,648-Speed 5136.22 samples/sec Loss 2.6569 LearningRate 0.0305 Epoch: 8 Global Step: 149580 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:17:02,630-Speed 5167.03 samples/sec Loss 2.6512 LearningRate 0.0305 Epoch: 8 Global Step: 149590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:17:04,624-Speed 5137.76 samples/sec Loss 2.5896 LearningRate 0.0305 Epoch: 8 Global Step: 149600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:17:06,606-Speed 5168.25 samples/sec Loss 2.6301 LearningRate 0.0305 Epoch: 8 Global Step: 149610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:17:08,577-Speed 5196.09 samples/sec Loss 2.5611 LearningRate 0.0304 Epoch: 8 Global Step: 149620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:17:10,557-Speed 5174.95 samples/sec Loss 2.7074 LearningRate 0.0304 Epoch: 8 Global Step: 149630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:17:12,535-Speed 5177.20 samples/sec Loss 2.6585 LearningRate 0.0304 Epoch: 8 Global Step: 149640 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:17:14,512-Speed 5182.97 samples/sec Loss 2.6008 LearningRate 0.0304 Epoch: 8 Global Step: 149650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:17:16,496-Speed 5162.20 samples/sec Loss 2.6691 LearningRate 0.0304 Epoch: 8 Global Step: 149660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:17:18,468-Speed 5194.65 samples/sec Loss 2.6279 LearningRate 0.0304 Epoch: 8 Global Step: 149670 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:17:20,449-Speed 5171.73 samples/sec Loss 2.6911 LearningRate 0.0304 Epoch: 8 Global Step: 149680 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:17:22,426-Speed 5183.10 samples/sec Loss 2.6344 LearningRate 0.0304 Epoch: 8 Global Step: 149690 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:17:24,416-Speed 5146.59 samples/sec Loss 2.5966 LearningRate 0.0304 Epoch: 8 Global Step: 149700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:17:26,387-Speed 5198.02 samples/sec Loss 2.7145 LearningRate 0.0304 Epoch: 8 Global Step: 149710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:17:28,361-Speed 5189.00 samples/sec Loss 2.6561 LearningRate 0.0304 Epoch: 8 Global Step: 149720 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:17:30,331-Speed 5197.38 samples/sec Loss 2.6305 LearningRate 0.0304 Epoch: 8 Global Step: 149730 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:17:32,314-Speed 5167.16 samples/sec Loss 2.6183 LearningRate 0.0304 Epoch: 8 Global Step: 149740 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:17:34,302-Speed 5150.93 samples/sec Loss 2.5502 LearningRate 0.0304 Epoch: 8 Global Step: 149750 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:17:36,277-Speed 5188.22 samples/sec Loss 2.6538 LearningRate 0.0304 Epoch: 8 Global Step: 149760 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:17:38,264-Speed 5154.37 samples/sec Loss 2.6142 LearningRate 0.0304 Epoch: 8 Global Step: 149770 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:17:40,242-Speed 5180.61 samples/sec Loss 2.6054 LearningRate 0.0304 Epoch: 8 Global Step: 149780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:17:42,216-Speed 5189.01 samples/sec Loss 2.6093 LearningRate 0.0304 Epoch: 8 Global Step: 149790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:17:44,189-Speed 5190.71 samples/sec Loss 2.6482 LearningRate 0.0304 Epoch: 8 Global Step: 149800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:17:46,165-Speed 5184.85 samples/sec Loss 2.7102 LearningRate 0.0304 Epoch: 8 Global Step: 149810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:17:48,173-Speed 5101.55 samples/sec Loss 2.6452 LearningRate 0.0304 Epoch: 8 Global Step: 149820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:17:50,161-Speed 5151.39 samples/sec Loss 2.6879 LearningRate 0.0304 Epoch: 8 Global Step: 149830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:17:52,138-Speed 5181.41 samples/sec Loss 2.7413 LearningRate 0.0304 Epoch: 8 Global Step: 149840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:17:54,112-Speed 5188.97 samples/sec Loss 2.6371 LearningRate 0.0304 Epoch: 8 Global Step: 149850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:17:56,096-Speed 5164.09 samples/sec Loss 2.6730 LearningRate 0.0304 Epoch: 8 Global Step: 149860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:17:58,085-Speed 5148.40 samples/sec Loss 2.7114 LearningRate 0.0304 Epoch: 8 Global Step: 149870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:18:00,053-Speed 5206.50 samples/sec Loss 2.5538 LearningRate 0.0304 Epoch: 8 Global Step: 149880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:18:02,038-Speed 5159.73 samples/sec Loss 2.6497 LearningRate 0.0304 Epoch: 8 Global Step: 149890 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:18:04,018-Speed 5172.53 samples/sec Loss 2.6018 LearningRate 0.0304 Epoch: 8 Global Step: 149900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:18:06,001-Speed 5167.68 samples/sec Loss 2.6605 LearningRate 0.0304 Epoch: 8 Global Step: 149910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:18:07,975-Speed 5188.79 samples/sec Loss 2.6012 LearningRate 0.0303 Epoch: 8 Global Step: 149920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:18:09,971-Speed 5129.85 samples/sec Loss 2.6205 LearningRate 0.0303 Epoch: 8 Global Step: 149930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:18:11,961-Speed 5147.21 samples/sec Loss 2.6315 LearningRate 0.0303 Epoch: 8 Global Step: 149940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:18:13,952-Speed 5147.33 samples/sec Loss 2.6920 LearningRate 0.0303 Epoch: 8 Global Step: 149950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:18:15,937-Speed 5159.44 samples/sec Loss 2.6054 LearningRate 0.0303 Epoch: 8 Global Step: 149960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:18:17,918-Speed 5169.83 samples/sec Loss 2.6284 LearningRate 0.0303 Epoch: 8 Global Step: 149970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:18:19,895-Speed 5182.78 samples/sec Loss 2.6734 LearningRate 0.0303 Epoch: 8 Global Step: 149980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:18:21,882-Speed 5159.07 samples/sec Loss 2.6281 LearningRate 0.0303 Epoch: 8 Global Step: 149990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:18:23,880-Speed 5126.17 samples/sec Loss 2.6295 LearningRate 0.0303 Epoch: 8 Global Step: 150000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:18:50,404-[lfw][150000]XNorm: 22.678578 Training: 2022-04-11 09:18:50,405-[lfw][150000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 09:18:50,406-[lfw][150000]Accuracy-Highest: 0.99833 Training: 2022-04-11 09:19:21,124-[cfp_fp][150000]XNorm: 21.126419 Training: 2022-04-11 09:19:21,125-[cfp_fp][150000]Accuracy-Flip: 0.98200+-0.00357 Training: 2022-04-11 09:19:21,125-[cfp_fp][150000]Accuracy-Highest: 0.98443 Training: 2022-04-11 09:19:47,616-[agedb_30][150000]XNorm: 22.996035 Training: 2022-04-11 09:19:47,617-[agedb_30][150000]Accuracy-Flip: 0.97983+-0.00917 Training: 2022-04-11 09:19:47,617-[agedb_30][150000]Accuracy-Highest: 0.98150 Training: 2022-04-11 09:19:49,600-Speed 119.46 samples/sec Loss 2.6264 LearningRate 0.0303 Epoch: 8 Global Step: 150010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:19:51,580-Speed 5174.61 samples/sec Loss 2.5915 LearningRate 0.0303 Epoch: 8 Global Step: 150020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:19:53,600-Speed 5070.19 samples/sec Loss 2.6728 LearningRate 0.0303 Epoch: 8 Global Step: 150030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:19:55,566-Speed 5208.84 samples/sec Loss 2.5696 LearningRate 0.0303 Epoch: 8 Global Step: 150040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:19:57,525-Speed 5231.16 samples/sec Loss 2.7099 LearningRate 0.0303 Epoch: 8 Global Step: 150050 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:19:59,499-Speed 5189.28 samples/sec Loss 2.5835 LearningRate 0.0303 Epoch: 8 Global Step: 150060 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:20:01,475-Speed 5183.11 samples/sec Loss 2.6313 LearningRate 0.0303 Epoch: 8 Global Step: 150070 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:20:03,449-Speed 5189.03 samples/sec Loss 2.6802 LearningRate 0.0303 Epoch: 8 Global Step: 150080 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:20:05,435-Speed 5157.95 samples/sec Loss 2.6591 LearningRate 0.0303 Epoch: 8 Global Step: 150090 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:20:07,410-Speed 5185.15 samples/sec Loss 2.6517 LearningRate 0.0303 Epoch: 8 Global Step: 150100 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:20:09,393-Speed 5165.21 samples/sec Loss 2.7208 LearningRate 0.0303 Epoch: 8 Global Step: 150110 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:20:11,368-Speed 5188.72 samples/sec Loss 2.5947 LearningRate 0.0303 Epoch: 8 Global Step: 150120 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:20:13,354-Speed 5157.72 samples/sec Loss 2.5305 LearningRate 0.0303 Epoch: 8 Global Step: 150130 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:20:15,331-Speed 5180.45 samples/sec Loss 2.5796 LearningRate 0.0303 Epoch: 8 Global Step: 150140 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:20:17,302-Speed 5197.60 samples/sec Loss 2.6516 LearningRate 0.0303 Epoch: 8 Global Step: 150150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:20:19,271-Speed 5202.66 samples/sec Loss 2.6458 LearningRate 0.0303 Epoch: 8 Global Step: 150160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:20:21,240-Speed 5201.16 samples/sec Loss 2.6417 LearningRate 0.0303 Epoch: 8 Global Step: 150170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:20:23,217-Speed 5182.93 samples/sec Loss 2.6422 LearningRate 0.0303 Epoch: 8 Global Step: 150180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:20:25,191-Speed 5188.24 samples/sec Loss 2.6786 LearningRate 0.0303 Epoch: 8 Global Step: 150190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:20:27,158-Speed 5208.22 samples/sec Loss 2.6422 LearningRate 0.0303 Epoch: 8 Global Step: 150200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:20:29,383-Speed 4603.36 samples/sec Loss 2.6182 LearningRate 0.0303 Epoch: 8 Global Step: 150210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:21:01,379-Speed 320.05 samples/sec Loss 2.5479 LearningRate 0.0302 Epoch: 9 Global Step: 150220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:21:03,409-Speed 5047.89 samples/sec Loss 2.0756 LearningRate 0.0302 Epoch: 9 Global Step: 150230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:21:05,396-Speed 5154.43 samples/sec Loss 2.0514 LearningRate 0.0302 Epoch: 9 Global Step: 150240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:21:07,353-Speed 5235.23 samples/sec Loss 2.1117 LearningRate 0.0302 Epoch: 9 Global Step: 150250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:21:10,163-Speed 3644.32 samples/sec Loss 2.0281 LearningRate 0.0302 Epoch: 9 Global Step: 150260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:21:12,245-Speed 4919.73 samples/sec Loss 1.9754 LearningRate 0.0302 Epoch: 9 Global Step: 150270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:21:14,205-Speed 5227.63 samples/sec Loss 2.0515 LearningRate 0.0302 Epoch: 9 Global Step: 150280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:21:16,191-Speed 5156.22 samples/sec Loss 2.1113 LearningRate 0.0302 Epoch: 9 Global Step: 150290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:21:18,161-Speed 5200.69 samples/sec Loss 2.1060 LearningRate 0.0302 Epoch: 9 Global Step: 150300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:21:20,142-Speed 5170.90 samples/sec Loss 2.1157 LearningRate 0.0302 Epoch: 9 Global Step: 150310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:21:22,123-Speed 5171.77 samples/sec Loss 2.0812 LearningRate 0.0302 Epoch: 9 Global Step: 150320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:21:24,090-Speed 5205.74 samples/sec Loss 2.0543 LearningRate 0.0302 Epoch: 9 Global Step: 150330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:21:26,084-Speed 5138.73 samples/sec Loss 2.0241 LearningRate 0.0302 Epoch: 9 Global Step: 150340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:21:28,056-Speed 5193.77 samples/sec Loss 2.0520 LearningRate 0.0302 Epoch: 9 Global Step: 150350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:21:30,036-Speed 5173.46 samples/sec Loss 2.0855 LearningRate 0.0302 Epoch: 9 Global Step: 150360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:21:31,999-Speed 5217.29 samples/sec Loss 2.0402 LearningRate 0.0302 Epoch: 9 Global Step: 150370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:21:33,966-Speed 5206.87 samples/sec Loss 2.0720 LearningRate 0.0302 Epoch: 9 Global Step: 150380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:21:36,077-Speed 4854.06 samples/sec Loss 2.0526 LearningRate 0.0302 Epoch: 9 Global Step: 150390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:21:38,050-Speed 5190.74 samples/sec Loss 2.0015 LearningRate 0.0302 Epoch: 9 Global Step: 150400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:21:40,022-Speed 5195.88 samples/sec Loss 2.0103 LearningRate 0.0302 Epoch: 9 Global Step: 150410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:21:42,010-Speed 5151.28 samples/sec Loss 2.0527 LearningRate 0.0302 Epoch: 9 Global Step: 150420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:21:44,578-Speed 3988.45 samples/sec Loss 2.1065 LearningRate 0.0302 Epoch: 9 Global Step: 150430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:21:46,546-Speed 5205.36 samples/sec Loss 2.0601 LearningRate 0.0302 Epoch: 9 Global Step: 150440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:21:48,514-Speed 5206.14 samples/sec Loss 2.0967 LearningRate 0.0302 Epoch: 9 Global Step: 150450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:21:50,487-Speed 5191.09 samples/sec Loss 2.1382 LearningRate 0.0302 Epoch: 9 Global Step: 150460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:21:52,447-Speed 5225.73 samples/sec Loss 2.0853 LearningRate 0.0302 Epoch: 9 Global Step: 150470 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:21:54,426-Speed 5177.14 samples/sec Loss 2.0533 LearningRate 0.0302 Epoch: 9 Global Step: 150480 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:21:56,401-Speed 5185.19 samples/sec Loss 1.9686 LearningRate 0.0302 Epoch: 9 Global Step: 150490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:21:58,379-Speed 5179.41 samples/sec Loss 2.0205 LearningRate 0.0302 Epoch: 9 Global Step: 150500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:22:00,378-Speed 5124.63 samples/sec Loss 2.0590 LearningRate 0.0302 Epoch: 9 Global Step: 150510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:22:02,356-Speed 5180.47 samples/sec Loss 2.0904 LearningRate 0.0302 Epoch: 9 Global Step: 150520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:22:04,330-Speed 5188.34 samples/sec Loss 2.0529 LearningRate 0.0301 Epoch: 9 Global Step: 150530 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:22:06,298-Speed 5203.24 samples/sec Loss 2.0693 LearningRate 0.0301 Epoch: 9 Global Step: 150540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:22:08,270-Speed 5195.46 samples/sec Loss 2.0480 LearningRate 0.0301 Epoch: 9 Global Step: 150550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:22:10,260-Speed 5147.88 samples/sec Loss 2.0493 LearningRate 0.0301 Epoch: 9 Global Step: 150560 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:22:12,230-Speed 5199.60 samples/sec Loss 2.0528 LearningRate 0.0301 Epoch: 9 Global Step: 150570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:22:14,214-Speed 5162.01 samples/sec Loss 2.0530 LearningRate 0.0301 Epoch: 9 Global Step: 150580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:22:16,186-Speed 5194.30 samples/sec Loss 2.0297 LearningRate 0.0301 Epoch: 9 Global Step: 150590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:22:18,165-Speed 5178.12 samples/sec Loss 2.0694 LearningRate 0.0301 Epoch: 9 Global Step: 150600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:22:20,155-Speed 5146.40 samples/sec Loss 2.0954 LearningRate 0.0301 Epoch: 9 Global Step: 150610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:22:22,146-Speed 5144.02 samples/sec Loss 2.1035 LearningRate 0.0301 Epoch: 9 Global Step: 150620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:22:24,130-Speed 5164.28 samples/sec Loss 2.0573 LearningRate 0.0301 Epoch: 9 Global Step: 150630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:22:26,108-Speed 5179.33 samples/sec Loss 2.1034 LearningRate 0.0301 Epoch: 9 Global Step: 150640 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:22:28,113-Speed 5107.31 samples/sec Loss 2.0808 LearningRate 0.0301 Epoch: 9 Global Step: 150650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:22:30,120-Speed 5103.66 samples/sec Loss 2.0277 LearningRate 0.0301 Epoch: 9 Global Step: 150660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:22:32,096-Speed 5185.16 samples/sec Loss 2.1093 LearningRate 0.0301 Epoch: 9 Global Step: 150670 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:22:34,075-Speed 5176.86 samples/sec Loss 2.0562 LearningRate 0.0301 Epoch: 9 Global Step: 150680 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:22:36,067-Speed 5140.92 samples/sec Loss 2.0675 LearningRate 0.0301 Epoch: 9 Global Step: 150690 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:22:38,071-Speed 5111.82 samples/sec Loss 2.0539 LearningRate 0.0301 Epoch: 9 Global Step: 150700 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:22:40,051-Speed 5174.83 samples/sec Loss 2.1031 LearningRate 0.0301 Epoch: 9 Global Step: 150710 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:22:42,028-Speed 5180.27 samples/sec Loss 2.0511 LearningRate 0.0301 Epoch: 9 Global Step: 150720 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:22:44,015-Speed 5154.34 samples/sec Loss 2.0842 LearningRate 0.0301 Epoch: 9 Global Step: 150730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:22:45,995-Speed 5175.84 samples/sec Loss 2.1406 LearningRate 0.0301 Epoch: 9 Global Step: 150740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:22:47,986-Speed 5143.00 samples/sec Loss 2.1429 LearningRate 0.0301 Epoch: 9 Global Step: 150750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:22:49,965-Speed 5175.46 samples/sec Loss 2.1281 LearningRate 0.0301 Epoch: 9 Global Step: 150760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:22:51,945-Speed 5173.51 samples/sec Loss 2.0811 LearningRate 0.0301 Epoch: 9 Global Step: 150770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:22:53,919-Speed 5190.05 samples/sec Loss 2.0779 LearningRate 0.0301 Epoch: 9 Global Step: 150780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:22:55,893-Speed 5188.16 samples/sec Loss 2.1446 LearningRate 0.0301 Epoch: 9 Global Step: 150790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:22:57,880-Speed 5156.55 samples/sec Loss 2.1014 LearningRate 0.0301 Epoch: 9 Global Step: 150800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:22:59,864-Speed 5163.32 samples/sec Loss 2.0522 LearningRate 0.0301 Epoch: 9 Global Step: 150810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:23:01,847-Speed 5166.74 samples/sec Loss 2.1561 LearningRate 0.0301 Epoch: 9 Global Step: 150820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:23:03,824-Speed 5180.27 samples/sec Loss 2.1577 LearningRate 0.0300 Epoch: 9 Global Step: 150830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:23:05,797-Speed 5190.14 samples/sec Loss 2.1211 LearningRate 0.0300 Epoch: 9 Global Step: 150840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:23:07,772-Speed 5188.16 samples/sec Loss 2.1106 LearningRate 0.0300 Epoch: 9 Global Step: 150850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:23:09,748-Speed 5183.76 samples/sec Loss 2.1337 LearningRate 0.0300 Epoch: 9 Global Step: 150860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:23:11,723-Speed 5185.19 samples/sec Loss 2.1551 LearningRate 0.0300 Epoch: 9 Global Step: 150870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:23:13,698-Speed 5187.58 samples/sec Loss 2.1593 LearningRate 0.0300 Epoch: 9 Global Step: 150880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:23:15,695-Speed 5128.59 samples/sec Loss 2.1075 LearningRate 0.0300 Epoch: 9 Global Step: 150890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:23:17,672-Speed 5182.41 samples/sec Loss 2.1148 LearningRate 0.0300 Epoch: 9 Global Step: 150900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:23:19,660-Speed 5154.02 samples/sec Loss 2.1668 LearningRate 0.0300 Epoch: 9 Global Step: 150910 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:23:21,644-Speed 5162.95 samples/sec Loss 2.1143 LearningRate 0.0300 Epoch: 9 Global Step: 150920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:23:23,624-Speed 5173.63 samples/sec Loss 2.0510 LearningRate 0.0300 Epoch: 9 Global Step: 150930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:23:25,607-Speed 5164.06 samples/sec Loss 2.1571 LearningRate 0.0300 Epoch: 9 Global Step: 150940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:23:27,586-Speed 5176.60 samples/sec Loss 2.0821 LearningRate 0.0300 Epoch: 9 Global Step: 150950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:23:29,564-Speed 5178.77 samples/sec Loss 2.1489 LearningRate 0.0300 Epoch: 9 Global Step: 150960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:23:31,536-Speed 5195.65 samples/sec Loss 2.1284 LearningRate 0.0300 Epoch: 9 Global Step: 150970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:23:33,512-Speed 5181.77 samples/sec Loss 2.0832 LearningRate 0.0300 Epoch: 9 Global Step: 150980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:23:35,496-Speed 5163.87 samples/sec Loss 2.1274 LearningRate 0.0300 Epoch: 9 Global Step: 150990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:23:37,479-Speed 5165.53 samples/sec Loss 2.1503 LearningRate 0.0300 Epoch: 9 Global Step: 151000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:23:39,463-Speed 5163.00 samples/sec Loss 2.1566 LearningRate 0.0300 Epoch: 9 Global Step: 151010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:23:41,437-Speed 5189.69 samples/sec Loss 2.1300 LearningRate 0.0300 Epoch: 9 Global Step: 151020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:23:43,409-Speed 5194.41 samples/sec Loss 2.0435 LearningRate 0.0300 Epoch: 9 Global Step: 151030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:23:45,386-Speed 5180.68 samples/sec Loss 2.1100 LearningRate 0.0300 Epoch: 9 Global Step: 151040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:23:47,364-Speed 5181.81 samples/sec Loss 2.0795 LearningRate 0.0300 Epoch: 9 Global Step: 151050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:23:49,328-Speed 5214.94 samples/sec Loss 2.1098 LearningRate 0.0300 Epoch: 9 Global Step: 151060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:23:51,319-Speed 5144.33 samples/sec Loss 2.1203 LearningRate 0.0300 Epoch: 9 Global Step: 151070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:23:53,303-Speed 5163.74 samples/sec Loss 2.1961 LearningRate 0.0300 Epoch: 9 Global Step: 151080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:23:55,285-Speed 5166.53 samples/sec Loss 2.1000 LearningRate 0.0300 Epoch: 9 Global Step: 151090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:23:57,284-Speed 5125.71 samples/sec Loss 2.1340 LearningRate 0.0300 Epoch: 9 Global Step: 151100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:23:59,265-Speed 5170.13 samples/sec Loss 2.0817 LearningRate 0.0300 Epoch: 9 Global Step: 151110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:24:01,246-Speed 5172.16 samples/sec Loss 2.0397 LearningRate 0.0300 Epoch: 9 Global Step: 151120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:24:03,232-Speed 5156.16 samples/sec Loss 2.1464 LearningRate 0.0300 Epoch: 9 Global Step: 151130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:24:05,205-Speed 5193.49 samples/sec Loss 2.1336 LearningRate 0.0299 Epoch: 9 Global Step: 151140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:24:07,187-Speed 5167.77 samples/sec Loss 2.1412 LearningRate 0.0299 Epoch: 9 Global Step: 151150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:24:09,161-Speed 5188.62 samples/sec Loss 2.1595 LearningRate 0.0299 Epoch: 9 Global Step: 151160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:24:11,147-Speed 5156.92 samples/sec Loss 2.1040 LearningRate 0.0299 Epoch: 9 Global Step: 151170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:24:13,124-Speed 5183.47 samples/sec Loss 2.1509 LearningRate 0.0299 Epoch: 9 Global Step: 151180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:24:15,097-Speed 5189.07 samples/sec Loss 2.1233 LearningRate 0.0299 Epoch: 9 Global Step: 151190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:24:17,074-Speed 5184.36 samples/sec Loss 2.1207 LearningRate 0.0299 Epoch: 9 Global Step: 151200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:24:19,063-Speed 5148.86 samples/sec Loss 2.1085 LearningRate 0.0299 Epoch: 9 Global Step: 151210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:24:21,049-Speed 5158.83 samples/sec Loss 2.2157 LearningRate 0.0299 Epoch: 9 Global Step: 151220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:24:23,039-Speed 5147.26 samples/sec Loss 2.1737 LearningRate 0.0299 Epoch: 9 Global Step: 151230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:24:25,028-Speed 5150.24 samples/sec Loss 2.1071 LearningRate 0.0299 Epoch: 9 Global Step: 151240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:24:27,008-Speed 5172.40 samples/sec Loss 2.1390 LearningRate 0.0299 Epoch: 9 Global Step: 151250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:24:28,985-Speed 5182.27 samples/sec Loss 2.2044 LearningRate 0.0299 Epoch: 9 Global Step: 151260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:24:30,967-Speed 5168.47 samples/sec Loss 2.0450 LearningRate 0.0299 Epoch: 9 Global Step: 151270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:24:32,935-Speed 5202.78 samples/sec Loss 2.1357 LearningRate 0.0299 Epoch: 9 Global Step: 151280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:24:34,912-Speed 5180.81 samples/sec Loss 2.1472 LearningRate 0.0299 Epoch: 9 Global Step: 151290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:24:36,908-Speed 5132.45 samples/sec Loss 2.1716 LearningRate 0.0299 Epoch: 9 Global Step: 151300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:24:38,902-Speed 5138.80 samples/sec Loss 2.1187 LearningRate 0.0299 Epoch: 9 Global Step: 151310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:24:40,874-Speed 5194.36 samples/sec Loss 2.0955 LearningRate 0.0299 Epoch: 9 Global Step: 151320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:24:42,848-Speed 5187.65 samples/sec Loss 2.1187 LearningRate 0.0299 Epoch: 9 Global Step: 151330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:24:44,855-Speed 5105.32 samples/sec Loss 2.0788 LearningRate 0.0299 Epoch: 9 Global Step: 151340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:24:46,846-Speed 5144.79 samples/sec Loss 2.1065 LearningRate 0.0299 Epoch: 9 Global Step: 151350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:24:48,871-Speed 5059.29 samples/sec Loss 2.2018 LearningRate 0.0299 Epoch: 9 Global Step: 151360 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:24:50,879-Speed 5100.43 samples/sec Loss 2.1720 LearningRate 0.0299 Epoch: 9 Global Step: 151370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:24:52,880-Speed 5119.18 samples/sec Loss 2.1237 LearningRate 0.0299 Epoch: 9 Global Step: 151380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:24:54,864-Speed 5161.84 samples/sec Loss 2.1265 LearningRate 0.0299 Epoch: 9 Global Step: 151390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:24:56,846-Speed 5168.84 samples/sec Loss 2.1121 LearningRate 0.0299 Epoch: 9 Global Step: 151400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:24:58,836-Speed 5148.67 samples/sec Loss 2.0991 LearningRate 0.0299 Epoch: 9 Global Step: 151410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:25:00,834-Speed 5126.40 samples/sec Loss 2.1178 LearningRate 0.0299 Epoch: 9 Global Step: 151420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:25:02,825-Speed 5144.15 samples/sec Loss 2.1070 LearningRate 0.0299 Epoch: 9 Global Step: 151430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:25:04,826-Speed 5118.74 samples/sec Loss 2.1749 LearningRate 0.0298 Epoch: 9 Global Step: 151440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:25:06,805-Speed 5176.86 samples/sec Loss 2.1407 LearningRate 0.0298 Epoch: 9 Global Step: 151450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:25:08,796-Speed 5144.63 samples/sec Loss 2.1232 LearningRate 0.0298 Epoch: 9 Global Step: 151460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:25:10,781-Speed 5160.14 samples/sec Loss 2.1507 LearningRate 0.0298 Epoch: 9 Global Step: 151470 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:25:12,771-Speed 5147.79 samples/sec Loss 2.1313 LearningRate 0.0298 Epoch: 9 Global Step: 151480 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:25:14,761-Speed 5149.38 samples/sec Loss 2.1377 LearningRate 0.0298 Epoch: 9 Global Step: 151490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:25:16,737-Speed 5182.94 samples/sec Loss 2.2067 LearningRate 0.0298 Epoch: 9 Global Step: 151500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:25:18,710-Speed 5192.59 samples/sec Loss 2.1286 LearningRate 0.0298 Epoch: 9 Global Step: 151510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:25:20,690-Speed 5173.79 samples/sec Loss 2.0502 LearningRate 0.0298 Epoch: 9 Global Step: 151520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:25:22,663-Speed 5192.38 samples/sec Loss 2.2216 LearningRate 0.0298 Epoch: 9 Global Step: 151530 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:25:24,641-Speed 5176.98 samples/sec Loss 2.1962 LearningRate 0.0298 Epoch: 9 Global Step: 151540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:25:26,636-Speed 5134.56 samples/sec Loss 2.1465 LearningRate 0.0298 Epoch: 9 Global Step: 151550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:25:28,611-Speed 5186.13 samples/sec Loss 2.2128 LearningRate 0.0298 Epoch: 9 Global Step: 151560 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:25:30,582-Speed 5198.70 samples/sec Loss 2.1517 LearningRate 0.0298 Epoch: 9 Global Step: 151570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:25:32,554-Speed 5193.23 samples/sec Loss 2.1228 LearningRate 0.0298 Epoch: 9 Global Step: 151580 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:25:34,536-Speed 5169.80 samples/sec Loss 2.1005 LearningRate 0.0298 Epoch: 9 Global Step: 151590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:25:36,512-Speed 5182.42 samples/sec Loss 2.1405 LearningRate 0.0298 Epoch: 9 Global Step: 151600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:25:38,491-Speed 5177.88 samples/sec Loss 2.1606 LearningRate 0.0298 Epoch: 9 Global Step: 151610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:25:40,456-Speed 5211.89 samples/sec Loss 2.1640 LearningRate 0.0298 Epoch: 9 Global Step: 151620 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:25:42,429-Speed 5193.07 samples/sec Loss 2.2010 LearningRate 0.0298 Epoch: 9 Global Step: 151630 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:25:44,402-Speed 5191.85 samples/sec Loss 2.1250 LearningRate 0.0298 Epoch: 9 Global Step: 151640 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:25:46,389-Speed 5155.66 samples/sec Loss 2.1933 LearningRate 0.0298 Epoch: 9 Global Step: 151650 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:25:48,378-Speed 5149.62 samples/sec Loss 2.1822 LearningRate 0.0298 Epoch: 9 Global Step: 151660 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:25:50,355-Speed 5180.31 samples/sec Loss 2.1520 LearningRate 0.0298 Epoch: 9 Global Step: 151670 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:25:52,344-Speed 5149.41 samples/sec Loss 2.1664 LearningRate 0.0298 Epoch: 9 Global Step: 151680 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:25:54,318-Speed 5189.52 samples/sec Loss 2.2327 LearningRate 0.0298 Epoch: 9 Global Step: 151690 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:25:56,295-Speed 5182.22 samples/sec Loss 2.1157 LearningRate 0.0298 Epoch: 9 Global Step: 151700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:25:58,270-Speed 5187.22 samples/sec Loss 2.1527 LearningRate 0.0298 Epoch: 9 Global Step: 151710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:26:00,255-Speed 5160.15 samples/sec Loss 2.1174 LearningRate 0.0298 Epoch: 9 Global Step: 151720 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:26:02,276-Speed 5066.87 samples/sec Loss 2.1678 LearningRate 0.0298 Epoch: 9 Global Step: 151730 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:26:04,265-Speed 5150.87 samples/sec Loss 2.1379 LearningRate 0.0298 Epoch: 9 Global Step: 151740 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:26:06,242-Speed 5180.32 samples/sec Loss 2.1721 LearningRate 0.0297 Epoch: 9 Global Step: 151750 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:26:08,218-Speed 5185.56 samples/sec Loss 2.2360 LearningRate 0.0297 Epoch: 9 Global Step: 151760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:26:10,197-Speed 5175.95 samples/sec Loss 2.1964 LearningRate 0.0297 Epoch: 9 Global Step: 151770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:26:12,190-Speed 5139.21 samples/sec Loss 2.1316 LearningRate 0.0297 Epoch: 9 Global Step: 151780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:26:14,166-Speed 5183.94 samples/sec Loss 2.1479 LearningRate 0.0297 Epoch: 9 Global Step: 151790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:26:16,157-Speed 5145.32 samples/sec Loss 2.1639 LearningRate 0.0297 Epoch: 9 Global Step: 151800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:26:18,158-Speed 5120.14 samples/sec Loss 2.2031 LearningRate 0.0297 Epoch: 9 Global Step: 151810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:26:20,141-Speed 5165.67 samples/sec Loss 2.2095 LearningRate 0.0297 Epoch: 9 Global Step: 151820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:26:22,113-Speed 5193.53 samples/sec Loss 2.1710 LearningRate 0.0297 Epoch: 9 Global Step: 151830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:26:24,104-Speed 5143.66 samples/sec Loss 2.1950 LearningRate 0.0297 Epoch: 9 Global Step: 151840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:26:26,091-Speed 5155.49 samples/sec Loss 2.2027 LearningRate 0.0297 Epoch: 9 Global Step: 151850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:26:28,083-Speed 5142.24 samples/sec Loss 2.1462 LearningRate 0.0297 Epoch: 9 Global Step: 151860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:26:30,060-Speed 5180.23 samples/sec Loss 2.1730 LearningRate 0.0297 Epoch: 9 Global Step: 151870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:26:32,049-Speed 5151.49 samples/sec Loss 2.1204 LearningRate 0.0297 Epoch: 9 Global Step: 151880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:26:34,049-Speed 5121.37 samples/sec Loss 2.1902 LearningRate 0.0297 Epoch: 9 Global Step: 151890 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:26:36,025-Speed 5184.72 samples/sec Loss 2.2237 LearningRate 0.0297 Epoch: 9 Global Step: 151900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:26:38,001-Speed 5184.35 samples/sec Loss 2.1982 LearningRate 0.0297 Epoch: 9 Global Step: 151910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:26:39,991-Speed 5148.28 samples/sec Loss 2.1614 LearningRate 0.0297 Epoch: 9 Global Step: 151920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:26:41,964-Speed 5191.69 samples/sec Loss 2.2102 LearningRate 0.0297 Epoch: 9 Global Step: 151930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:26:43,938-Speed 5187.27 samples/sec Loss 2.1863 LearningRate 0.0297 Epoch: 9 Global Step: 151940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:26:45,918-Speed 5174.85 samples/sec Loss 2.1554 LearningRate 0.0297 Epoch: 9 Global Step: 151950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:26:47,900-Speed 5168.34 samples/sec Loss 2.2019 LearningRate 0.0297 Epoch: 9 Global Step: 151960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:26:49,883-Speed 5165.25 samples/sec Loss 2.2178 LearningRate 0.0297 Epoch: 9 Global Step: 151970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:26:51,919-Speed 5032.36 samples/sec Loss 2.2406 LearningRate 0.0297 Epoch: 9 Global Step: 151980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:26:53,895-Speed 5184.42 samples/sec Loss 2.1457 LearningRate 0.0297 Epoch: 9 Global Step: 151990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:26:55,868-Speed 5189.78 samples/sec Loss 2.1742 LearningRate 0.0297 Epoch: 9 Global Step: 152000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:27:22,360-[lfw][152000]XNorm: 21.434037 Training: 2022-04-11 09:27:22,360-[lfw][152000]Accuracy-Flip: 0.99767+-0.00271 Training: 2022-04-11 09:27:22,361-[lfw][152000]Accuracy-Highest: 0.99833 Training: 2022-04-11 09:27:53,190-[cfp_fp][152000]XNorm: 20.313161 Training: 2022-04-11 09:27:53,190-[cfp_fp][152000]Accuracy-Flip: 0.98157+-0.00566 Training: 2022-04-11 09:27:53,191-[cfp_fp][152000]Accuracy-Highest: 0.98443 Training: 2022-04-11 09:28:19,857-[agedb_30][152000]XNorm: 21.480131 Training: 2022-04-11 09:28:19,857-[agedb_30][152000]Accuracy-Flip: 0.97967+-0.00816 Training: 2022-04-11 09:28:19,858-[agedb_30][152000]Accuracy-Highest: 0.98150 Training: 2022-04-11 09:28:21,853-Speed 119.09 samples/sec Loss 2.1250 LearningRate 0.0297 Epoch: 9 Global Step: 152010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:28:23,818-Speed 5212.48 samples/sec Loss 2.2226 LearningRate 0.0297 Epoch: 9 Global Step: 152020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:28:25,811-Speed 5141.96 samples/sec Loss 2.1049 LearningRate 0.0297 Epoch: 9 Global Step: 152030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:28:27,785-Speed 5188.24 samples/sec Loss 2.1742 LearningRate 0.0297 Epoch: 9 Global Step: 152040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:28:29,757-Speed 5194.57 samples/sec Loss 2.1762 LearningRate 0.0296 Epoch: 9 Global Step: 152050 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:28:31,721-Speed 5215.09 samples/sec Loss 2.2709 LearningRate 0.0296 Epoch: 9 Global Step: 152060 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:28:33,697-Speed 5183.76 samples/sec Loss 2.2037 LearningRate 0.0296 Epoch: 9 Global Step: 152070 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:28:35,674-Speed 5182.53 samples/sec Loss 2.2599 LearningRate 0.0296 Epoch: 9 Global Step: 152080 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:28:37,646-Speed 5195.70 samples/sec Loss 2.2481 LearningRate 0.0296 Epoch: 9 Global Step: 152090 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:28:39,612-Speed 5209.67 samples/sec Loss 2.1883 LearningRate 0.0296 Epoch: 9 Global Step: 152100 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:28:41,602-Speed 5145.76 samples/sec Loss 2.2145 LearningRate 0.0296 Epoch: 9 Global Step: 152110 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:28:43,589-Speed 5156.42 samples/sec Loss 2.1924 LearningRate 0.0296 Epoch: 9 Global Step: 152120 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:28:45,574-Speed 5158.23 samples/sec Loss 2.1589 LearningRate 0.0296 Epoch: 9 Global Step: 152130 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:28:47,586-Speed 5092.70 samples/sec Loss 2.2163 LearningRate 0.0296 Epoch: 9 Global Step: 152140 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:28:49,582-Speed 5132.62 samples/sec Loss 2.1718 LearningRate 0.0296 Epoch: 9 Global Step: 152150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:28:51,582-Speed 5120.73 samples/sec Loss 2.2425 LearningRate 0.0296 Epoch: 9 Global Step: 152160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:28:53,558-Speed 5183.69 samples/sec Loss 2.1873 LearningRate 0.0296 Epoch: 9 Global Step: 152170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:28:55,537-Speed 5177.00 samples/sec Loss 2.2375 LearningRate 0.0296 Epoch: 9 Global Step: 152180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:28:57,519-Speed 5167.12 samples/sec Loss 2.2035 LearningRate 0.0296 Epoch: 9 Global Step: 152190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:28:59,498-Speed 5178.23 samples/sec Loss 2.2088 LearningRate 0.0296 Epoch: 9 Global Step: 152200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:29:01,476-Speed 5178.10 samples/sec Loss 2.2160 LearningRate 0.0296 Epoch: 9 Global Step: 152210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:29:03,458-Speed 5166.09 samples/sec Loss 2.1782 LearningRate 0.0296 Epoch: 9 Global Step: 152220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:29:05,446-Speed 5152.66 samples/sec Loss 2.2096 LearningRate 0.0296 Epoch: 9 Global Step: 152230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:29:07,437-Speed 5146.39 samples/sec Loss 2.2667 LearningRate 0.0296 Epoch: 9 Global Step: 152240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:29:09,412-Speed 5186.02 samples/sec Loss 2.1953 LearningRate 0.0296 Epoch: 9 Global Step: 152250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:29:11,427-Speed 5083.24 samples/sec Loss 2.2173 LearningRate 0.0296 Epoch: 9 Global Step: 152260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:29:13,413-Speed 5158.97 samples/sec Loss 2.2613 LearningRate 0.0296 Epoch: 9 Global Step: 152270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:29:15,395-Speed 5166.95 samples/sec Loss 2.2709 LearningRate 0.0296 Epoch: 9 Global Step: 152280 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:29:17,376-Speed 5172.37 samples/sec Loss 2.1840 LearningRate 0.0296 Epoch: 9 Global Step: 152290 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:29:19,355-Speed 5175.51 samples/sec Loss 2.1987 LearningRate 0.0296 Epoch: 9 Global Step: 152300 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:29:21,333-Speed 5178.34 samples/sec Loss 2.2111 LearningRate 0.0296 Epoch: 9 Global Step: 152310 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:29:23,318-Speed 5161.07 samples/sec Loss 2.1989 LearningRate 0.0296 Epoch: 9 Global Step: 152320 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:29:25,311-Speed 5139.47 samples/sec Loss 2.2373 LearningRate 0.0296 Epoch: 9 Global Step: 152330 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:29:27,300-Speed 5150.17 samples/sec Loss 2.1599 LearningRate 0.0296 Epoch: 9 Global Step: 152340 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:29:29,290-Speed 5147.18 samples/sec Loss 2.1605 LearningRate 0.0296 Epoch: 9 Global Step: 152350 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:29:31,282-Speed 5143.38 samples/sec Loss 2.2180 LearningRate 0.0295 Epoch: 9 Global Step: 152360 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:29:33,292-Speed 5097.15 samples/sec Loss 2.2475 LearningRate 0.0295 Epoch: 9 Global Step: 152370 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:29:35,272-Speed 5171.63 samples/sec Loss 2.2413 LearningRate 0.0295 Epoch: 9 Global Step: 152380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:29:37,265-Speed 5141.54 samples/sec Loss 2.2451 LearningRate 0.0295 Epoch: 9 Global Step: 152390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:29:39,248-Speed 5163.63 samples/sec Loss 2.2219 LearningRate 0.0295 Epoch: 9 Global Step: 152400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:29:41,246-Speed 5126.09 samples/sec Loss 2.2451 LearningRate 0.0295 Epoch: 9 Global Step: 152410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:29:43,219-Speed 5191.91 samples/sec Loss 2.1761 LearningRate 0.0295 Epoch: 9 Global Step: 152420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:29:45,211-Speed 5143.84 samples/sec Loss 2.2210 LearningRate 0.0295 Epoch: 9 Global Step: 152430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:29:47,188-Speed 5180.66 samples/sec Loss 2.2322 LearningRate 0.0295 Epoch: 9 Global Step: 152440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:29:49,172-Speed 5162.03 samples/sec Loss 2.2418 LearningRate 0.0295 Epoch: 9 Global Step: 152450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:29:51,150-Speed 5179.69 samples/sec Loss 2.2616 LearningRate 0.0295 Epoch: 9 Global Step: 152460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:29:53,131-Speed 5170.44 samples/sec Loss 2.2851 LearningRate 0.0295 Epoch: 9 Global Step: 152470 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:29:55,106-Speed 5188.36 samples/sec Loss 2.2123 LearningRate 0.0295 Epoch: 9 Global Step: 152480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:29:57,077-Speed 5196.99 samples/sec Loss 2.2744 LearningRate 0.0295 Epoch: 9 Global Step: 152490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:29:59,065-Speed 5152.85 samples/sec Loss 2.1839 LearningRate 0.0295 Epoch: 9 Global Step: 152500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:30:01,040-Speed 5186.58 samples/sec Loss 2.1493 LearningRate 0.0295 Epoch: 9 Global Step: 152510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:30:03,023-Speed 5163.95 samples/sec Loss 2.2196 LearningRate 0.0295 Epoch: 9 Global Step: 152520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:30:05,002-Speed 5175.29 samples/sec Loss 2.2033 LearningRate 0.0295 Epoch: 9 Global Step: 152530 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:30:06,974-Speed 5195.37 samples/sec Loss 2.2099 LearningRate 0.0295 Epoch: 9 Global Step: 152540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:30:08,956-Speed 5170.01 samples/sec Loss 2.2118 LearningRate 0.0295 Epoch: 9 Global Step: 152550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:30:10,928-Speed 5193.37 samples/sec Loss 2.1876 LearningRate 0.0295 Epoch: 9 Global Step: 152560 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:30:12,925-Speed 5130.56 samples/sec Loss 2.1893 LearningRate 0.0295 Epoch: 9 Global Step: 152570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:30:14,915-Speed 5147.50 samples/sec Loss 2.2430 LearningRate 0.0295 Epoch: 9 Global Step: 152580 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:30:16,917-Speed 5114.90 samples/sec Loss 2.2197 LearningRate 0.0295 Epoch: 9 Global Step: 152590 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:30:18,893-Speed 5184.61 samples/sec Loss 2.2719 LearningRate 0.0295 Epoch: 9 Global Step: 152600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:30:20,866-Speed 5192.77 samples/sec Loss 2.2569 LearningRate 0.0295 Epoch: 9 Global Step: 152610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:30:22,839-Speed 5191.02 samples/sec Loss 2.2294 LearningRate 0.0295 Epoch: 9 Global Step: 152620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:30:24,824-Speed 5158.52 samples/sec Loss 2.2713 LearningRate 0.0295 Epoch: 9 Global Step: 152630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:30:26,804-Speed 5175.19 samples/sec Loss 2.2299 LearningRate 0.0295 Epoch: 9 Global Step: 152640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:30:28,781-Speed 5181.96 samples/sec Loss 2.1970 LearningRate 0.0295 Epoch: 9 Global Step: 152650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:30:30,747-Speed 5210.95 samples/sec Loss 2.2811 LearningRate 0.0295 Epoch: 9 Global Step: 152660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:30:32,711-Speed 5213.53 samples/sec Loss 2.2554 LearningRate 0.0294 Epoch: 9 Global Step: 152670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:30:34,682-Speed 5197.23 samples/sec Loss 2.1764 LearningRate 0.0294 Epoch: 9 Global Step: 152680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:30:36,652-Speed 5199.60 samples/sec Loss 2.2299 LearningRate 0.0294 Epoch: 9 Global Step: 152690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:30:38,621-Speed 5203.73 samples/sec Loss 2.2536 LearningRate 0.0294 Epoch: 9 Global Step: 152700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:30:40,588-Speed 5206.98 samples/sec Loss 2.2565 LearningRate 0.0294 Epoch: 9 Global Step: 152710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:30:42,567-Speed 5175.26 samples/sec Loss 2.3110 LearningRate 0.0294 Epoch: 9 Global Step: 152720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:30:44,540-Speed 5191.65 samples/sec Loss 2.2793 LearningRate 0.0294 Epoch: 9 Global Step: 152730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:30:46,531-Speed 5145.24 samples/sec Loss 2.2419 LearningRate 0.0294 Epoch: 9 Global Step: 152740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:30:48,516-Speed 5161.69 samples/sec Loss 2.2753 LearningRate 0.0294 Epoch: 9 Global Step: 152750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:30:50,494-Speed 5178.89 samples/sec Loss 2.3226 LearningRate 0.0294 Epoch: 9 Global Step: 152760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:30:52,476-Speed 5168.30 samples/sec Loss 2.3180 LearningRate 0.0294 Epoch: 9 Global Step: 152770 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:30:54,442-Speed 5208.51 samples/sec Loss 2.2653 LearningRate 0.0294 Epoch: 9 Global Step: 152780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:30:56,406-Speed 5216.57 samples/sec Loss 2.2945 LearningRate 0.0294 Epoch: 9 Global Step: 152790 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:30:58,376-Speed 5198.32 samples/sec Loss 2.1991 LearningRate 0.0294 Epoch: 9 Global Step: 152800 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:31:00,362-Speed 5158.37 samples/sec Loss 2.2718 LearningRate 0.0294 Epoch: 9 Global Step: 152810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:31:02,334-Speed 5195.98 samples/sec Loss 2.2409 LearningRate 0.0294 Epoch: 9 Global Step: 152820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:31:04,323-Speed 5148.59 samples/sec Loss 2.1979 LearningRate 0.0294 Epoch: 9 Global Step: 152830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:31:06,292-Speed 5201.45 samples/sec Loss 2.2508 LearningRate 0.0294 Epoch: 9 Global Step: 152840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:31:08,265-Speed 5193.98 samples/sec Loss 2.2507 LearningRate 0.0294 Epoch: 9 Global Step: 152850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:31:10,229-Speed 5214.61 samples/sec Loss 2.2403 LearningRate 0.0294 Epoch: 9 Global Step: 152860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:31:12,194-Speed 5212.63 samples/sec Loss 2.1673 LearningRate 0.0294 Epoch: 9 Global Step: 152870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:31:14,166-Speed 5195.95 samples/sec Loss 2.2039 LearningRate 0.0294 Epoch: 9 Global Step: 152880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:31:16,134-Speed 5205.20 samples/sec Loss 2.2479 LearningRate 0.0294 Epoch: 9 Global Step: 152890 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:31:18,101-Speed 5205.95 samples/sec Loss 2.2437 LearningRate 0.0294 Epoch: 9 Global Step: 152900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:31:20,076-Speed 5185.84 samples/sec Loss 2.1915 LearningRate 0.0294 Epoch: 9 Global Step: 152910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:31:22,050-Speed 5191.46 samples/sec Loss 2.2589 LearningRate 0.0294 Epoch: 9 Global Step: 152920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:31:24,020-Speed 5198.77 samples/sec Loss 2.2580 LearningRate 0.0294 Epoch: 9 Global Step: 152930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:31:25,991-Speed 5196.01 samples/sec Loss 2.3151 LearningRate 0.0294 Epoch: 9 Global Step: 152940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:31:27,996-Speed 5110.15 samples/sec Loss 2.2394 LearningRate 0.0294 Epoch: 9 Global Step: 152950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:31:29,969-Speed 5191.37 samples/sec Loss 2.2504 LearningRate 0.0294 Epoch: 9 Global Step: 152960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:31:31,939-Speed 5200.20 samples/sec Loss 2.2365 LearningRate 0.0294 Epoch: 9 Global Step: 152970 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:31:33,910-Speed 5196.77 samples/sec Loss 2.2230 LearningRate 0.0293 Epoch: 9 Global Step: 152980 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:31:35,897-Speed 5154.63 samples/sec Loss 2.2806 LearningRate 0.0293 Epoch: 9 Global Step: 152990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:31:37,866-Speed 5203.36 samples/sec Loss 2.2680 LearningRate 0.0293 Epoch: 9 Global Step: 153000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:31:39,834-Speed 5204.66 samples/sec Loss 2.3017 LearningRate 0.0293 Epoch: 9 Global Step: 153010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:31:41,800-Speed 5209.71 samples/sec Loss 2.2307 LearningRate 0.0293 Epoch: 9 Global Step: 153020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:31:43,770-Speed 5198.78 samples/sec Loss 2.3035 LearningRate 0.0293 Epoch: 9 Global Step: 153030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:31:45,745-Speed 5187.19 samples/sec Loss 2.3289 LearningRate 0.0293 Epoch: 9 Global Step: 153040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:31:47,730-Speed 5161.26 samples/sec Loss 2.2868 LearningRate 0.0293 Epoch: 9 Global Step: 153050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:31:49,724-Speed 5136.44 samples/sec Loss 2.1807 LearningRate 0.0293 Epoch: 9 Global Step: 153060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:31:51,698-Speed 5191.65 samples/sec Loss 2.1954 LearningRate 0.0293 Epoch: 9 Global Step: 153070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:31:53,668-Speed 5199.29 samples/sec Loss 2.3188 LearningRate 0.0293 Epoch: 9 Global Step: 153080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:31:55,633-Speed 5211.67 samples/sec Loss 2.2411 LearningRate 0.0293 Epoch: 9 Global Step: 153090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:31:57,611-Speed 5180.06 samples/sec Loss 2.2849 LearningRate 0.0293 Epoch: 9 Global Step: 153100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:31:59,580-Speed 5201.87 samples/sec Loss 2.2369 LearningRate 0.0293 Epoch: 9 Global Step: 153110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:32:01,575-Speed 5133.26 samples/sec Loss 2.2693 LearningRate 0.0293 Epoch: 9 Global Step: 153120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:32:03,546-Speed 5198.84 samples/sec Loss 2.2863 LearningRate 0.0293 Epoch: 9 Global Step: 153130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:32:05,506-Speed 5223.75 samples/sec Loss 2.2865 LearningRate 0.0293 Epoch: 9 Global Step: 153140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:32:07,495-Speed 5151.30 samples/sec Loss 2.2636 LearningRate 0.0293 Epoch: 9 Global Step: 153150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:32:09,463-Speed 5206.31 samples/sec Loss 2.2775 LearningRate 0.0293 Epoch: 9 Global Step: 153160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:32:11,439-Speed 5182.48 samples/sec Loss 2.2855 LearningRate 0.0293 Epoch: 9 Global Step: 153170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:32:13,407-Speed 5206.29 samples/sec Loss 2.2672 LearningRate 0.0293 Epoch: 9 Global Step: 153180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:32:15,397-Speed 5145.95 samples/sec Loss 2.2793 LearningRate 0.0293 Epoch: 9 Global Step: 153190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:32:17,370-Speed 5192.17 samples/sec Loss 2.2699 LearningRate 0.0293 Epoch: 9 Global Step: 153200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:32:19,338-Speed 5205.91 samples/sec Loss 2.2933 LearningRate 0.0293 Epoch: 9 Global Step: 153210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:32:21,311-Speed 5191.10 samples/sec Loss 2.3154 LearningRate 0.0293 Epoch: 9 Global Step: 153220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:32:23,289-Speed 5178.73 samples/sec Loss 2.3058 LearningRate 0.0293 Epoch: 9 Global Step: 153230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:32:25,255-Speed 5210.41 samples/sec Loss 2.2913 LearningRate 0.0293 Epoch: 9 Global Step: 153240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:32:27,221-Speed 5209.94 samples/sec Loss 2.3334 LearningRate 0.0293 Epoch: 9 Global Step: 153250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:32:29,208-Speed 5155.51 samples/sec Loss 2.3316 LearningRate 0.0293 Epoch: 9 Global Step: 153260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:32:31,185-Speed 5180.40 samples/sec Loss 2.2376 LearningRate 0.0293 Epoch: 9 Global Step: 153270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:32:33,160-Speed 5188.61 samples/sec Loss 2.2680 LearningRate 0.0292 Epoch: 9 Global Step: 153280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:32:35,130-Speed 5199.98 samples/sec Loss 2.2817 LearningRate 0.0292 Epoch: 9 Global Step: 153290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:32:37,105-Speed 5185.13 samples/sec Loss 2.2464 LearningRate 0.0292 Epoch: 9 Global Step: 153300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:32:39,076-Speed 5198.23 samples/sec Loss 2.2996 LearningRate 0.0292 Epoch: 9 Global Step: 153310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:32:41,048-Speed 5192.56 samples/sec Loss 2.2900 LearningRate 0.0292 Epoch: 9 Global Step: 153320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:32:43,021-Speed 5192.85 samples/sec Loss 2.2769 LearningRate 0.0292 Epoch: 9 Global Step: 153330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:32:44,993-Speed 5194.60 samples/sec Loss 2.2312 LearningRate 0.0292 Epoch: 9 Global Step: 153340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:32:46,968-Speed 5186.12 samples/sec Loss 2.3377 LearningRate 0.0292 Epoch: 9 Global Step: 153350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:32:48,968-Speed 5122.78 samples/sec Loss 2.2463 LearningRate 0.0292 Epoch: 9 Global Step: 153360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:32:50,967-Speed 5123.57 samples/sec Loss 2.2727 LearningRate 0.0292 Epoch: 9 Global Step: 153370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:32:52,935-Speed 5206.37 samples/sec Loss 2.3105 LearningRate 0.0292 Epoch: 9 Global Step: 153380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:32:54,902-Speed 5207.51 samples/sec Loss 2.2401 LearningRate 0.0292 Epoch: 9 Global Step: 153390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:32:56,874-Speed 5193.88 samples/sec Loss 2.2892 LearningRate 0.0292 Epoch: 9 Global Step: 153400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:32:58,846-Speed 5194.49 samples/sec Loss 2.2287 LearningRate 0.0292 Epoch: 9 Global Step: 153410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:33:00,809-Speed 5217.01 samples/sec Loss 2.2164 LearningRate 0.0292 Epoch: 9 Global Step: 153420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:33:02,793-Speed 5163.93 samples/sec Loss 2.3169 LearningRate 0.0292 Epoch: 9 Global Step: 153430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:33:04,765-Speed 5195.41 samples/sec Loss 2.3135 LearningRate 0.0292 Epoch: 9 Global Step: 153440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:33:06,753-Speed 5154.34 samples/sec Loss 2.2727 LearningRate 0.0292 Epoch: 9 Global Step: 153450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:33:08,720-Speed 5206.72 samples/sec Loss 2.2757 LearningRate 0.0292 Epoch: 9 Global Step: 153460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:33:10,689-Speed 5201.79 samples/sec Loss 2.3392 LearningRate 0.0292 Epoch: 9 Global Step: 153470 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:33:12,677-Speed 5151.94 samples/sec Loss 2.2821 LearningRate 0.0292 Epoch: 9 Global Step: 153480 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:33:14,656-Speed 5176.94 samples/sec Loss 2.3099 LearningRate 0.0292 Epoch: 9 Global Step: 153490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:33:16,658-Speed 5115.92 samples/sec Loss 2.2389 LearningRate 0.0292 Epoch: 9 Global Step: 153500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:33:18,641-Speed 5166.41 samples/sec Loss 2.3203 LearningRate 0.0292 Epoch: 9 Global Step: 153510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:33:20,625-Speed 5161.41 samples/sec Loss 2.2448 LearningRate 0.0292 Epoch: 9 Global Step: 153520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:33:22,598-Speed 5192.66 samples/sec Loss 2.2545 LearningRate 0.0292 Epoch: 9 Global Step: 153530 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:33:24,581-Speed 5166.75 samples/sec Loss 2.3108 LearningRate 0.0292 Epoch: 9 Global Step: 153540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:33:26,565-Speed 5162.84 samples/sec Loss 2.3361 LearningRate 0.0292 Epoch: 9 Global Step: 153550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:33:28,535-Speed 5199.25 samples/sec Loss 2.2769 LearningRate 0.0292 Epoch: 9 Global Step: 153560 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:33:30,512-Speed 5183.76 samples/sec Loss 2.2446 LearningRate 0.0292 Epoch: 9 Global Step: 153570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:33:32,483-Speed 5196.27 samples/sec Loss 2.2969 LearningRate 0.0292 Epoch: 9 Global Step: 153580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:33:34,449-Speed 5209.00 samples/sec Loss 2.2711 LearningRate 0.0291 Epoch: 9 Global Step: 153590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:33:36,423-Speed 5189.26 samples/sec Loss 2.2578 LearningRate 0.0291 Epoch: 9 Global Step: 153600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:33:38,400-Speed 5183.56 samples/sec Loss 2.2948 LearningRate 0.0291 Epoch: 9 Global Step: 153610 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:33:40,388-Speed 5150.59 samples/sec Loss 2.2167 LearningRate 0.0291 Epoch: 9 Global Step: 153620 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:33:42,384-Speed 5134.06 samples/sec Loss 2.3569 LearningRate 0.0291 Epoch: 9 Global Step: 153630 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:33:44,352-Speed 5202.52 samples/sec Loss 2.3824 LearningRate 0.0291 Epoch: 9 Global Step: 153640 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:33:46,322-Speed 5200.98 samples/sec Loss 2.3151 LearningRate 0.0291 Epoch: 9 Global Step: 153650 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:33:48,299-Speed 5181.13 samples/sec Loss 2.3106 LearningRate 0.0291 Epoch: 9 Global Step: 153660 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:33:50,295-Speed 5133.00 samples/sec Loss 2.2767 LearningRate 0.0291 Epoch: 9 Global Step: 153670 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:33:52,301-Speed 5106.28 samples/sec Loss 2.2540 LearningRate 0.0291 Epoch: 9 Global Step: 153680 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:33:54,283-Speed 5166.68 samples/sec Loss 2.2966 LearningRate 0.0291 Epoch: 9 Global Step: 153690 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:33:56,259-Speed 5185.40 samples/sec Loss 2.3420 LearningRate 0.0291 Epoch: 9 Global Step: 153700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:33:58,244-Speed 5158.49 samples/sec Loss 2.3404 LearningRate 0.0291 Epoch: 9 Global Step: 153710 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:34:00,230-Speed 5158.03 samples/sec Loss 2.2787 LearningRate 0.0291 Epoch: 9 Global Step: 153720 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:34:02,206-Speed 5183.79 samples/sec Loss 2.3189 LearningRate 0.0291 Epoch: 9 Global Step: 153730 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:34:04,227-Speed 5068.73 samples/sec Loss 2.3178 LearningRate 0.0291 Epoch: 9 Global Step: 153740 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:34:06,198-Speed 5199.26 samples/sec Loss 2.2720 LearningRate 0.0291 Epoch: 9 Global Step: 153750 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:34:08,165-Speed 5206.57 samples/sec Loss 2.2349 LearningRate 0.0291 Epoch: 9 Global Step: 153760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:34:10,145-Speed 5173.50 samples/sec Loss 2.2512 LearningRate 0.0291 Epoch: 9 Global Step: 153770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:34:12,120-Speed 5185.80 samples/sec Loss 2.2410 LearningRate 0.0291 Epoch: 9 Global Step: 153780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:34:14,091-Speed 5197.05 samples/sec Loss 2.2862 LearningRate 0.0291 Epoch: 9 Global Step: 153790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:34:16,094-Speed 5114.63 samples/sec Loss 2.3437 LearningRate 0.0291 Epoch: 9 Global Step: 153800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:34:18,071-Speed 5181.57 samples/sec Loss 2.3568 LearningRate 0.0291 Epoch: 9 Global Step: 153810 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:34:20,043-Speed 5193.97 samples/sec Loss 2.2889 LearningRate 0.0291 Epoch: 9 Global Step: 153820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:34:22,026-Speed 5166.89 samples/sec Loss 2.2541 LearningRate 0.0291 Epoch: 9 Global Step: 153830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:34:24,009-Speed 5165.80 samples/sec Loss 2.2745 LearningRate 0.0291 Epoch: 9 Global Step: 153840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:34:25,979-Speed 5198.61 samples/sec Loss 2.2844 LearningRate 0.0291 Epoch: 9 Global Step: 153850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:34:27,966-Speed 5156.45 samples/sec Loss 2.2971 LearningRate 0.0291 Epoch: 9 Global Step: 153860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:34:29,944-Speed 5178.76 samples/sec Loss 2.2788 LearningRate 0.0291 Epoch: 9 Global Step: 153870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:34:31,914-Speed 5199.19 samples/sec Loss 2.3178 LearningRate 0.0291 Epoch: 9 Global Step: 153880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:34:33,909-Speed 5134.92 samples/sec Loss 2.3157 LearningRate 0.0291 Epoch: 9 Global Step: 153890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:34:35,889-Speed 5172.58 samples/sec Loss 2.3531 LearningRate 0.0290 Epoch: 9 Global Step: 153900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:34:37,878-Speed 5150.09 samples/sec Loss 2.3046 LearningRate 0.0290 Epoch: 9 Global Step: 153910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:34:39,872-Speed 5137.82 samples/sec Loss 2.3378 LearningRate 0.0290 Epoch: 9 Global Step: 153920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:34:41,862-Speed 5145.75 samples/sec Loss 2.2531 LearningRate 0.0290 Epoch: 9 Global Step: 153930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:34:43,838-Speed 5185.53 samples/sec Loss 2.3124 LearningRate 0.0290 Epoch: 9 Global Step: 153940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:34:45,810-Speed 5195.45 samples/sec Loss 2.2825 LearningRate 0.0290 Epoch: 9 Global Step: 153950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:34:47,780-Speed 5197.79 samples/sec Loss 2.2909 LearningRate 0.0290 Epoch: 9 Global Step: 153960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:34:49,762-Speed 5168.36 samples/sec Loss 2.2699 LearningRate 0.0290 Epoch: 9 Global Step: 153970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:34:51,755-Speed 5141.55 samples/sec Loss 2.2123 LearningRate 0.0290 Epoch: 9 Global Step: 153980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:34:53,732-Speed 5180.77 samples/sec Loss 2.3055 LearningRate 0.0290 Epoch: 9 Global Step: 153990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:34:55,694-Speed 5221.02 samples/sec Loss 2.2513 LearningRate 0.0290 Epoch: 9 Global Step: 154000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:35:22,284-[lfw][154000]XNorm: 22.042320 Training: 2022-04-11 09:35:22,284-[lfw][154000]Accuracy-Flip: 0.99783+-0.00269 Training: 2022-04-11 09:35:22,285-[lfw][154000]Accuracy-Highest: 0.99833 Training: 2022-04-11 09:35:52,865-[cfp_fp][154000]XNorm: 20.811043 Training: 2022-04-11 09:35:52,866-[cfp_fp][154000]Accuracy-Flip: 0.98229+-0.00558 Training: 2022-04-11 09:35:52,866-[cfp_fp][154000]Accuracy-Highest: 0.98443 Training: 2022-04-11 09:36:19,270-[agedb_30][154000]XNorm: 22.259916 Training: 2022-04-11 09:36:19,271-[agedb_30][154000]Accuracy-Flip: 0.98033+-0.00785 Training: 2022-04-11 09:36:19,271-[agedb_30][154000]Accuracy-Highest: 0.98150 Training: 2022-04-11 09:36:21,262-Speed 119.67 samples/sec Loss 2.3511 LearningRate 0.0290 Epoch: 9 Global Step: 154010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:36:23,225-Speed 5217.83 samples/sec Loss 2.3190 LearningRate 0.0290 Epoch: 9 Global Step: 154020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:36:25,225-Speed 5121.28 samples/sec Loss 2.2969 LearningRate 0.0290 Epoch: 9 Global Step: 154030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:36:27,187-Speed 5220.93 samples/sec Loss 2.3163 LearningRate 0.0290 Epoch: 9 Global Step: 154040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:36:29,170-Speed 5166.49 samples/sec Loss 2.4078 LearningRate 0.0290 Epoch: 9 Global Step: 154050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:36:31,132-Speed 5221.85 samples/sec Loss 2.2857 LearningRate 0.0290 Epoch: 9 Global Step: 154060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:36:33,099-Speed 5205.39 samples/sec Loss 2.3039 LearningRate 0.0290 Epoch: 9 Global Step: 154070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:36:35,074-Speed 5188.50 samples/sec Loss 2.2892 LearningRate 0.0290 Epoch: 9 Global Step: 154080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:36:37,040-Speed 5209.46 samples/sec Loss 2.3243 LearningRate 0.0290 Epoch: 9 Global Step: 154090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:36:39,006-Speed 5209.94 samples/sec Loss 2.3025 LearningRate 0.0290 Epoch: 9 Global Step: 154100 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:36:40,989-Speed 5165.59 samples/sec Loss 2.3278 LearningRate 0.0290 Epoch: 9 Global Step: 154110 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:36:42,954-Speed 5214.14 samples/sec Loss 2.3267 LearningRate 0.0290 Epoch: 9 Global Step: 154120 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:36:44,924-Speed 5200.12 samples/sec Loss 2.3551 LearningRate 0.0290 Epoch: 9 Global Step: 154130 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:36:46,890-Speed 5208.46 samples/sec Loss 2.3133 LearningRate 0.0290 Epoch: 9 Global Step: 154140 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:36:48,859-Speed 5205.63 samples/sec Loss 2.2908 LearningRate 0.0290 Epoch: 9 Global Step: 154150 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:36:50,833-Speed 5188.28 samples/sec Loss 2.2189 LearningRate 0.0290 Epoch: 9 Global Step: 154160 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:36:52,813-Speed 5174.24 samples/sec Loss 2.2666 LearningRate 0.0290 Epoch: 9 Global Step: 154170 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:36:54,777-Speed 5214.40 samples/sec Loss 2.3567 LearningRate 0.0290 Epoch: 9 Global Step: 154180 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:36:56,742-Speed 5212.28 samples/sec Loss 2.3044 LearningRate 0.0290 Epoch: 9 Global Step: 154190 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:36:58,735-Speed 5141.52 samples/sec Loss 2.2783 LearningRate 0.0290 Epoch: 9 Global Step: 154200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:37:00,721-Speed 5156.83 samples/sec Loss 2.2863 LearningRate 0.0289 Epoch: 9 Global Step: 154210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:37:02,704-Speed 5167.24 samples/sec Loss 2.2713 LearningRate 0.0289 Epoch: 9 Global Step: 154220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:37:04,673-Speed 5202.06 samples/sec Loss 2.3152 LearningRate 0.0289 Epoch: 9 Global Step: 154230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:37:06,639-Speed 5209.43 samples/sec Loss 2.3689 LearningRate 0.0289 Epoch: 9 Global Step: 154240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:37:08,610-Speed 5197.69 samples/sec Loss 2.3152 LearningRate 0.0289 Epoch: 9 Global Step: 154250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:37:10,586-Speed 5183.93 samples/sec Loss 2.2773 LearningRate 0.0289 Epoch: 9 Global Step: 154260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:37:12,555-Speed 5201.12 samples/sec Loss 2.3285 LearningRate 0.0289 Epoch: 9 Global Step: 154270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:37:14,530-Speed 5186.26 samples/sec Loss 2.2960 LearningRate 0.0289 Epoch: 9 Global Step: 154280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:37:16,500-Speed 5200.99 samples/sec Loss 2.2512 LearningRate 0.0289 Epoch: 9 Global Step: 154290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:37:18,480-Speed 5172.76 samples/sec Loss 2.3495 LearningRate 0.0289 Epoch: 9 Global Step: 154300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:37:20,471-Speed 5146.02 samples/sec Loss 2.3364 LearningRate 0.0289 Epoch: 9 Global Step: 154310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:37:22,444-Speed 5191.88 samples/sec Loss 2.2931 LearningRate 0.0289 Epoch: 9 Global Step: 154320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:37:24,417-Speed 5192.28 samples/sec Loss 2.3128 LearningRate 0.0289 Epoch: 9 Global Step: 154330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:37:26,392-Speed 5185.62 samples/sec Loss 2.2575 LearningRate 0.0289 Epoch: 9 Global Step: 154340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:37:28,364-Speed 5196.14 samples/sec Loss 2.3136 LearningRate 0.0289 Epoch: 9 Global Step: 154350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:37:30,332-Speed 5203.64 samples/sec Loss 2.2829 LearningRate 0.0289 Epoch: 9 Global Step: 154360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:37:32,301-Speed 5202.88 samples/sec Loss 2.3217 LearningRate 0.0289 Epoch: 9 Global Step: 154370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:37:34,261-Speed 5226.57 samples/sec Loss 2.2766 LearningRate 0.0289 Epoch: 9 Global Step: 154380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:37:36,234-Speed 5190.24 samples/sec Loss 2.3696 LearningRate 0.0289 Epoch: 9 Global Step: 154390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:37:38,222-Speed 5155.54 samples/sec Loss 2.2696 LearningRate 0.0289 Epoch: 9 Global Step: 154400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:37:40,199-Speed 5182.04 samples/sec Loss 2.3330 LearningRate 0.0289 Epoch: 9 Global Step: 154410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:37:42,171-Speed 5193.48 samples/sec Loss 2.2945 LearningRate 0.0289 Epoch: 9 Global Step: 154420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:37:44,143-Speed 5194.08 samples/sec Loss 2.3303 LearningRate 0.0289 Epoch: 9 Global Step: 154430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:37:46,115-Speed 5194.75 samples/sec Loss 2.3164 LearningRate 0.0289 Epoch: 9 Global Step: 154440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:37:48,087-Speed 5195.38 samples/sec Loss 2.2709 LearningRate 0.0289 Epoch: 9 Global Step: 154450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:37:50,062-Speed 5186.22 samples/sec Loss 2.3115 LearningRate 0.0289 Epoch: 9 Global Step: 154460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:37:52,033-Speed 5197.96 samples/sec Loss 2.3308 LearningRate 0.0289 Epoch: 9 Global Step: 154470 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:37:54,017-Speed 5162.46 samples/sec Loss 2.2819 LearningRate 0.0289 Epoch: 9 Global Step: 154480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:37:56,013-Speed 5131.61 samples/sec Loss 2.2856 LearningRate 0.0289 Epoch: 9 Global Step: 154490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:37:57,996-Speed 5164.55 samples/sec Loss 2.3186 LearningRate 0.0289 Epoch: 9 Global Step: 154500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:37:59,989-Speed 5139.35 samples/sec Loss 2.3233 LearningRate 0.0289 Epoch: 9 Global Step: 154510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:38:01,978-Speed 5150.57 samples/sec Loss 2.3325 LearningRate 0.0288 Epoch: 9 Global Step: 154520 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:38:03,950-Speed 5194.86 samples/sec Loss 2.2569 LearningRate 0.0288 Epoch: 9 Global Step: 154530 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:38:05,921-Speed 5196.54 samples/sec Loss 2.3460 LearningRate 0.0288 Epoch: 9 Global Step: 154540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:38:07,895-Speed 5194.01 samples/sec Loss 2.3121 LearningRate 0.0288 Epoch: 9 Global Step: 154550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:38:09,862-Speed 5205.15 samples/sec Loss 2.2910 LearningRate 0.0288 Epoch: 9 Global Step: 154560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:38:11,833-Speed 5198.28 samples/sec Loss 2.3323 LearningRate 0.0288 Epoch: 9 Global Step: 154570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:38:13,814-Speed 5170.14 samples/sec Loss 2.3921 LearningRate 0.0288 Epoch: 9 Global Step: 154580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:38:15,788-Speed 5189.60 samples/sec Loss 2.3470 LearningRate 0.0288 Epoch: 9 Global Step: 154590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:38:17,763-Speed 5185.24 samples/sec Loss 2.3333 LearningRate 0.0288 Epoch: 9 Global Step: 154600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:38:19,737-Speed 5189.22 samples/sec Loss 2.2978 LearningRate 0.0288 Epoch: 9 Global Step: 154610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:38:21,706-Speed 5202.53 samples/sec Loss 2.3258 LearningRate 0.0288 Epoch: 9 Global Step: 154620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:38:23,681-Speed 5188.48 samples/sec Loss 2.3593 LearningRate 0.0288 Epoch: 9 Global Step: 154630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:38:25,674-Speed 5138.42 samples/sec Loss 2.3457 LearningRate 0.0288 Epoch: 9 Global Step: 154640 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:38:27,660-Speed 5158.52 samples/sec Loss 2.3182 LearningRate 0.0288 Epoch: 9 Global Step: 154650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:38:29,640-Speed 5175.12 samples/sec Loss 2.3606 LearningRate 0.0288 Epoch: 9 Global Step: 154660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:38:31,611-Speed 5197.12 samples/sec Loss 2.3901 LearningRate 0.0288 Epoch: 9 Global Step: 154670 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:38:33,582-Speed 5197.49 samples/sec Loss 2.3117 LearningRate 0.0288 Epoch: 9 Global Step: 154680 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:38:35,584-Speed 5114.22 samples/sec Loss 2.3790 LearningRate 0.0288 Epoch: 9 Global Step: 154690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:38:37,560-Speed 5186.65 samples/sec Loss 2.3311 LearningRate 0.0288 Epoch: 9 Global Step: 154700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:38:39,533-Speed 5191.05 samples/sec Loss 2.3604 LearningRate 0.0288 Epoch: 9 Global Step: 154710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:38:41,501-Speed 5204.01 samples/sec Loss 2.3487 LearningRate 0.0288 Epoch: 9 Global Step: 154720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:38:43,475-Speed 5188.52 samples/sec Loss 2.2914 LearningRate 0.0288 Epoch: 9 Global Step: 154730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:38:45,446-Speed 5199.49 samples/sec Loss 2.3303 LearningRate 0.0288 Epoch: 9 Global Step: 154740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:38:47,438-Speed 5142.48 samples/sec Loss 2.3929 LearningRate 0.0288 Epoch: 9 Global Step: 154750 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:38:49,404-Speed 5207.87 samples/sec Loss 2.3554 LearningRate 0.0288 Epoch: 9 Global Step: 154760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:38:51,375-Speed 5199.39 samples/sec Loss 2.2796 LearningRate 0.0288 Epoch: 9 Global Step: 154770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:38:53,357-Speed 5167.21 samples/sec Loss 2.3583 LearningRate 0.0288 Epoch: 9 Global Step: 154780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:38:55,329-Speed 5194.93 samples/sec Loss 2.4048 LearningRate 0.0288 Epoch: 9 Global Step: 154790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:38:57,322-Speed 5139.48 samples/sec Loss 2.3246 LearningRate 0.0288 Epoch: 9 Global Step: 154800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:38:59,305-Speed 5166.66 samples/sec Loss 2.3389 LearningRate 0.0288 Epoch: 9 Global Step: 154810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:39:01,310-Speed 5109.02 samples/sec Loss 2.3338 LearningRate 0.0288 Epoch: 9 Global Step: 154820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:39:03,278-Speed 5203.72 samples/sec Loss 2.3356 LearningRate 0.0287 Epoch: 9 Global Step: 154830 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:39:05,251-Speed 5192.87 samples/sec Loss 2.3228 LearningRate 0.0287 Epoch: 9 Global Step: 154840 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:39:07,222-Speed 5197.56 samples/sec Loss 2.3629 LearningRate 0.0287 Epoch: 9 Global Step: 154850 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:39:09,192-Speed 5198.43 samples/sec Loss 2.2842 LearningRate 0.0287 Epoch: 9 Global Step: 154860 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:39:11,162-Speed 5200.94 samples/sec Loss 2.3542 LearningRate 0.0287 Epoch: 9 Global Step: 154870 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:39:13,139-Speed 5180.44 samples/sec Loss 2.3290 LearningRate 0.0287 Epoch: 9 Global Step: 154880 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:39:15,120-Speed 5172.07 samples/sec Loss 2.3365 LearningRate 0.0287 Epoch: 9 Global Step: 154890 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:39:17,102-Speed 5166.61 samples/sec Loss 2.4009 LearningRate 0.0287 Epoch: 9 Global Step: 154900 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:39:19,085-Speed 5165.46 samples/sec Loss 2.3283 LearningRate 0.0287 Epoch: 9 Global Step: 154910 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:39:21,080-Speed 5134.77 samples/sec Loss 2.3549 LearningRate 0.0287 Epoch: 9 Global Step: 154920 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:39:23,070-Speed 5148.65 samples/sec Loss 2.3389 LearningRate 0.0287 Epoch: 9 Global Step: 154930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:39:25,050-Speed 5172.62 samples/sec Loss 2.3277 LearningRate 0.0287 Epoch: 9 Global Step: 154940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:39:27,031-Speed 5171.17 samples/sec Loss 2.4338 LearningRate 0.0287 Epoch: 9 Global Step: 154950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:39:29,007-Speed 5182.84 samples/sec Loss 2.3510 LearningRate 0.0287 Epoch: 9 Global Step: 154960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:39:30,982-Speed 5186.68 samples/sec Loss 2.2675 LearningRate 0.0287 Epoch: 9 Global Step: 154970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:39:32,954-Speed 5194.06 samples/sec Loss 2.3081 LearningRate 0.0287 Epoch: 9 Global Step: 154980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:39:34,938-Speed 5163.80 samples/sec Loss 2.3583 LearningRate 0.0287 Epoch: 9 Global Step: 154990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:39:36,913-Speed 5186.88 samples/sec Loss 2.4004 LearningRate 0.0287 Epoch: 9 Global Step: 155000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:39:38,891-Speed 5177.31 samples/sec Loss 2.3587 LearningRate 0.0287 Epoch: 9 Global Step: 155010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:39:40,872-Speed 5172.84 samples/sec Loss 2.3239 LearningRate 0.0287 Epoch: 9 Global Step: 155020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:39:42,841-Speed 5200.30 samples/sec Loss 2.3046 LearningRate 0.0287 Epoch: 9 Global Step: 155030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:39:44,830-Speed 5151.09 samples/sec Loss 2.3411 LearningRate 0.0287 Epoch: 9 Global Step: 155040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:39:46,806-Speed 5184.01 samples/sec Loss 2.4199 LearningRate 0.0287 Epoch: 9 Global Step: 155050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:39:48,796-Speed 5148.95 samples/sec Loss 2.3844 LearningRate 0.0287 Epoch: 9 Global Step: 155060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:39:50,769-Speed 5191.60 samples/sec Loss 2.3062 LearningRate 0.0287 Epoch: 9 Global Step: 155070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:39:52,739-Speed 5198.00 samples/sec Loss 2.3298 LearningRate 0.0287 Epoch: 9 Global Step: 155080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:39:54,721-Speed 5170.11 samples/sec Loss 2.3550 LearningRate 0.0287 Epoch: 9 Global Step: 155090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:39:56,690-Speed 5201.69 samples/sec Loss 2.3620 LearningRate 0.0287 Epoch: 9 Global Step: 155100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:39:58,670-Speed 5171.87 samples/sec Loss 2.3282 LearningRate 0.0287 Epoch: 9 Global Step: 155110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:40:00,646-Speed 5186.56 samples/sec Loss 2.3323 LearningRate 0.0287 Epoch: 9 Global Step: 155120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:40:02,640-Speed 5135.80 samples/sec Loss 2.4159 LearningRate 0.0287 Epoch: 9 Global Step: 155130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:40:04,627-Speed 5154.60 samples/sec Loss 2.4337 LearningRate 0.0287 Epoch: 9 Global Step: 155140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:40:06,611-Speed 5164.20 samples/sec Loss 2.3631 LearningRate 0.0286 Epoch: 9 Global Step: 155150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:40:08,583-Speed 5194.13 samples/sec Loss 2.3500 LearningRate 0.0286 Epoch: 9 Global Step: 155160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:40:10,574-Speed 5146.25 samples/sec Loss 2.2901 LearningRate 0.0286 Epoch: 9 Global Step: 155170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:40:12,561-Speed 5153.93 samples/sec Loss 2.3650 LearningRate 0.0286 Epoch: 9 Global Step: 155180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:40:14,542-Speed 5172.07 samples/sec Loss 2.3014 LearningRate 0.0286 Epoch: 9 Global Step: 155190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:40:16,548-Speed 5105.82 samples/sec Loss 2.3671 LearningRate 0.0286 Epoch: 9 Global Step: 155200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:40:18,530-Speed 5168.91 samples/sec Loss 2.4057 LearningRate 0.0286 Epoch: 9 Global Step: 155210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:40:20,502-Speed 5193.56 samples/sec Loss 2.2917 LearningRate 0.0286 Epoch: 9 Global Step: 155220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:40:22,483-Speed 5172.26 samples/sec Loss 2.3480 LearningRate 0.0286 Epoch: 9 Global Step: 155230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:40:24,465-Speed 5171.08 samples/sec Loss 2.3247 LearningRate 0.0286 Epoch: 9 Global Step: 155240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:40:26,442-Speed 5180.83 samples/sec Loss 2.2680 LearningRate 0.0286 Epoch: 9 Global Step: 155250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:40:28,414-Speed 5193.94 samples/sec Loss 2.3877 LearningRate 0.0286 Epoch: 9 Global Step: 155260 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:40:30,400-Speed 5156.91 samples/sec Loss 2.3932 LearningRate 0.0286 Epoch: 9 Global Step: 155270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:40:32,374-Speed 5189.89 samples/sec Loss 2.3028 LearningRate 0.0286 Epoch: 9 Global Step: 155280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:40:34,345-Speed 5196.98 samples/sec Loss 2.3795 LearningRate 0.0286 Epoch: 9 Global Step: 155290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:40:36,322-Speed 5179.82 samples/sec Loss 2.3277 LearningRate 0.0286 Epoch: 9 Global Step: 155300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:40:38,323-Speed 5120.59 samples/sec Loss 2.4032 LearningRate 0.0286 Epoch: 9 Global Step: 155310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:40:40,315-Speed 5142.65 samples/sec Loss 2.3458 LearningRate 0.0286 Epoch: 9 Global Step: 155320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:40:42,292-Speed 5182.94 samples/sec Loss 2.3306 LearningRate 0.0286 Epoch: 9 Global Step: 155330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:40:44,273-Speed 5169.03 samples/sec Loss 2.4140 LearningRate 0.0286 Epoch: 9 Global Step: 155340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:40:46,244-Speed 5197.56 samples/sec Loss 2.3369 LearningRate 0.0286 Epoch: 9 Global Step: 155350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:40:48,229-Speed 5160.79 samples/sec Loss 2.3529 LearningRate 0.0286 Epoch: 9 Global Step: 155360 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:40:50,209-Speed 5173.48 samples/sec Loss 2.3710 LearningRate 0.0286 Epoch: 9 Global Step: 155370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:40:52,178-Speed 5200.95 samples/sec Loss 2.3269 LearningRate 0.0286 Epoch: 9 Global Step: 155380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:40:54,154-Speed 5184.37 samples/sec Loss 2.3747 LearningRate 0.0286 Epoch: 9 Global Step: 155390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:40:56,126-Speed 5194.16 samples/sec Loss 2.2658 LearningRate 0.0286 Epoch: 9 Global Step: 155400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:40:58,103-Speed 5180.70 samples/sec Loss 2.4312 LearningRate 0.0286 Epoch: 9 Global Step: 155410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:41:00,080-Speed 5181.85 samples/sec Loss 2.3691 LearningRate 0.0286 Epoch: 9 Global Step: 155420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:41:02,049-Speed 5203.12 samples/sec Loss 2.3500 LearningRate 0.0286 Epoch: 9 Global Step: 155430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:41:04,022-Speed 5193.20 samples/sec Loss 2.3073 LearningRate 0.0286 Epoch: 9 Global Step: 155440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:41:05,994-Speed 5194.20 samples/sec Loss 2.3636 LearningRate 0.0286 Epoch: 9 Global Step: 155450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:41:07,967-Speed 5190.62 samples/sec Loss 2.3608 LearningRate 0.0285 Epoch: 9 Global Step: 155460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:41:09,936-Speed 5202.15 samples/sec Loss 2.3554 LearningRate 0.0285 Epoch: 9 Global Step: 155470 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:41:11,916-Speed 5175.23 samples/sec Loss 2.3649 LearningRate 0.0285 Epoch: 9 Global Step: 155480 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:41:13,888-Speed 5192.40 samples/sec Loss 2.4492 LearningRate 0.0285 Epoch: 9 Global Step: 155490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:41:15,867-Speed 5175.53 samples/sec Loss 2.3794 LearningRate 0.0285 Epoch: 9 Global Step: 155500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:41:17,836-Speed 5204.33 samples/sec Loss 2.3119 LearningRate 0.0285 Epoch: 9 Global Step: 155510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:41:19,826-Speed 5146.98 samples/sec Loss 2.3750 LearningRate 0.0285 Epoch: 9 Global Step: 155520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:41:21,809-Speed 5166.36 samples/sec Loss 2.3168 LearningRate 0.0285 Epoch: 9 Global Step: 155530 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:41:23,798-Speed 5151.09 samples/sec Loss 2.3461 LearningRate 0.0285 Epoch: 9 Global Step: 155540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:41:25,770-Speed 5192.24 samples/sec Loss 2.2932 LearningRate 0.0285 Epoch: 9 Global Step: 155550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:41:27,755-Speed 5160.30 samples/sec Loss 2.2765 LearningRate 0.0285 Epoch: 9 Global Step: 155560 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:41:29,744-Speed 5151.67 samples/sec Loss 2.3390 LearningRate 0.0285 Epoch: 9 Global Step: 155570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:41:31,723-Speed 5175.90 samples/sec Loss 2.3893 LearningRate 0.0285 Epoch: 9 Global Step: 155580 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:41:33,721-Speed 5125.55 samples/sec Loss 2.3523 LearningRate 0.0285 Epoch: 9 Global Step: 155590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:41:35,721-Speed 5122.47 samples/sec Loss 2.3848 LearningRate 0.0285 Epoch: 9 Global Step: 155600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:41:37,745-Speed 5061.18 samples/sec Loss 2.3432 LearningRate 0.0285 Epoch: 9 Global Step: 155610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:41:39,727-Speed 5168.10 samples/sec Loss 2.3938 LearningRate 0.0285 Epoch: 9 Global Step: 155620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:41:41,699-Speed 5195.60 samples/sec Loss 2.3353 LearningRate 0.0285 Epoch: 9 Global Step: 155630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:41:43,661-Speed 5218.40 samples/sec Loss 2.3578 LearningRate 0.0285 Epoch: 9 Global Step: 155640 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:41:45,635-Speed 5191.03 samples/sec Loss 2.3584 LearningRate 0.0285 Epoch: 9 Global Step: 155650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:41:47,612-Speed 5179.89 samples/sec Loss 2.4525 LearningRate 0.0285 Epoch: 9 Global Step: 155660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:41:49,591-Speed 5175.22 samples/sec Loss 2.3916 LearningRate 0.0285 Epoch: 9 Global Step: 155670 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:41:51,587-Speed 5142.74 samples/sec Loss 2.3506 LearningRate 0.0285 Epoch: 9 Global Step: 155680 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:41:53,562-Speed 5187.27 samples/sec Loss 2.3577 LearningRate 0.0285 Epoch: 9 Global Step: 155690 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:41:55,547-Speed 5160.41 samples/sec Loss 2.3331 LearningRate 0.0285 Epoch: 9 Global Step: 155700 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:41:57,531-Speed 5162.19 samples/sec Loss 2.3059 LearningRate 0.0285 Epoch: 9 Global Step: 155710 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:41:59,519-Speed 5153.85 samples/sec Loss 2.3233 LearningRate 0.0285 Epoch: 9 Global Step: 155720 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:42:01,502-Speed 5165.24 samples/sec Loss 2.3399 LearningRate 0.0285 Epoch: 9 Global Step: 155730 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:42:03,483-Speed 5169.46 samples/sec Loss 2.3555 LearningRate 0.0285 Epoch: 9 Global Step: 155740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:42:05,469-Speed 5159.06 samples/sec Loss 2.3364 LearningRate 0.0285 Epoch: 9 Global Step: 155750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:42:07,441-Speed 5194.44 samples/sec Loss 2.3617 LearningRate 0.0285 Epoch: 9 Global Step: 155760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:42:09,418-Speed 5182.29 samples/sec Loss 2.2749 LearningRate 0.0284 Epoch: 9 Global Step: 155770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:42:11,397-Speed 5175.51 samples/sec Loss 2.3609 LearningRate 0.0284 Epoch: 9 Global Step: 155780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:42:13,374-Speed 5181.17 samples/sec Loss 2.3125 LearningRate 0.0284 Epoch: 9 Global Step: 155790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:42:15,347-Speed 5190.93 samples/sec Loss 2.4238 LearningRate 0.0284 Epoch: 9 Global Step: 155800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:42:17,326-Speed 5175.39 samples/sec Loss 2.3706 LearningRate 0.0284 Epoch: 9 Global Step: 155810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:42:19,300-Speed 5189.79 samples/sec Loss 2.4011 LearningRate 0.0284 Epoch: 9 Global Step: 155820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:42:21,281-Speed 5169.49 samples/sec Loss 2.4672 LearningRate 0.0284 Epoch: 9 Global Step: 155830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:42:23,253-Speed 5195.52 samples/sec Loss 2.3167 LearningRate 0.0284 Epoch: 9 Global Step: 155840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:42:25,230-Speed 5182.66 samples/sec Loss 2.3852 LearningRate 0.0284 Epoch: 9 Global Step: 155850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:42:27,202-Speed 5192.96 samples/sec Loss 2.3799 LearningRate 0.0284 Epoch: 9 Global Step: 155860 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:42:29,180-Speed 5179.46 samples/sec Loss 2.2987 LearningRate 0.0284 Epoch: 9 Global Step: 155870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:42:31,150-Speed 5199.97 samples/sec Loss 2.3162 LearningRate 0.0284 Epoch: 9 Global Step: 155880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:42:33,126-Speed 5184.20 samples/sec Loss 2.3237 LearningRate 0.0284 Epoch: 9 Global Step: 155890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:42:35,102-Speed 5182.98 samples/sec Loss 2.4115 LearningRate 0.0284 Epoch: 9 Global Step: 155900 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:42:37,067-Speed 5213.22 samples/sec Loss 2.4197 LearningRate 0.0284 Epoch: 9 Global Step: 155910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:42:39,079-Speed 5092.11 samples/sec Loss 2.4025 LearningRate 0.0284 Epoch: 9 Global Step: 155920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:42:41,059-Speed 5173.40 samples/sec Loss 2.3109 LearningRate 0.0284 Epoch: 9 Global Step: 155930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:42:43,029-Speed 5200.16 samples/sec Loss 2.3286 LearningRate 0.0284 Epoch: 9 Global Step: 155940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:42:45,012-Speed 5163.86 samples/sec Loss 2.3630 LearningRate 0.0284 Epoch: 9 Global Step: 155950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:42:47,012-Speed 5123.15 samples/sec Loss 2.3651 LearningRate 0.0284 Epoch: 9 Global Step: 155960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:42:48,983-Speed 5194.69 samples/sec Loss 2.4467 LearningRate 0.0284 Epoch: 9 Global Step: 155970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:42:50,968-Speed 5162.13 samples/sec Loss 2.3032 LearningRate 0.0284 Epoch: 9 Global Step: 155980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:42:52,945-Speed 5182.14 samples/sec Loss 2.3665 LearningRate 0.0284 Epoch: 9 Global Step: 155990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:42:54,917-Speed 5192.25 samples/sec Loss 2.3494 LearningRate 0.0284 Epoch: 9 Global Step: 156000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:43:21,523-[lfw][156000]XNorm: 21.982915 Training: 2022-04-11 09:43:21,524-[lfw][156000]Accuracy-Flip: 0.99800+-0.00267 Training: 2022-04-11 09:43:21,524-[lfw][156000]Accuracy-Highest: 0.99833 Training: 2022-04-11 09:43:52,273-[cfp_fp][156000]XNorm: 20.918581 Training: 2022-04-11 09:43:52,274-[cfp_fp][156000]Accuracy-Flip: 0.98429+-0.00499 Training: 2022-04-11 09:43:52,274-[cfp_fp][156000]Accuracy-Highest: 0.98443 Training: 2022-04-11 09:44:18,817-[agedb_30][156000]XNorm: 22.247673 Training: 2022-04-11 09:44:18,817-[agedb_30][156000]Accuracy-Flip: 0.98167+-0.00745 Training: 2022-04-11 09:44:18,818-[agedb_30][156000]Accuracy-Highest: 0.98167 Training: 2022-04-11 09:44:20,808-Speed 119.22 samples/sec Loss 2.3743 LearningRate 0.0284 Epoch: 9 Global Step: 156010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:44:22,780-Speed 5194.69 samples/sec Loss 2.3486 LearningRate 0.0284 Epoch: 9 Global Step: 156020 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:44:24,748-Speed 5203.49 samples/sec Loss 2.3350 LearningRate 0.0284 Epoch: 9 Global Step: 156030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:44:26,728-Speed 5173.51 samples/sec Loss 2.3268 LearningRate 0.0284 Epoch: 9 Global Step: 156040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:44:28,692-Speed 5217.65 samples/sec Loss 2.4291 LearningRate 0.0284 Epoch: 9 Global Step: 156050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:44:30,658-Speed 5210.88 samples/sec Loss 2.3549 LearningRate 0.0284 Epoch: 9 Global Step: 156060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:44:32,624-Speed 5210.09 samples/sec Loss 2.3330 LearningRate 0.0284 Epoch: 9 Global Step: 156070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:44:34,593-Speed 5200.56 samples/sec Loss 2.4122 LearningRate 0.0283 Epoch: 9 Global Step: 156080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:44:36,575-Speed 5178.35 samples/sec Loss 2.3295 LearningRate 0.0283 Epoch: 9 Global Step: 156090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:44:38,540-Speed 5212.48 samples/sec Loss 2.3950 LearningRate 0.0283 Epoch: 9 Global Step: 156100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:44:40,507-Speed 5205.12 samples/sec Loss 2.3850 LearningRate 0.0283 Epoch: 9 Global Step: 156110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:44:42,474-Speed 5207.46 samples/sec Loss 2.3489 LearningRate 0.0283 Epoch: 9 Global Step: 156120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:44:44,443-Speed 5203.44 samples/sec Loss 2.4300 LearningRate 0.0283 Epoch: 9 Global Step: 156130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:44:46,418-Speed 5187.31 samples/sec Loss 2.3932 LearningRate 0.0283 Epoch: 9 Global Step: 156140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:44:48,386-Speed 5204.07 samples/sec Loss 2.3537 LearningRate 0.0283 Epoch: 9 Global Step: 156150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:44:50,365-Speed 5176.40 samples/sec Loss 2.3581 LearningRate 0.0283 Epoch: 9 Global Step: 156160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:44:52,345-Speed 5172.30 samples/sec Loss 2.3869 LearningRate 0.0283 Epoch: 9 Global Step: 156170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:44:54,362-Speed 5080.76 samples/sec Loss 2.3673 LearningRate 0.0283 Epoch: 9 Global Step: 156180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:44:56,328-Speed 5210.36 samples/sec Loss 2.3736 LearningRate 0.0283 Epoch: 9 Global Step: 156190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:44:58,297-Speed 5201.40 samples/sec Loss 2.3627 LearningRate 0.0283 Epoch: 9 Global Step: 156200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:45:00,279-Speed 5167.32 samples/sec Loss 2.3931 LearningRate 0.0283 Epoch: 9 Global Step: 156210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:45:02,257-Speed 5178.98 samples/sec Loss 2.4281 LearningRate 0.0283 Epoch: 9 Global Step: 156220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:45:04,241-Speed 5162.57 samples/sec Loss 2.3942 LearningRate 0.0283 Epoch: 9 Global Step: 156230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:45:06,213-Speed 5196.77 samples/sec Loss 2.3873 LearningRate 0.0283 Epoch: 9 Global Step: 156240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:45:08,221-Speed 5100.77 samples/sec Loss 2.4113 LearningRate 0.0283 Epoch: 9 Global Step: 156250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:45:10,190-Speed 5201.42 samples/sec Loss 2.3948 LearningRate 0.0283 Epoch: 9 Global Step: 156260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:45:12,178-Speed 5152.23 samples/sec Loss 2.3619 LearningRate 0.0283 Epoch: 9 Global Step: 156270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:45:14,154-Speed 5186.15 samples/sec Loss 2.4083 LearningRate 0.0283 Epoch: 9 Global Step: 156280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:45:16,125-Speed 5194.58 samples/sec Loss 2.3550 LearningRate 0.0283 Epoch: 9 Global Step: 156290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:45:18,102-Speed 5182.74 samples/sec Loss 2.3803 LearningRate 0.0283 Epoch: 9 Global Step: 156300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:45:20,079-Speed 5180.19 samples/sec Loss 2.3781 LearningRate 0.0283 Epoch: 9 Global Step: 156310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:45:22,057-Speed 5179.56 samples/sec Loss 2.3570 LearningRate 0.0283 Epoch: 9 Global Step: 156320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:45:24,059-Speed 5116.56 samples/sec Loss 2.4255 LearningRate 0.0283 Epoch: 9 Global Step: 156330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:45:26,049-Speed 5146.43 samples/sec Loss 2.3614 LearningRate 0.0283 Epoch: 9 Global Step: 156340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:45:28,041-Speed 5143.73 samples/sec Loss 2.3933 LearningRate 0.0283 Epoch: 9 Global Step: 156350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:45:30,046-Speed 5109.44 samples/sec Loss 2.3798 LearningRate 0.0283 Epoch: 9 Global Step: 156360 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:45:32,029-Speed 5165.12 samples/sec Loss 2.3116 LearningRate 0.0283 Epoch: 9 Global Step: 156370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:45:34,012-Speed 5166.69 samples/sec Loss 2.4273 LearningRate 0.0283 Epoch: 9 Global Step: 156380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:45:35,996-Speed 5161.03 samples/sec Loss 2.4110 LearningRate 0.0283 Epoch: 9 Global Step: 156390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:45:37,989-Speed 5141.23 samples/sec Loss 2.4109 LearningRate 0.0282 Epoch: 9 Global Step: 156400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:45:39,980-Speed 5143.22 samples/sec Loss 2.3281 LearningRate 0.0282 Epoch: 9 Global Step: 156410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:45:41,961-Speed 5171.82 samples/sec Loss 2.3844 LearningRate 0.0282 Epoch: 9 Global Step: 156420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:45:43,934-Speed 5190.13 samples/sec Loss 2.3483 LearningRate 0.0282 Epoch: 9 Global Step: 156430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:45:45,920-Speed 5158.47 samples/sec Loss 2.3277 LearningRate 0.0282 Epoch: 9 Global Step: 156440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:45:47,894-Speed 5189.61 samples/sec Loss 2.3580 LearningRate 0.0282 Epoch: 9 Global Step: 156450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:45:49,906-Speed 5090.88 samples/sec Loss 2.3604 LearningRate 0.0282 Epoch: 9 Global Step: 156460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:45:51,890-Speed 5162.35 samples/sec Loss 2.3617 LearningRate 0.0282 Epoch: 9 Global Step: 156470 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:45:53,861-Speed 5197.26 samples/sec Loss 2.3162 LearningRate 0.0282 Epoch: 9 Global Step: 156480 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:45:55,841-Speed 5174.53 samples/sec Loss 2.3461 LearningRate 0.0282 Epoch: 9 Global Step: 156490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:45:57,819-Speed 5178.78 samples/sec Loss 2.3660 LearningRate 0.0282 Epoch: 9 Global Step: 156500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:45:59,810-Speed 5145.27 samples/sec Loss 2.3403 LearningRate 0.0282 Epoch: 9 Global Step: 156510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:46:01,796-Speed 5157.42 samples/sec Loss 2.3954 LearningRate 0.0282 Epoch: 9 Global Step: 156520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:46:03,807-Speed 5092.00 samples/sec Loss 2.3485 LearningRate 0.0282 Epoch: 9 Global Step: 156530 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:46:05,781-Speed 5191.80 samples/sec Loss 2.4047 LearningRate 0.0282 Epoch: 9 Global Step: 156540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:46:07,757-Speed 5182.19 samples/sec Loss 2.3622 LearningRate 0.0282 Epoch: 9 Global Step: 156550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:46:09,740-Speed 5166.32 samples/sec Loss 2.3156 LearningRate 0.0282 Epoch: 9 Global Step: 156560 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:46:11,726-Speed 5155.98 samples/sec Loss 2.4367 LearningRate 0.0282 Epoch: 9 Global Step: 156570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:46:13,726-Speed 5123.78 samples/sec Loss 2.3447 LearningRate 0.0282 Epoch: 9 Global Step: 156580 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:46:15,695-Speed 5202.86 samples/sec Loss 2.4584 LearningRate 0.0282 Epoch: 9 Global Step: 156590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:46:17,678-Speed 5165.33 samples/sec Loss 2.3569 LearningRate 0.0282 Epoch: 9 Global Step: 156600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:46:19,656-Speed 5177.92 samples/sec Loss 2.3802 LearningRate 0.0282 Epoch: 9 Global Step: 156610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:46:21,646-Speed 5148.06 samples/sec Loss 2.3631 LearningRate 0.0282 Epoch: 9 Global Step: 156620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:46:23,619-Speed 5190.05 samples/sec Loss 2.4000 LearningRate 0.0282 Epoch: 9 Global Step: 156630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:46:25,591-Speed 5195.74 samples/sec Loss 2.3985 LearningRate 0.0282 Epoch: 9 Global Step: 156640 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:46:27,564-Speed 5192.16 samples/sec Loss 2.3115 LearningRate 0.0282 Epoch: 9 Global Step: 156650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:46:29,530-Speed 5211.01 samples/sec Loss 2.4093 LearningRate 0.0282 Epoch: 9 Global Step: 156660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:46:31,497-Speed 5206.75 samples/sec Loss 2.4589 LearningRate 0.0282 Epoch: 9 Global Step: 156670 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:46:33,491-Speed 5138.17 samples/sec Loss 2.3130 LearningRate 0.0282 Epoch: 9 Global Step: 156680 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:46:35,465-Speed 5189.31 samples/sec Loss 2.3814 LearningRate 0.0282 Epoch: 9 Global Step: 156690 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:46:37,436-Speed 5196.01 samples/sec Loss 2.4360 LearningRate 0.0282 Epoch: 9 Global Step: 156700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:46:39,408-Speed 5194.43 samples/sec Loss 2.3901 LearningRate 0.0281 Epoch: 9 Global Step: 156710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:46:41,381-Speed 5191.90 samples/sec Loss 2.3801 LearningRate 0.0281 Epoch: 9 Global Step: 156720 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:46:43,350-Speed 5202.07 samples/sec Loss 2.3695 LearningRate 0.0281 Epoch: 9 Global Step: 156730 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:46:45,322-Speed 5194.45 samples/sec Loss 2.3860 LearningRate 0.0281 Epoch: 9 Global Step: 156740 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:46:47,293-Speed 5195.62 samples/sec Loss 2.4185 LearningRate 0.0281 Epoch: 9 Global Step: 156750 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:46:49,268-Speed 5189.30 samples/sec Loss 2.3149 LearningRate 0.0281 Epoch: 9 Global Step: 156760 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:46:51,257-Speed 5148.50 samples/sec Loss 2.4209 LearningRate 0.0281 Epoch: 9 Global Step: 156770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:46:53,232-Speed 5187.06 samples/sec Loss 2.3937 LearningRate 0.0281 Epoch: 9 Global Step: 156780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:46:55,201-Speed 5201.78 samples/sec Loss 2.3863 LearningRate 0.0281 Epoch: 9 Global Step: 156790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:46:57,173-Speed 5194.64 samples/sec Loss 2.4180 LearningRate 0.0281 Epoch: 9 Global Step: 156800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:46:59,164-Speed 5146.21 samples/sec Loss 2.4621 LearningRate 0.0281 Epoch: 9 Global Step: 156810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:47:01,144-Speed 5173.93 samples/sec Loss 2.3512 LearningRate 0.0281 Epoch: 9 Global Step: 156820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:47:03,128-Speed 5161.35 samples/sec Loss 2.4515 LearningRate 0.0281 Epoch: 9 Global Step: 156830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:47:05,100-Speed 5194.05 samples/sec Loss 2.3699 LearningRate 0.0281 Epoch: 9 Global Step: 156840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:47:07,085-Speed 5160.75 samples/sec Loss 2.2935 LearningRate 0.0281 Epoch: 9 Global Step: 156850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:47:09,079-Speed 5157.11 samples/sec Loss 2.3541 LearningRate 0.0281 Epoch: 9 Global Step: 156860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:47:11,073-Speed 5134.79 samples/sec Loss 2.3957 LearningRate 0.0281 Epoch: 9 Global Step: 156870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:47:13,067-Speed 5138.65 samples/sec Loss 2.4437 LearningRate 0.0281 Epoch: 9 Global Step: 156880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:47:15,043-Speed 5183.76 samples/sec Loss 2.3787 LearningRate 0.0281 Epoch: 9 Global Step: 156890 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:47:17,021-Speed 5178.52 samples/sec Loss 2.3926 LearningRate 0.0281 Epoch: 9 Global Step: 156900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:47:18,991-Speed 5200.26 samples/sec Loss 2.4145 LearningRate 0.0281 Epoch: 9 Global Step: 156910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:47:20,962-Speed 5197.21 samples/sec Loss 2.3887 LearningRate 0.0281 Epoch: 9 Global Step: 156920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:47:22,933-Speed 5195.65 samples/sec Loss 2.2803 LearningRate 0.0281 Epoch: 9 Global Step: 156930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:47:24,916-Speed 5166.02 samples/sec Loss 2.4121 LearningRate 0.0281 Epoch: 9 Global Step: 156940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:47:26,902-Speed 5157.98 samples/sec Loss 2.3273 LearningRate 0.0281 Epoch: 9 Global Step: 156950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:47:28,881-Speed 5176.60 samples/sec Loss 2.3541 LearningRate 0.0281 Epoch: 9 Global Step: 156960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:47:30,849-Speed 5205.52 samples/sec Loss 2.4190 LearningRate 0.0281 Epoch: 9 Global Step: 156970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:47:32,820-Speed 5196.27 samples/sec Loss 2.3495 LearningRate 0.0281 Epoch: 9 Global Step: 156980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:47:34,811-Speed 5142.88 samples/sec Loss 2.3498 LearningRate 0.0281 Epoch: 9 Global Step: 156990 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:47:36,799-Speed 5155.60 samples/sec Loss 2.4700 LearningRate 0.0281 Epoch: 9 Global Step: 157000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:47:38,780-Speed 5171.49 samples/sec Loss 2.3980 LearningRate 0.0281 Epoch: 9 Global Step: 157010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:47:40,761-Speed 5169.75 samples/sec Loss 2.2899 LearningRate 0.0281 Epoch: 9 Global Step: 157020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:47:42,746-Speed 5159.73 samples/sec Loss 2.3446 LearningRate 0.0280 Epoch: 9 Global Step: 157030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:47:44,718-Speed 5194.58 samples/sec Loss 2.3602 LearningRate 0.0280 Epoch: 9 Global Step: 157040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:47:46,704-Speed 5157.41 samples/sec Loss 2.3530 LearningRate 0.0280 Epoch: 9 Global Step: 157050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:47:48,743-Speed 5023.26 samples/sec Loss 2.4134 LearningRate 0.0280 Epoch: 9 Global Step: 157060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:47:50,778-Speed 5033.06 samples/sec Loss 2.4078 LearningRate 0.0280 Epoch: 9 Global Step: 157070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:47:52,746-Speed 5205.88 samples/sec Loss 2.4070 LearningRate 0.0280 Epoch: 9 Global Step: 157080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:47:54,733-Speed 5155.20 samples/sec Loss 2.4103 LearningRate 0.0280 Epoch: 9 Global Step: 157090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:47:56,740-Speed 5106.15 samples/sec Loss 2.4128 LearningRate 0.0280 Epoch: 9 Global Step: 157100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:47:58,723-Speed 5166.50 samples/sec Loss 2.4334 LearningRate 0.0280 Epoch: 9 Global Step: 157110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:48:00,705-Speed 5168.18 samples/sec Loss 2.4673 LearningRate 0.0280 Epoch: 9 Global Step: 157120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:48:02,684-Speed 5176.28 samples/sec Loss 2.4117 LearningRate 0.0280 Epoch: 9 Global Step: 157130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:48:04,662-Speed 5177.09 samples/sec Loss 2.3943 LearningRate 0.0280 Epoch: 9 Global Step: 157140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:48:06,636-Speed 5188.87 samples/sec Loss 2.4370 LearningRate 0.0280 Epoch: 9 Global Step: 157150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:48:08,607-Speed 5197.53 samples/sec Loss 2.4186 LearningRate 0.0280 Epoch: 9 Global Step: 157160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:48:10,599-Speed 5142.24 samples/sec Loss 2.3896 LearningRate 0.0280 Epoch: 9 Global Step: 157170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:48:12,580-Speed 5171.73 samples/sec Loss 2.3883 LearningRate 0.0280 Epoch: 9 Global Step: 157180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:48:14,551-Speed 5195.69 samples/sec Loss 2.4326 LearningRate 0.0280 Epoch: 9 Global Step: 157190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:48:16,528-Speed 5181.69 samples/sec Loss 2.3611 LearningRate 0.0280 Epoch: 9 Global Step: 157200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:48:18,512-Speed 5164.25 samples/sec Loss 2.3313 LearningRate 0.0280 Epoch: 9 Global Step: 157210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:48:20,492-Speed 5171.56 samples/sec Loss 2.3191 LearningRate 0.0280 Epoch: 9 Global Step: 157220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:48:22,468-Speed 5184.51 samples/sec Loss 2.4175 LearningRate 0.0280 Epoch: 9 Global Step: 157230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:48:24,467-Speed 5124.88 samples/sec Loss 2.3986 LearningRate 0.0280 Epoch: 9 Global Step: 157240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:48:26,458-Speed 5144.45 samples/sec Loss 2.3882 LearningRate 0.0280 Epoch: 9 Global Step: 157250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:48:28,433-Speed 5187.95 samples/sec Loss 2.3510 LearningRate 0.0280 Epoch: 9 Global Step: 157260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:48:30,408-Speed 5184.26 samples/sec Loss 2.3883 LearningRate 0.0280 Epoch: 9 Global Step: 157270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:48:32,381-Speed 5191.69 samples/sec Loss 2.4330 LearningRate 0.0280 Epoch: 9 Global Step: 157280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:48:34,362-Speed 5171.21 samples/sec Loss 2.4050 LearningRate 0.0280 Epoch: 9 Global Step: 157290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:48:36,353-Speed 5146.29 samples/sec Loss 2.3659 LearningRate 0.0280 Epoch: 9 Global Step: 157300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:48:38,332-Speed 5175.23 samples/sec Loss 2.4243 LearningRate 0.0280 Epoch: 9 Global Step: 157310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:48:40,325-Speed 5139.76 samples/sec Loss 2.3499 LearningRate 0.0280 Epoch: 9 Global Step: 157320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:48:42,306-Speed 5170.91 samples/sec Loss 2.3755 LearningRate 0.0280 Epoch: 9 Global Step: 157330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:48:44,280-Speed 5191.18 samples/sec Loss 2.3964 LearningRate 0.0279 Epoch: 9 Global Step: 157340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:48:46,287-Speed 5101.86 samples/sec Loss 2.3667 LearningRate 0.0279 Epoch: 9 Global Step: 157350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:48:48,272-Speed 5160.36 samples/sec Loss 2.3753 LearningRate 0.0279 Epoch: 9 Global Step: 157360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:48:50,270-Speed 5128.00 samples/sec Loss 2.3838 LearningRate 0.0279 Epoch: 9 Global Step: 157370 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:48:52,242-Speed 5192.05 samples/sec Loss 2.3776 LearningRate 0.0279 Epoch: 9 Global Step: 157380 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:48:54,228-Speed 5159.26 samples/sec Loss 2.4128 LearningRate 0.0279 Epoch: 9 Global Step: 157390 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:48:56,208-Speed 5173.70 samples/sec Loss 2.3374 LearningRate 0.0279 Epoch: 9 Global Step: 157400 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:48:58,195-Speed 5153.98 samples/sec Loss 2.3968 LearningRate 0.0279 Epoch: 9 Global Step: 157410 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:49:00,198-Speed 5113.86 samples/sec Loss 2.3989 LearningRate 0.0279 Epoch: 9 Global Step: 157420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:49:02,176-Speed 5180.69 samples/sec Loss 2.3913 LearningRate 0.0279 Epoch: 9 Global Step: 157430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:49:04,138-Speed 5219.68 samples/sec Loss 2.3932 LearningRate 0.0279 Epoch: 9 Global Step: 157440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:49:06,117-Speed 5175.78 samples/sec Loss 2.3953 LearningRate 0.0279 Epoch: 9 Global Step: 157450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:49:08,095-Speed 5180.14 samples/sec Loss 2.4225 LearningRate 0.0279 Epoch: 9 Global Step: 157460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:49:10,068-Speed 5191.72 samples/sec Loss 2.3871 LearningRate 0.0279 Epoch: 9 Global Step: 157470 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:49:12,045-Speed 5183.29 samples/sec Loss 2.4494 LearningRate 0.0279 Epoch: 9 Global Step: 157480 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:49:14,031-Speed 5158.31 samples/sec Loss 2.3441 LearningRate 0.0279 Epoch: 9 Global Step: 157490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:49:16,013-Speed 5167.22 samples/sec Loss 2.3646 LearningRate 0.0279 Epoch: 9 Global Step: 157500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:49:18,003-Speed 5146.81 samples/sec Loss 2.3270 LearningRate 0.0279 Epoch: 9 Global Step: 157510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:49:19,977-Speed 5190.20 samples/sec Loss 2.3651 LearningRate 0.0279 Epoch: 9 Global Step: 157520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:49:21,972-Speed 5135.78 samples/sec Loss 2.3649 LearningRate 0.0279 Epoch: 9 Global Step: 157530 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:49:23,947-Speed 5186.05 samples/sec Loss 2.3777 LearningRate 0.0279 Epoch: 9 Global Step: 157540 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:49:25,921-Speed 5190.47 samples/sec Loss 2.4041 LearningRate 0.0279 Epoch: 9 Global Step: 157550 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:49:27,896-Speed 5186.07 samples/sec Loss 2.4426 LearningRate 0.0279 Epoch: 9 Global Step: 157560 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:49:29,869-Speed 5192.19 samples/sec Loss 2.3958 LearningRate 0.0279 Epoch: 9 Global Step: 157570 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:49:31,836-Speed 5207.10 samples/sec Loss 2.4055 LearningRate 0.0279 Epoch: 9 Global Step: 157580 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:49:33,811-Speed 5191.43 samples/sec Loss 2.4868 LearningRate 0.0279 Epoch: 9 Global Step: 157590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:49:35,799-Speed 5154.89 samples/sec Loss 2.3962 LearningRate 0.0279 Epoch: 9 Global Step: 157600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:49:37,776-Speed 5179.81 samples/sec Loss 2.4611 LearningRate 0.0279 Epoch: 9 Global Step: 157610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:49:39,754-Speed 5179.39 samples/sec Loss 2.3595 LearningRate 0.0279 Epoch: 9 Global Step: 157620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:49:41,739-Speed 5161.71 samples/sec Loss 2.4473 LearningRate 0.0279 Epoch: 9 Global Step: 157630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:49:43,714-Speed 5186.26 samples/sec Loss 2.3907 LearningRate 0.0279 Epoch: 9 Global Step: 157640 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:49:45,698-Speed 5162.74 samples/sec Loss 2.3275 LearningRate 0.0279 Epoch: 9 Global Step: 157650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:49:47,680-Speed 5166.37 samples/sec Loss 2.3761 LearningRate 0.0278 Epoch: 9 Global Step: 157660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:49:49,656-Speed 5185.70 samples/sec Loss 2.3983 LearningRate 0.0278 Epoch: 9 Global Step: 157670 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:49:51,641-Speed 5159.29 samples/sec Loss 2.4222 LearningRate 0.0278 Epoch: 9 Global Step: 157680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:49:53,614-Speed 5192.74 samples/sec Loss 2.4157 LearningRate 0.0278 Epoch: 9 Global Step: 157690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:49:55,585-Speed 5196.16 samples/sec Loss 2.4177 LearningRate 0.0278 Epoch: 9 Global Step: 157700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:49:57,557-Speed 5194.94 samples/sec Loss 2.3849 LearningRate 0.0278 Epoch: 9 Global Step: 157710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:49:59,531-Speed 5189.81 samples/sec Loss 2.4189 LearningRate 0.0278 Epoch: 9 Global Step: 157720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:50:01,521-Speed 5147.72 samples/sec Loss 2.4010 LearningRate 0.0278 Epoch: 9 Global Step: 157730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:50:03,503-Speed 5167.80 samples/sec Loss 2.4423 LearningRate 0.0278 Epoch: 9 Global Step: 157740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:50:05,488-Speed 5161.35 samples/sec Loss 2.3812 LearningRate 0.0278 Epoch: 9 Global Step: 157750 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:50:07,462-Speed 5187.40 samples/sec Loss 2.4593 LearningRate 0.0278 Epoch: 9 Global Step: 157760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:50:09,441-Speed 5176.75 samples/sec Loss 2.3573 LearningRate 0.0278 Epoch: 9 Global Step: 157770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:50:11,430-Speed 5148.93 samples/sec Loss 2.4900 LearningRate 0.0278 Epoch: 9 Global Step: 157780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:50:13,407-Speed 5183.02 samples/sec Loss 2.4289 LearningRate 0.0278 Epoch: 9 Global Step: 157790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:50:15,384-Speed 5180.53 samples/sec Loss 2.4483 LearningRate 0.0278 Epoch: 9 Global Step: 157800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:50:17,374-Speed 5148.62 samples/sec Loss 2.4046 LearningRate 0.0278 Epoch: 9 Global Step: 157810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:50:19,357-Speed 5165.16 samples/sec Loss 2.3836 LearningRate 0.0278 Epoch: 9 Global Step: 157820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:50:21,347-Speed 5147.72 samples/sec Loss 2.4164 LearningRate 0.0278 Epoch: 9 Global Step: 157830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:50:23,320-Speed 5190.51 samples/sec Loss 2.4253 LearningRate 0.0278 Epoch: 9 Global Step: 157840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:50:25,298-Speed 5178.82 samples/sec Loss 2.3694 LearningRate 0.0278 Epoch: 9 Global Step: 157850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:50:27,286-Speed 5151.73 samples/sec Loss 2.4870 LearningRate 0.0278 Epoch: 9 Global Step: 157860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:50:29,265-Speed 5177.36 samples/sec Loss 2.3573 LearningRate 0.0278 Epoch: 9 Global Step: 157870 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:50:31,241-Speed 5184.39 samples/sec Loss 2.4079 LearningRate 0.0278 Epoch: 9 Global Step: 157880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:50:33,241-Speed 5121.34 samples/sec Loss 2.4157 LearningRate 0.0278 Epoch: 9 Global Step: 157890 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:50:35,245-Speed 5109.98 samples/sec Loss 2.4092 LearningRate 0.0278 Epoch: 9 Global Step: 157900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:50:37,231-Speed 5157.61 samples/sec Loss 2.3532 LearningRate 0.0278 Epoch: 9 Global Step: 157910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:50:39,213-Speed 5168.88 samples/sec Loss 2.4448 LearningRate 0.0278 Epoch: 9 Global Step: 157920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:50:41,206-Speed 5142.58 samples/sec Loss 2.3999 LearningRate 0.0278 Epoch: 9 Global Step: 157930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:50:43,181-Speed 5184.17 samples/sec Loss 2.3699 LearningRate 0.0278 Epoch: 9 Global Step: 157940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:50:45,154-Speed 5192.34 samples/sec Loss 2.4567 LearningRate 0.0278 Epoch: 9 Global Step: 157950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:50:47,149-Speed 5133.49 samples/sec Loss 2.3638 LearningRate 0.0278 Epoch: 9 Global Step: 157960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:50:49,123-Speed 5189.16 samples/sec Loss 2.3233 LearningRate 0.0277 Epoch: 9 Global Step: 157970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:50:51,110-Speed 5155.68 samples/sec Loss 2.4296 LearningRate 0.0277 Epoch: 9 Global Step: 157980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:50:53,097-Speed 5154.36 samples/sec Loss 2.3598 LearningRate 0.0277 Epoch: 9 Global Step: 157990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:50:55,071-Speed 5189.25 samples/sec Loss 2.3738 LearningRate 0.0277 Epoch: 9 Global Step: 158000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:51:21,483-[lfw][158000]XNorm: 23.359644 Training: 2022-04-11 09:51:21,484-[lfw][158000]Accuracy-Flip: 0.99783+-0.00279 Training: 2022-04-11 09:51:21,484-[lfw][158000]Accuracy-Highest: 0.99833 Training: 2022-04-11 09:51:52,129-[cfp_fp][158000]XNorm: 21.565683 Training: 2022-04-11 09:51:52,130-[cfp_fp][158000]Accuracy-Flip: 0.98443+-0.00562 Training: 2022-04-11 09:51:52,130-[cfp_fp][158000]Accuracy-Highest: 0.98443 Training: 2022-04-11 09:52:18,664-[agedb_30][158000]XNorm: 23.571141 Training: 2022-04-11 09:52:18,664-[agedb_30][158000]Accuracy-Flip: 0.98000+-0.00775 Training: 2022-04-11 09:52:18,665-[agedb_30][158000]Accuracy-Highest: 0.98167 Training: 2022-04-11 09:52:20,655-Speed 119.65 samples/sec Loss 2.4429 LearningRate 0.0277 Epoch: 9 Global Step: 158010 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:52:22,616-Speed 5224.13 samples/sec Loss 2.3862 LearningRate 0.0277 Epoch: 9 Global Step: 158020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:52:24,590-Speed 5187.81 samples/sec Loss 2.4191 LearningRate 0.0277 Epoch: 9 Global Step: 158030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:52:26,558-Speed 5204.50 samples/sec Loss 2.3689 LearningRate 0.0277 Epoch: 9 Global Step: 158040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:52:28,533-Speed 5188.40 samples/sec Loss 2.4130 LearningRate 0.0277 Epoch: 9 Global Step: 158050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:52:30,504-Speed 5195.73 samples/sec Loss 2.4010 LearningRate 0.0277 Epoch: 9 Global Step: 158060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:52:32,469-Speed 5212.47 samples/sec Loss 2.3855 LearningRate 0.0277 Epoch: 9 Global Step: 158070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:52:34,462-Speed 5142.57 samples/sec Loss 2.3595 LearningRate 0.0277 Epoch: 9 Global Step: 158080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:52:36,434-Speed 5196.05 samples/sec Loss 2.3443 LearningRate 0.0277 Epoch: 9 Global Step: 158090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:52:38,410-Speed 5183.43 samples/sec Loss 2.3211 LearningRate 0.0277 Epoch: 9 Global Step: 158100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:52:40,379-Speed 5199.93 samples/sec Loss 2.3916 LearningRate 0.0277 Epoch: 9 Global Step: 158110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:52:42,359-Speed 5174.68 samples/sec Loss 2.3832 LearningRate 0.0277 Epoch: 9 Global Step: 158120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:52:44,327-Speed 5204.02 samples/sec Loss 2.3450 LearningRate 0.0277 Epoch: 9 Global Step: 158130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:52:46,309-Speed 5170.69 samples/sec Loss 2.4070 LearningRate 0.0277 Epoch: 9 Global Step: 158140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:52:48,286-Speed 5180.72 samples/sec Loss 2.4133 LearningRate 0.0277 Epoch: 9 Global Step: 158150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:52:50,267-Speed 5171.25 samples/sec Loss 2.3869 LearningRate 0.0277 Epoch: 9 Global Step: 158160 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:52:52,232-Speed 5212.50 samples/sec Loss 2.3767 LearningRate 0.0277 Epoch: 9 Global Step: 158170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:52:54,203-Speed 5196.18 samples/sec Loss 2.4053 LearningRate 0.0277 Epoch: 9 Global Step: 158180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:52:56,171-Speed 5204.89 samples/sec Loss 2.4680 LearningRate 0.0277 Epoch: 9 Global Step: 158190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:52:58,156-Speed 5160.98 samples/sec Loss 2.3734 LearningRate 0.0277 Epoch: 9 Global Step: 158200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:53:00,136-Speed 5173.50 samples/sec Loss 2.4103 LearningRate 0.0277 Epoch: 9 Global Step: 158210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:53:02,111-Speed 5187.16 samples/sec Loss 2.3721 LearningRate 0.0277 Epoch: 9 Global Step: 158220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:53:04,097-Speed 5155.92 samples/sec Loss 2.4066 LearningRate 0.0277 Epoch: 9 Global Step: 158230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:53:06,074-Speed 5181.67 samples/sec Loss 2.4112 LearningRate 0.0277 Epoch: 9 Global Step: 158240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:53:08,061-Speed 5155.74 samples/sec Loss 2.3782 LearningRate 0.0277 Epoch: 9 Global Step: 158250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:53:10,048-Speed 5156.75 samples/sec Loss 2.3764 LearningRate 0.0277 Epoch: 9 Global Step: 158260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:53:12,029-Speed 5170.77 samples/sec Loss 2.4945 LearningRate 0.0277 Epoch: 9 Global Step: 158270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:53:14,002-Speed 5191.09 samples/sec Loss 2.3870 LearningRate 0.0277 Epoch: 9 Global Step: 158280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:53:15,972-Speed 5199.68 samples/sec Loss 2.3936 LearningRate 0.0276 Epoch: 9 Global Step: 158290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:53:17,936-Speed 5216.30 samples/sec Loss 2.4082 LearningRate 0.0276 Epoch: 9 Global Step: 158300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:53:19,908-Speed 5194.49 samples/sec Loss 2.3901 LearningRate 0.0276 Epoch: 9 Global Step: 158310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:53:21,890-Speed 5167.08 samples/sec Loss 2.4174 LearningRate 0.0276 Epoch: 9 Global Step: 158320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:53:23,865-Speed 5184.87 samples/sec Loss 2.3717 LearningRate 0.0276 Epoch: 9 Global Step: 158330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:53:25,853-Speed 5152.94 samples/sec Loss 2.4623 LearningRate 0.0276 Epoch: 9 Global Step: 158340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:53:27,843-Speed 5148.80 samples/sec Loss 2.3736 LearningRate 0.0276 Epoch: 9 Global Step: 158350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:53:29,814-Speed 5197.10 samples/sec Loss 2.4261 LearningRate 0.0276 Epoch: 9 Global Step: 158360 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:53:31,805-Speed 5145.12 samples/sec Loss 2.4177 LearningRate 0.0276 Epoch: 9 Global Step: 158370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:53:33,787-Speed 5168.33 samples/sec Loss 2.3283 LearningRate 0.0276 Epoch: 9 Global Step: 158380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:53:35,776-Speed 5151.09 samples/sec Loss 2.3548 LearningRate 0.0276 Epoch: 9 Global Step: 158390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:53:37,741-Speed 5212.31 samples/sec Loss 2.4596 LearningRate 0.0276 Epoch: 9 Global Step: 158400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:53:39,723-Speed 5168.63 samples/sec Loss 2.4541 LearningRate 0.0276 Epoch: 9 Global Step: 158410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:53:41,696-Speed 5192.13 samples/sec Loss 2.3988 LearningRate 0.0276 Epoch: 9 Global Step: 158420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:53:43,666-Speed 5199.23 samples/sec Loss 2.4059 LearningRate 0.0276 Epoch: 9 Global Step: 158430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:53:45,638-Speed 5192.81 samples/sec Loss 2.4178 LearningRate 0.0276 Epoch: 9 Global Step: 158440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:53:47,616-Speed 5180.60 samples/sec Loss 2.4337 LearningRate 0.0276 Epoch: 9 Global Step: 158450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:53:49,615-Speed 5124.25 samples/sec Loss 2.3998 LearningRate 0.0276 Epoch: 9 Global Step: 158460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:53:51,584-Speed 5202.57 samples/sec Loss 2.4592 LearningRate 0.0276 Epoch: 9 Global Step: 158470 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:53:53,560-Speed 5184.27 samples/sec Loss 2.4156 LearningRate 0.0276 Epoch: 9 Global Step: 158480 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:53:55,531-Speed 5195.56 samples/sec Loss 2.4874 LearningRate 0.0276 Epoch: 9 Global Step: 158490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:53:57,499-Speed 5206.44 samples/sec Loss 2.4372 LearningRate 0.0276 Epoch: 9 Global Step: 158500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:53:59,476-Speed 5179.18 samples/sec Loss 2.4197 LearningRate 0.0276 Epoch: 9 Global Step: 158510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:54:01,461-Speed 5160.85 samples/sec Loss 2.4503 LearningRate 0.0276 Epoch: 9 Global Step: 158520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:54:03,436-Speed 5186.89 samples/sec Loss 2.4056 LearningRate 0.0276 Epoch: 9 Global Step: 158530 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:54:05,420-Speed 5162.20 samples/sec Loss 2.4622 LearningRate 0.0276 Epoch: 9 Global Step: 158540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:54:07,421-Speed 5120.95 samples/sec Loss 2.3827 LearningRate 0.0276 Epoch: 9 Global Step: 158550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:54:09,398-Speed 5180.44 samples/sec Loss 2.4140 LearningRate 0.0276 Epoch: 9 Global Step: 158560 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:54:11,399-Speed 5120.53 samples/sec Loss 2.3972 LearningRate 0.0276 Epoch: 9 Global Step: 158570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:54:13,398-Speed 5123.47 samples/sec Loss 2.4000 LearningRate 0.0276 Epoch: 9 Global Step: 158580 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:54:15,384-Speed 5159.20 samples/sec Loss 2.3804 LearningRate 0.0276 Epoch: 9 Global Step: 158590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:54:17,353-Speed 5201.07 samples/sec Loss 2.3974 LearningRate 0.0276 Epoch: 9 Global Step: 158600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:54:19,323-Speed 5198.90 samples/sec Loss 2.3675 LearningRate 0.0275 Epoch: 9 Global Step: 158610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:54:21,284-Speed 5223.89 samples/sec Loss 2.3931 LearningRate 0.0275 Epoch: 9 Global Step: 158620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:54:23,275-Speed 5145.96 samples/sec Loss 2.3480 LearningRate 0.0275 Epoch: 9 Global Step: 158630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:54:25,242-Speed 5208.11 samples/sec Loss 2.3458 LearningRate 0.0275 Epoch: 9 Global Step: 158640 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:54:27,212-Speed 5199.41 samples/sec Loss 2.3910 LearningRate 0.0275 Epoch: 9 Global Step: 158650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:54:29,186-Speed 5190.08 samples/sec Loss 2.3746 LearningRate 0.0275 Epoch: 9 Global Step: 158660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:54:31,153-Speed 5205.86 samples/sec Loss 2.4899 LearningRate 0.0275 Epoch: 9 Global Step: 158670 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:54:33,141-Speed 5153.65 samples/sec Loss 2.4119 LearningRate 0.0275 Epoch: 9 Global Step: 158680 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:54:35,115-Speed 5190.10 samples/sec Loss 2.3671 LearningRate 0.0275 Epoch: 9 Global Step: 158690 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:54:37,093-Speed 5177.39 samples/sec Loss 2.4326 LearningRate 0.0275 Epoch: 9 Global Step: 158700 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:54:39,067-Speed 5188.43 samples/sec Loss 2.4667 LearningRate 0.0275 Epoch: 9 Global Step: 158710 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:54:41,040-Speed 5191.08 samples/sec Loss 2.3528 LearningRate 0.0275 Epoch: 9 Global Step: 158720 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:54:43,029-Speed 5151.94 samples/sec Loss 2.4241 LearningRate 0.0275 Epoch: 9 Global Step: 158730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:54:45,014-Speed 5159.55 samples/sec Loss 2.3981 LearningRate 0.0275 Epoch: 9 Global Step: 158740 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:54:47,034-Speed 5071.02 samples/sec Loss 2.4032 LearningRate 0.0275 Epoch: 9 Global Step: 158750 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:54:49,011-Speed 5180.37 samples/sec Loss 2.4519 LearningRate 0.0275 Epoch: 9 Global Step: 158760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:54:50,981-Speed 5200.06 samples/sec Loss 2.3918 LearningRate 0.0275 Epoch: 9 Global Step: 158770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:54:52,948-Speed 5210.06 samples/sec Loss 2.4325 LearningRate 0.0275 Epoch: 9 Global Step: 158780 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:54:54,919-Speed 5197.02 samples/sec Loss 2.3812 LearningRate 0.0275 Epoch: 9 Global Step: 158790 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:54:56,900-Speed 5170.47 samples/sec Loss 2.3940 LearningRate 0.0275 Epoch: 9 Global Step: 158800 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:54:58,873-Speed 5189.54 samples/sec Loss 2.4212 LearningRate 0.0275 Epoch: 9 Global Step: 158810 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:55:00,862-Speed 5150.19 samples/sec Loss 2.4857 LearningRate 0.0275 Epoch: 9 Global Step: 158820 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:55:02,843-Speed 5171.19 samples/sec Loss 2.4449 LearningRate 0.0275 Epoch: 9 Global Step: 158830 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:55:04,824-Speed 5172.09 samples/sec Loss 2.4412 LearningRate 0.0275 Epoch: 9 Global Step: 158840 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:55:06,793-Speed 5201.69 samples/sec Loss 2.3320 LearningRate 0.0275 Epoch: 9 Global Step: 158850 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:55:08,777-Speed 5163.56 samples/sec Loss 2.4726 LearningRate 0.0275 Epoch: 9 Global Step: 158860 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:55:10,757-Speed 5172.42 samples/sec Loss 2.3901 LearningRate 0.0275 Epoch: 9 Global Step: 158870 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:55:12,736-Speed 5175.83 samples/sec Loss 2.4455 LearningRate 0.0275 Epoch: 9 Global Step: 158880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:55:14,726-Speed 5149.26 samples/sec Loss 2.3514 LearningRate 0.0275 Epoch: 9 Global Step: 158890 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:55:16,691-Speed 5213.55 samples/sec Loss 2.4563 LearningRate 0.0275 Epoch: 9 Global Step: 158900 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:55:18,664-Speed 5190.05 samples/sec Loss 2.3992 LearningRate 0.0275 Epoch: 9 Global Step: 158910 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:55:20,633-Speed 5202.33 samples/sec Loss 2.4252 LearningRate 0.0275 Epoch: 9 Global Step: 158920 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:55:22,614-Speed 5170.58 samples/sec Loss 2.3596 LearningRate 0.0274 Epoch: 9 Global Step: 158930 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:55:24,589-Speed 5187.06 samples/sec Loss 2.3792 LearningRate 0.0274 Epoch: 9 Global Step: 158940 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:55:26,557-Speed 5205.42 samples/sec Loss 2.4635 LearningRate 0.0274 Epoch: 9 Global Step: 158950 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:55:28,527-Speed 5200.04 samples/sec Loss 2.3917 LearningRate 0.0274 Epoch: 9 Global Step: 158960 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:55:30,510-Speed 5163.40 samples/sec Loss 2.3886 LearningRate 0.0274 Epoch: 9 Global Step: 158970 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:55:32,495-Speed 5161.32 samples/sec Loss 2.4559 LearningRate 0.0274 Epoch: 9 Global Step: 158980 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:55:34,468-Speed 5193.68 samples/sec Loss 2.4212 LearningRate 0.0274 Epoch: 9 Global Step: 158990 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:55:36,445-Speed 5181.12 samples/sec Loss 2.3806 LearningRate 0.0274 Epoch: 9 Global Step: 159000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:55:38,432-Speed 5153.56 samples/sec Loss 2.4032 LearningRate 0.0274 Epoch: 9 Global Step: 159010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:55:40,401-Speed 5204.05 samples/sec Loss 2.4028 LearningRate 0.0274 Epoch: 9 Global Step: 159020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:55:42,371-Speed 5199.15 samples/sec Loss 2.3734 LearningRate 0.0274 Epoch: 9 Global Step: 159030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:55:44,347-Speed 5183.51 samples/sec Loss 2.4361 LearningRate 0.0274 Epoch: 9 Global Step: 159040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:55:46,340-Speed 5140.86 samples/sec Loss 2.3718 LearningRate 0.0274 Epoch: 9 Global Step: 159050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:55:48,325-Speed 5159.69 samples/sec Loss 2.4397 LearningRate 0.0274 Epoch: 9 Global Step: 159060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:55:50,316-Speed 5143.55 samples/sec Loss 2.3900 LearningRate 0.0274 Epoch: 9 Global Step: 159070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:55:52,307-Speed 5146.25 samples/sec Loss 2.3648 LearningRate 0.0274 Epoch: 9 Global Step: 159080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:55:54,301-Speed 5136.59 samples/sec Loss 2.4268 LearningRate 0.0274 Epoch: 9 Global Step: 159090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:55:56,271-Speed 5201.21 samples/sec Loss 2.4465 LearningRate 0.0274 Epoch: 9 Global Step: 159100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:55:58,264-Speed 5139.93 samples/sec Loss 2.4264 LearningRate 0.0274 Epoch: 9 Global Step: 159110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:56:00,238-Speed 5187.66 samples/sec Loss 2.3804 LearningRate 0.0274 Epoch: 9 Global Step: 159120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:56:02,219-Speed 5171.87 samples/sec Loss 2.4794 LearningRate 0.0274 Epoch: 9 Global Step: 159130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:56:04,195-Speed 5183.83 samples/sec Loss 2.3864 LearningRate 0.0274 Epoch: 9 Global Step: 159140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:56:06,170-Speed 5185.76 samples/sec Loss 2.4120 LearningRate 0.0274 Epoch: 9 Global Step: 159150 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:56:08,143-Speed 5192.91 samples/sec Loss 2.3902 LearningRate 0.0274 Epoch: 9 Global Step: 159160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:56:10,130-Speed 5154.66 samples/sec Loss 2.4134 LearningRate 0.0274 Epoch: 9 Global Step: 159170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:56:12,119-Speed 5147.92 samples/sec Loss 2.4121 LearningRate 0.0274 Epoch: 9 Global Step: 159180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:56:14,097-Speed 5180.22 samples/sec Loss 2.4017 LearningRate 0.0274 Epoch: 9 Global Step: 159190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:56:16,096-Speed 5124.81 samples/sec Loss 2.3260 LearningRate 0.0274 Epoch: 9 Global Step: 159200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:56:18,069-Speed 5192.81 samples/sec Loss 2.3925 LearningRate 0.0274 Epoch: 9 Global Step: 159210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:56:20,043-Speed 5188.98 samples/sec Loss 2.4066 LearningRate 0.0274 Epoch: 9 Global Step: 159220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:56:22,034-Speed 5143.98 samples/sec Loss 2.5038 LearningRate 0.0274 Epoch: 9 Global Step: 159230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:56:24,017-Speed 5164.97 samples/sec Loss 2.3497 LearningRate 0.0274 Epoch: 9 Global Step: 159240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:56:25,992-Speed 5187.24 samples/sec Loss 2.4441 LearningRate 0.0273 Epoch: 9 Global Step: 159250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:56:27,960-Speed 5204.90 samples/sec Loss 2.4035 LearningRate 0.0273 Epoch: 9 Global Step: 159260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:56:29,934-Speed 5189.54 samples/sec Loss 2.4378 LearningRate 0.0273 Epoch: 9 Global Step: 159270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:56:31,931-Speed 5129.44 samples/sec Loss 2.4296 LearningRate 0.0273 Epoch: 9 Global Step: 159280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:56:33,912-Speed 5171.97 samples/sec Loss 2.4049 LearningRate 0.0273 Epoch: 9 Global Step: 159290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:56:35,888-Speed 5184.29 samples/sec Loss 2.4589 LearningRate 0.0273 Epoch: 9 Global Step: 159300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:56:37,865-Speed 5179.98 samples/sec Loss 2.5120 LearningRate 0.0273 Epoch: 9 Global Step: 159310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:56:39,839-Speed 5189.55 samples/sec Loss 2.4215 LearningRate 0.0273 Epoch: 9 Global Step: 159320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:56:41,811-Speed 5194.23 samples/sec Loss 2.4566 LearningRate 0.0273 Epoch: 9 Global Step: 159330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:56:43,799-Speed 5153.04 samples/sec Loss 2.3637 LearningRate 0.0273 Epoch: 9 Global Step: 159340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:56:45,768-Speed 5201.80 samples/sec Loss 2.4854 LearningRate 0.0273 Epoch: 9 Global Step: 159350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:56:47,751-Speed 5165.67 samples/sec Loss 2.4310 LearningRate 0.0273 Epoch: 9 Global Step: 159360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:56:49,717-Speed 5209.23 samples/sec Loss 2.3708 LearningRate 0.0273 Epoch: 9 Global Step: 159370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:56:51,718-Speed 5120.60 samples/sec Loss 2.4319 LearningRate 0.0273 Epoch: 9 Global Step: 159380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:56:53,686-Speed 5205.01 samples/sec Loss 2.4019 LearningRate 0.0273 Epoch: 9 Global Step: 159390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:56:55,659-Speed 5191.98 samples/sec Loss 2.4014 LearningRate 0.0273 Epoch: 9 Global Step: 159400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:56:57,633-Speed 5190.22 samples/sec Loss 2.4053 LearningRate 0.0273 Epoch: 9 Global Step: 159410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:56:59,606-Speed 5191.49 samples/sec Loss 2.3919 LearningRate 0.0273 Epoch: 9 Global Step: 159420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:57:01,577-Speed 5194.96 samples/sec Loss 2.4301 LearningRate 0.0273 Epoch: 9 Global Step: 159430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:57:03,551-Speed 5188.98 samples/sec Loss 2.4481 LearningRate 0.0273 Epoch: 9 Global Step: 159440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:57:05,524-Speed 5192.50 samples/sec Loss 2.3647 LearningRate 0.0273 Epoch: 9 Global Step: 159450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:57:07,501-Speed 5181.22 samples/sec Loss 2.3606 LearningRate 0.0273 Epoch: 9 Global Step: 159460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:57:09,486-Speed 5161.61 samples/sec Loss 2.3487 LearningRate 0.0273 Epoch: 9 Global Step: 159470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:57:11,472-Speed 5155.68 samples/sec Loss 2.4427 LearningRate 0.0273 Epoch: 9 Global Step: 159480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:57:13,449-Speed 5181.93 samples/sec Loss 2.3784 LearningRate 0.0273 Epoch: 9 Global Step: 159490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:57:15,435-Speed 5159.15 samples/sec Loss 2.4590 LearningRate 0.0273 Epoch: 9 Global Step: 159500 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:57:17,411-Speed 5185.85 samples/sec Loss 2.4360 LearningRate 0.0273 Epoch: 9 Global Step: 159510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:57:19,393-Speed 5169.06 samples/sec Loss 2.3416 LearningRate 0.0273 Epoch: 9 Global Step: 159520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:57:21,377-Speed 5161.85 samples/sec Loss 2.4186 LearningRate 0.0273 Epoch: 9 Global Step: 159530 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:57:23,352-Speed 5187.69 samples/sec Loss 2.4429 LearningRate 0.0273 Epoch: 9 Global Step: 159540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:57:25,331-Speed 5174.67 samples/sec Loss 2.4028 LearningRate 0.0273 Epoch: 9 Global Step: 159550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:57:27,305-Speed 5188.39 samples/sec Loss 2.4569 LearningRate 0.0273 Epoch: 9 Global Step: 159560 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:57:29,288-Speed 5167.48 samples/sec Loss 2.4049 LearningRate 0.0272 Epoch: 9 Global Step: 159570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:57:31,278-Speed 5147.31 samples/sec Loss 2.4244 LearningRate 0.0272 Epoch: 9 Global Step: 159580 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:57:33,253-Speed 5184.30 samples/sec Loss 2.4819 LearningRate 0.0272 Epoch: 9 Global Step: 159590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:57:35,224-Speed 5197.49 samples/sec Loss 2.4504 LearningRate 0.0272 Epoch: 9 Global Step: 159600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:57:37,199-Speed 5187.80 samples/sec Loss 2.4376 LearningRate 0.0272 Epoch: 9 Global Step: 159610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:57:39,173-Speed 5188.23 samples/sec Loss 2.3504 LearningRate 0.0272 Epoch: 9 Global Step: 159620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:57:41,145-Speed 5195.96 samples/sec Loss 2.4153 LearningRate 0.0272 Epoch: 9 Global Step: 159630 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:57:43,118-Speed 5192.03 samples/sec Loss 2.3931 LearningRate 0.0272 Epoch: 9 Global Step: 159640 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:57:45,093-Speed 5186.87 samples/sec Loss 2.4502 LearningRate 0.0272 Epoch: 9 Global Step: 159650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:57:47,071-Speed 5177.59 samples/sec Loss 2.4293 LearningRate 0.0272 Epoch: 9 Global Step: 159660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:57:49,046-Speed 5187.11 samples/sec Loss 2.3977 LearningRate 0.0272 Epoch: 9 Global Step: 159670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:57:51,041-Speed 5132.88 samples/sec Loss 2.4709 LearningRate 0.0272 Epoch: 9 Global Step: 159680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:57:53,017-Speed 5186.28 samples/sec Loss 2.4446 LearningRate 0.0272 Epoch: 9 Global Step: 159690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:57:55,000-Speed 5164.42 samples/sec Loss 2.3908 LearningRate 0.0272 Epoch: 9 Global Step: 159700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:57:56,974-Speed 5187.91 samples/sec Loss 2.4127 LearningRate 0.0272 Epoch: 9 Global Step: 159710 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:57:58,942-Speed 5206.03 samples/sec Loss 2.4811 LearningRate 0.0272 Epoch: 9 Global Step: 159720 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:58:00,918-Speed 5184.02 samples/sec Loss 2.4075 LearningRate 0.0272 Epoch: 9 Global Step: 159730 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:58:02,923-Speed 5108.47 samples/sec Loss 2.3707 LearningRate 0.0272 Epoch: 9 Global Step: 159740 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:58:04,899-Speed 5185.92 samples/sec Loss 2.3320 LearningRate 0.0272 Epoch: 9 Global Step: 159750 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:58:06,872-Speed 5191.90 samples/sec Loss 2.4885 LearningRate 0.0272 Epoch: 9 Global Step: 159760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:58:08,848-Speed 5184.04 samples/sec Loss 2.3485 LearningRate 0.0272 Epoch: 9 Global Step: 159770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:58:10,837-Speed 5149.28 samples/sec Loss 2.4469 LearningRate 0.0272 Epoch: 9 Global Step: 159780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:58:12,810-Speed 5191.32 samples/sec Loss 2.4580 LearningRate 0.0272 Epoch: 9 Global Step: 159790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:58:14,787-Speed 5182.95 samples/sec Loss 2.3674 LearningRate 0.0272 Epoch: 9 Global Step: 159800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:58:16,773-Speed 5157.92 samples/sec Loss 2.3705 LearningRate 0.0272 Epoch: 9 Global Step: 159810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:58:18,760-Speed 5154.62 samples/sec Loss 2.4495 LearningRate 0.0272 Epoch: 9 Global Step: 159820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:58:20,730-Speed 5201.22 samples/sec Loss 2.4246 LearningRate 0.0272 Epoch: 9 Global Step: 159830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:58:22,719-Speed 5150.53 samples/sec Loss 2.3709 LearningRate 0.0272 Epoch: 9 Global Step: 159840 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:58:24,692-Speed 5190.30 samples/sec Loss 2.4634 LearningRate 0.0272 Epoch: 9 Global Step: 159850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:58:26,683-Speed 5145.05 samples/sec Loss 2.3610 LearningRate 0.0272 Epoch: 9 Global Step: 159860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:58:28,657-Speed 5189.31 samples/sec Loss 2.5373 LearningRate 0.0272 Epoch: 9 Global Step: 159870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:58:30,633-Speed 5183.33 samples/sec Loss 2.4124 LearningRate 0.0272 Epoch: 9 Global Step: 159880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:58:32,610-Speed 5179.73 samples/sec Loss 2.4351 LearningRate 0.0271 Epoch: 9 Global Step: 159890 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:58:34,584-Speed 5190.35 samples/sec Loss 2.4672 LearningRate 0.0271 Epoch: 9 Global Step: 159900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:58:36,577-Speed 5139.77 samples/sec Loss 2.4555 LearningRate 0.0271 Epoch: 9 Global Step: 159910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:58:38,552-Speed 5187.16 samples/sec Loss 2.4206 LearningRate 0.0271 Epoch: 9 Global Step: 159920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:58:40,525-Speed 5190.37 samples/sec Loss 2.3509 LearningRate 0.0271 Epoch: 9 Global Step: 159930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:58:42,503-Speed 5180.41 samples/sec Loss 2.3637 LearningRate 0.0271 Epoch: 9 Global Step: 159940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:58:44,478-Speed 5187.19 samples/sec Loss 2.3935 LearningRate 0.0271 Epoch: 9 Global Step: 159950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:58:46,458-Speed 5171.89 samples/sec Loss 2.3997 LearningRate 0.0271 Epoch: 9 Global Step: 159960 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 09:58:48,436-Speed 5179.09 samples/sec Loss 2.4349 LearningRate 0.0271 Epoch: 9 Global Step: 159970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 09:58:50,405-Speed 5202.75 samples/sec Loss 2.3414 LearningRate 0.0271 Epoch: 9 Global Step: 159980 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:58:52,383-Speed 5179.14 samples/sec Loss 2.3655 LearningRate 0.0271 Epoch: 9 Global Step: 159990 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:58:54,364-Speed 5170.40 samples/sec Loss 2.4372 LearningRate 0.0271 Epoch: 9 Global Step: 160000 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 09:59:20,985-[lfw][160000]XNorm: 22.783834 Training: 2022-04-11 09:59:20,985-[lfw][160000]Accuracy-Flip: 0.99800+-0.00245 Training: 2022-04-11 09:59:20,986-[lfw][160000]Accuracy-Highest: 0.99833 Training: 2022-04-11 09:59:51,642-[cfp_fp][160000]XNorm: 20.621144 Training: 2022-04-11 09:59:51,643-[cfp_fp][160000]Accuracy-Flip: 0.98529+-0.00515 Training: 2022-04-11 09:59:51,643-[cfp_fp][160000]Accuracy-Highest: 0.98529 Training: 2022-04-11 10:00:18,074-[agedb_30][160000]XNorm: 22.597258 Training: 2022-04-11 10:00:18,074-[agedb_30][160000]Accuracy-Flip: 0.98067+-0.00629 Training: 2022-04-11 10:00:18,075-[agedb_30][160000]Accuracy-Highest: 0.98167 Training: 2022-04-11 10:00:20,050-Speed 119.51 samples/sec Loss 2.4551 LearningRate 0.0271 Epoch: 9 Global Step: 160010 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:00:22,013-Speed 5219.72 samples/sec Loss 2.3957 LearningRate 0.0271 Epoch: 9 Global Step: 160020 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:00:23,976-Speed 5217.40 samples/sec Loss 2.4443 LearningRate 0.0271 Epoch: 9 Global Step: 160030 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:00:25,955-Speed 5176.28 samples/sec Loss 2.3995 LearningRate 0.0271 Epoch: 9 Global Step: 160040 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:00:27,920-Speed 5220.97 samples/sec Loss 2.3778 LearningRate 0.0271 Epoch: 9 Global Step: 160050 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:00:29,885-Speed 5215.60 samples/sec Loss 2.3878 LearningRate 0.0271 Epoch: 9 Global Step: 160060 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:00:31,846-Speed 5222.33 samples/sec Loss 2.4424 LearningRate 0.0271 Epoch: 9 Global Step: 160070 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:00:33,815-Speed 5201.43 samples/sec Loss 2.4796 LearningRate 0.0271 Epoch: 9 Global Step: 160080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:00:35,792-Speed 5181.25 samples/sec Loss 2.4826 LearningRate 0.0271 Epoch: 9 Global Step: 160090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:00:37,756-Speed 5216.14 samples/sec Loss 2.4203 LearningRate 0.0271 Epoch: 9 Global Step: 160100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:00:39,733-Speed 5180.02 samples/sec Loss 2.3811 LearningRate 0.0271 Epoch: 9 Global Step: 160110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:00:41,720-Speed 5156.90 samples/sec Loss 2.3997 LearningRate 0.0271 Epoch: 9 Global Step: 160120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:00:43,697-Speed 5179.74 samples/sec Loss 2.3606 LearningRate 0.0271 Epoch: 9 Global Step: 160130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:00:45,673-Speed 5184.30 samples/sec Loss 2.3937 LearningRate 0.0271 Epoch: 9 Global Step: 160140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:00:47,674-Speed 5119.25 samples/sec Loss 2.4129 LearningRate 0.0271 Epoch: 9 Global Step: 160150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:00:49,664-Speed 5148.20 samples/sec Loss 2.4168 LearningRate 0.0271 Epoch: 9 Global Step: 160160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:00:51,649-Speed 5158.63 samples/sec Loss 2.3974 LearningRate 0.0271 Epoch: 9 Global Step: 160170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:00:53,625-Speed 5186.10 samples/sec Loss 2.4409 LearningRate 0.0271 Epoch: 9 Global Step: 160180 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:00:55,597-Speed 5194.78 samples/sec Loss 2.4098 LearningRate 0.0271 Epoch: 9 Global Step: 160190 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:00:57,600-Speed 5114.67 samples/sec Loss 2.4280 LearningRate 0.0271 Epoch: 9 Global Step: 160200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:00:59,581-Speed 5169.37 samples/sec Loss 2.3288 LearningRate 0.0270 Epoch: 9 Global Step: 160210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:01:01,559-Speed 5179.09 samples/sec Loss 2.4450 LearningRate 0.0270 Epoch: 9 Global Step: 160220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:01:03,528-Speed 5201.85 samples/sec Loss 2.5130 LearningRate 0.0270 Epoch: 9 Global Step: 160230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:01:05,496-Speed 5205.74 samples/sec Loss 2.4668 LearningRate 0.0270 Epoch: 9 Global Step: 160240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:01:07,464-Speed 5204.91 samples/sec Loss 2.4118 LearningRate 0.0270 Epoch: 9 Global Step: 160250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:01:09,444-Speed 5172.88 samples/sec Loss 2.4130 LearningRate 0.0270 Epoch: 9 Global Step: 160260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:01:11,431-Speed 5155.40 samples/sec Loss 2.4275 LearningRate 0.0270 Epoch: 9 Global Step: 160270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:01:13,411-Speed 5173.08 samples/sec Loss 2.4542 LearningRate 0.0270 Epoch: 9 Global Step: 160280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:01:15,396-Speed 5162.25 samples/sec Loss 2.3627 LearningRate 0.0270 Epoch: 9 Global Step: 160290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:01:17,365-Speed 5199.98 samples/sec Loss 2.3684 LearningRate 0.0270 Epoch: 9 Global Step: 160300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:01:19,331-Speed 5212.03 samples/sec Loss 2.4020 LearningRate 0.0270 Epoch: 9 Global Step: 160310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:01:21,291-Speed 5224.07 samples/sec Loss 2.4364 LearningRate 0.0270 Epoch: 9 Global Step: 160320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:01:23,264-Speed 5192.31 samples/sec Loss 2.4703 LearningRate 0.0270 Epoch: 9 Global Step: 160330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:01:25,234-Speed 5199.39 samples/sec Loss 2.4411 LearningRate 0.0270 Epoch: 9 Global Step: 160340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:01:27,202-Speed 5205.62 samples/sec Loss 2.4282 LearningRate 0.0270 Epoch: 9 Global Step: 160350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:01:29,185-Speed 5166.31 samples/sec Loss 2.4474 LearningRate 0.0270 Epoch: 9 Global Step: 160360 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:01:31,153-Speed 5203.53 samples/sec Loss 2.4271 LearningRate 0.0270 Epoch: 9 Global Step: 160370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:01:33,140-Speed 5156.22 samples/sec Loss 2.4015 LearningRate 0.0270 Epoch: 9 Global Step: 160380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:01:35,115-Speed 5188.35 samples/sec Loss 2.4101 LearningRate 0.0270 Epoch: 9 Global Step: 160390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:01:37,085-Speed 5198.24 samples/sec Loss 2.3958 LearningRate 0.0270 Epoch: 9 Global Step: 160400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:01:39,055-Speed 5200.95 samples/sec Loss 2.4517 LearningRate 0.0270 Epoch: 9 Global Step: 160410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:01:41,022-Speed 5206.97 samples/sec Loss 2.4423 LearningRate 0.0270 Epoch: 9 Global Step: 160420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:01:42,985-Speed 5216.77 samples/sec Loss 2.3922 LearningRate 0.0270 Epoch: 9 Global Step: 160430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:01:44,953-Speed 5206.60 samples/sec Loss 2.4626 LearningRate 0.0270 Epoch: 9 Global Step: 160440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:01:46,928-Speed 5185.14 samples/sec Loss 2.3074 LearningRate 0.0270 Epoch: 9 Global Step: 160450 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:01:48,892-Speed 5216.57 samples/sec Loss 2.4064 LearningRate 0.0270 Epoch: 9 Global Step: 160460 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:01:50,876-Speed 5162.55 samples/sec Loss 2.4533 LearningRate 0.0270 Epoch: 9 Global Step: 160470 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:01:52,843-Speed 5208.09 samples/sec Loss 2.3854 LearningRate 0.0270 Epoch: 9 Global Step: 160480 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:01:54,815-Speed 5194.27 samples/sec Loss 2.3600 LearningRate 0.0270 Epoch: 9 Global Step: 160490 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:01:56,785-Speed 5199.61 samples/sec Loss 2.4125 LearningRate 0.0270 Epoch: 9 Global Step: 160500 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:01:58,761-Speed 5184.42 samples/sec Loss 2.4154 LearningRate 0.0270 Epoch: 9 Global Step: 160510 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:02:00,752-Speed 5145.37 samples/sec Loss 2.3808 LearningRate 0.0270 Epoch: 9 Global Step: 160520 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:02:02,740-Speed 5152.50 samples/sec Loss 2.3987 LearningRate 0.0269 Epoch: 9 Global Step: 160530 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:02:04,714-Speed 5189.18 samples/sec Loss 2.4826 LearningRate 0.0269 Epoch: 9 Global Step: 160540 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:02:06,681-Speed 5206.35 samples/sec Loss 2.4348 LearningRate 0.0269 Epoch: 9 Global Step: 160550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:02:08,654-Speed 5192.48 samples/sec Loss 2.3810 LearningRate 0.0269 Epoch: 9 Global Step: 160560 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:02:10,621-Speed 5208.15 samples/sec Loss 2.4344 LearningRate 0.0269 Epoch: 9 Global Step: 160570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:02:12,587-Speed 5208.53 samples/sec Loss 2.3896 LearningRate 0.0269 Epoch: 9 Global Step: 160580 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:02:14,566-Speed 5177.99 samples/sec Loss 2.4276 LearningRate 0.0269 Epoch: 9 Global Step: 160590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:02:16,560-Speed 5138.22 samples/sec Loss 2.3220 LearningRate 0.0269 Epoch: 9 Global Step: 160600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:02:18,527-Speed 5207.98 samples/sec Loss 2.4869 LearningRate 0.0269 Epoch: 9 Global Step: 160610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:02:20,517-Speed 5145.74 samples/sec Loss 2.3961 LearningRate 0.0269 Epoch: 9 Global Step: 160620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:02:22,506-Speed 5150.53 samples/sec Loss 2.4844 LearningRate 0.0269 Epoch: 9 Global Step: 160630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:02:24,470-Speed 5214.37 samples/sec Loss 2.4180 LearningRate 0.0269 Epoch: 9 Global Step: 160640 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:02:26,447-Speed 5181.40 samples/sec Loss 2.4666 LearningRate 0.0269 Epoch: 9 Global Step: 160650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:02:28,418-Speed 5198.23 samples/sec Loss 2.4451 LearningRate 0.0269 Epoch: 9 Global Step: 160660 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:02:30,392-Speed 5187.30 samples/sec Loss 2.4378 LearningRate 0.0269 Epoch: 9 Global Step: 160670 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:02:32,359-Speed 5209.60 samples/sec Loss 2.4501 LearningRate 0.0269 Epoch: 9 Global Step: 160680 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:02:34,326-Speed 5208.03 samples/sec Loss 2.3574 LearningRate 0.0269 Epoch: 9 Global Step: 160690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:02:36,289-Speed 5216.24 samples/sec Loss 2.3817 LearningRate 0.0269 Epoch: 9 Global Step: 160700 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:02:38,274-Speed 5161.09 samples/sec Loss 2.3876 LearningRate 0.0269 Epoch: 9 Global Step: 160710 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:02:40,235-Speed 5225.17 samples/sec Loss 2.4338 LearningRate 0.0269 Epoch: 9 Global Step: 160720 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:02:42,205-Speed 5200.00 samples/sec Loss 2.4375 LearningRate 0.0269 Epoch: 9 Global Step: 160730 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:02:44,175-Speed 5199.61 samples/sec Loss 2.3934 LearningRate 0.0269 Epoch: 9 Global Step: 160740 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:02:46,155-Speed 5172.75 samples/sec Loss 2.4505 LearningRate 0.0269 Epoch: 9 Global Step: 160750 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:02:48,131-Speed 5183.63 samples/sec Loss 2.4761 LearningRate 0.0269 Epoch: 9 Global Step: 160760 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:02:50,100-Speed 5201.70 samples/sec Loss 2.3921 LearningRate 0.0269 Epoch: 9 Global Step: 160770 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:02:52,078-Speed 5179.10 samples/sec Loss 2.3920 LearningRate 0.0269 Epoch: 9 Global Step: 160780 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:02:54,043-Speed 5212.35 samples/sec Loss 2.4741 LearningRate 0.0269 Epoch: 9 Global Step: 160790 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:02:56,013-Speed 5199.97 samples/sec Loss 2.3687 LearningRate 0.0269 Epoch: 9 Global Step: 160800 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:02:57,980-Speed 5208.57 samples/sec Loss 2.3868 LearningRate 0.0269 Epoch: 9 Global Step: 160810 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:02:59,948-Speed 5206.25 samples/sec Loss 2.3762 LearningRate 0.0269 Epoch: 9 Global Step: 160820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:03:01,916-Speed 5203.40 samples/sec Loss 2.4208 LearningRate 0.0269 Epoch: 9 Global Step: 160830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:03:03,885-Speed 5204.11 samples/sec Loss 2.2924 LearningRate 0.0269 Epoch: 9 Global Step: 160840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:03:05,864-Speed 5174.82 samples/sec Loss 2.3567 LearningRate 0.0268 Epoch: 9 Global Step: 160850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:03:07,834-Speed 5199.40 samples/sec Loss 2.4027 LearningRate 0.0268 Epoch: 9 Global Step: 160860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:03:09,804-Speed 5198.93 samples/sec Loss 2.4631 LearningRate 0.0268 Epoch: 9 Global Step: 160870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:03:11,779-Speed 5186.52 samples/sec Loss 2.3928 LearningRate 0.0268 Epoch: 9 Global Step: 160880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:03:13,750-Speed 5198.33 samples/sec Loss 2.4764 LearningRate 0.0268 Epoch: 9 Global Step: 160890 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:03:15,732-Speed 5177.69 samples/sec Loss 2.4628 LearningRate 0.0268 Epoch: 9 Global Step: 160900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:03:17,714-Speed 5167.05 samples/sec Loss 2.4920 LearningRate 0.0268 Epoch: 9 Global Step: 160910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:03:19,692-Speed 5180.49 samples/sec Loss 2.4327 LearningRate 0.0268 Epoch: 9 Global Step: 160920 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:03:21,660-Speed 5204.36 samples/sec Loss 2.3585 LearningRate 0.0268 Epoch: 9 Global Step: 160930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:03:23,639-Speed 5176.21 samples/sec Loss 2.3652 LearningRate 0.0268 Epoch: 9 Global Step: 160940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:03:25,609-Speed 5198.82 samples/sec Loss 2.4242 LearningRate 0.0268 Epoch: 9 Global Step: 160950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:03:27,584-Speed 5185.53 samples/sec Loss 2.4152 LearningRate 0.0268 Epoch: 9 Global Step: 160960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:03:29,557-Speed 5192.75 samples/sec Loss 2.4618 LearningRate 0.0268 Epoch: 9 Global Step: 160970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:03:31,532-Speed 5185.79 samples/sec Loss 2.3071 LearningRate 0.0268 Epoch: 9 Global Step: 160980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:03:33,510-Speed 5179.26 samples/sec Loss 2.3692 LearningRate 0.0268 Epoch: 9 Global Step: 160990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:03:35,482-Speed 5195.14 samples/sec Loss 2.4556 LearningRate 0.0268 Epoch: 9 Global Step: 161000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:03:37,448-Speed 5208.17 samples/sec Loss 2.3277 LearningRate 0.0268 Epoch: 9 Global Step: 161010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:03:39,422-Speed 5190.42 samples/sec Loss 2.4593 LearningRate 0.0268 Epoch: 9 Global Step: 161020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:03:41,405-Speed 5166.73 samples/sec Loss 2.5035 LearningRate 0.0268 Epoch: 9 Global Step: 161030 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:03:43,375-Speed 5200.28 samples/sec Loss 2.4502 LearningRate 0.0268 Epoch: 9 Global Step: 161040 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:03:45,346-Speed 5196.40 samples/sec Loss 2.4246 LearningRate 0.0268 Epoch: 9 Global Step: 161050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:03:47,326-Speed 5173.09 samples/sec Loss 2.3517 LearningRate 0.0268 Epoch: 9 Global Step: 161060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:03:49,297-Speed 5196.94 samples/sec Loss 2.4248 LearningRate 0.0268 Epoch: 9 Global Step: 161070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:03:51,271-Speed 5189.44 samples/sec Loss 2.3779 LearningRate 0.0268 Epoch: 9 Global Step: 161080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:03:53,238-Speed 5206.12 samples/sec Loss 2.4163 LearningRate 0.0268 Epoch: 9 Global Step: 161090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:03:55,215-Speed 5184.82 samples/sec Loss 2.4413 LearningRate 0.0268 Epoch: 9 Global Step: 161100 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:03:57,190-Speed 5185.38 samples/sec Loss 2.4031 LearningRate 0.0268 Epoch: 9 Global Step: 161110 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:03:59,168-Speed 5179.51 samples/sec Loss 2.4385 LearningRate 0.0268 Epoch: 9 Global Step: 161120 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:04:01,138-Speed 5198.82 samples/sec Loss 2.4150 LearningRate 0.0268 Epoch: 9 Global Step: 161130 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:04:03,105-Speed 5207.39 samples/sec Loss 2.4281 LearningRate 0.0268 Epoch: 9 Global Step: 161140 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:04:05,067-Speed 5222.15 samples/sec Loss 2.4891 LearningRate 0.0268 Epoch: 9 Global Step: 161150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:04:07,037-Speed 5199.22 samples/sec Loss 2.3737 LearningRate 0.0268 Epoch: 9 Global Step: 161160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:04:09,003-Speed 5210.51 samples/sec Loss 2.4424 LearningRate 0.0267 Epoch: 9 Global Step: 161170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:04:10,978-Speed 5185.61 samples/sec Loss 2.4896 LearningRate 0.0267 Epoch: 9 Global Step: 161180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:04:12,955-Speed 5181.86 samples/sec Loss 2.3821 LearningRate 0.0267 Epoch: 9 Global Step: 161190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:04:14,930-Speed 5186.38 samples/sec Loss 2.4410 LearningRate 0.0267 Epoch: 9 Global Step: 161200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:04:16,917-Speed 5154.72 samples/sec Loss 2.4893 LearningRate 0.0267 Epoch: 9 Global Step: 161210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:04:18,893-Speed 5183.89 samples/sec Loss 2.3550 LearningRate 0.0267 Epoch: 9 Global Step: 161220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:04:20,862-Speed 5201.35 samples/sec Loss 2.3889 LearningRate 0.0267 Epoch: 9 Global Step: 161230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:04:22,844-Speed 5169.86 samples/sec Loss 2.4323 LearningRate 0.0267 Epoch: 9 Global Step: 161240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:04:24,814-Speed 5199.63 samples/sec Loss 2.4309 LearningRate 0.0267 Epoch: 9 Global Step: 161250 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:04:26,794-Speed 5175.59 samples/sec Loss 2.3596 LearningRate 0.0267 Epoch: 9 Global Step: 161260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:04:28,769-Speed 5186.93 samples/sec Loss 2.4124 LearningRate 0.0267 Epoch: 9 Global Step: 161270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:04:30,743-Speed 5189.13 samples/sec Loss 2.4439 LearningRate 0.0267 Epoch: 9 Global Step: 161280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:04:32,712-Speed 5200.53 samples/sec Loss 2.4271 LearningRate 0.0267 Epoch: 9 Global Step: 161290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:04:34,681-Speed 5203.20 samples/sec Loss 2.4572 LearningRate 0.0267 Epoch: 9 Global Step: 161300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:04:36,651-Speed 5198.39 samples/sec Loss 2.4083 LearningRate 0.0267 Epoch: 9 Global Step: 161310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:04:38,629-Speed 5179.65 samples/sec Loss 2.3937 LearningRate 0.0267 Epoch: 9 Global Step: 161320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:04:40,612-Speed 5165.75 samples/sec Loss 2.4729 LearningRate 0.0267 Epoch: 9 Global Step: 161330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:04:42,586-Speed 5188.39 samples/sec Loss 2.4287 LearningRate 0.0267 Epoch: 9 Global Step: 161340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:04:44,555-Speed 5204.03 samples/sec Loss 2.4171 LearningRate 0.0267 Epoch: 9 Global Step: 161350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:04:46,533-Speed 5177.80 samples/sec Loss 2.4211 LearningRate 0.0267 Epoch: 9 Global Step: 161360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:04:48,514-Speed 5171.23 samples/sec Loss 2.4074 LearningRate 0.0267 Epoch: 9 Global Step: 161370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:04:50,493-Speed 5174.75 samples/sec Loss 2.4787 LearningRate 0.0267 Epoch: 9 Global Step: 161380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:04:52,483-Speed 5148.71 samples/sec Loss 2.4089 LearningRate 0.0267 Epoch: 9 Global Step: 161390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:04:54,470-Speed 5153.86 samples/sec Loss 2.4217 LearningRate 0.0267 Epoch: 9 Global Step: 161400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:04:56,447-Speed 5182.84 samples/sec Loss 2.3153 LearningRate 0.0267 Epoch: 9 Global Step: 161410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:04:58,418-Speed 5196.32 samples/sec Loss 2.4133 LearningRate 0.0267 Epoch: 9 Global Step: 161420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:05:00,420-Speed 5115.82 samples/sec Loss 2.4923 LearningRate 0.0267 Epoch: 9 Global Step: 161430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:05:02,421-Speed 5120.38 samples/sec Loss 2.4535 LearningRate 0.0267 Epoch: 9 Global Step: 161440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:05:04,402-Speed 5171.76 samples/sec Loss 2.3592 LearningRate 0.0267 Epoch: 9 Global Step: 161450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:05:06,383-Speed 5168.79 samples/sec Loss 2.4267 LearningRate 0.0267 Epoch: 9 Global Step: 161460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:05:08,358-Speed 5186.93 samples/sec Loss 2.4197 LearningRate 0.0267 Epoch: 9 Global Step: 161470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:05:10,323-Speed 5214.77 samples/sec Loss 2.4249 LearningRate 0.0267 Epoch: 9 Global Step: 161480 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:05:12,297-Speed 5188.54 samples/sec Loss 2.3925 LearningRate 0.0266 Epoch: 9 Global Step: 161490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:05:14,282-Speed 5159.99 samples/sec Loss 2.3474 LearningRate 0.0266 Epoch: 9 Global Step: 161500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:05:16,292-Speed 5095.12 samples/sec Loss 2.4348 LearningRate 0.0266 Epoch: 9 Global Step: 161510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:05:18,269-Speed 5182.21 samples/sec Loss 2.3730 LearningRate 0.0266 Epoch: 9 Global Step: 161520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:05:20,255-Speed 5157.32 samples/sec Loss 2.4317 LearningRate 0.0266 Epoch: 9 Global Step: 161530 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:05:22,244-Speed 5151.50 samples/sec Loss 2.4534 LearningRate 0.0266 Epoch: 9 Global Step: 161540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:05:24,230-Speed 5157.67 samples/sec Loss 2.4187 LearningRate 0.0266 Epoch: 9 Global Step: 161550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:05:26,212-Speed 5166.58 samples/sec Loss 2.4714 LearningRate 0.0266 Epoch: 9 Global Step: 161560 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:05:28,192-Speed 5173.53 samples/sec Loss 2.3976 LearningRate 0.0266 Epoch: 9 Global Step: 161570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:05:30,162-Speed 5200.82 samples/sec Loss 2.4403 LearningRate 0.0266 Epoch: 9 Global Step: 161580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:05:32,129-Speed 5207.57 samples/sec Loss 2.4610 LearningRate 0.0266 Epoch: 9 Global Step: 161590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:05:34,100-Speed 5197.68 samples/sec Loss 2.4177 LearningRate 0.0266 Epoch: 9 Global Step: 161600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:05:36,092-Speed 5141.64 samples/sec Loss 2.4196 LearningRate 0.0266 Epoch: 9 Global Step: 161610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:05:38,069-Speed 5182.46 samples/sec Loss 2.4219 LearningRate 0.0266 Epoch: 9 Global Step: 161620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:05:40,050-Speed 5170.80 samples/sec Loss 2.4424 LearningRate 0.0266 Epoch: 9 Global Step: 161630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:05:42,032-Speed 5165.59 samples/sec Loss 2.4423 LearningRate 0.0266 Epoch: 9 Global Step: 161640 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:05:44,011-Speed 5178.20 samples/sec Loss 2.3957 LearningRate 0.0266 Epoch: 9 Global Step: 161650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:05:45,998-Speed 5154.15 samples/sec Loss 2.4504 LearningRate 0.0266 Epoch: 9 Global Step: 161660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:05:47,993-Speed 5135.57 samples/sec Loss 2.3794 LearningRate 0.0266 Epoch: 9 Global Step: 161670 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:05:49,969-Speed 5184.30 samples/sec Loss 2.3684 LearningRate 0.0266 Epoch: 9 Global Step: 161680 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:05:51,947-Speed 5178.25 samples/sec Loss 2.4100 LearningRate 0.0266 Epoch: 9 Global Step: 161690 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:05:53,922-Speed 5186.56 samples/sec Loss 2.3863 LearningRate 0.0266 Epoch: 9 Global Step: 161700 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:05:55,886-Speed 5214.91 samples/sec Loss 2.4183 LearningRate 0.0266 Epoch: 9 Global Step: 161710 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:05:57,868-Speed 5168.17 samples/sec Loss 2.3654 LearningRate 0.0266 Epoch: 9 Global Step: 161720 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:05:59,845-Speed 5182.65 samples/sec Loss 2.4352 LearningRate 0.0266 Epoch: 9 Global Step: 161730 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:06:01,821-Speed 5183.09 samples/sec Loss 2.4161 LearningRate 0.0266 Epoch: 9 Global Step: 161740 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:06:03,805-Speed 5162.14 samples/sec Loss 2.4729 LearningRate 0.0266 Epoch: 9 Global Step: 161750 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:06:05,814-Speed 5099.69 samples/sec Loss 2.4125 LearningRate 0.0266 Epoch: 9 Global Step: 161760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:06:07,783-Speed 5202.53 samples/sec Loss 2.3900 LearningRate 0.0266 Epoch: 9 Global Step: 161770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:06:09,747-Speed 5216.00 samples/sec Loss 2.3695 LearningRate 0.0266 Epoch: 9 Global Step: 161780 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:06:11,721-Speed 5187.88 samples/sec Loss 2.3757 LearningRate 0.0266 Epoch: 9 Global Step: 161790 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:06:13,705-Speed 5162.96 samples/sec Loss 2.4166 LearningRate 0.0266 Epoch: 9 Global Step: 161800 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:06:15,720-Speed 5083.38 samples/sec Loss 2.4024 LearningRate 0.0266 Epoch: 9 Global Step: 161810 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:06:17,691-Speed 5198.74 samples/sec Loss 2.4649 LearningRate 0.0265 Epoch: 9 Global Step: 161820 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:06:19,664-Speed 5190.81 samples/sec Loss 2.4086 LearningRate 0.0265 Epoch: 9 Global Step: 161830 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:06:21,638-Speed 5189.37 samples/sec Loss 2.4615 LearningRate 0.0265 Epoch: 9 Global Step: 161840 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:06:23,643-Speed 5107.72 samples/sec Loss 2.4779 LearningRate 0.0265 Epoch: 9 Global Step: 161850 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:06:25,619-Speed 5185.76 samples/sec Loss 2.4147 LearningRate 0.0265 Epoch: 9 Global Step: 161860 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:06:27,651-Speed 5040.99 samples/sec Loss 2.4248 LearningRate 0.0265 Epoch: 9 Global Step: 161870 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:06:29,636-Speed 5159.98 samples/sec Loss 2.4506 LearningRate 0.0265 Epoch: 9 Global Step: 161880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:06:31,609-Speed 5192.77 samples/sec Loss 2.4322 LearningRate 0.0265 Epoch: 9 Global Step: 161890 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:06:33,606-Speed 5129.97 samples/sec Loss 2.4075 LearningRate 0.0265 Epoch: 9 Global Step: 161900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:06:35,600-Speed 5136.01 samples/sec Loss 2.4525 LearningRate 0.0265 Epoch: 9 Global Step: 161910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:06:37,583-Speed 5166.11 samples/sec Loss 2.3981 LearningRate 0.0265 Epoch: 9 Global Step: 161920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:06:39,574-Speed 5143.92 samples/sec Loss 2.4022 LearningRate 0.0265 Epoch: 9 Global Step: 161930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:06:41,546-Speed 5193.05 samples/sec Loss 2.4496 LearningRate 0.0265 Epoch: 9 Global Step: 161940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:06:43,516-Speed 5199.55 samples/sec Loss 2.4360 LearningRate 0.0265 Epoch: 9 Global Step: 161950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:06:45,507-Speed 5144.58 samples/sec Loss 2.4584 LearningRate 0.0265 Epoch: 9 Global Step: 161960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:06:47,482-Speed 5188.75 samples/sec Loss 2.3772 LearningRate 0.0265 Epoch: 9 Global Step: 161970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:06:49,448-Speed 5210.17 samples/sec Loss 2.4294 LearningRate 0.0265 Epoch: 9 Global Step: 161980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:06:51,419-Speed 5197.42 samples/sec Loss 2.5045 LearningRate 0.0265 Epoch: 9 Global Step: 161990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:06:53,391-Speed 5194.54 samples/sec Loss 2.4333 LearningRate 0.0265 Epoch: 9 Global Step: 162000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:07:20,057-[lfw][162000]XNorm: 21.836133 Training: 2022-04-11 10:07:20,058-[lfw][162000]Accuracy-Flip: 0.99750+-0.00239 Training: 2022-04-11 10:07:20,058-[lfw][162000]Accuracy-Highest: 0.99833 Training: 2022-04-11 10:07:50,792-[cfp_fp][162000]XNorm: 20.621309 Training: 2022-04-11 10:07:50,792-[cfp_fp][162000]Accuracy-Flip: 0.98500+-0.00614 Training: 2022-04-11 10:07:50,793-[cfp_fp][162000]Accuracy-Highest: 0.98529 Training: 2022-04-11 10:08:17,490-[agedb_30][162000]XNorm: 21.659581 Training: 2022-04-11 10:08:17,490-[agedb_30][162000]Accuracy-Flip: 0.98050+-0.00691 Training: 2022-04-11 10:08:17,491-[agedb_30][162000]Accuracy-Highest: 0.98167 Training: 2022-04-11 10:08:19,468-Speed 118.96 samples/sec Loss 2.3911 LearningRate 0.0265 Epoch: 9 Global Step: 162010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:08:21,443-Speed 5187.94 samples/sec Loss 2.3645 LearningRate 0.0265 Epoch: 9 Global Step: 162020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:08:23,408-Speed 5213.24 samples/sec Loss 2.4443 LearningRate 0.0265 Epoch: 9 Global Step: 162030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:08:25,379-Speed 5194.81 samples/sec Loss 2.4118 LearningRate 0.0265 Epoch: 9 Global Step: 162040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:08:27,341-Speed 5223.51 samples/sec Loss 2.4486 LearningRate 0.0265 Epoch: 9 Global Step: 162050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:08:29,300-Speed 5226.15 samples/sec Loss 2.4339 LearningRate 0.0265 Epoch: 9 Global Step: 162060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:08:31,262-Speed 5223.04 samples/sec Loss 2.4151 LearningRate 0.0265 Epoch: 9 Global Step: 162070 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:08:33,227-Speed 5210.52 samples/sec Loss 2.5284 LearningRate 0.0265 Epoch: 9 Global Step: 162080 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:08:35,220-Speed 5141.83 samples/sec Loss 2.4114 LearningRate 0.0265 Epoch: 9 Global Step: 162090 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:08:37,186-Speed 5208.80 samples/sec Loss 2.3771 LearningRate 0.0265 Epoch: 9 Global Step: 162100 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:08:39,150-Speed 5216.81 samples/sec Loss 2.3492 LearningRate 0.0265 Epoch: 9 Global Step: 162110 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:08:41,120-Speed 5200.81 samples/sec Loss 2.4585 LearningRate 0.0265 Epoch: 9 Global Step: 162120 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:08:43,101-Speed 5169.53 samples/sec Loss 2.3952 LearningRate 0.0265 Epoch: 9 Global Step: 162130 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:08:45,094-Speed 5140.66 samples/sec Loss 2.3839 LearningRate 0.0264 Epoch: 9 Global Step: 162140 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:08:47,076-Speed 5167.39 samples/sec Loss 2.4196 LearningRate 0.0264 Epoch: 9 Global Step: 162150 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:08:49,049-Speed 5191.24 samples/sec Loss 2.4211 LearningRate 0.0264 Epoch: 9 Global Step: 162160 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-11 10:08:51,038-Speed 5150.73 samples/sec Loss 2.4207 LearningRate 0.0264 Epoch: 9 Global Step: 162170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:08:53,045-Speed 5103.31 samples/sec Loss 2.4277 LearningRate 0.0264 Epoch: 9 Global Step: 162180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:08:55,035-Speed 5148.66 samples/sec Loss 2.3952 LearningRate 0.0264 Epoch: 9 Global Step: 162190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:08:57,017-Speed 5168.57 samples/sec Loss 2.3631 LearningRate 0.0264 Epoch: 9 Global Step: 162200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:08:58,992-Speed 5186.52 samples/sec Loss 2.4367 LearningRate 0.0264 Epoch: 9 Global Step: 162210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:09:00,998-Speed 5106.51 samples/sec Loss 2.4601 LearningRate 0.0264 Epoch: 9 Global Step: 162220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:09:02,971-Speed 5191.06 samples/sec Loss 2.4541 LearningRate 0.0264 Epoch: 9 Global Step: 162230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:09:04,966-Speed 5135.21 samples/sec Loss 2.3789 LearningRate 0.0264 Epoch: 9 Global Step: 162240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:09:06,937-Speed 5196.53 samples/sec Loss 2.4558 LearningRate 0.0264 Epoch: 9 Global Step: 162250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:09:08,912-Speed 5186.84 samples/sec Loss 2.4528 LearningRate 0.0264 Epoch: 9 Global Step: 162260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:09:10,887-Speed 5186.07 samples/sec Loss 2.4037 LearningRate 0.0264 Epoch: 9 Global Step: 162270 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:09:12,858-Speed 5195.91 samples/sec Loss 2.4242 LearningRate 0.0264 Epoch: 9 Global Step: 162280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:09:14,846-Speed 5153.71 samples/sec Loss 2.4537 LearningRate 0.0264 Epoch: 9 Global Step: 162290 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:09:16,829-Speed 5165.90 samples/sec Loss 2.3562 LearningRate 0.0264 Epoch: 9 Global Step: 162300 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:09:18,841-Speed 5091.58 samples/sec Loss 2.3416 LearningRate 0.0264 Epoch: 9 Global Step: 162310 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:09:20,808-Speed 5207.44 samples/sec Loss 2.4204 LearningRate 0.0264 Epoch: 9 Global Step: 162320 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:09:22,784-Speed 5182.82 samples/sec Loss 2.4820 LearningRate 0.0264 Epoch: 9 Global Step: 162330 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:09:24,758-Speed 5190.84 samples/sec Loss 2.4093 LearningRate 0.0264 Epoch: 9 Global Step: 162340 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:09:26,727-Speed 5202.14 samples/sec Loss 2.3706 LearningRate 0.0264 Epoch: 9 Global Step: 162350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:09:28,708-Speed 5169.33 samples/sec Loss 2.3893 LearningRate 0.0264 Epoch: 9 Global Step: 162360 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:09:30,661-Speed 5245.69 samples/sec Loss 2.4134 LearningRate 0.0264 Epoch: 9 Global Step: 162370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:09:32,646-Speed 5159.83 samples/sec Loss 2.3782 LearningRate 0.0264 Epoch: 9 Global Step: 162380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:09:34,634-Speed 5152.48 samples/sec Loss 2.4134 LearningRate 0.0264 Epoch: 9 Global Step: 162390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:09:36,619-Speed 5161.22 samples/sec Loss 2.3948 LearningRate 0.0264 Epoch: 9 Global Step: 162400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:09:38,585-Speed 5210.76 samples/sec Loss 2.4078 LearningRate 0.0264 Epoch: 9 Global Step: 162410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:09:40,556-Speed 5197.98 samples/sec Loss 2.4312 LearningRate 0.0264 Epoch: 9 Global Step: 162420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:09:42,520-Speed 5213.24 samples/sec Loss 2.4434 LearningRate 0.0264 Epoch: 9 Global Step: 162430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:09:44,486-Speed 5211.63 samples/sec Loss 2.4258 LearningRate 0.0264 Epoch: 9 Global Step: 162440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:09:46,455-Speed 5202.22 samples/sec Loss 2.3972 LearningRate 0.0264 Epoch: 9 Global Step: 162450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:09:48,451-Speed 5131.21 samples/sec Loss 2.4735 LearningRate 0.0264 Epoch: 9 Global Step: 162460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:09:50,427-Speed 5185.52 samples/sec Loss 2.4255 LearningRate 0.0263 Epoch: 9 Global Step: 162470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:09:52,398-Speed 5195.41 samples/sec Loss 2.4522 LearningRate 0.0263 Epoch: 9 Global Step: 162480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-11 10:09:54,374-Speed 5183.53 samples/sec Loss 2.4354 LearningRate 0.0263 Epoch: 9 Global Step: 162490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:09:56,357-Speed 5166.25 samples/sec Loss 2.4575 LearningRate 0.0263 Epoch: 9 Global Step: 162500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:09:58,348-Speed 5145.61 samples/sec Loss 2.4248 LearningRate 0.0263 Epoch: 9 Global Step: 162510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:10:00,356-Speed 5100.56 samples/sec Loss 2.4181 LearningRate 0.0263 Epoch: 9 Global Step: 162520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:10:02,321-Speed 5215.16 samples/sec Loss 2.3833 LearningRate 0.0263 Epoch: 9 Global Step: 162530 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:10:04,296-Speed 5185.32 samples/sec Loss 2.4568 LearningRate 0.0263 Epoch: 9 Global Step: 162540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:10:06,262-Speed 5209.28 samples/sec Loss 2.4043 LearningRate 0.0263 Epoch: 9 Global Step: 162550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:10:08,225-Speed 5218.54 samples/sec Loss 2.4087 LearningRate 0.0263 Epoch: 9 Global Step: 162560 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:10:10,190-Speed 5213.69 samples/sec Loss 2.4313 LearningRate 0.0263 Epoch: 9 Global Step: 162570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-11 10:10:12,154-Speed 5214.18 samples/sec Loss 2.3860 LearningRate 0.0263 Epoch: 9 Global Step: 162580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:10:14,120-Speed 5211.30 samples/sec Loss 2.4197 LearningRate 0.0263 Epoch: 9 Global Step: 162590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:10:16,089-Speed 5202.86 samples/sec Loss 2.3710 LearningRate 0.0263 Epoch: 9 Global Step: 162600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:10:18,071-Speed 5168.67 samples/sec Loss 2.3606 LearningRate 0.0263 Epoch: 9 Global Step: 162610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:10:20,035-Speed 5214.30 samples/sec Loss 2.4330 LearningRate 0.0263 Epoch: 9 Global Step: 162620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:10:21,999-Speed 5217.37 samples/sec Loss 2.4351 LearningRate 0.0263 Epoch: 9 Global Step: 162630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:10:23,969-Speed 5200.44 samples/sec Loss 2.4396 LearningRate 0.0263 Epoch: 9 Global Step: 162640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:10:25,933-Speed 5213.76 samples/sec Loss 2.4345 LearningRate 0.0263 Epoch: 9 Global Step: 162650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:10:27,898-Speed 5212.16 samples/sec Loss 2.4781 LearningRate 0.0263 Epoch: 9 Global Step: 162660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:10:29,865-Speed 5209.14 samples/sec Loss 2.4067 LearningRate 0.0263 Epoch: 9 Global Step: 162670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:10:31,828-Speed 5217.95 samples/sec Loss 2.4967 LearningRate 0.0263 Epoch: 9 Global Step: 162680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:10:33,806-Speed 5178.53 samples/sec Loss 2.4796 LearningRate 0.0263 Epoch: 9 Global Step: 162690 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:10:35,792-Speed 5157.63 samples/sec Loss 2.4072 LearningRate 0.0263 Epoch: 9 Global Step: 162700 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:10:37,770-Speed 5179.55 samples/sec Loss 2.3953 LearningRate 0.0263 Epoch: 9 Global Step: 162710 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:10:39,744-Speed 5187.95 samples/sec Loss 2.4803 LearningRate 0.0263 Epoch: 9 Global Step: 162720 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:10:41,725-Speed 5171.50 samples/sec Loss 2.4599 LearningRate 0.0263 Epoch: 9 Global Step: 162730 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:10:43,692-Speed 5207.71 samples/sec Loss 2.4752 LearningRate 0.0263 Epoch: 9 Global Step: 162740 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:10:45,673-Speed 5170.59 samples/sec Loss 2.3796 LearningRate 0.0263 Epoch: 9 Global Step: 162750 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:10:47,657-Speed 5162.21 samples/sec Loss 2.3667 LearningRate 0.0263 Epoch: 9 Global Step: 162760 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:10:49,624-Speed 5208.79 samples/sec Loss 2.3943 LearningRate 0.0263 Epoch: 9 Global Step: 162770 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:10:51,600-Speed 5183.72 samples/sec Loss 2.3808 LearningRate 0.0263 Epoch: 9 Global Step: 162780 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:10:53,575-Speed 5186.91 samples/sec Loss 2.3580 LearningRate 0.0262 Epoch: 9 Global Step: 162790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:10:55,550-Speed 5186.14 samples/sec Loss 2.4316 LearningRate 0.0262 Epoch: 9 Global Step: 162800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:10:57,529-Speed 5176.51 samples/sec Loss 2.3941 LearningRate 0.0262 Epoch: 9 Global Step: 162810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:10:59,488-Speed 5228.53 samples/sec Loss 2.4475 LearningRate 0.0262 Epoch: 9 Global Step: 162820 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:11:01,476-Speed 5153.38 samples/sec Loss 2.4395 LearningRate 0.0262 Epoch: 9 Global Step: 162830 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:11:03,451-Speed 5186.02 samples/sec Loss 2.4951 LearningRate 0.0262 Epoch: 9 Global Step: 162840 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:11:05,425-Speed 5188.76 samples/sec Loss 2.4496 LearningRate 0.0262 Epoch: 9 Global Step: 162850 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:11:07,393-Speed 5205.01 samples/sec Loss 2.4160 LearningRate 0.0262 Epoch: 9 Global Step: 162860 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:11:09,387-Speed 5137.30 samples/sec Loss 2.4036 LearningRate 0.0262 Epoch: 9 Global Step: 162870 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:11:11,363-Speed 5182.95 samples/sec Loss 2.5261 LearningRate 0.0262 Epoch: 9 Global Step: 162880 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:11:13,337-Speed 5191.03 samples/sec Loss 2.4142 LearningRate 0.0262 Epoch: 9 Global Step: 162890 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:11:15,321-Speed 5163.51 samples/sec Loss 2.4256 LearningRate 0.0262 Epoch: 9 Global Step: 162900 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:11:17,300-Speed 5175.38 samples/sec Loss 2.3459 LearningRate 0.0262 Epoch: 9 Global Step: 162910 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:11:19,269-Speed 5203.34 samples/sec Loss 2.4593 LearningRate 0.0262 Epoch: 9 Global Step: 162920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:11:21,233-Speed 5215.86 samples/sec Loss 2.4066 LearningRate 0.0262 Epoch: 9 Global Step: 162930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:11:23,202-Speed 5203.02 samples/sec Loss 2.4627 LearningRate 0.0262 Epoch: 9 Global Step: 162940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:11:25,171-Speed 5202.11 samples/sec Loss 2.4078 LearningRate 0.0262 Epoch: 9 Global Step: 162950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:11:27,144-Speed 5191.36 samples/sec Loss 2.4560 LearningRate 0.0262 Epoch: 9 Global Step: 162960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:11:29,108-Speed 5214.70 samples/sec Loss 2.3601 LearningRate 0.0262 Epoch: 9 Global Step: 162970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:11:31,075-Speed 5208.93 samples/sec Loss 2.4258 LearningRate 0.0262 Epoch: 9 Global Step: 162980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:11:33,047-Speed 5192.97 samples/sec Loss 2.4467 LearningRate 0.0262 Epoch: 9 Global Step: 162990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:11:35,019-Speed 5195.56 samples/sec Loss 2.4198 LearningRate 0.0262 Epoch: 9 Global Step: 163000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:11:37,005-Speed 5155.97 samples/sec Loss 2.4874 LearningRate 0.0262 Epoch: 9 Global Step: 163010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:11:38,977-Speed 5194.45 samples/sec Loss 2.4208 LearningRate 0.0262 Epoch: 9 Global Step: 163020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:11:40,946-Speed 5203.42 samples/sec Loss 2.3626 LearningRate 0.0262 Epoch: 9 Global Step: 163030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:11:42,913-Speed 5207.01 samples/sec Loss 2.4011 LearningRate 0.0262 Epoch: 9 Global Step: 163040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:11:44,880-Speed 5208.20 samples/sec Loss 2.4284 LearningRate 0.0262 Epoch: 9 Global Step: 163050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:11:46,866-Speed 5159.14 samples/sec Loss 2.4020 LearningRate 0.0262 Epoch: 9 Global Step: 163060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:11:48,840-Speed 5188.57 samples/sec Loss 2.4287 LearningRate 0.0262 Epoch: 9 Global Step: 163070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:11:50,810-Speed 5198.20 samples/sec Loss 2.4085 LearningRate 0.0262 Epoch: 9 Global Step: 163080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:11:52,780-Speed 5199.24 samples/sec Loss 2.4083 LearningRate 0.0262 Epoch: 9 Global Step: 163090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:11:54,749-Speed 5203.18 samples/sec Loss 2.4208 LearningRate 0.0262 Epoch: 9 Global Step: 163100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:11:56,716-Speed 5206.58 samples/sec Loss 2.3997 LearningRate 0.0262 Epoch: 9 Global Step: 163110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:11:58,706-Speed 5147.95 samples/sec Loss 2.4379 LearningRate 0.0261 Epoch: 9 Global Step: 163120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:12:00,674-Speed 5205.30 samples/sec Loss 2.4358 LearningRate 0.0261 Epoch: 9 Global Step: 163130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:12:02,639-Speed 5213.26 samples/sec Loss 2.4767 LearningRate 0.0261 Epoch: 9 Global Step: 163140 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:12:04,606-Speed 5207.87 samples/sec Loss 2.4065 LearningRate 0.0261 Epoch: 9 Global Step: 163150 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:12:06,573-Speed 5208.41 samples/sec Loss 2.3456 LearningRate 0.0261 Epoch: 9 Global Step: 163160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:12:08,539-Speed 5208.65 samples/sec Loss 2.4333 LearningRate 0.0261 Epoch: 9 Global Step: 163170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:12:10,514-Speed 5187.45 samples/sec Loss 2.3888 LearningRate 0.0261 Epoch: 9 Global Step: 163180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:12:12,492-Speed 5178.21 samples/sec Loss 2.3943 LearningRate 0.0261 Epoch: 9 Global Step: 163190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:12:14,466-Speed 5189.12 samples/sec Loss 2.4665 LearningRate 0.0261 Epoch: 9 Global Step: 163200 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:12:16,439-Speed 5191.74 samples/sec Loss 2.4789 LearningRate 0.0261 Epoch: 9 Global Step: 163210 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:12:18,414-Speed 5186.23 samples/sec Loss 2.5115 LearningRate 0.0261 Epoch: 9 Global Step: 163220 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:12:20,392-Speed 5181.16 samples/sec Loss 2.4542 LearningRate 0.0261 Epoch: 9 Global Step: 163230 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:12:22,362-Speed 5198.47 samples/sec Loss 2.3839 LearningRate 0.0261 Epoch: 9 Global Step: 163240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:12:24,332-Speed 5199.27 samples/sec Loss 2.4052 LearningRate 0.0261 Epoch: 9 Global Step: 163250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:12:26,299-Speed 5209.15 samples/sec Loss 2.3706 LearningRate 0.0261 Epoch: 9 Global Step: 163260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:12:28,268-Speed 5201.82 samples/sec Loss 2.3800 LearningRate 0.0261 Epoch: 9 Global Step: 163270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:12:30,240-Speed 5195.05 samples/sec Loss 2.4388 LearningRate 0.0261 Epoch: 9 Global Step: 163280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:12:32,210-Speed 5198.46 samples/sec Loss 2.3950 LearningRate 0.0261 Epoch: 9 Global Step: 163290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:12:34,187-Speed 5182.59 samples/sec Loss 2.4561 LearningRate 0.0261 Epoch: 9 Global Step: 163300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:12:36,152-Speed 5210.45 samples/sec Loss 2.4400 LearningRate 0.0261 Epoch: 9 Global Step: 163310 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:12:38,122-Speed 5201.68 samples/sec Loss 2.4721 LearningRate 0.0261 Epoch: 9 Global Step: 163320 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:12:40,090-Speed 5204.69 samples/sec Loss 2.4177 LearningRate 0.0261 Epoch: 9 Global Step: 163330 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:12:42,061-Speed 5197.89 samples/sec Loss 2.4807 LearningRate 0.0261 Epoch: 9 Global Step: 163340 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:12:44,068-Speed 5103.51 samples/sec Loss 2.4094 LearningRate 0.0261 Epoch: 9 Global Step: 163350 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:12:46,044-Speed 5182.61 samples/sec Loss 2.3979 LearningRate 0.0261 Epoch: 9 Global Step: 163360 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:12:48,016-Speed 5194.95 samples/sec Loss 2.3649 LearningRate 0.0261 Epoch: 9 Global Step: 163370 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:12:49,984-Speed 5204.94 samples/sec Loss 2.3848 LearningRate 0.0261 Epoch: 9 Global Step: 163380 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:12:51,958-Speed 5190.10 samples/sec Loss 2.4129 LearningRate 0.0261 Epoch: 9 Global Step: 163390 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:12:53,938-Speed 5173.77 samples/sec Loss 2.4251 LearningRate 0.0261 Epoch: 9 Global Step: 163400 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:12:55,905-Speed 5207.49 samples/sec Loss 2.3940 LearningRate 0.0261 Epoch: 9 Global Step: 163410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:12:57,884-Speed 5173.90 samples/sec Loss 2.3356 LearningRate 0.0261 Epoch: 9 Global Step: 163420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:12:59,878-Speed 5137.53 samples/sec Loss 2.4064 LearningRate 0.0261 Epoch: 9 Global Step: 163430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:13:01,864-Speed 5158.45 samples/sec Loss 2.3973 LearningRate 0.0261 Epoch: 9 Global Step: 163440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:13:03,848-Speed 5164.00 samples/sec Loss 2.4216 LearningRate 0.0260 Epoch: 9 Global Step: 163450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:13:05,829-Speed 5170.45 samples/sec Loss 2.4200 LearningRate 0.0260 Epoch: 9 Global Step: 163460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:13:07,800-Speed 5197.41 samples/sec Loss 2.3366 LearningRate 0.0260 Epoch: 9 Global Step: 163470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:13:09,768-Speed 5205.97 samples/sec Loss 2.4580 LearningRate 0.0260 Epoch: 9 Global Step: 163480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:13:11,744-Speed 5181.54 samples/sec Loss 2.4474 LearningRate 0.0260 Epoch: 9 Global Step: 163490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:13:13,711-Speed 5207.35 samples/sec Loss 2.4056 LearningRate 0.0260 Epoch: 9 Global Step: 163500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:13:15,690-Speed 5178.76 samples/sec Loss 2.4236 LearningRate 0.0260 Epoch: 9 Global Step: 163510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:13:17,673-Speed 5164.62 samples/sec Loss 2.4506 LearningRate 0.0260 Epoch: 9 Global Step: 163520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:13:19,650-Speed 5181.70 samples/sec Loss 2.4595 LearningRate 0.0260 Epoch: 9 Global Step: 163530 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:13:21,634-Speed 5162.27 samples/sec Loss 2.3754 LearningRate 0.0260 Epoch: 9 Global Step: 163540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:13:23,618-Speed 5163.95 samples/sec Loss 2.4665 LearningRate 0.0260 Epoch: 9 Global Step: 163550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:13:25,588-Speed 5200.40 samples/sec Loss 2.4037 LearningRate 0.0260 Epoch: 9 Global Step: 163560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:13:27,569-Speed 5169.48 samples/sec Loss 2.3843 LearningRate 0.0260 Epoch: 9 Global Step: 163570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:13:29,542-Speed 5192.96 samples/sec Loss 2.3341 LearningRate 0.0260 Epoch: 9 Global Step: 163580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:13:31,512-Speed 5199.72 samples/sec Loss 2.3952 LearningRate 0.0260 Epoch: 9 Global Step: 163590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:13:33,483-Speed 5195.94 samples/sec Loss 2.4284 LearningRate 0.0260 Epoch: 9 Global Step: 163600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:13:35,466-Speed 5164.97 samples/sec Loss 2.4403 LearningRate 0.0260 Epoch: 9 Global Step: 163610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:13:37,436-Speed 5199.55 samples/sec Loss 2.4458 LearningRate 0.0260 Epoch: 9 Global Step: 163620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:13:39,420-Speed 5162.85 samples/sec Loss 2.3815 LearningRate 0.0260 Epoch: 9 Global Step: 163630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:13:41,389-Speed 5203.44 samples/sec Loss 2.3684 LearningRate 0.0260 Epoch: 9 Global Step: 163640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:13:43,353-Speed 5214.32 samples/sec Loss 2.3281 LearningRate 0.0260 Epoch: 9 Global Step: 163650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:13:45,327-Speed 5191.92 samples/sec Loss 2.4510 LearningRate 0.0260 Epoch: 9 Global Step: 163660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:13:47,301-Speed 5187.72 samples/sec Loss 2.4897 LearningRate 0.0260 Epoch: 9 Global Step: 163670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:13:49,290-Speed 5149.87 samples/sec Loss 2.3925 LearningRate 0.0260 Epoch: 9 Global Step: 163680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:13:51,260-Speed 5199.37 samples/sec Loss 2.3749 LearningRate 0.0260 Epoch: 9 Global Step: 163690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:13:53,233-Speed 5191.32 samples/sec Loss 2.4460 LearningRate 0.0260 Epoch: 9 Global Step: 163700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:13:55,205-Speed 5195.81 samples/sec Loss 2.3711 LearningRate 0.0260 Epoch: 9 Global Step: 163710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:13:57,176-Speed 5197.10 samples/sec Loss 2.3081 LearningRate 0.0260 Epoch: 9 Global Step: 163720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:13:59,151-Speed 5185.68 samples/sec Loss 2.4183 LearningRate 0.0260 Epoch: 9 Global Step: 163730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:14:01,129-Speed 5180.10 samples/sec Loss 2.3983 LearningRate 0.0260 Epoch: 9 Global Step: 163740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:14:03,111-Speed 5166.22 samples/sec Loss 2.3683 LearningRate 0.0260 Epoch: 9 Global Step: 163750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:14:05,095-Speed 5164.10 samples/sec Loss 2.4067 LearningRate 0.0260 Epoch: 9 Global Step: 163760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:14:07,069-Speed 5189.31 samples/sec Loss 2.3950 LearningRate 0.0259 Epoch: 9 Global Step: 163770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:14:09,058-Speed 5150.16 samples/sec Loss 2.3746 LearningRate 0.0259 Epoch: 9 Global Step: 163780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:14:11,036-Speed 5178.72 samples/sec Loss 2.3849 LearningRate 0.0259 Epoch: 9 Global Step: 163790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:14:13,015-Speed 5175.10 samples/sec Loss 2.4462 LearningRate 0.0259 Epoch: 9 Global Step: 163800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:14:14,998-Speed 5165.88 samples/sec Loss 2.3898 LearningRate 0.0259 Epoch: 9 Global Step: 163810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:14:16,990-Speed 5142.82 samples/sec Loss 2.3429 LearningRate 0.0259 Epoch: 9 Global Step: 163820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:14:18,968-Speed 5178.42 samples/sec Loss 2.4653 LearningRate 0.0259 Epoch: 9 Global Step: 163830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:14:20,944-Speed 5183.38 samples/sec Loss 2.4250 LearningRate 0.0259 Epoch: 9 Global Step: 163840 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:14:22,921-Speed 5181.11 samples/sec Loss 2.3513 LearningRate 0.0259 Epoch: 9 Global Step: 163850 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:14:24,928-Speed 5104.98 samples/sec Loss 2.4778 LearningRate 0.0259 Epoch: 9 Global Step: 163860 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:14:26,899-Speed 5196.80 samples/sec Loss 2.3850 LearningRate 0.0259 Epoch: 9 Global Step: 163870 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:14:28,868-Speed 5203.74 samples/sec Loss 2.4152 LearningRate 0.0259 Epoch: 9 Global Step: 163880 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:14:30,836-Speed 5204.10 samples/sec Loss 2.4002 LearningRate 0.0259 Epoch: 9 Global Step: 163890 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:14:32,805-Speed 5202.78 samples/sec Loss 2.4627 LearningRate 0.0259 Epoch: 9 Global Step: 163900 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:14:34,791-Speed 5157.18 samples/sec Loss 2.3802 LearningRate 0.0259 Epoch: 9 Global Step: 163910 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:14:36,766-Speed 5186.72 samples/sec Loss 2.4514 LearningRate 0.0259 Epoch: 9 Global Step: 163920 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:14:38,744-Speed 5178.46 samples/sec Loss 2.3825 LearningRate 0.0259 Epoch: 9 Global Step: 163930 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:14:40,729-Speed 5160.42 samples/sec Loss 2.4234 LearningRate 0.0259 Epoch: 9 Global Step: 163940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:14:42,718-Speed 5150.58 samples/sec Loss 2.3583 LearningRate 0.0259 Epoch: 9 Global Step: 163950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:14:44,691-Speed 5191.89 samples/sec Loss 2.4111 LearningRate 0.0259 Epoch: 9 Global Step: 163960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:14:46,678-Speed 5154.24 samples/sec Loss 2.4571 LearningRate 0.0259 Epoch: 9 Global Step: 163970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:14:48,651-Speed 5192.13 samples/sec Loss 2.4499 LearningRate 0.0259 Epoch: 9 Global Step: 163980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:14:50,629-Speed 5181.05 samples/sec Loss 2.4863 LearningRate 0.0259 Epoch: 9 Global Step: 163990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:14:52,605-Speed 5181.83 samples/sec Loss 2.4183 LearningRate 0.0259 Epoch: 9 Global Step: 164000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:15:19,316-[lfw][164000]XNorm: 22.292755 Training: 2022-04-11 10:15:19,317-[lfw][164000]Accuracy-Flip: 0.99733+-0.00351 Training: 2022-04-11 10:15:19,317-[lfw][164000]Accuracy-Highest: 0.99833 Training: 2022-04-11 10:15:50,099-[cfp_fp][164000]XNorm: 21.098416 Training: 2022-04-11 10:15:50,100-[cfp_fp][164000]Accuracy-Flip: 0.98557+-0.00471 Training: 2022-04-11 10:15:50,100-[cfp_fp][164000]Accuracy-Highest: 0.98557 Training: 2022-04-11 10:16:16,634-[agedb_30][164000]XNorm: 22.442575 Training: 2022-04-11 10:16:16,635-[agedb_30][164000]Accuracy-Flip: 0.98050+-0.00654 Training: 2022-04-11 10:16:16,635-[agedb_30][164000]Accuracy-Highest: 0.98167 Training: 2022-04-11 10:16:18,605-Speed 119.07 samples/sec Loss 2.4918 LearningRate 0.0259 Epoch: 9 Global Step: 164010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:16:20,568-Speed 5219.17 samples/sec Loss 2.4428 LearningRate 0.0259 Epoch: 9 Global Step: 164020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:16:22,528-Speed 5225.99 samples/sec Loss 2.4472 LearningRate 0.0259 Epoch: 9 Global Step: 164030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:16:24,485-Speed 5235.15 samples/sec Loss 2.3811 LearningRate 0.0259 Epoch: 9 Global Step: 164040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:16:26,453-Speed 5204.58 samples/sec Loss 2.4328 LearningRate 0.0259 Epoch: 9 Global Step: 164050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:16:28,413-Speed 5226.23 samples/sec Loss 2.4073 LearningRate 0.0259 Epoch: 9 Global Step: 164060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:16:30,372-Speed 5228.31 samples/sec Loss 2.4031 LearningRate 0.0259 Epoch: 9 Global Step: 164070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:16:32,337-Speed 5212.44 samples/sec Loss 2.3516 LearningRate 0.0259 Epoch: 9 Global Step: 164080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:16:34,326-Speed 5151.08 samples/sec Loss 2.3984 LearningRate 0.0259 Epoch: 9 Global Step: 164090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:16:36,286-Speed 5224.83 samples/sec Loss 2.3924 LearningRate 0.0258 Epoch: 9 Global Step: 164100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:16:38,253-Speed 5209.65 samples/sec Loss 2.4153 LearningRate 0.0258 Epoch: 9 Global Step: 164110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:16:40,215-Speed 5219.18 samples/sec Loss 2.4020 LearningRate 0.0258 Epoch: 9 Global Step: 164120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:16:42,175-Speed 5227.35 samples/sec Loss 2.3582 LearningRate 0.0258 Epoch: 9 Global Step: 164130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:16:44,133-Speed 5230.41 samples/sec Loss 2.2973 LearningRate 0.0258 Epoch: 9 Global Step: 164140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:16:46,096-Speed 5218.41 samples/sec Loss 2.3971 LearningRate 0.0258 Epoch: 9 Global Step: 164150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:16:48,058-Speed 5220.80 samples/sec Loss 2.4305 LearningRate 0.0258 Epoch: 9 Global Step: 164160 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:16:50,042-Speed 5163.28 samples/sec Loss 2.5009 LearningRate 0.0258 Epoch: 9 Global Step: 164170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:16:52,015-Speed 5191.72 samples/sec Loss 2.3363 LearningRate 0.0258 Epoch: 9 Global Step: 164180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:16:53,976-Speed 5222.95 samples/sec Loss 2.4477 LearningRate 0.0258 Epoch: 9 Global Step: 164190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:16:55,944-Speed 5204.86 samples/sec Loss 2.3848 LearningRate 0.0258 Epoch: 9 Global Step: 164200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:16:57,915-Speed 5199.06 samples/sec Loss 2.4763 LearningRate 0.0258 Epoch: 9 Global Step: 164210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:16:59,879-Speed 5215.19 samples/sec Loss 2.3930 LearningRate 0.0258 Epoch: 9 Global Step: 164220 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:17:01,842-Speed 5218.69 samples/sec Loss 2.3810 LearningRate 0.0258 Epoch: 9 Global Step: 164230 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:17:03,826-Speed 5163.34 samples/sec Loss 2.4218 LearningRate 0.0258 Epoch: 9 Global Step: 164240 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:17:05,803-Speed 5180.58 samples/sec Loss 2.3808 LearningRate 0.0258 Epoch: 9 Global Step: 164250 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:17:07,765-Speed 5220.39 samples/sec Loss 2.4460 LearningRate 0.0258 Epoch: 9 Global Step: 164260 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:17:09,728-Speed 5219.25 samples/sec Loss 2.4043 LearningRate 0.0258 Epoch: 9 Global Step: 164270 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:17:11,690-Speed 5220.90 samples/sec Loss 2.3773 LearningRate 0.0258 Epoch: 9 Global Step: 164280 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:17:13,659-Speed 5201.67 samples/sec Loss 2.4336 LearningRate 0.0258 Epoch: 9 Global Step: 164290 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:17:15,636-Speed 5179.76 samples/sec Loss 2.3958 LearningRate 0.0258 Epoch: 9 Global Step: 164300 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:17:17,605-Speed 5203.84 samples/sec Loss 2.4510 LearningRate 0.0258 Epoch: 9 Global Step: 164310 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:17:19,582-Speed 5180.38 samples/sec Loss 2.3778 LearningRate 0.0258 Epoch: 9 Global Step: 164320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:17:21,552-Speed 5201.81 samples/sec Loss 2.4663 LearningRate 0.0258 Epoch: 9 Global Step: 164330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:17:23,523-Speed 5196.79 samples/sec Loss 2.4397 LearningRate 0.0258 Epoch: 9 Global Step: 164340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:17:25,490-Speed 5206.64 samples/sec Loss 2.4348 LearningRate 0.0258 Epoch: 9 Global Step: 164350 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:17:27,458-Speed 5206.47 samples/sec Loss 2.4466 LearningRate 0.0258 Epoch: 9 Global Step: 164360 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:17:29,457-Speed 5123.79 samples/sec Loss 2.4039 LearningRate 0.0258 Epoch: 9 Global Step: 164370 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:17:31,418-Speed 5223.48 samples/sec Loss 2.4203 LearningRate 0.0258 Epoch: 9 Global Step: 164380 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:17:33,380-Speed 5221.71 samples/sec Loss 2.4107 LearningRate 0.0258 Epoch: 9 Global Step: 164390 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:17:35,346-Speed 5208.74 samples/sec Loss 2.5166 LearningRate 0.0258 Epoch: 9 Global Step: 164400 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:17:37,306-Speed 5225.56 samples/sec Loss 2.5040 LearningRate 0.0258 Epoch: 9 Global Step: 164410 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:17:39,296-Speed 5147.36 samples/sec Loss 2.4095 LearningRate 0.0258 Epoch: 9 Global Step: 164420 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:17:41,301-Speed 5110.23 samples/sec Loss 2.3935 LearningRate 0.0257 Epoch: 9 Global Step: 164430 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:17:43,272-Speed 5196.27 samples/sec Loss 2.4674 LearningRate 0.0257 Epoch: 9 Global Step: 164440 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:17:45,232-Speed 5226.21 samples/sec Loss 2.4632 LearningRate 0.0257 Epoch: 9 Global Step: 164450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:17:47,212-Speed 5174.15 samples/sec Loss 2.4205 LearningRate 0.0257 Epoch: 9 Global Step: 164460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:17:49,197-Speed 5160.05 samples/sec Loss 2.3963 LearningRate 0.0257 Epoch: 9 Global Step: 164470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:17:51,158-Speed 5223.39 samples/sec Loss 2.3929 LearningRate 0.0257 Epoch: 9 Global Step: 164480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:17:53,120-Speed 5219.22 samples/sec Loss 2.4261 LearningRate 0.0257 Epoch: 9 Global Step: 164490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:17:55,086-Speed 5211.19 samples/sec Loss 2.4308 LearningRate 0.0257 Epoch: 9 Global Step: 164500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:17:57,071-Speed 5160.30 samples/sec Loss 2.3974 LearningRate 0.0257 Epoch: 9 Global Step: 164510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:17:59,055-Speed 5162.75 samples/sec Loss 2.3775 LearningRate 0.0257 Epoch: 9 Global Step: 164520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:18:01,039-Speed 5163.16 samples/sec Loss 2.4070 LearningRate 0.0257 Epoch: 9 Global Step: 164530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:18:03,003-Speed 5218.22 samples/sec Loss 2.4080 LearningRate 0.0257 Epoch: 9 Global Step: 164540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:18:04,962-Speed 5228.39 samples/sec Loss 2.3957 LearningRate 0.0257 Epoch: 9 Global Step: 164550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:18:06,921-Speed 5227.90 samples/sec Loss 2.4567 LearningRate 0.0257 Epoch: 9 Global Step: 164560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:18:08,877-Speed 5235.90 samples/sec Loss 2.3306 LearningRate 0.0257 Epoch: 9 Global Step: 164570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:18:10,855-Speed 5180.14 samples/sec Loss 2.4570 LearningRate 0.0257 Epoch: 9 Global Step: 164580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:18:12,822-Speed 5206.08 samples/sec Loss 2.4221 LearningRate 0.0257 Epoch: 9 Global Step: 164590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:18:14,788-Speed 5211.19 samples/sec Loss 2.4240 LearningRate 0.0257 Epoch: 9 Global Step: 164600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:18:16,748-Speed 5226.71 samples/sec Loss 2.3715 LearningRate 0.0257 Epoch: 9 Global Step: 164610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:18:18,722-Speed 5189.15 samples/sec Loss 2.4015 LearningRate 0.0257 Epoch: 9 Global Step: 164620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:18:20,704-Speed 5168.09 samples/sec Loss 2.3354 LearningRate 0.0257 Epoch: 9 Global Step: 164630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:18:22,690-Speed 5157.14 samples/sec Loss 2.4442 LearningRate 0.0257 Epoch: 9 Global Step: 164640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:18:24,660-Speed 5202.54 samples/sec Loss 2.3800 LearningRate 0.0257 Epoch: 9 Global Step: 164650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:18:26,630-Speed 5198.34 samples/sec Loss 2.3933 LearningRate 0.0257 Epoch: 9 Global Step: 164660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:18:28,603-Speed 5192.29 samples/sec Loss 2.4753 LearningRate 0.0257 Epoch: 9 Global Step: 164670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:18:30,574-Speed 5194.92 samples/sec Loss 2.3915 LearningRate 0.0257 Epoch: 9 Global Step: 164680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:18:32,532-Speed 5234.05 samples/sec Loss 2.3802 LearningRate 0.0257 Epoch: 9 Global Step: 164690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:18:34,496-Speed 5214.22 samples/sec Loss 2.3805 LearningRate 0.0257 Epoch: 9 Global Step: 164700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:18:36,466-Speed 5198.41 samples/sec Loss 2.4098 LearningRate 0.0257 Epoch: 9 Global Step: 164710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:18:38,427-Speed 5223.57 samples/sec Loss 2.3866 LearningRate 0.0257 Epoch: 9 Global Step: 164720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:18:40,401-Speed 5190.86 samples/sec Loss 2.4000 LearningRate 0.0257 Epoch: 9 Global Step: 164730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:18:42,364-Speed 5216.61 samples/sec Loss 2.3759 LearningRate 0.0257 Epoch: 9 Global Step: 164740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:18:44,340-Speed 5184.75 samples/sec Loss 2.4501 LearningRate 0.0257 Epoch: 9 Global Step: 164750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:18:46,318-Speed 5179.21 samples/sec Loss 2.4122 LearningRate 0.0256 Epoch: 9 Global Step: 164760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:18:48,288-Speed 5201.18 samples/sec Loss 2.4434 LearningRate 0.0256 Epoch: 9 Global Step: 164770 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:18:50,257-Speed 5200.43 samples/sec Loss 2.4266 LearningRate 0.0256 Epoch: 9 Global Step: 164780 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:18:52,229-Speed 5194.29 samples/sec Loss 2.3916 LearningRate 0.0256 Epoch: 9 Global Step: 164790 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:18:54,202-Speed 5196.07 samples/sec Loss 2.4436 LearningRate 0.0256 Epoch: 9 Global Step: 164800 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:18:56,924-Speed 3761.65 samples/sec Loss 2.3688 LearningRate 0.0256 Epoch: 9 Global Step: 164810 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:18:58,887-Speed 5217.82 samples/sec Loss 2.4282 LearningRate 0.0256 Epoch: 9 Global Step: 164820 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:19:00,852-Speed 5213.34 samples/sec Loss 2.3778 LearningRate 0.0256 Epoch: 9 Global Step: 164830 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:19:02,835-Speed 5165.68 samples/sec Loss 2.4099 LearningRate 0.0256 Epoch: 9 Global Step: 164840 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:19:04,801-Speed 5210.97 samples/sec Loss 2.4316 LearningRate 0.0256 Epoch: 9 Global Step: 164850 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:19:06,778-Speed 5180.75 samples/sec Loss 2.4149 LearningRate 0.0256 Epoch: 9 Global Step: 164860 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:19:08,743-Speed 5212.03 samples/sec Loss 2.4687 LearningRate 0.0256 Epoch: 9 Global Step: 164870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:19:10,721-Speed 5179.72 samples/sec Loss 2.4002 LearningRate 0.0256 Epoch: 9 Global Step: 164880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:19:12,688-Speed 5207.82 samples/sec Loss 2.3151 LearningRate 0.0256 Epoch: 9 Global Step: 164890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:19:14,665-Speed 5183.17 samples/sec Loss 2.4541 LearningRate 0.0256 Epoch: 9 Global Step: 164900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:19:16,643-Speed 5176.72 samples/sec Loss 2.3562 LearningRate 0.0256 Epoch: 9 Global Step: 164910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:19:18,612-Speed 5203.69 samples/sec Loss 2.3685 LearningRate 0.0256 Epoch: 9 Global Step: 164920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:19:20,584-Speed 5192.44 samples/sec Loss 2.3535 LearningRate 0.0256 Epoch: 9 Global Step: 164930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:19:22,548-Speed 5215.25 samples/sec Loss 2.4278 LearningRate 0.0256 Epoch: 9 Global Step: 164940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:19:24,516-Speed 5206.80 samples/sec Loss 2.3885 LearningRate 0.0256 Epoch: 9 Global Step: 164950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:19:26,495-Speed 5174.66 samples/sec Loss 2.3409 LearningRate 0.0256 Epoch: 9 Global Step: 164960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:19:28,449-Speed 5243.03 samples/sec Loss 2.3451 LearningRate 0.0256 Epoch: 9 Global Step: 164970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:19:30,406-Speed 5234.25 samples/sec Loss 2.4047 LearningRate 0.0256 Epoch: 9 Global Step: 164980 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:19:32,370-Speed 5215.96 samples/sec Loss 2.3673 LearningRate 0.0256 Epoch: 9 Global Step: 164990 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:19:34,338-Speed 5206.40 samples/sec Loss 2.3990 LearningRate 0.0256 Epoch: 9 Global Step: 165000 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:19:36,304-Speed 5209.90 samples/sec Loss 2.4454 LearningRate 0.0256 Epoch: 9 Global Step: 165010 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:19:38,266-Speed 5218.86 samples/sec Loss 2.3749 LearningRate 0.0256 Epoch: 9 Global Step: 165020 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:19:40,232-Speed 5210.33 samples/sec Loss 2.3731 LearningRate 0.0256 Epoch: 9 Global Step: 165030 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:19:42,196-Speed 5217.13 samples/sec Loss 2.4353 LearningRate 0.0256 Epoch: 9 Global Step: 165040 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:19:44,159-Speed 5216.60 samples/sec Loss 2.3737 LearningRate 0.0256 Epoch: 9 Global Step: 165050 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:19:46,136-Speed 5182.32 samples/sec Loss 2.4491 LearningRate 0.0256 Epoch: 9 Global Step: 165060 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:19:48,096-Speed 5225.52 samples/sec Loss 2.4331 LearningRate 0.0256 Epoch: 9 Global Step: 165070 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:19:50,084-Speed 5152.80 samples/sec Loss 2.4239 LearningRate 0.0256 Epoch: 9 Global Step: 165080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:19:52,080-Speed 5131.65 samples/sec Loss 2.4323 LearningRate 0.0255 Epoch: 9 Global Step: 165090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:19:54,047-Speed 5208.01 samples/sec Loss 2.4533 LearningRate 0.0255 Epoch: 9 Global Step: 165100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:19:56,010-Speed 5220.44 samples/sec Loss 2.4198 LearningRate 0.0255 Epoch: 9 Global Step: 165110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:19:57,972-Speed 5219.70 samples/sec Loss 2.3685 LearningRate 0.0255 Epoch: 9 Global Step: 165120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:19:59,939-Speed 5207.70 samples/sec Loss 2.4309 LearningRate 0.0255 Epoch: 9 Global Step: 165130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:20:01,902-Speed 5219.31 samples/sec Loss 2.3860 LearningRate 0.0255 Epoch: 9 Global Step: 165140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:20:03,896-Speed 5136.05 samples/sec Loss 2.4101 LearningRate 0.0255 Epoch: 9 Global Step: 165150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:20:05,857-Speed 5224.34 samples/sec Loss 2.3615 LearningRate 0.0255 Epoch: 9 Global Step: 165160 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:20:07,830-Speed 5191.35 samples/sec Loss 2.4291 LearningRate 0.0255 Epoch: 9 Global Step: 165170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:20:09,806-Speed 5184.12 samples/sec Loss 2.4378 LearningRate 0.0255 Epoch: 9 Global Step: 165180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:20:11,787-Speed 5169.97 samples/sec Loss 2.3843 LearningRate 0.0255 Epoch: 9 Global Step: 165190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:20:13,758-Speed 5196.89 samples/sec Loss 2.4106 LearningRate 0.0255 Epoch: 9 Global Step: 165200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:20:15,731-Speed 5193.67 samples/sec Loss 2.4070 LearningRate 0.0255 Epoch: 9 Global Step: 165210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:20:17,708-Speed 5179.60 samples/sec Loss 2.4222 LearningRate 0.0255 Epoch: 9 Global Step: 165220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:20:19,672-Speed 5215.23 samples/sec Loss 2.4166 LearningRate 0.0255 Epoch: 9 Global Step: 165230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:20:21,634-Speed 5220.36 samples/sec Loss 2.3892 LearningRate 0.0255 Epoch: 9 Global Step: 165240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:20:23,605-Speed 5198.70 samples/sec Loss 2.4389 LearningRate 0.0255 Epoch: 9 Global Step: 165250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:20:25,567-Speed 5220.54 samples/sec Loss 2.3865 LearningRate 0.0255 Epoch: 9 Global Step: 165260 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:20:27,530-Speed 5217.53 samples/sec Loss 2.4182 LearningRate 0.0255 Epoch: 9 Global Step: 165270 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:20:29,500-Speed 5200.49 samples/sec Loss 2.3999 LearningRate 0.0255 Epoch: 9 Global Step: 165280 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:20:31,473-Speed 5191.55 samples/sec Loss 2.4086 LearningRate 0.0255 Epoch: 9 Global Step: 165290 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:20:33,453-Speed 5173.05 samples/sec Loss 2.4058 LearningRate 0.0255 Epoch: 9 Global Step: 165300 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:20:35,450-Speed 5129.36 samples/sec Loss 2.4358 LearningRate 0.0255 Epoch: 9 Global Step: 165310 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:20:37,444-Speed 5138.13 samples/sec Loss 2.3834 LearningRate 0.0255 Epoch: 9 Global Step: 165320 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:20:39,433-Speed 5149.63 samples/sec Loss 2.4203 LearningRate 0.0255 Epoch: 9 Global Step: 165330 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:20:41,412-Speed 5176.95 samples/sec Loss 2.4262 LearningRate 0.0255 Epoch: 9 Global Step: 165340 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:20:43,376-Speed 5215.35 samples/sec Loss 2.4058 LearningRate 0.0255 Epoch: 9 Global Step: 165350 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:20:45,343-Speed 5206.22 samples/sec Loss 2.4269 LearningRate 0.0255 Epoch: 9 Global Step: 165360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:20:47,327-Speed 5164.63 samples/sec Loss 2.4608 LearningRate 0.0255 Epoch: 9 Global Step: 165370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:20:49,301-Speed 5186.54 samples/sec Loss 2.4425 LearningRate 0.0255 Epoch: 9 Global Step: 165380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:20:51,266-Speed 5213.93 samples/sec Loss 2.3612 LearningRate 0.0255 Epoch: 9 Global Step: 165390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:20:53,244-Speed 5178.84 samples/sec Loss 2.3184 LearningRate 0.0255 Epoch: 9 Global Step: 165400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:20:55,206-Speed 5220.97 samples/sec Loss 2.4122 LearningRate 0.0255 Epoch: 9 Global Step: 165410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:20:57,179-Speed 5191.45 samples/sec Loss 2.3662 LearningRate 0.0254 Epoch: 9 Global Step: 165420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:20:59,157-Speed 5179.51 samples/sec Loss 2.5041 LearningRate 0.0254 Epoch: 9 Global Step: 165430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:21:01,126-Speed 5202.28 samples/sec Loss 2.3080 LearningRate 0.0254 Epoch: 9 Global Step: 165440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:21:03,099-Speed 5192.71 samples/sec Loss 2.4257 LearningRate 0.0254 Epoch: 9 Global Step: 165450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:21:05,068-Speed 5201.46 samples/sec Loss 2.3452 LearningRate 0.0254 Epoch: 9 Global Step: 165460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:21:07,035-Speed 5206.54 samples/sec Loss 2.4132 LearningRate 0.0254 Epoch: 9 Global Step: 165470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:21:09,002-Speed 5208.95 samples/sec Loss 2.3835 LearningRate 0.0254 Epoch: 9 Global Step: 165480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:21:10,995-Speed 5138.23 samples/sec Loss 2.4015 LearningRate 0.0254 Epoch: 9 Global Step: 165490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:21:12,969-Speed 5190.09 samples/sec Loss 2.3711 LearningRate 0.0254 Epoch: 9 Global Step: 165500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:21:14,952-Speed 5165.78 samples/sec Loss 2.3159 LearningRate 0.0254 Epoch: 9 Global Step: 165510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:21:16,948-Speed 5133.29 samples/sec Loss 2.3385 LearningRate 0.0254 Epoch: 9 Global Step: 165520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:21:18,916-Speed 5204.13 samples/sec Loss 2.3630 LearningRate 0.0254 Epoch: 9 Global Step: 165530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:21:20,883-Speed 5207.58 samples/sec Loss 2.4619 LearningRate 0.0254 Epoch: 9 Global Step: 165540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:21:22,849-Speed 5210.42 samples/sec Loss 2.4325 LearningRate 0.0254 Epoch: 9 Global Step: 165550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:21:24,823-Speed 5190.05 samples/sec Loss 2.4060 LearningRate 0.0254 Epoch: 9 Global Step: 165560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:21:26,796-Speed 5191.20 samples/sec Loss 2.3779 LearningRate 0.0254 Epoch: 9 Global Step: 165570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:21:28,765-Speed 5203.40 samples/sec Loss 2.3402 LearningRate 0.0254 Epoch: 9 Global Step: 165580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:21:30,744-Speed 5176.06 samples/sec Loss 2.3219 LearningRate 0.0254 Epoch: 9 Global Step: 165590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:21:32,714-Speed 5197.99 samples/sec Loss 2.4061 LearningRate 0.0254 Epoch: 9 Global Step: 165600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:21:34,679-Speed 5214.48 samples/sec Loss 2.4459 LearningRate 0.0254 Epoch: 9 Global Step: 165610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:21:36,653-Speed 5188.64 samples/sec Loss 2.3900 LearningRate 0.0254 Epoch: 9 Global Step: 165620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:21:38,681-Speed 5050.48 samples/sec Loss 2.4560 LearningRate 0.0254 Epoch: 9 Global Step: 165630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:21:40,647-Speed 5209.72 samples/sec Loss 2.3736 LearningRate 0.0254 Epoch: 9 Global Step: 165640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:21:42,616-Speed 5202.70 samples/sec Loss 2.3888 LearningRate 0.0254 Epoch: 9 Global Step: 165650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:21:44,585-Speed 5204.15 samples/sec Loss 2.3326 LearningRate 0.0254 Epoch: 9 Global Step: 165660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:21:46,549-Speed 5215.83 samples/sec Loss 2.4187 LearningRate 0.0254 Epoch: 9 Global Step: 165670 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:21:48,527-Speed 5178.77 samples/sec Loss 2.3777 LearningRate 0.0254 Epoch: 9 Global Step: 165680 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:21:50,492-Speed 5210.44 samples/sec Loss 2.3965 LearningRate 0.0254 Epoch: 9 Global Step: 165690 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:21:52,456-Speed 5217.55 samples/sec Loss 2.3448 LearningRate 0.0254 Epoch: 9 Global Step: 165700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:21:54,423-Speed 5208.13 samples/sec Loss 2.4014 LearningRate 0.0254 Epoch: 9 Global Step: 165710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:21:56,409-Speed 5157.95 samples/sec Loss 2.3320 LearningRate 0.0254 Epoch: 9 Global Step: 165720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:21:58,381-Speed 5193.11 samples/sec Loss 2.4220 LearningRate 0.0254 Epoch: 9 Global Step: 165730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:22:00,353-Speed 5196.38 samples/sec Loss 2.3594 LearningRate 0.0254 Epoch: 9 Global Step: 165740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:22:02,339-Speed 5158.03 samples/sec Loss 2.3403 LearningRate 0.0253 Epoch: 9 Global Step: 165750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:22:04,314-Speed 5186.53 samples/sec Loss 2.3607 LearningRate 0.0253 Epoch: 9 Global Step: 165760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:22:06,280-Speed 5209.65 samples/sec Loss 2.3907 LearningRate 0.0253 Epoch: 9 Global Step: 165770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:22:08,246-Speed 5210.51 samples/sec Loss 2.3805 LearningRate 0.0253 Epoch: 9 Global Step: 165780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:22:10,214-Speed 5204.50 samples/sec Loss 2.4075 LearningRate 0.0253 Epoch: 9 Global Step: 165790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:22:12,188-Speed 5189.02 samples/sec Loss 2.3851 LearningRate 0.0253 Epoch: 9 Global Step: 165800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:22:14,182-Speed 5135.49 samples/sec Loss 2.4673 LearningRate 0.0253 Epoch: 9 Global Step: 165810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:22:16,153-Speed 5199.27 samples/sec Loss 2.3957 LearningRate 0.0253 Epoch: 9 Global Step: 165820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:22:18,117-Speed 5213.68 samples/sec Loss 2.4017 LearningRate 0.0253 Epoch: 9 Global Step: 165830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:22:20,078-Speed 5224.73 samples/sec Loss 2.4144 LearningRate 0.0253 Epoch: 9 Global Step: 165840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:22:22,042-Speed 5216.11 samples/sec Loss 2.3879 LearningRate 0.0253 Epoch: 9 Global Step: 165850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:22:24,020-Speed 5179.63 samples/sec Loss 2.4126 LearningRate 0.0253 Epoch: 9 Global Step: 165860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:22:26,002-Speed 5166.32 samples/sec Loss 2.4200 LearningRate 0.0253 Epoch: 9 Global Step: 165870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:22:27,971-Speed 5203.50 samples/sec Loss 2.3499 LearningRate 0.0253 Epoch: 9 Global Step: 165880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:22:29,937-Speed 5209.31 samples/sec Loss 2.3917 LearningRate 0.0253 Epoch: 9 Global Step: 165890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:22:31,907-Speed 5199.81 samples/sec Loss 2.3774 LearningRate 0.0253 Epoch: 9 Global Step: 165900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:22:33,888-Speed 5170.85 samples/sec Loss 2.3743 LearningRate 0.0253 Epoch: 9 Global Step: 165910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:22:35,864-Speed 5182.82 samples/sec Loss 2.3920 LearningRate 0.0253 Epoch: 9 Global Step: 165920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:22:37,866-Speed 5117.63 samples/sec Loss 2.4006 LearningRate 0.0253 Epoch: 9 Global Step: 165930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:22:39,856-Speed 5145.99 samples/sec Loss 2.4155 LearningRate 0.0253 Epoch: 9 Global Step: 165940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:22:41,837-Speed 5173.75 samples/sec Loss 2.4628 LearningRate 0.0253 Epoch: 9 Global Step: 165950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:22:43,806-Speed 5201.24 samples/sec Loss 2.4372 LearningRate 0.0253 Epoch: 9 Global Step: 165960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:22:45,780-Speed 5190.62 samples/sec Loss 2.2898 LearningRate 0.0253 Epoch: 9 Global Step: 165970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:22:47,766-Speed 5157.18 samples/sec Loss 2.4279 LearningRate 0.0253 Epoch: 9 Global Step: 165980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:22:49,732-Speed 5208.11 samples/sec Loss 2.3963 LearningRate 0.0253 Epoch: 9 Global Step: 165990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:22:51,715-Speed 5167.72 samples/sec Loss 2.4100 LearningRate 0.0253 Epoch: 9 Global Step: 166000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:23:18,338-[lfw][166000]XNorm: 23.007523 Training: 2022-04-11 10:23:18,339-[lfw][166000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 10:23:18,339-[lfw][166000]Accuracy-Highest: 0.99833 Training: 2022-04-11 10:23:49,126-[cfp_fp][166000]XNorm: 21.410519 Training: 2022-04-11 10:23:49,127-[cfp_fp][166000]Accuracy-Flip: 0.98329+-0.00438 Training: 2022-04-11 10:23:49,127-[cfp_fp][166000]Accuracy-Highest: 0.98557 Training: 2022-04-11 10:24:15,595-[agedb_30][166000]XNorm: 22.935029 Training: 2022-04-11 10:24:15,596-[agedb_30][166000]Accuracy-Flip: 0.98033+-0.00792 Training: 2022-04-11 10:24:15,596-[agedb_30][166000]Accuracy-Highest: 0.98167 Training: 2022-04-11 10:24:17,583-Speed 119.25 samples/sec Loss 2.4007 LearningRate 0.0253 Epoch: 9 Global Step: 166010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:24:19,549-Speed 5210.25 samples/sec Loss 2.3882 LearningRate 0.0253 Epoch: 9 Global Step: 166020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:24:21,526-Speed 5179.52 samples/sec Loss 2.3725 LearningRate 0.0253 Epoch: 9 Global Step: 166030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:24:23,522-Speed 5132.85 samples/sec Loss 2.3537 LearningRate 0.0253 Epoch: 9 Global Step: 166040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:24:25,498-Speed 5184.27 samples/sec Loss 2.4380 LearningRate 0.0253 Epoch: 9 Global Step: 166050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:24:27,473-Speed 5186.16 samples/sec Loss 2.3206 LearningRate 0.0253 Epoch: 9 Global Step: 166060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:24:29,449-Speed 5183.31 samples/sec Loss 2.3526 LearningRate 0.0253 Epoch: 9 Global Step: 166070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:24:31,415-Speed 5210.44 samples/sec Loss 2.3986 LearningRate 0.0252 Epoch: 9 Global Step: 166080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:24:33,390-Speed 5187.77 samples/sec Loss 2.3408 LearningRate 0.0252 Epoch: 9 Global Step: 166090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:24:35,347-Speed 5235.57 samples/sec Loss 2.4240 LearningRate 0.0252 Epoch: 9 Global Step: 166100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:24:37,314-Speed 5205.13 samples/sec Loss 2.4039 LearningRate 0.0252 Epoch: 9 Global Step: 166110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:24:39,293-Speed 5178.05 samples/sec Loss 2.4098 LearningRate 0.0252 Epoch: 9 Global Step: 166120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:24:41,264-Speed 5194.82 samples/sec Loss 2.3446 LearningRate 0.0252 Epoch: 9 Global Step: 166130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:24:43,240-Speed 5185.50 samples/sec Loss 2.4010 LearningRate 0.0252 Epoch: 9 Global Step: 166140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:24:45,216-Speed 5183.08 samples/sec Loss 2.3575 LearningRate 0.0252 Epoch: 9 Global Step: 166150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:24:47,210-Speed 5137.76 samples/sec Loss 2.4117 LearningRate 0.0252 Epoch: 9 Global Step: 166160 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:24:49,216-Speed 5105.33 samples/sec Loss 2.4636 LearningRate 0.0252 Epoch: 9 Global Step: 166170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:24:51,215-Speed 5125.68 samples/sec Loss 2.3940 LearningRate 0.0252 Epoch: 9 Global Step: 166180 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:24:53,188-Speed 5191.00 samples/sec Loss 2.4013 LearningRate 0.0252 Epoch: 9 Global Step: 166190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:24:55,161-Speed 5192.86 samples/sec Loss 2.3498 LearningRate 0.0252 Epoch: 9 Global Step: 166200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:24:57,132-Speed 5195.17 samples/sec Loss 2.4124 LearningRate 0.0252 Epoch: 9 Global Step: 166210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:24:59,112-Speed 5175.09 samples/sec Loss 2.3777 LearningRate 0.0252 Epoch: 9 Global Step: 166220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:25:01,088-Speed 5182.82 samples/sec Loss 2.4331 LearningRate 0.0252 Epoch: 9 Global Step: 166230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:25:03,085-Speed 5129.77 samples/sec Loss 2.3768 LearningRate 0.0252 Epoch: 9 Global Step: 166240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:25:05,074-Speed 5150.72 samples/sec Loss 2.3906 LearningRate 0.0252 Epoch: 9 Global Step: 166250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:25:07,061-Speed 5153.35 samples/sec Loss 2.3454 LearningRate 0.0252 Epoch: 9 Global Step: 166260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:25:09,049-Speed 5153.49 samples/sec Loss 2.4372 LearningRate 0.0252 Epoch: 9 Global Step: 166270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:25:11,043-Speed 5138.63 samples/sec Loss 2.4309 LearningRate 0.0252 Epoch: 9 Global Step: 166280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:25:13,032-Speed 5148.46 samples/sec Loss 2.4369 LearningRate 0.0252 Epoch: 9 Global Step: 166290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:25:15,022-Speed 5148.84 samples/sec Loss 2.3501 LearningRate 0.0252 Epoch: 9 Global Step: 166300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:25:16,996-Speed 5188.72 samples/sec Loss 2.3585 LearningRate 0.0252 Epoch: 9 Global Step: 166310 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:25:18,974-Speed 5179.43 samples/sec Loss 2.4185 LearningRate 0.0252 Epoch: 9 Global Step: 166320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:25:20,948-Speed 5187.55 samples/sec Loss 2.4457 LearningRate 0.0252 Epoch: 9 Global Step: 166330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:25:22,924-Speed 5184.44 samples/sec Loss 2.4012 LearningRate 0.0252 Epoch: 9 Global Step: 166340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:25:24,912-Speed 5153.75 samples/sec Loss 2.3900 LearningRate 0.0252 Epoch: 9 Global Step: 166350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:25:26,896-Speed 5161.49 samples/sec Loss 2.4298 LearningRate 0.0252 Epoch: 9 Global Step: 166360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:25:28,886-Speed 5149.29 samples/sec Loss 2.3495 LearningRate 0.0252 Epoch: 9 Global Step: 166370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:25:30,873-Speed 5155.34 samples/sec Loss 2.3105 LearningRate 0.0252 Epoch: 9 Global Step: 166380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:25:32,846-Speed 5191.50 samples/sec Loss 2.4155 LearningRate 0.0252 Epoch: 9 Global Step: 166390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:25:34,820-Speed 5189.65 samples/sec Loss 2.4438 LearningRate 0.0252 Epoch: 9 Global Step: 166400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:25:36,809-Speed 5149.33 samples/sec Loss 2.3247 LearningRate 0.0252 Epoch: 9 Global Step: 166410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:25:38,784-Speed 5187.75 samples/sec Loss 2.3656 LearningRate 0.0251 Epoch: 9 Global Step: 166420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:25:40,773-Speed 5148.59 samples/sec Loss 2.4375 LearningRate 0.0251 Epoch: 9 Global Step: 166430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:25:42,758-Speed 5161.17 samples/sec Loss 2.4458 LearningRate 0.0251 Epoch: 9 Global Step: 166440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:25:44,729-Speed 5195.93 samples/sec Loss 2.4365 LearningRate 0.0251 Epoch: 9 Global Step: 166450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:25:46,715-Speed 5157.28 samples/sec Loss 2.3500 LearningRate 0.0251 Epoch: 9 Global Step: 166460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:25:48,701-Speed 5159.49 samples/sec Loss 2.3684 LearningRate 0.0251 Epoch: 9 Global Step: 166470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:25:50,673-Speed 5193.40 samples/sec Loss 2.4582 LearningRate 0.0251 Epoch: 9 Global Step: 166480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:25:52,656-Speed 5167.05 samples/sec Loss 2.3923 LearningRate 0.0251 Epoch: 9 Global Step: 166490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:25:54,625-Speed 5202.79 samples/sec Loss 2.4137 LearningRate 0.0251 Epoch: 9 Global Step: 166500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:25:56,594-Speed 5202.93 samples/sec Loss 2.3682 LearningRate 0.0251 Epoch: 9 Global Step: 166510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:25:58,562-Speed 5202.68 samples/sec Loss 2.3751 LearningRate 0.0251 Epoch: 9 Global Step: 166520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:26:00,535-Speed 5192.08 samples/sec Loss 2.3987 LearningRate 0.0251 Epoch: 9 Global Step: 166530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:26:02,517-Speed 5168.46 samples/sec Loss 2.3784 LearningRate 0.0251 Epoch: 9 Global Step: 166540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:26:04,485-Speed 5206.11 samples/sec Loss 2.3651 LearningRate 0.0251 Epoch: 9 Global Step: 166550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:26:06,446-Speed 5222.71 samples/sec Loss 2.3536 LearningRate 0.0251 Epoch: 9 Global Step: 166560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:26:08,426-Speed 5172.76 samples/sec Loss 2.4303 LearningRate 0.0251 Epoch: 9 Global Step: 166570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:26:10,427-Speed 5120.71 samples/sec Loss 2.4367 LearningRate 0.0251 Epoch: 9 Global Step: 166580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:26:12,410-Speed 5163.35 samples/sec Loss 2.4152 LearningRate 0.0251 Epoch: 9 Global Step: 166590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:26:14,394-Speed 5163.05 samples/sec Loss 2.3313 LearningRate 0.0251 Epoch: 9 Global Step: 166600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:26:16,352-Speed 5234.89 samples/sec Loss 2.3648 LearningRate 0.0251 Epoch: 9 Global Step: 166610 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:26:18,337-Speed 5159.78 samples/sec Loss 2.3098 LearningRate 0.0251 Epoch: 9 Global Step: 166620 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:26:20,302-Speed 5212.45 samples/sec Loss 2.4127 LearningRate 0.0251 Epoch: 9 Global Step: 166630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:26:22,273-Speed 5196.40 samples/sec Loss 2.3843 LearningRate 0.0251 Epoch: 9 Global Step: 166640 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:26:24,238-Speed 5213.12 samples/sec Loss 2.3925 LearningRate 0.0251 Epoch: 9 Global Step: 166650 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:26:26,212-Speed 5189.69 samples/sec Loss 2.2870 LearningRate 0.0251 Epoch: 9 Global Step: 166660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:26:28,197-Speed 5160.64 samples/sec Loss 2.3657 LearningRate 0.0251 Epoch: 9 Global Step: 166670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:26:30,176-Speed 5174.05 samples/sec Loss 2.4031 LearningRate 0.0251 Epoch: 9 Global Step: 166680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:26:32,156-Speed 5174.49 samples/sec Loss 2.4219 LearningRate 0.0251 Epoch: 9 Global Step: 166690 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:26:34,128-Speed 5196.59 samples/sec Loss 2.3908 LearningRate 0.0251 Epoch: 9 Global Step: 166700 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:26:36,124-Speed 5130.67 samples/sec Loss 2.2839 LearningRate 0.0251 Epoch: 9 Global Step: 166710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:26:38,117-Speed 5138.79 samples/sec Loss 2.4214 LearningRate 0.0251 Epoch: 9 Global Step: 166720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:26:40,083-Speed 5211.43 samples/sec Loss 2.4293 LearningRate 0.0251 Epoch: 9 Global Step: 166730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:26:42,046-Speed 5216.57 samples/sec Loss 2.3673 LearningRate 0.0251 Epoch: 9 Global Step: 166740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:26:44,012-Speed 5209.63 samples/sec Loss 2.4169 LearningRate 0.0250 Epoch: 9 Global Step: 166750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:26:45,984-Speed 5196.30 samples/sec Loss 2.4141 LearningRate 0.0250 Epoch: 9 Global Step: 166760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:26:47,964-Speed 5174.19 samples/sec Loss 2.4220 LearningRate 0.0250 Epoch: 9 Global Step: 166770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:26:49,933-Speed 5201.74 samples/sec Loss 2.3358 LearningRate 0.0250 Epoch: 9 Global Step: 166780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:26:51,902-Speed 5203.40 samples/sec Loss 2.3835 LearningRate 0.0250 Epoch: 9 Global Step: 166790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:26:53,880-Speed 5178.45 samples/sec Loss 2.4196 LearningRate 0.0250 Epoch: 9 Global Step: 166800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:26:55,838-Speed 5230.08 samples/sec Loss 2.3146 LearningRate 0.0250 Epoch: 9 Global Step: 166810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:26:57,803-Speed 5212.72 samples/sec Loss 2.3298 LearningRate 0.0250 Epoch: 9 Global Step: 166820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:26:59,780-Speed 5181.02 samples/sec Loss 2.3308 LearningRate 0.0250 Epoch: 9 Global Step: 166830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:27:01,781-Speed 5120.73 samples/sec Loss 2.3952 LearningRate 0.0250 Epoch: 9 Global Step: 166840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:27:03,761-Speed 5173.38 samples/sec Loss 2.3573 LearningRate 0.0250 Epoch: 9 Global Step: 166850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:27:05,750-Speed 5150.41 samples/sec Loss 2.4446 LearningRate 0.0250 Epoch: 9 Global Step: 166860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:27:07,733-Speed 5166.07 samples/sec Loss 2.4206 LearningRate 0.0250 Epoch: 9 Global Step: 166870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:27:09,704-Speed 5197.08 samples/sec Loss 2.3457 LearningRate 0.0250 Epoch: 9 Global Step: 166880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:27:11,671-Speed 5206.04 samples/sec Loss 2.3500 LearningRate 0.0250 Epoch: 9 Global Step: 166890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:27:13,891-Speed 4614.99 samples/sec Loss 2.4004 LearningRate 0.0250 Epoch: 9 Global Step: 166900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:27:15,836-Speed 5265.62 samples/sec Loss 2.3496 LearningRate 0.0250 Epoch: 9 Global Step: 166910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:27:46,677-Speed 332.04 samples/sec Loss 1.8247 LearningRate 0.0250 Epoch: 10 Global Step: 166920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:27:48,731-Speed 4987.69 samples/sec Loss 1.7948 LearningRate 0.0250 Epoch: 10 Global Step: 166930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:27:50,730-Speed 5124.93 samples/sec Loss 1.8101 LearningRate 0.0250 Epoch: 10 Global Step: 166940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:27:52,692-Speed 5221.00 samples/sec Loss 1.8304 LearningRate 0.0250 Epoch: 10 Global Step: 166950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:27:55,133-Speed 4195.70 samples/sec Loss 1.7946 LearningRate 0.0250 Epoch: 10 Global Step: 166960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:27:57,104-Speed 5199.24 samples/sec Loss 1.8397 LearningRate 0.0250 Epoch: 10 Global Step: 166970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:27:59,080-Speed 5182.38 samples/sec Loss 1.8489 LearningRate 0.0250 Epoch: 10 Global Step: 166980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:28:01,094-Speed 5088.87 samples/sec Loss 1.8050 LearningRate 0.0250 Epoch: 10 Global Step: 166990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:28:03,083-Speed 5149.66 samples/sec Loss 1.7904 LearningRate 0.0250 Epoch: 10 Global Step: 167000 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:28:05,074-Speed 5143.39 samples/sec Loss 1.7568 LearningRate 0.0250 Epoch: 10 Global Step: 167010 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:28:07,049-Speed 5189.54 samples/sec Loss 1.8455 LearningRate 0.0250 Epoch: 10 Global Step: 167020 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:28:09,041-Speed 5141.50 samples/sec Loss 1.8297 LearningRate 0.0250 Epoch: 10 Global Step: 167030 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:28:11,027-Speed 5156.25 samples/sec Loss 1.8038 LearningRate 0.0250 Epoch: 10 Global Step: 167040 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:28:13,015-Speed 5154.72 samples/sec Loss 1.8320 LearningRate 0.0250 Epoch: 10 Global Step: 167050 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:28:14,995-Speed 5174.37 samples/sec Loss 1.7757 LearningRate 0.0250 Epoch: 10 Global Step: 167060 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:28:16,991-Speed 5131.03 samples/sec Loss 1.7863 LearningRate 0.0250 Epoch: 10 Global Step: 167070 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:28:18,955-Speed 5214.75 samples/sec Loss 1.8255 LearningRate 0.0249 Epoch: 10 Global Step: 167080 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:28:20,921-Speed 5211.29 samples/sec Loss 1.8374 LearningRate 0.0249 Epoch: 10 Global Step: 167090 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:28:22,893-Speed 5194.80 samples/sec Loss 1.8986 LearningRate 0.0249 Epoch: 10 Global Step: 167100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:28:24,864-Speed 5195.97 samples/sec Loss 1.8486 LearningRate 0.0249 Epoch: 10 Global Step: 167110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:28:26,874-Speed 5096.49 samples/sec Loss 1.8798 LearningRate 0.0249 Epoch: 10 Global Step: 167120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:28:28,865-Speed 5144.11 samples/sec Loss 1.8675 LearningRate 0.0249 Epoch: 10 Global Step: 167130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:28:30,849-Speed 5162.55 samples/sec Loss 1.8317 LearningRate 0.0249 Epoch: 10 Global Step: 167140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:28:32,819-Speed 5202.57 samples/sec Loss 1.7810 LearningRate 0.0249 Epoch: 10 Global Step: 167150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:28:34,798-Speed 5176.75 samples/sec Loss 1.8341 LearningRate 0.0249 Epoch: 10 Global Step: 167160 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:28:36,779-Speed 5170.47 samples/sec Loss 1.8277 LearningRate 0.0249 Epoch: 10 Global Step: 167170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:28:38,758-Speed 5176.28 samples/sec Loss 1.8260 LearningRate 0.0249 Epoch: 10 Global Step: 167180 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:28:40,760-Speed 5116.16 samples/sec Loss 1.7749 LearningRate 0.0249 Epoch: 10 Global Step: 167190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:28:42,730-Speed 5201.07 samples/sec Loss 1.8237 LearningRate 0.0249 Epoch: 10 Global Step: 167200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:28:44,727-Speed 5127.33 samples/sec Loss 1.8185 LearningRate 0.0249 Epoch: 10 Global Step: 167210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:28:46,718-Speed 5148.08 samples/sec Loss 1.8221 LearningRate 0.0249 Epoch: 10 Global Step: 167220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:28:48,701-Speed 5164.65 samples/sec Loss 1.8083 LearningRate 0.0249 Epoch: 10 Global Step: 167230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:28:50,815-Speed 4845.18 samples/sec Loss 1.7898 LearningRate 0.0249 Epoch: 10 Global Step: 167240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:28:52,796-Speed 5171.01 samples/sec Loss 1.8884 LearningRate 0.0249 Epoch: 10 Global Step: 167250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:28:54,776-Speed 5172.31 samples/sec Loss 1.8017 LearningRate 0.0249 Epoch: 10 Global Step: 167260 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:28:56,759-Speed 5167.07 samples/sec Loss 1.8625 LearningRate 0.0249 Epoch: 10 Global Step: 167270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:28:58,743-Speed 5163.53 samples/sec Loss 1.8304 LearningRate 0.0249 Epoch: 10 Global Step: 167280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:29:00,735-Speed 5142.18 samples/sec Loss 1.8415 LearningRate 0.0249 Epoch: 10 Global Step: 167290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:29:02,702-Speed 5206.72 samples/sec Loss 1.8418 LearningRate 0.0249 Epoch: 10 Global Step: 167300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:29:04,711-Speed 5100.31 samples/sec Loss 1.7988 LearningRate 0.0249 Epoch: 10 Global Step: 167310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:29:06,682-Speed 5196.27 samples/sec Loss 1.8608 LearningRate 0.0249 Epoch: 10 Global Step: 167320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:29:08,656-Speed 5190.02 samples/sec Loss 1.8380 LearningRate 0.0249 Epoch: 10 Global Step: 167330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:29:10,628-Speed 5195.02 samples/sec Loss 1.8366 LearningRate 0.0249 Epoch: 10 Global Step: 167340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:29:12,594-Speed 5210.70 samples/sec Loss 1.8309 LearningRate 0.0249 Epoch: 10 Global Step: 167350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:29:14,563-Speed 5201.03 samples/sec Loss 1.8264 LearningRate 0.0249 Epoch: 10 Global Step: 167360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:29:16,562-Speed 5126.04 samples/sec Loss 1.8406 LearningRate 0.0249 Epoch: 10 Global Step: 167370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:29:18,526-Speed 5215.36 samples/sec Loss 1.8467 LearningRate 0.0249 Epoch: 10 Global Step: 167380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:29:20,492-Speed 5209.40 samples/sec Loss 1.8307 LearningRate 0.0249 Epoch: 10 Global Step: 167390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:29:22,463-Speed 5196.25 samples/sec Loss 1.8559 LearningRate 0.0249 Epoch: 10 Global Step: 167400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:29:24,443-Speed 5174.72 samples/sec Loss 1.8225 LearningRate 0.0249 Epoch: 10 Global Step: 167410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:29:26,412-Speed 5203.96 samples/sec Loss 1.8106 LearningRate 0.0248 Epoch: 10 Global Step: 167420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:29:28,383-Speed 5195.70 samples/sec Loss 1.8884 LearningRate 0.0248 Epoch: 10 Global Step: 167430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:29:30,384-Speed 5119.37 samples/sec Loss 1.8514 LearningRate 0.0248 Epoch: 10 Global Step: 167440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:29:32,375-Speed 5146.51 samples/sec Loss 1.8847 LearningRate 0.0248 Epoch: 10 Global Step: 167450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:29:34,359-Speed 5162.32 samples/sec Loss 1.8755 LearningRate 0.0248 Epoch: 10 Global Step: 167460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:29:36,327-Speed 5205.09 samples/sec Loss 1.8319 LearningRate 0.0248 Epoch: 10 Global Step: 167470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:29:38,292-Speed 5213.74 samples/sec Loss 1.7936 LearningRate 0.0248 Epoch: 10 Global Step: 167480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:29:40,268-Speed 5183.98 samples/sec Loss 1.8064 LearningRate 0.0248 Epoch: 10 Global Step: 167490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:29:42,254-Speed 5157.99 samples/sec Loss 1.8199 LearningRate 0.0248 Epoch: 10 Global Step: 167500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:29:44,216-Speed 5222.52 samples/sec Loss 1.8920 LearningRate 0.0248 Epoch: 10 Global Step: 167510 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:29:46,207-Speed 5143.76 samples/sec Loss 1.9029 LearningRate 0.0248 Epoch: 10 Global Step: 167520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:29:48,742-Speed 4041.52 samples/sec Loss 1.8833 LearningRate 0.0248 Epoch: 10 Global Step: 167530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:29:50,731-Speed 5150.22 samples/sec Loss 1.9038 LearningRate 0.0248 Epoch: 10 Global Step: 167540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:29:52,703-Speed 5194.85 samples/sec Loss 1.8663 LearningRate 0.0248 Epoch: 10 Global Step: 167550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:29:54,667-Speed 5213.78 samples/sec Loss 1.8849 LearningRate 0.0248 Epoch: 10 Global Step: 167560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:29:56,649-Speed 5170.11 samples/sec Loss 1.8631 LearningRate 0.0248 Epoch: 10 Global Step: 167570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:29:58,628-Speed 5174.38 samples/sec Loss 1.8177 LearningRate 0.0248 Epoch: 10 Global Step: 167580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:30:00,592-Speed 5216.50 samples/sec Loss 1.8755 LearningRate 0.0248 Epoch: 10 Global Step: 167590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:30:02,576-Speed 5162.68 samples/sec Loss 1.8594 LearningRate 0.0248 Epoch: 10 Global Step: 167600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:30:04,554-Speed 5179.40 samples/sec Loss 1.8267 LearningRate 0.0248 Epoch: 10 Global Step: 167610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:30:06,520-Speed 5209.24 samples/sec Loss 1.9235 LearningRate 0.0248 Epoch: 10 Global Step: 167620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:30:08,490-Speed 5201.47 samples/sec Loss 1.8492 LearningRate 0.0248 Epoch: 10 Global Step: 167630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:30:10,477-Speed 5153.58 samples/sec Loss 1.8642 LearningRate 0.0248 Epoch: 10 Global Step: 167640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:30:12,455-Speed 5179.26 samples/sec Loss 1.8789 LearningRate 0.0248 Epoch: 10 Global Step: 167650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:30:14,422-Speed 5206.73 samples/sec Loss 1.8431 LearningRate 0.0248 Epoch: 10 Global Step: 167660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:30:16,384-Speed 5220.40 samples/sec Loss 1.9009 LearningRate 0.0248 Epoch: 10 Global Step: 167670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:30:18,380-Speed 5134.32 samples/sec Loss 1.8796 LearningRate 0.0248 Epoch: 10 Global Step: 167680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:30:20,345-Speed 5212.22 samples/sec Loss 1.8779 LearningRate 0.0248 Epoch: 10 Global Step: 167690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:30:22,318-Speed 5191.16 samples/sec Loss 1.8961 LearningRate 0.0248 Epoch: 10 Global Step: 167700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:30:24,316-Speed 5126.06 samples/sec Loss 1.9180 LearningRate 0.0248 Epoch: 10 Global Step: 167710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:30:26,300-Speed 5165.46 samples/sec Loss 1.8600 LearningRate 0.0248 Epoch: 10 Global Step: 167720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:30:28,320-Speed 5069.77 samples/sec Loss 1.8580 LearningRate 0.0248 Epoch: 10 Global Step: 167730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:30:30,286-Speed 5211.50 samples/sec Loss 1.8077 LearningRate 0.0248 Epoch: 10 Global Step: 167740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:30:32,252-Speed 5210.41 samples/sec Loss 1.8527 LearningRate 0.0247 Epoch: 10 Global Step: 167750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:30:34,220-Speed 5205.31 samples/sec Loss 1.8945 LearningRate 0.0247 Epoch: 10 Global Step: 167760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:30:36,194-Speed 5187.77 samples/sec Loss 1.9064 LearningRate 0.0247 Epoch: 10 Global Step: 167770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:30:38,170-Speed 5183.05 samples/sec Loss 1.8828 LearningRate 0.0247 Epoch: 10 Global Step: 167780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:30:40,144-Speed 5189.57 samples/sec Loss 1.8465 LearningRate 0.0247 Epoch: 10 Global Step: 167790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:30:42,123-Speed 5175.87 samples/sec Loss 1.8664 LearningRate 0.0247 Epoch: 10 Global Step: 167800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:30:44,087-Speed 5216.04 samples/sec Loss 1.8681 LearningRate 0.0247 Epoch: 10 Global Step: 167810 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:30:46,072-Speed 5162.34 samples/sec Loss 1.8508 LearningRate 0.0247 Epoch: 10 Global Step: 167820 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:30:48,052-Speed 5174.02 samples/sec Loss 1.9071 LearningRate 0.0247 Epoch: 10 Global Step: 167830 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:30:50,021-Speed 5201.10 samples/sec Loss 1.8771 LearningRate 0.0247 Epoch: 10 Global Step: 167840 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:30:51,995-Speed 5189.87 samples/sec Loss 1.9374 LearningRate 0.0247 Epoch: 10 Global Step: 167850 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:30:53,973-Speed 5178.80 samples/sec Loss 1.8897 LearningRate 0.0247 Epoch: 10 Global Step: 167860 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:30:55,972-Speed 5124.13 samples/sec Loss 1.8955 LearningRate 0.0247 Epoch: 10 Global Step: 167870 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:30:57,955-Speed 5164.42 samples/sec Loss 1.8727 LearningRate 0.0247 Epoch: 10 Global Step: 167880 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:30:59,932-Speed 5181.79 samples/sec Loss 1.9197 LearningRate 0.0247 Epoch: 10 Global Step: 167890 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:31:01,926-Speed 5136.95 samples/sec Loss 1.8630 LearningRate 0.0247 Epoch: 10 Global Step: 167900 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:31:03,898-Speed 5194.45 samples/sec Loss 1.9012 LearningRate 0.0247 Epoch: 10 Global Step: 167910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:31:05,889-Speed 5145.74 samples/sec Loss 1.9404 LearningRate 0.0247 Epoch: 10 Global Step: 167920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:31:07,856-Speed 5208.00 samples/sec Loss 1.8834 LearningRate 0.0247 Epoch: 10 Global Step: 167930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:31:09,835-Speed 5175.07 samples/sec Loss 1.8843 LearningRate 0.0247 Epoch: 10 Global Step: 167940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:31:11,808-Speed 5192.62 samples/sec Loss 1.9102 LearningRate 0.0247 Epoch: 10 Global Step: 167950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:31:13,795-Speed 5153.20 samples/sec Loss 1.8464 LearningRate 0.0247 Epoch: 10 Global Step: 167960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:31:15,784-Speed 5151.86 samples/sec Loss 1.8919 LearningRate 0.0247 Epoch: 10 Global Step: 167970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:31:17,768-Speed 5161.50 samples/sec Loss 1.9021 LearningRate 0.0247 Epoch: 10 Global Step: 167980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:31:19,763-Speed 5135.36 samples/sec Loss 1.8682 LearningRate 0.0247 Epoch: 10 Global Step: 167990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:31:21,766-Speed 5115.96 samples/sec Loss 1.8835 LearningRate 0.0247 Epoch: 10 Global Step: 168000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:31:48,386-[lfw][168000]XNorm: 23.877025 Training: 2022-04-11 10:31:48,386-[lfw][168000]Accuracy-Flip: 0.99800+-0.00296 Training: 2022-04-11 10:31:48,387-[lfw][168000]Accuracy-Highest: 0.99833 Training: 2022-04-11 10:32:19,146-[cfp_fp][168000]XNorm: 22.139400 Training: 2022-04-11 10:32:19,147-[cfp_fp][168000]Accuracy-Flip: 0.98514+-0.00554 Training: 2022-04-11 10:32:19,147-[cfp_fp][168000]Accuracy-Highest: 0.98557 Training: 2022-04-11 10:32:45,693-[agedb_30][168000]XNorm: 23.850678 Training: 2022-04-11 10:32:45,693-[agedb_30][168000]Accuracy-Flip: 0.97950+-0.00785 Training: 2022-04-11 10:32:45,694-[agedb_30][168000]Accuracy-Highest: 0.98167 Training: 2022-04-11 10:32:47,680-Speed 119.19 samples/sec Loss 1.8592 LearningRate 0.0247 Epoch: 10 Global Step: 168010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:32:49,653-Speed 5192.55 samples/sec Loss 1.8976 LearningRate 0.0247 Epoch: 10 Global Step: 168020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:32:51,623-Speed 5199.36 samples/sec Loss 1.8412 LearningRate 0.0247 Epoch: 10 Global Step: 168030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:32:53,585-Speed 5220.60 samples/sec Loss 1.9051 LearningRate 0.0247 Epoch: 10 Global Step: 168040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:32:55,548-Speed 5217.96 samples/sec Loss 1.9312 LearningRate 0.0247 Epoch: 10 Global Step: 168050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:32:57,520-Speed 5194.05 samples/sec Loss 1.8777 LearningRate 0.0247 Epoch: 10 Global Step: 168060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:32:59,491-Speed 5196.31 samples/sec Loss 1.8811 LearningRate 0.0247 Epoch: 10 Global Step: 168070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:33:01,454-Speed 5219.87 samples/sec Loss 1.8991 LearningRate 0.0247 Epoch: 10 Global Step: 168080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:33:03,417-Speed 5218.17 samples/sec Loss 1.8560 LearningRate 0.0246 Epoch: 10 Global Step: 168090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:33:05,389-Speed 5193.01 samples/sec Loss 1.8592 LearningRate 0.0246 Epoch: 10 Global Step: 168100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:33:07,351-Speed 5221.48 samples/sec Loss 1.8481 LearningRate 0.0246 Epoch: 10 Global Step: 168110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:33:09,324-Speed 5191.35 samples/sec Loss 1.8451 LearningRate 0.0246 Epoch: 10 Global Step: 168120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:33:11,314-Speed 5149.47 samples/sec Loss 1.9174 LearningRate 0.0246 Epoch: 10 Global Step: 168130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:33:13,279-Speed 5213.12 samples/sec Loss 1.8954 LearningRate 0.0246 Epoch: 10 Global Step: 168140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:33:15,245-Speed 5207.65 samples/sec Loss 1.8719 LearningRate 0.0246 Epoch: 10 Global Step: 168150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:33:17,261-Speed 5081.68 samples/sec Loss 1.8394 LearningRate 0.0246 Epoch: 10 Global Step: 168160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:33:19,229-Speed 5205.03 samples/sec Loss 1.8808 LearningRate 0.0246 Epoch: 10 Global Step: 168170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:33:21,202-Speed 5192.68 samples/sec Loss 1.8793 LearningRate 0.0246 Epoch: 10 Global Step: 168180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:33:23,177-Speed 5187.02 samples/sec Loss 1.9157 LearningRate 0.0246 Epoch: 10 Global Step: 168190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:33:25,154-Speed 5181.06 samples/sec Loss 1.9027 LearningRate 0.0246 Epoch: 10 Global Step: 168200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:33:27,141-Speed 5155.21 samples/sec Loss 1.8678 LearningRate 0.0246 Epoch: 10 Global Step: 168210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:33:29,155-Speed 5085.55 samples/sec Loss 1.9014 LearningRate 0.0246 Epoch: 10 Global Step: 168220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:33:31,127-Speed 5194.36 samples/sec Loss 1.9183 LearningRate 0.0246 Epoch: 10 Global Step: 168230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:33:33,095-Speed 5205.83 samples/sec Loss 1.9337 LearningRate 0.0246 Epoch: 10 Global Step: 168240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:33:35,063-Speed 5205.18 samples/sec Loss 1.9158 LearningRate 0.0246 Epoch: 10 Global Step: 168250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:33:37,037-Speed 5189.93 samples/sec Loss 1.9047 LearningRate 0.0246 Epoch: 10 Global Step: 168260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:33:39,010-Speed 5189.43 samples/sec Loss 1.8353 LearningRate 0.0246 Epoch: 10 Global Step: 168270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:33:40,978-Speed 5206.00 samples/sec Loss 1.9311 LearningRate 0.0246 Epoch: 10 Global Step: 168280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:33:42,940-Speed 5220.20 samples/sec Loss 1.8742 LearningRate 0.0246 Epoch: 10 Global Step: 168290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:33:44,912-Speed 5195.88 samples/sec Loss 1.9015 LearningRate 0.0246 Epoch: 10 Global Step: 168300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:33:46,884-Speed 5193.12 samples/sec Loss 1.8647 LearningRate 0.0246 Epoch: 10 Global Step: 168310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:33:48,858-Speed 5190.80 samples/sec Loss 1.8777 LearningRate 0.0246 Epoch: 10 Global Step: 168320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:33:50,835-Speed 5180.31 samples/sec Loss 1.8847 LearningRate 0.0246 Epoch: 10 Global Step: 168330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:33:52,829-Speed 5137.93 samples/sec Loss 1.9345 LearningRate 0.0246 Epoch: 10 Global Step: 168340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:33:54,810-Speed 5168.97 samples/sec Loss 1.9441 LearningRate 0.0246 Epoch: 10 Global Step: 168350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:33:56,783-Speed 5193.48 samples/sec Loss 1.8973 LearningRate 0.0246 Epoch: 10 Global Step: 168360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:33:58,777-Speed 5136.54 samples/sec Loss 1.9434 LearningRate 0.0246 Epoch: 10 Global Step: 168370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:34:00,752-Speed 5186.68 samples/sec Loss 1.8800 LearningRate 0.0246 Epoch: 10 Global Step: 168380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:34:02,741-Speed 5150.15 samples/sec Loss 1.9795 LearningRate 0.0246 Epoch: 10 Global Step: 168390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:34:04,715-Speed 5188.99 samples/sec Loss 1.8668 LearningRate 0.0246 Epoch: 10 Global Step: 168400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:34:06,694-Speed 5178.37 samples/sec Loss 1.8863 LearningRate 0.0246 Epoch: 10 Global Step: 168410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:34:08,663-Speed 5201.72 samples/sec Loss 1.8770 LearningRate 0.0245 Epoch: 10 Global Step: 168420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:34:10,647-Speed 5162.19 samples/sec Loss 1.9579 LearningRate 0.0245 Epoch: 10 Global Step: 168430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:34:12,636-Speed 5150.26 samples/sec Loss 1.9394 LearningRate 0.0245 Epoch: 10 Global Step: 168440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:34:14,600-Speed 5216.66 samples/sec Loss 1.9269 LearningRate 0.0245 Epoch: 10 Global Step: 168450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:34:16,569-Speed 5201.96 samples/sec Loss 1.9252 LearningRate 0.0245 Epoch: 10 Global Step: 168460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:34:18,543-Speed 5187.93 samples/sec Loss 1.9596 LearningRate 0.0245 Epoch: 10 Global Step: 168470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:34:20,509-Speed 5212.13 samples/sec Loss 1.9774 LearningRate 0.0245 Epoch: 10 Global Step: 168480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:34:22,488-Speed 5174.01 samples/sec Loss 1.9221 LearningRate 0.0245 Epoch: 10 Global Step: 168490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:34:24,478-Speed 5147.78 samples/sec Loss 1.9495 LearningRate 0.0245 Epoch: 10 Global Step: 168500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:34:26,446-Speed 5207.45 samples/sec Loss 1.9550 LearningRate 0.0245 Epoch: 10 Global Step: 168510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:34:28,417-Speed 5195.71 samples/sec Loss 1.9411 LearningRate 0.0245 Epoch: 10 Global Step: 168520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:34:30,384-Speed 5208.18 samples/sec Loss 1.9477 LearningRate 0.0245 Epoch: 10 Global Step: 168530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:34:32,353-Speed 5202.05 samples/sec Loss 1.9429 LearningRate 0.0245 Epoch: 10 Global Step: 168540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:34:34,336-Speed 5165.12 samples/sec Loss 1.9939 LearningRate 0.0245 Epoch: 10 Global Step: 168550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:34:36,304-Speed 5204.55 samples/sec Loss 1.9568 LearningRate 0.0245 Epoch: 10 Global Step: 168560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:34:38,287-Speed 5166.05 samples/sec Loss 1.9457 LearningRate 0.0245 Epoch: 10 Global Step: 168570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:34:40,264-Speed 5181.63 samples/sec Loss 1.8557 LearningRate 0.0245 Epoch: 10 Global Step: 168580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:34:42,229-Speed 5212.16 samples/sec Loss 1.8817 LearningRate 0.0245 Epoch: 10 Global Step: 168590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:34:44,197-Speed 5205.70 samples/sec Loss 1.9237 LearningRate 0.0245 Epoch: 10 Global Step: 168600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:34:46,187-Speed 5148.71 samples/sec Loss 1.9268 LearningRate 0.0245 Epoch: 10 Global Step: 168610 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:34:48,159-Speed 5194.18 samples/sec Loss 1.9609 LearningRate 0.0245 Epoch: 10 Global Step: 168620 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:34:50,125-Speed 5209.48 samples/sec Loss 1.9271 LearningRate 0.0245 Epoch: 10 Global Step: 168630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:34:52,100-Speed 5185.40 samples/sec Loss 1.9309 LearningRate 0.0245 Epoch: 10 Global Step: 168640 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:34:54,067-Speed 5208.58 samples/sec Loss 1.9639 LearningRate 0.0245 Epoch: 10 Global Step: 168650 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:34:56,042-Speed 5186.30 samples/sec Loss 1.9148 LearningRate 0.0245 Epoch: 10 Global Step: 168660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:34:58,019-Speed 5180.29 samples/sec Loss 1.9016 LearningRate 0.0245 Epoch: 10 Global Step: 168670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:35:00,012-Speed 5141.84 samples/sec Loss 1.9032 LearningRate 0.0245 Epoch: 10 Global Step: 168680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:35:02,013-Speed 5118.51 samples/sec Loss 1.9958 LearningRate 0.0245 Epoch: 10 Global Step: 168690 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:35:03,998-Speed 5162.26 samples/sec Loss 1.9349 LearningRate 0.0245 Epoch: 10 Global Step: 168700 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:35:05,968-Speed 5199.02 samples/sec Loss 1.9708 LearningRate 0.0245 Epoch: 10 Global Step: 168710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:35:07,933-Speed 5210.83 samples/sec Loss 1.9319 LearningRate 0.0245 Epoch: 10 Global Step: 168720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:35:09,913-Speed 5174.40 samples/sec Loss 1.9000 LearningRate 0.0245 Epoch: 10 Global Step: 168730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:35:11,880-Speed 5208.35 samples/sec Loss 1.9377 LearningRate 0.0245 Epoch: 10 Global Step: 168740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:35:13,857-Speed 5180.08 samples/sec Loss 1.9494 LearningRate 0.0245 Epoch: 10 Global Step: 168750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:35:15,844-Speed 5156.38 samples/sec Loss 2.0203 LearningRate 0.0244 Epoch: 10 Global Step: 168760 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:35:17,824-Speed 5171.68 samples/sec Loss 1.9533 LearningRate 0.0244 Epoch: 10 Global Step: 168770 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:35:19,789-Speed 5212.77 samples/sec Loss 1.9261 LearningRate 0.0244 Epoch: 10 Global Step: 168780 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:35:21,755-Speed 5211.14 samples/sec Loss 1.9221 LearningRate 0.0244 Epoch: 10 Global Step: 168790 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:35:23,724-Speed 5204.25 samples/sec Loss 1.9902 LearningRate 0.0244 Epoch: 10 Global Step: 168800 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:35:25,692-Speed 5204.92 samples/sec Loss 1.9378 LearningRate 0.0244 Epoch: 10 Global Step: 168810 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:35:27,664-Speed 5194.32 samples/sec Loss 1.9529 LearningRate 0.0244 Epoch: 10 Global Step: 168820 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:35:29,643-Speed 5175.22 samples/sec Loss 1.9299 LearningRate 0.0244 Epoch: 10 Global Step: 168830 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:35:31,609-Speed 5210.08 samples/sec Loss 1.9307 LearningRate 0.0244 Epoch: 10 Global Step: 168840 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:35:33,583-Speed 5188.83 samples/sec Loss 1.9257 LearningRate 0.0244 Epoch: 10 Global Step: 168850 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:35:35,557-Speed 5189.16 samples/sec Loss 1.9272 LearningRate 0.0244 Epoch: 10 Global Step: 168860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:35:37,542-Speed 5159.53 samples/sec Loss 1.9147 LearningRate 0.0244 Epoch: 10 Global Step: 168870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:35:39,515-Speed 5194.33 samples/sec Loss 1.9788 LearningRate 0.0244 Epoch: 10 Global Step: 168880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:35:41,500-Speed 5159.68 samples/sec Loss 1.9652 LearningRate 0.0244 Epoch: 10 Global Step: 168890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:35:43,468-Speed 5204.44 samples/sec Loss 1.9246 LearningRate 0.0244 Epoch: 10 Global Step: 168900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:35:45,441-Speed 5191.29 samples/sec Loss 1.8961 LearningRate 0.0244 Epoch: 10 Global Step: 168910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:35:47,418-Speed 5183.15 samples/sec Loss 1.9806 LearningRate 0.0244 Epoch: 10 Global Step: 168920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:35:49,385-Speed 5206.09 samples/sec Loss 1.9716 LearningRate 0.0244 Epoch: 10 Global Step: 168930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:35:51,353-Speed 5204.72 samples/sec Loss 1.9923 LearningRate 0.0244 Epoch: 10 Global Step: 168940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:35:53,317-Speed 5216.20 samples/sec Loss 1.9686 LearningRate 0.0244 Epoch: 10 Global Step: 168950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:35:55,283-Speed 5209.82 samples/sec Loss 1.9348 LearningRate 0.0244 Epoch: 10 Global Step: 168960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:35:57,250-Speed 5208.47 samples/sec Loss 1.9119 LearningRate 0.0244 Epoch: 10 Global Step: 168970 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:35:59,218-Speed 5206.91 samples/sec Loss 1.9735 LearningRate 0.0244 Epoch: 10 Global Step: 168980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:36:01,196-Speed 5176.30 samples/sec Loss 1.9647 LearningRate 0.0244 Epoch: 10 Global Step: 168990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:36:03,175-Speed 5176.62 samples/sec Loss 1.9643 LearningRate 0.0244 Epoch: 10 Global Step: 169000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:36:05,145-Speed 5201.70 samples/sec Loss 1.9727 LearningRate 0.0244 Epoch: 10 Global Step: 169010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:36:07,113-Speed 5203.09 samples/sec Loss 1.9048 LearningRate 0.0244 Epoch: 10 Global Step: 169020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:36:09,089-Speed 5184.48 samples/sec Loss 1.9913 LearningRate 0.0244 Epoch: 10 Global Step: 169030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:36:11,076-Speed 5154.18 samples/sec Loss 1.9243 LearningRate 0.0244 Epoch: 10 Global Step: 169040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:36:13,060-Speed 5163.77 samples/sec Loss 1.9470 LearningRate 0.0244 Epoch: 10 Global Step: 169050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:36:15,026-Speed 5211.20 samples/sec Loss 1.9320 LearningRate 0.0244 Epoch: 10 Global Step: 169060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:36:16,996-Speed 5199.51 samples/sec Loss 1.9919 LearningRate 0.0244 Epoch: 10 Global Step: 169070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:36:18,990-Speed 5137.76 samples/sec Loss 1.9869 LearningRate 0.0244 Epoch: 10 Global Step: 169080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:36:20,958-Speed 5203.16 samples/sec Loss 1.9634 LearningRate 0.0244 Epoch: 10 Global Step: 169090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:36:22,944-Speed 5157.63 samples/sec Loss 2.0003 LearningRate 0.0243 Epoch: 10 Global Step: 169100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:36:24,924-Speed 5174.04 samples/sec Loss 2.0124 LearningRate 0.0243 Epoch: 10 Global Step: 169110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:36:26,901-Speed 5182.76 samples/sec Loss 2.0443 LearningRate 0.0243 Epoch: 10 Global Step: 169120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:36:28,873-Speed 5192.30 samples/sec Loss 1.9936 LearningRate 0.0243 Epoch: 10 Global Step: 169130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:36:30,847-Speed 5191.41 samples/sec Loss 1.9554 LearningRate 0.0243 Epoch: 10 Global Step: 169140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:36:32,819-Speed 5194.76 samples/sec Loss 2.0358 LearningRate 0.0243 Epoch: 10 Global Step: 169150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:36:34,800-Speed 5169.56 samples/sec Loss 1.9800 LearningRate 0.0243 Epoch: 10 Global Step: 169160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:36:36,787-Speed 5154.75 samples/sec Loss 1.9199 LearningRate 0.0243 Epoch: 10 Global Step: 169170 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:36:38,771-Speed 5163.63 samples/sec Loss 1.9907 LearningRate 0.0243 Epoch: 10 Global Step: 169180 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:36:40,774-Speed 5113.06 samples/sec Loss 1.9904 LearningRate 0.0243 Epoch: 10 Global Step: 169190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:36:42,753-Speed 5176.01 samples/sec Loss 1.9623 LearningRate 0.0243 Epoch: 10 Global Step: 169200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:36:44,726-Speed 5191.22 samples/sec Loss 1.9678 LearningRate 0.0243 Epoch: 10 Global Step: 169210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:36:46,711-Speed 5160.83 samples/sec Loss 1.9573 LearningRate 0.0243 Epoch: 10 Global Step: 169220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:36:48,684-Speed 5191.18 samples/sec Loss 1.9432 LearningRate 0.0243 Epoch: 10 Global Step: 169230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:36:50,667-Speed 5166.64 samples/sec Loss 2.0004 LearningRate 0.0243 Epoch: 10 Global Step: 169240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:36:52,665-Speed 5128.23 samples/sec Loss 2.0192 LearningRate 0.0243 Epoch: 10 Global Step: 169250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:36:54,632-Speed 5206.83 samples/sec Loss 1.9438 LearningRate 0.0243 Epoch: 10 Global Step: 169260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:36:56,601-Speed 5202.69 samples/sec Loss 1.9572 LearningRate 0.0243 Epoch: 10 Global Step: 169270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:36:58,573-Speed 5194.85 samples/sec Loss 1.9595 LearningRate 0.0243 Epoch: 10 Global Step: 169280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:37:00,538-Speed 5212.69 samples/sec Loss 1.9909 LearningRate 0.0243 Epoch: 10 Global Step: 169290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:37:02,527-Speed 5148.88 samples/sec Loss 1.9342 LearningRate 0.0243 Epoch: 10 Global Step: 169300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:37:04,505-Speed 5179.87 samples/sec Loss 1.9541 LearningRate 0.0243 Epoch: 10 Global Step: 169310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:37:06,478-Speed 5192.64 samples/sec Loss 1.9473 LearningRate 0.0243 Epoch: 10 Global Step: 169320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:37:08,464-Speed 5157.09 samples/sec Loss 2.0064 LearningRate 0.0243 Epoch: 10 Global Step: 169330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:37:10,444-Speed 5172.91 samples/sec Loss 1.9753 LearningRate 0.0243 Epoch: 10 Global Step: 169340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:37:12,413-Speed 5202.94 samples/sec Loss 2.0574 LearningRate 0.0243 Epoch: 10 Global Step: 169350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:37:14,379-Speed 5209.53 samples/sec Loss 2.0044 LearningRate 0.0243 Epoch: 10 Global Step: 169360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:37:16,371-Speed 5142.81 samples/sec Loss 2.0139 LearningRate 0.0243 Epoch: 10 Global Step: 169370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:37:18,353-Speed 5169.98 samples/sec Loss 2.0216 LearningRate 0.0243 Epoch: 10 Global Step: 169380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:37:20,323-Speed 5198.09 samples/sec Loss 1.9905 LearningRate 0.0243 Epoch: 10 Global Step: 169390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:37:22,298-Speed 5186.95 samples/sec Loss 1.9952 LearningRate 0.0243 Epoch: 10 Global Step: 169400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:37:24,291-Speed 5139.65 samples/sec Loss 1.9508 LearningRate 0.0243 Epoch: 10 Global Step: 169410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:37:26,263-Speed 5193.87 samples/sec Loss 1.9763 LearningRate 0.0243 Epoch: 10 Global Step: 169420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:37:28,236-Speed 5191.47 samples/sec Loss 1.9950 LearningRate 0.0243 Epoch: 10 Global Step: 169430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:37:30,206-Speed 5200.98 samples/sec Loss 1.9872 LearningRate 0.0242 Epoch: 10 Global Step: 169440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:37:32,211-Speed 5109.56 samples/sec Loss 1.9677 LearningRate 0.0242 Epoch: 10 Global Step: 169450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:37:34,180-Speed 5202.25 samples/sec Loss 1.9469 LearningRate 0.0242 Epoch: 10 Global Step: 169460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:37:36,156-Speed 5184.02 samples/sec Loss 1.9981 LearningRate 0.0242 Epoch: 10 Global Step: 169470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:37:38,136-Speed 5173.26 samples/sec Loss 1.9383 LearningRate 0.0242 Epoch: 10 Global Step: 169480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:37:40,105-Speed 5202.21 samples/sec Loss 1.9662 LearningRate 0.0242 Epoch: 10 Global Step: 169490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:37:42,093-Speed 5152.74 samples/sec Loss 2.0516 LearningRate 0.0242 Epoch: 10 Global Step: 169500 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:37:44,060-Speed 5205.86 samples/sec Loss 1.9444 LearningRate 0.0242 Epoch: 10 Global Step: 169510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:37:46,041-Speed 5170.98 samples/sec Loss 1.9839 LearningRate 0.0242 Epoch: 10 Global Step: 169520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:37:48,024-Speed 5166.91 samples/sec Loss 1.9725 LearningRate 0.0242 Epoch: 10 Global Step: 169530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:37:50,000-Speed 5182.90 samples/sec Loss 1.9972 LearningRate 0.0242 Epoch: 10 Global Step: 169540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:37:51,982-Speed 5169.87 samples/sec Loss 1.9718 LearningRate 0.0242 Epoch: 10 Global Step: 169550 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:37:53,969-Speed 5155.00 samples/sec Loss 1.9531 LearningRate 0.0242 Epoch: 10 Global Step: 169560 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:37:55,952-Speed 5165.90 samples/sec Loss 1.9233 LearningRate 0.0242 Epoch: 10 Global Step: 169570 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:37:57,944-Speed 5142.55 samples/sec Loss 1.9819 LearningRate 0.0242 Epoch: 10 Global Step: 169580 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:37:59,927-Speed 5166.96 samples/sec Loss 1.9851 LearningRate 0.0242 Epoch: 10 Global Step: 169590 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:38:01,907-Speed 5171.17 samples/sec Loss 2.0254 LearningRate 0.0242 Epoch: 10 Global Step: 169600 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:38:03,910-Speed 5115.89 samples/sec Loss 2.0193 LearningRate 0.0242 Epoch: 10 Global Step: 169610 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:38:05,887-Speed 5181.46 samples/sec Loss 2.0291 LearningRate 0.0242 Epoch: 10 Global Step: 169620 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:38:07,860-Speed 5192.67 samples/sec Loss 1.9725 LearningRate 0.0242 Epoch: 10 Global Step: 169630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:38:13,736-Speed 1743.10 samples/sec Loss 1.9565 LearningRate 0.0242 Epoch: 10 Global Step: 169640 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:38:15,736-Speed 5120.59 samples/sec Loss 1.9939 LearningRate 0.0242 Epoch: 10 Global Step: 169650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:38:17,717-Speed 5175.60 samples/sec Loss 1.9987 LearningRate 0.0242 Epoch: 10 Global Step: 169660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:38:19,680-Speed 5218.04 samples/sec Loss 1.9804 LearningRate 0.0242 Epoch: 10 Global Step: 169670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:38:21,646-Speed 5208.60 samples/sec Loss 1.9737 LearningRate 0.0242 Epoch: 10 Global Step: 169680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:38:23,621-Speed 5187.71 samples/sec Loss 2.0458 LearningRate 0.0242 Epoch: 10 Global Step: 169690 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:38:25,597-Speed 5184.00 samples/sec Loss 2.0071 LearningRate 0.0242 Epoch: 10 Global Step: 169700 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:38:27,570-Speed 5191.48 samples/sec Loss 2.0065 LearningRate 0.0242 Epoch: 10 Global Step: 169710 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:38:29,562-Speed 5141.44 samples/sec Loss 1.9988 LearningRate 0.0242 Epoch: 10 Global Step: 169720 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:38:31,543-Speed 5169.98 samples/sec Loss 2.0035 LearningRate 0.0242 Epoch: 10 Global Step: 169730 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:38:33,528-Speed 5162.40 samples/sec Loss 1.9843 LearningRate 0.0242 Epoch: 10 Global Step: 169740 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:38:35,528-Speed 5122.67 samples/sec Loss 1.9915 LearningRate 0.0242 Epoch: 10 Global Step: 169750 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:38:37,508-Speed 5172.94 samples/sec Loss 1.9851 LearningRate 0.0242 Epoch: 10 Global Step: 169760 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:38:39,481-Speed 5190.19 samples/sec Loss 1.9518 LearningRate 0.0242 Epoch: 10 Global Step: 169770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:38:41,463-Speed 5168.29 samples/sec Loss 1.9770 LearningRate 0.0241 Epoch: 10 Global Step: 169780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:38:43,436-Speed 5192.57 samples/sec Loss 1.9955 LearningRate 0.0241 Epoch: 10 Global Step: 169790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:38:45,431-Speed 5132.62 samples/sec Loss 1.9985 LearningRate 0.0241 Epoch: 10 Global Step: 169800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:38:47,477-Speed 5008.71 samples/sec Loss 2.0223 LearningRate 0.0241 Epoch: 10 Global Step: 169810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:38:49,505-Speed 5050.36 samples/sec Loss 2.0159 LearningRate 0.0241 Epoch: 10 Global Step: 169820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:38:51,532-Speed 5052.92 samples/sec Loss 2.0229 LearningRate 0.0241 Epoch: 10 Global Step: 169830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:38:53,535-Speed 5115.13 samples/sec Loss 2.0486 LearningRate 0.0241 Epoch: 10 Global Step: 169840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:38:55,523-Speed 5152.62 samples/sec Loss 1.9762 LearningRate 0.0241 Epoch: 10 Global Step: 169850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:38:57,504-Speed 5170.46 samples/sec Loss 2.0011 LearningRate 0.0241 Epoch: 10 Global Step: 169860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:38:59,513-Speed 5097.59 samples/sec Loss 2.0541 LearningRate 0.0241 Epoch: 10 Global Step: 169870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:39:01,489-Speed 5183.87 samples/sec Loss 1.9876 LearningRate 0.0241 Epoch: 10 Global Step: 169880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:39:03,487-Speed 5129.89 samples/sec Loss 2.0699 LearningRate 0.0241 Epoch: 10 Global Step: 169890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:39:05,475-Speed 5154.34 samples/sec Loss 2.0646 LearningRate 0.0241 Epoch: 10 Global Step: 169900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:39:07,446-Speed 5196.63 samples/sec Loss 2.0253 LearningRate 0.0241 Epoch: 10 Global Step: 169910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:39:09,440-Speed 5138.25 samples/sec Loss 2.0126 LearningRate 0.0241 Epoch: 10 Global Step: 169920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:39:11,417-Speed 5180.83 samples/sec Loss 2.0141 LearningRate 0.0241 Epoch: 10 Global Step: 169930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:39:13,407-Speed 5147.83 samples/sec Loss 2.0353 LearningRate 0.0241 Epoch: 10 Global Step: 169940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:39:15,390-Speed 5165.94 samples/sec Loss 1.9971 LearningRate 0.0241 Epoch: 10 Global Step: 169950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:39:17,381-Speed 5143.38 samples/sec Loss 2.0163 LearningRate 0.0241 Epoch: 10 Global Step: 169960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:39:19,362-Speed 5171.66 samples/sec Loss 1.9913 LearningRate 0.0241 Epoch: 10 Global Step: 169970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:39:21,330-Speed 5203.82 samples/sec Loss 2.0010 LearningRate 0.0241 Epoch: 10 Global Step: 169980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:39:23,305-Speed 5188.51 samples/sec Loss 1.9783 LearningRate 0.0241 Epoch: 10 Global Step: 169990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:39:25,292-Speed 5153.57 samples/sec Loss 2.0089 LearningRate 0.0241 Epoch: 10 Global Step: 170000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:39:52,010-[lfw][170000]XNorm: 22.468053 Training: 2022-04-11 10:39:52,010-[lfw][170000]Accuracy-Flip: 0.99767+-0.00260 Training: 2022-04-11 10:39:52,011-[lfw][170000]Accuracy-Highest: 0.99833 Training: 2022-04-11 10:40:22,827-[cfp_fp][170000]XNorm: 21.200211 Training: 2022-04-11 10:40:22,828-[cfp_fp][170000]Accuracy-Flip: 0.98571+-0.00531 Training: 2022-04-11 10:40:22,828-[cfp_fp][170000]Accuracy-Highest: 0.98571 Training: 2022-04-11 10:40:49,335-[agedb_30][170000]XNorm: 22.486466 Training: 2022-04-11 10:40:49,335-[agedb_30][170000]Accuracy-Flip: 0.98117+-0.00820 Training: 2022-04-11 10:40:49,336-[agedb_30][170000]Accuracy-Highest: 0.98167 Training: 2022-04-11 10:40:51,368-Speed 118.97 samples/sec Loss 2.0516 LearningRate 0.0241 Epoch: 10 Global Step: 170010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:40:53,348-Speed 5175.52 samples/sec Loss 1.9786 LearningRate 0.0241 Epoch: 10 Global Step: 170020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:40:55,307-Speed 5227.27 samples/sec Loss 2.0235 LearningRate 0.0241 Epoch: 10 Global Step: 170030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:40:57,268-Speed 5225.15 samples/sec Loss 2.0597 LearningRate 0.0241 Epoch: 10 Global Step: 170040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:40:59,244-Speed 5184.81 samples/sec Loss 2.0103 LearningRate 0.0241 Epoch: 10 Global Step: 170050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:41:01,210-Speed 5211.38 samples/sec Loss 2.0410 LearningRate 0.0241 Epoch: 10 Global Step: 170060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:41:03,175-Speed 5211.68 samples/sec Loss 2.0444 LearningRate 0.0241 Epoch: 10 Global Step: 170070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:41:05,150-Speed 5187.89 samples/sec Loss 2.0330 LearningRate 0.0241 Epoch: 10 Global Step: 170080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:41:07,117-Speed 5206.18 samples/sec Loss 2.0179 LearningRate 0.0241 Epoch: 10 Global Step: 170090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:41:09,085-Speed 5205.22 samples/sec Loss 2.0340 LearningRate 0.0241 Epoch: 10 Global Step: 170100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:41:11,081-Speed 5132.27 samples/sec Loss 1.9973 LearningRate 0.0241 Epoch: 10 Global Step: 170110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:41:13,077-Speed 5133.49 samples/sec Loss 2.0643 LearningRate 0.0240 Epoch: 10 Global Step: 170120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:41:15,066-Speed 5148.35 samples/sec Loss 1.9773 LearningRate 0.0240 Epoch: 10 Global Step: 170130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:41:17,092-Speed 5057.56 samples/sec Loss 1.9939 LearningRate 0.0240 Epoch: 10 Global Step: 170140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:41:19,060-Speed 5204.97 samples/sec Loss 2.0651 LearningRate 0.0240 Epoch: 10 Global Step: 170150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:41:21,042-Speed 5166.21 samples/sec Loss 2.0648 LearningRate 0.0240 Epoch: 10 Global Step: 170160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:41:23,029-Speed 5156.50 samples/sec Loss 1.9792 LearningRate 0.0240 Epoch: 10 Global Step: 170170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:41:25,025-Speed 5131.56 samples/sec Loss 1.9821 LearningRate 0.0240 Epoch: 10 Global Step: 170180 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:41:27,016-Speed 5146.32 samples/sec Loss 2.0662 LearningRate 0.0240 Epoch: 10 Global Step: 170190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:41:29,009-Speed 5137.65 samples/sec Loss 2.0351 LearningRate 0.0240 Epoch: 10 Global Step: 170200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:41:30,993-Speed 5164.35 samples/sec Loss 1.9979 LearningRate 0.0240 Epoch: 10 Global Step: 170210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:41:32,960-Speed 5206.10 samples/sec Loss 2.0424 LearningRate 0.0240 Epoch: 10 Global Step: 170220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:41:34,949-Speed 5151.48 samples/sec Loss 2.0036 LearningRate 0.0240 Epoch: 10 Global Step: 170230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:41:36,942-Speed 5141.07 samples/sec Loss 1.9634 LearningRate 0.0240 Epoch: 10 Global Step: 170240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:41:38,935-Speed 5139.69 samples/sec Loss 1.9693 LearningRate 0.0240 Epoch: 10 Global Step: 170250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:41:40,930-Speed 5133.71 samples/sec Loss 2.0139 LearningRate 0.0240 Epoch: 10 Global Step: 170260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:41:42,898-Speed 5204.91 samples/sec Loss 1.9181 LearningRate 0.0240 Epoch: 10 Global Step: 170270 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:41:44,877-Speed 5173.96 samples/sec Loss 2.0451 LearningRate 0.0240 Epoch: 10 Global Step: 170280 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:41:46,879-Speed 5117.53 samples/sec Loss 2.0195 LearningRate 0.0240 Epoch: 10 Global Step: 170290 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:41:48,865-Speed 5157.88 samples/sec Loss 1.9664 LearningRate 0.0240 Epoch: 10 Global Step: 170300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:41:50,854-Speed 5150.35 samples/sec Loss 2.0198 LearningRate 0.0240 Epoch: 10 Global Step: 170310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:41:52,818-Speed 5214.51 samples/sec Loss 1.9886 LearningRate 0.0240 Epoch: 10 Global Step: 170320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:41:54,788-Speed 5204.34 samples/sec Loss 2.0516 LearningRate 0.0240 Epoch: 10 Global Step: 170330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:41:56,761-Speed 5192.32 samples/sec Loss 1.9925 LearningRate 0.0240 Epoch: 10 Global Step: 170340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:41:58,743-Speed 5169.01 samples/sec Loss 2.0857 LearningRate 0.0240 Epoch: 10 Global Step: 170350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:42:00,717-Speed 5188.27 samples/sec Loss 2.0707 LearningRate 0.0240 Epoch: 10 Global Step: 170360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:42:02,703-Speed 5159.05 samples/sec Loss 2.0414 LearningRate 0.0240 Epoch: 10 Global Step: 170370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:42:04,682-Speed 5175.65 samples/sec Loss 2.0343 LearningRate 0.0240 Epoch: 10 Global Step: 170380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:42:06,658-Speed 5182.85 samples/sec Loss 2.0936 LearningRate 0.0240 Epoch: 10 Global Step: 170390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:42:08,628-Speed 5200.44 samples/sec Loss 2.0215 LearningRate 0.0240 Epoch: 10 Global Step: 170400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:42:10,595-Speed 5207.68 samples/sec Loss 2.0196 LearningRate 0.0240 Epoch: 10 Global Step: 170410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:42:12,561-Speed 5210.44 samples/sec Loss 2.0658 LearningRate 0.0240 Epoch: 10 Global Step: 170420 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:42:14,547-Speed 5157.84 samples/sec Loss 1.9779 LearningRate 0.0240 Epoch: 10 Global Step: 170430 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:42:16,524-Speed 5180.56 samples/sec Loss 2.0556 LearningRate 0.0240 Epoch: 10 Global Step: 170440 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:42:18,498-Speed 5190.78 samples/sec Loss 2.0078 LearningRate 0.0240 Epoch: 10 Global Step: 170450 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:42:20,479-Speed 5169.49 samples/sec Loss 2.0237 LearningRate 0.0239 Epoch: 10 Global Step: 170460 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:42:22,444-Speed 5214.53 samples/sec Loss 1.9689 LearningRate 0.0239 Epoch: 10 Global Step: 170470 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:42:24,424-Speed 5171.96 samples/sec Loss 2.0525 LearningRate 0.0239 Epoch: 10 Global Step: 170480 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:42:26,406-Speed 5169.55 samples/sec Loss 2.0506 LearningRate 0.0239 Epoch: 10 Global Step: 170490 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:42:28,374-Speed 5205.47 samples/sec Loss 2.0311 LearningRate 0.0239 Epoch: 10 Global Step: 170500 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:42:30,340-Speed 5207.93 samples/sec Loss 2.0550 LearningRate 0.0239 Epoch: 10 Global Step: 170510 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:42:32,336-Speed 5133.22 samples/sec Loss 2.0884 LearningRate 0.0239 Epoch: 10 Global Step: 170520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:42:34,315-Speed 5176.39 samples/sec Loss 2.0332 LearningRate 0.0239 Epoch: 10 Global Step: 170530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:42:36,296-Speed 5171.56 samples/sec Loss 2.0290 LearningRate 0.0239 Epoch: 10 Global Step: 170540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:42:38,276-Speed 5174.06 samples/sec Loss 1.9866 LearningRate 0.0239 Epoch: 10 Global Step: 170550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:42:40,251-Speed 5184.45 samples/sec Loss 2.0271 LearningRate 0.0239 Epoch: 10 Global Step: 170560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:42:42,220-Speed 5204.31 samples/sec Loss 2.0128 LearningRate 0.0239 Epoch: 10 Global Step: 170570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:42:44,198-Speed 5176.62 samples/sec Loss 1.9881 LearningRate 0.0239 Epoch: 10 Global Step: 170580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:42:46,194-Speed 5132.86 samples/sec Loss 2.0874 LearningRate 0.0239 Epoch: 10 Global Step: 170590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:42:48,185-Speed 5145.18 samples/sec Loss 2.0399 LearningRate 0.0239 Epoch: 10 Global Step: 170600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:42:50,177-Speed 5140.90 samples/sec Loss 2.0231 LearningRate 0.0239 Epoch: 10 Global Step: 170610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:42:52,180-Speed 5115.00 samples/sec Loss 2.1203 LearningRate 0.0239 Epoch: 10 Global Step: 170620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:42:54,212-Speed 5041.89 samples/sec Loss 2.0510 LearningRate 0.0239 Epoch: 10 Global Step: 170630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:42:56,180-Speed 5205.18 samples/sec Loss 1.9970 LearningRate 0.0239 Epoch: 10 Global Step: 170640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:42:58,155-Speed 5185.62 samples/sec Loss 2.0863 LearningRate 0.0239 Epoch: 10 Global Step: 170650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:43:00,148-Speed 5141.92 samples/sec Loss 2.0231 LearningRate 0.0239 Epoch: 10 Global Step: 170660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:43:02,219-Speed 4944.38 samples/sec Loss 2.1112 LearningRate 0.0239 Epoch: 10 Global Step: 170670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:43:04,204-Speed 5162.24 samples/sec Loss 2.0753 LearningRate 0.0239 Epoch: 10 Global Step: 170680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:43:06,224-Speed 5068.67 samples/sec Loss 2.1710 LearningRate 0.0239 Epoch: 10 Global Step: 170690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:43:08,190-Speed 5212.51 samples/sec Loss 2.0361 LearningRate 0.0239 Epoch: 10 Global Step: 170700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:43:10,172-Speed 5167.41 samples/sec Loss 2.0395 LearningRate 0.0239 Epoch: 10 Global Step: 170710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:43:12,187-Speed 5083.53 samples/sec Loss 2.0739 LearningRate 0.0239 Epoch: 10 Global Step: 170720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:43:14,184-Speed 5130.43 samples/sec Loss 2.0356 LearningRate 0.0239 Epoch: 10 Global Step: 170730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:43:16,174-Speed 5147.71 samples/sec Loss 2.0658 LearningRate 0.0239 Epoch: 10 Global Step: 170740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:43:18,148-Speed 5188.28 samples/sec Loss 2.0093 LearningRate 0.0239 Epoch: 10 Global Step: 170750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:43:20,117-Speed 5203.12 samples/sec Loss 2.0142 LearningRate 0.0239 Epoch: 10 Global Step: 170760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:43:22,099-Speed 5168.48 samples/sec Loss 2.0121 LearningRate 0.0239 Epoch: 10 Global Step: 170770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:43:24,092-Speed 5138.47 samples/sec Loss 2.0598 LearningRate 0.0239 Epoch: 10 Global Step: 170780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:43:26,074-Speed 5169.16 samples/sec Loss 2.0132 LearningRate 0.0239 Epoch: 10 Global Step: 170790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:43:28,046-Speed 5193.71 samples/sec Loss 2.0149 LearningRate 0.0238 Epoch: 10 Global Step: 170800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:43:30,036-Speed 5147.85 samples/sec Loss 2.0618 LearningRate 0.0238 Epoch: 10 Global Step: 170810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:43:32,001-Speed 5212.58 samples/sec Loss 2.0314 LearningRate 0.0238 Epoch: 10 Global Step: 170820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:43:33,981-Speed 5173.44 samples/sec Loss 1.9888 LearningRate 0.0238 Epoch: 10 Global Step: 170830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:43:35,956-Speed 5186.70 samples/sec Loss 2.0389 LearningRate 0.0238 Epoch: 10 Global Step: 170840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:43:37,927-Speed 5199.07 samples/sec Loss 2.0090 LearningRate 0.0238 Epoch: 10 Global Step: 170850 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 10:43:39,906-Speed 5174.07 samples/sec Loss 2.0368 LearningRate 0.0238 Epoch: 10 Global Step: 170860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:43:41,897-Speed 5146.53 samples/sec Loss 2.0148 LearningRate 0.0238 Epoch: 10 Global Step: 170870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:43:43,876-Speed 5174.77 samples/sec Loss 2.0019 LearningRate 0.0238 Epoch: 10 Global Step: 170880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:43:45,874-Speed 5126.44 samples/sec Loss 2.0107 LearningRate 0.0238 Epoch: 10 Global Step: 170890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:43:47,863-Speed 5149.29 samples/sec Loss 2.0821 LearningRate 0.0238 Epoch: 10 Global Step: 170900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:43:49,850-Speed 5155.57 samples/sec Loss 2.0551 LearningRate 0.0238 Epoch: 10 Global Step: 170910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:43:51,828-Speed 5177.69 samples/sec Loss 2.0490 LearningRate 0.0238 Epoch: 10 Global Step: 170920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:43:53,815-Speed 5158.10 samples/sec Loss 2.0833 LearningRate 0.0238 Epoch: 10 Global Step: 170930 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:43:55,806-Speed 5144.82 samples/sec Loss 1.9918 LearningRate 0.0238 Epoch: 10 Global Step: 170940 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:43:57,793-Speed 5155.71 samples/sec Loss 2.0376 LearningRate 0.0238 Epoch: 10 Global Step: 170950 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:43:59,770-Speed 5179.48 samples/sec Loss 1.9980 LearningRate 0.0238 Epoch: 10 Global Step: 170960 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:44:01,747-Speed 5180.64 samples/sec Loss 2.0677 LearningRate 0.0238 Epoch: 10 Global Step: 170970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:44:03,769-Speed 5068.49 samples/sec Loss 2.0351 LearningRate 0.0238 Epoch: 10 Global Step: 170980 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:44:05,750-Speed 5170.90 samples/sec Loss 2.0380 LearningRate 0.0238 Epoch: 10 Global Step: 170990 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:44:07,721-Speed 5195.94 samples/sec Loss 2.0492 LearningRate 0.0238 Epoch: 10 Global Step: 171000 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:44:09,712-Speed 5146.14 samples/sec Loss 2.1014 LearningRate 0.0238 Epoch: 10 Global Step: 171010 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:44:11,688-Speed 5183.12 samples/sec Loss 2.0515 LearningRate 0.0238 Epoch: 10 Global Step: 171020 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:44:13,671-Speed 5167.36 samples/sec Loss 2.0674 LearningRate 0.0238 Epoch: 10 Global Step: 171030 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:44:15,655-Speed 5163.09 samples/sec Loss 1.9675 LearningRate 0.0238 Epoch: 10 Global Step: 171040 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:44:17,639-Speed 5161.98 samples/sec Loss 2.0584 LearningRate 0.0238 Epoch: 10 Global Step: 171050 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:44:19,611-Speed 5194.38 samples/sec Loss 2.0420 LearningRate 0.0238 Epoch: 10 Global Step: 171060 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:44:21,591-Speed 5173.53 samples/sec Loss 2.1116 LearningRate 0.0238 Epoch: 10 Global Step: 171070 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:44:23,564-Speed 5192.24 samples/sec Loss 2.0639 LearningRate 0.0238 Epoch: 10 Global Step: 171080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:44:25,567-Speed 5113.36 samples/sec Loss 2.0253 LearningRate 0.0238 Epoch: 10 Global Step: 171090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:44:27,604-Speed 5027.88 samples/sec Loss 2.0250 LearningRate 0.0238 Epoch: 10 Global Step: 171100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:44:29,586-Speed 5170.28 samples/sec Loss 2.0585 LearningRate 0.0238 Epoch: 10 Global Step: 171110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:44:31,572-Speed 5157.85 samples/sec Loss 2.1182 LearningRate 0.0238 Epoch: 10 Global Step: 171120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:44:33,543-Speed 5197.59 samples/sec Loss 2.0681 LearningRate 0.0238 Epoch: 10 Global Step: 171130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:44:35,530-Speed 5155.49 samples/sec Loss 2.0282 LearningRate 0.0237 Epoch: 10 Global Step: 171140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:44:37,507-Speed 5181.55 samples/sec Loss 2.0590 LearningRate 0.0237 Epoch: 10 Global Step: 171150 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:44:39,492-Speed 5160.26 samples/sec Loss 2.0554 LearningRate 0.0237 Epoch: 10 Global Step: 171160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:44:41,465-Speed 5189.53 samples/sec Loss 2.0887 LearningRate 0.0237 Epoch: 10 Global Step: 171170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:44:43,444-Speed 5178.46 samples/sec Loss 2.0768 LearningRate 0.0237 Epoch: 10 Global Step: 171180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:44:45,461-Speed 5077.47 samples/sec Loss 2.0754 LearningRate 0.0237 Epoch: 10 Global Step: 171190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:44:47,450-Speed 5149.13 samples/sec Loss 2.0014 LearningRate 0.0237 Epoch: 10 Global Step: 171200 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:44:49,430-Speed 5174.97 samples/sec Loss 2.0773 LearningRate 0.0237 Epoch: 10 Global Step: 171210 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:44:51,405-Speed 5186.96 samples/sec Loss 2.1445 LearningRate 0.0237 Epoch: 10 Global Step: 171220 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:44:53,372-Speed 5206.38 samples/sec Loss 2.1132 LearningRate 0.0237 Epoch: 10 Global Step: 171230 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:44:55,361-Speed 5155.56 samples/sec Loss 2.0962 LearningRate 0.0237 Epoch: 10 Global Step: 171240 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:44:57,375-Speed 5087.43 samples/sec Loss 2.0391 LearningRate 0.0237 Epoch: 10 Global Step: 171250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:44:59,354-Speed 5181.61 samples/sec Loss 2.0408 LearningRate 0.0237 Epoch: 10 Global Step: 171260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:45:01,338-Speed 5163.58 samples/sec Loss 2.0870 LearningRate 0.0237 Epoch: 10 Global Step: 171270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:45:03,314-Speed 5183.64 samples/sec Loss 2.0744 LearningRate 0.0237 Epoch: 10 Global Step: 171280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:45:05,333-Speed 5074.98 samples/sec Loss 2.1101 LearningRate 0.0237 Epoch: 10 Global Step: 171290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:45:07,308-Speed 5185.72 samples/sec Loss 1.9673 LearningRate 0.0237 Epoch: 10 Global Step: 171300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:45:09,307-Speed 5126.09 samples/sec Loss 2.0634 LearningRate 0.0237 Epoch: 10 Global Step: 171310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:45:11,294-Speed 5155.25 samples/sec Loss 2.0659 LearningRate 0.0237 Epoch: 10 Global Step: 171320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:45:13,285-Speed 5144.29 samples/sec Loss 2.0691 LearningRate 0.0237 Epoch: 10 Global Step: 171330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:45:15,319-Speed 5036.85 samples/sec Loss 2.0909 LearningRate 0.0237 Epoch: 10 Global Step: 171340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:45:17,305-Speed 5159.57 samples/sec Loss 2.0386 LearningRate 0.0237 Epoch: 10 Global Step: 171350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:45:19,306-Speed 5119.31 samples/sec Loss 2.0730 LearningRate 0.0237 Epoch: 10 Global Step: 171360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:45:21,288-Speed 5169.57 samples/sec Loss 2.0629 LearningRate 0.0237 Epoch: 10 Global Step: 171370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:45:23,287-Speed 5124.77 samples/sec Loss 2.0630 LearningRate 0.0237 Epoch: 10 Global Step: 171380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:45:25,291-Speed 5113.11 samples/sec Loss 2.0411 LearningRate 0.0237 Epoch: 10 Global Step: 171390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:45:27,316-Speed 5059.31 samples/sec Loss 2.0247 LearningRate 0.0237 Epoch: 10 Global Step: 171400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:45:29,293-Speed 5181.35 samples/sec Loss 2.1396 LearningRate 0.0237 Epoch: 10 Global Step: 171410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:45:31,277-Speed 5163.58 samples/sec Loss 2.0661 LearningRate 0.0237 Epoch: 10 Global Step: 171420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:45:33,257-Speed 5172.59 samples/sec Loss 2.0094 LearningRate 0.0237 Epoch: 10 Global Step: 171430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:45:35,281-Speed 5061.69 samples/sec Loss 2.0584 LearningRate 0.0237 Epoch: 10 Global Step: 171440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:45:37,265-Speed 5163.47 samples/sec Loss 2.0420 LearningRate 0.0237 Epoch: 10 Global Step: 171450 Fp16 Grad Scale: 262144 Required: 11 hours Training: 2022-04-11 10:45:39,241-Speed 5182.67 samples/sec Loss 2.0973 LearningRate 0.0237 Epoch: 10 Global Step: 171460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:45:41,293-Speed 4992.31 samples/sec Loss 2.0532 LearningRate 0.0237 Epoch: 10 Global Step: 171470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:45:43,262-Speed 5202.12 samples/sec Loss 2.0380 LearningRate 0.0236 Epoch: 10 Global Step: 171480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:45:45,237-Speed 5186.73 samples/sec Loss 2.0546 LearningRate 0.0236 Epoch: 10 Global Step: 171490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:45:47,241-Speed 5112.34 samples/sec Loss 2.0931 LearningRate 0.0236 Epoch: 10 Global Step: 171500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:45:49,229-Speed 5153.84 samples/sec Loss 2.0537 LearningRate 0.0236 Epoch: 10 Global Step: 171510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:45:51,203-Speed 5187.53 samples/sec Loss 2.0491 LearningRate 0.0236 Epoch: 10 Global Step: 171520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:45:53,172-Speed 5203.43 samples/sec Loss 2.0140 LearningRate 0.0236 Epoch: 10 Global Step: 171530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:45:55,144-Speed 5195.24 samples/sec Loss 2.0328 LearningRate 0.0236 Epoch: 10 Global Step: 171540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:45:57,126-Speed 5168.35 samples/sec Loss 2.0369 LearningRate 0.0236 Epoch: 10 Global Step: 171550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:45:59,128-Speed 5115.26 samples/sec Loss 2.0559 LearningRate 0.0236 Epoch: 10 Global Step: 171560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:46:01,114-Speed 5158.59 samples/sec Loss 2.0308 LearningRate 0.0236 Epoch: 10 Global Step: 171570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:46:03,117-Speed 5117.55 samples/sec Loss 2.0570 LearningRate 0.0236 Epoch: 10 Global Step: 171580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:46:05,091-Speed 5188.17 samples/sec Loss 2.1129 LearningRate 0.0236 Epoch: 10 Global Step: 171590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:46:07,083-Speed 5144.05 samples/sec Loss 2.0836 LearningRate 0.0236 Epoch: 10 Global Step: 171600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:46:09,077-Speed 5135.67 samples/sec Loss 2.0395 LearningRate 0.0236 Epoch: 10 Global Step: 171610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:46:11,078-Speed 5120.06 samples/sec Loss 2.0017 LearningRate 0.0236 Epoch: 10 Global Step: 171620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:46:13,102-Speed 5060.14 samples/sec Loss 2.0241 LearningRate 0.0236 Epoch: 10 Global Step: 171630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:46:15,082-Speed 5178.34 samples/sec Loss 2.0505 LearningRate 0.0236 Epoch: 10 Global Step: 171640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:46:17,081-Speed 5124.70 samples/sec Loss 2.0064 LearningRate 0.0236 Epoch: 10 Global Step: 171650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:46:19,062-Speed 5170.27 samples/sec Loss 2.0284 LearningRate 0.0236 Epoch: 10 Global Step: 171660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:46:21,058-Speed 5130.16 samples/sec Loss 2.0444 LearningRate 0.0236 Epoch: 10 Global Step: 171670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:46:23,042-Speed 5165.23 samples/sec Loss 2.1081 LearningRate 0.0236 Epoch: 10 Global Step: 171680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:46:25,020-Speed 5177.76 samples/sec Loss 2.1050 LearningRate 0.0236 Epoch: 10 Global Step: 171690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:46:27,055-Speed 5036.29 samples/sec Loss 2.0739 LearningRate 0.0236 Epoch: 10 Global Step: 171700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:46:29,074-Speed 5073.29 samples/sec Loss 2.0514 LearningRate 0.0236 Epoch: 10 Global Step: 171710 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:46:31,090-Speed 5081.00 samples/sec Loss 2.0944 LearningRate 0.0236 Epoch: 10 Global Step: 171720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:46:33,068-Speed 5178.84 samples/sec Loss 2.0505 LearningRate 0.0236 Epoch: 10 Global Step: 171730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:46:35,067-Speed 5125.85 samples/sec Loss 2.0851 LearningRate 0.0236 Epoch: 10 Global Step: 171740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:46:37,054-Speed 5154.14 samples/sec Loss 2.0797 LearningRate 0.0236 Epoch: 10 Global Step: 171750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:46:39,048-Speed 5137.91 samples/sec Loss 2.0057 LearningRate 0.0236 Epoch: 10 Global Step: 171760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:46:41,063-Speed 5084.04 samples/sec Loss 2.0928 LearningRate 0.0236 Epoch: 10 Global Step: 171770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:46:43,052-Speed 5148.84 samples/sec Loss 2.0930 LearningRate 0.0236 Epoch: 10 Global Step: 171780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:46:45,047-Speed 5135.89 samples/sec Loss 2.0890 LearningRate 0.0236 Epoch: 10 Global Step: 171790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:46:47,062-Speed 5083.40 samples/sec Loss 2.1075 LearningRate 0.0236 Epoch: 10 Global Step: 171800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:46:49,083-Speed 5067.26 samples/sec Loss 2.1243 LearningRate 0.0236 Epoch: 10 Global Step: 171810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:46:51,067-Speed 5163.67 samples/sec Loss 2.0731 LearningRate 0.0236 Epoch: 10 Global Step: 171820 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:46:53,065-Speed 5126.86 samples/sec Loss 2.0913 LearningRate 0.0235 Epoch: 10 Global Step: 171830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:46:55,044-Speed 5174.65 samples/sec Loss 2.1240 LearningRate 0.0235 Epoch: 10 Global Step: 171840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:46:57,030-Speed 5157.88 samples/sec Loss 2.0844 LearningRate 0.0235 Epoch: 10 Global Step: 171850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:46:59,022-Speed 5143.80 samples/sec Loss 2.0484 LearningRate 0.0235 Epoch: 10 Global Step: 171860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:47:01,093-Speed 4946.49 samples/sec Loss 2.0759 LearningRate 0.0235 Epoch: 10 Global Step: 171870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:47:03,079-Speed 5156.98 samples/sec Loss 2.1071 LearningRate 0.0235 Epoch: 10 Global Step: 171880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:47:05,046-Speed 5208.91 samples/sec Loss 2.1355 LearningRate 0.0235 Epoch: 10 Global Step: 171890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:47:07,031-Speed 5161.00 samples/sec Loss 2.1169 LearningRate 0.0235 Epoch: 10 Global Step: 171900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:47:09,020-Speed 5148.43 samples/sec Loss 2.0890 LearningRate 0.0235 Epoch: 10 Global Step: 171910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:47:10,991-Speed 5197.53 samples/sec Loss 2.1417 LearningRate 0.0235 Epoch: 10 Global Step: 171920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:47:12,988-Speed 5130.65 samples/sec Loss 2.1658 LearningRate 0.0235 Epoch: 10 Global Step: 171930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:47:15,018-Speed 5045.65 samples/sec Loss 2.1058 LearningRate 0.0235 Epoch: 10 Global Step: 171940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:47:17,013-Speed 5134.49 samples/sec Loss 2.0736 LearningRate 0.0235 Epoch: 10 Global Step: 171950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:47:18,994-Speed 5172.07 samples/sec Loss 2.0374 LearningRate 0.0235 Epoch: 10 Global Step: 171960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:47:20,988-Speed 5135.45 samples/sec Loss 2.0967 LearningRate 0.0235 Epoch: 10 Global Step: 171970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:47:22,964-Speed 5184.29 samples/sec Loss 2.0795 LearningRate 0.0235 Epoch: 10 Global Step: 171980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:47:24,933-Speed 5202.10 samples/sec Loss 2.0776 LearningRate 0.0235 Epoch: 10 Global Step: 171990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:47:26,922-Speed 5151.56 samples/sec Loss 2.1842 LearningRate 0.0235 Epoch: 10 Global Step: 172000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:47:53,510-[lfw][172000]XNorm: 22.030115 Training: 2022-04-11 10:47:53,511-[lfw][172000]Accuracy-Flip: 0.99800+-0.00267 Training: 2022-04-11 10:47:53,511-[lfw][172000]Accuracy-Highest: 0.99833 Training: 2022-04-11 10:48:24,279-[cfp_fp][172000]XNorm: 20.958595 Training: 2022-04-11 10:48:24,279-[cfp_fp][172000]Accuracy-Flip: 0.98471+-0.00582 Training: 2022-04-11 10:48:24,280-[cfp_fp][172000]Accuracy-Highest: 0.98571 Training: 2022-04-11 10:48:50,819-[agedb_30][172000]XNorm: 21.934008 Training: 2022-04-11 10:48:50,819-[agedb_30][172000]Accuracy-Flip: 0.98017+-0.00747 Training: 2022-04-11 10:48:50,820-[agedb_30][172000]Accuracy-Highest: 0.98167 Training: 2022-04-11 10:48:52,822-Speed 119.21 samples/sec Loss 2.0719 LearningRate 0.0235 Epoch: 10 Global Step: 172010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:48:54,789-Speed 5205.67 samples/sec Loss 2.0785 LearningRate 0.0235 Epoch: 10 Global Step: 172020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:48:56,757-Speed 5206.84 samples/sec Loss 2.1203 LearningRate 0.0235 Epoch: 10 Global Step: 172030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:48:58,730-Speed 5190.17 samples/sec Loss 2.1305 LearningRate 0.0235 Epoch: 10 Global Step: 172040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:49:00,721-Speed 5146.73 samples/sec Loss 2.0966 LearningRate 0.0235 Epoch: 10 Global Step: 172050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:49:02,717-Speed 5132.46 samples/sec Loss 2.0896 LearningRate 0.0235 Epoch: 10 Global Step: 172060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:49:04,691-Speed 5188.44 samples/sec Loss 2.1026 LearningRate 0.0235 Epoch: 10 Global Step: 172070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:49:06,665-Speed 5189.02 samples/sec Loss 2.1662 LearningRate 0.0235 Epoch: 10 Global Step: 172080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:49:08,643-Speed 5180.13 samples/sec Loss 2.0785 LearningRate 0.0235 Epoch: 10 Global Step: 172090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:49:10,617-Speed 5187.87 samples/sec Loss 2.0726 LearningRate 0.0235 Epoch: 10 Global Step: 172100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:49:12,585-Speed 5203.46 samples/sec Loss 2.0768 LearningRate 0.0235 Epoch: 10 Global Step: 172110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:49:14,571-Speed 5158.31 samples/sec Loss 2.0978 LearningRate 0.0235 Epoch: 10 Global Step: 172120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:49:16,548-Speed 5180.78 samples/sec Loss 2.0878 LearningRate 0.0235 Epoch: 10 Global Step: 172130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:49:18,537-Speed 5150.15 samples/sec Loss 2.1332 LearningRate 0.0235 Epoch: 10 Global Step: 172140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:49:20,516-Speed 5176.22 samples/sec Loss 2.1144 LearningRate 0.0235 Epoch: 10 Global Step: 172150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:49:22,541-Speed 5059.11 samples/sec Loss 1.9930 LearningRate 0.0235 Epoch: 10 Global Step: 172160 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:49:24,531-Speed 5147.31 samples/sec Loss 2.0784 LearningRate 0.0234 Epoch: 10 Global Step: 172170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:49:26,532-Speed 5119.63 samples/sec Loss 2.1231 LearningRate 0.0234 Epoch: 10 Global Step: 172180 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:49:28,513-Speed 5171.50 samples/sec Loss 2.0923 LearningRate 0.0234 Epoch: 10 Global Step: 172190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:49:30,491-Speed 5178.48 samples/sec Loss 2.0757 LearningRate 0.0234 Epoch: 10 Global Step: 172200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:49:32,483-Speed 5143.18 samples/sec Loss 2.0606 LearningRate 0.0234 Epoch: 10 Global Step: 172210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:49:34,456-Speed 5190.77 samples/sec Loss 2.0948 LearningRate 0.0234 Epoch: 10 Global Step: 172220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:49:36,460-Speed 5112.61 samples/sec Loss 2.0926 LearningRate 0.0234 Epoch: 10 Global Step: 172230 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:49:38,451-Speed 5143.85 samples/sec Loss 2.0991 LearningRate 0.0234 Epoch: 10 Global Step: 172240 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:49:40,423-Speed 5193.70 samples/sec Loss 2.0592 LearningRate 0.0234 Epoch: 10 Global Step: 172250 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:49:42,387-Speed 5217.65 samples/sec Loss 2.0905 LearningRate 0.0234 Epoch: 10 Global Step: 172260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:49:44,358-Speed 5196.21 samples/sec Loss 2.0824 LearningRate 0.0234 Epoch: 10 Global Step: 172270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:49:46,341-Speed 5166.00 samples/sec Loss 2.0348 LearningRate 0.0234 Epoch: 10 Global Step: 172280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:49:48,318-Speed 5182.49 samples/sec Loss 2.1103 LearningRate 0.0234 Epoch: 10 Global Step: 172290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:49:50,312-Speed 5135.13 samples/sec Loss 2.0762 LearningRate 0.0234 Epoch: 10 Global Step: 172300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:49:52,283-Speed 5196.62 samples/sec Loss 2.0253 LearningRate 0.0234 Epoch: 10 Global Step: 172310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:49:54,253-Speed 5199.18 samples/sec Loss 2.0994 LearningRate 0.0234 Epoch: 10 Global Step: 172320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:49:56,230-Speed 5184.18 samples/sec Loss 2.1112 LearningRate 0.0234 Epoch: 10 Global Step: 172330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:49:58,217-Speed 5156.24 samples/sec Loss 2.0257 LearningRate 0.0234 Epoch: 10 Global Step: 172340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:50:00,184-Speed 5207.29 samples/sec Loss 2.0638 LearningRate 0.0234 Epoch: 10 Global Step: 172350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:50:02,171-Speed 5155.47 samples/sec Loss 2.0745 LearningRate 0.0234 Epoch: 10 Global Step: 172360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:50:04,151-Speed 5173.57 samples/sec Loss 2.0495 LearningRate 0.0234 Epoch: 10 Global Step: 172370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:50:06,138-Speed 5155.54 samples/sec Loss 2.1053 LearningRate 0.0234 Epoch: 10 Global Step: 172380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:50:08,105-Speed 5206.79 samples/sec Loss 2.1087 LearningRate 0.0234 Epoch: 10 Global Step: 172390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:50:10,091-Speed 5158.07 samples/sec Loss 2.0490 LearningRate 0.0234 Epoch: 10 Global Step: 172400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:50:12,063-Speed 5194.69 samples/sec Loss 2.1301 LearningRate 0.0234 Epoch: 10 Global Step: 172410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:50:14,064-Speed 5118.11 samples/sec Loss 2.0890 LearningRate 0.0234 Epoch: 10 Global Step: 172420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:50:16,036-Speed 5194.90 samples/sec Loss 2.1553 LearningRate 0.0234 Epoch: 10 Global Step: 172430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:50:18,014-Speed 5178.31 samples/sec Loss 2.0809 LearningRate 0.0234 Epoch: 10 Global Step: 172440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:50:19,994-Speed 5174.31 samples/sec Loss 2.0982 LearningRate 0.0234 Epoch: 10 Global Step: 172450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:50:21,977-Speed 5165.13 samples/sec Loss 2.0833 LearningRate 0.0234 Epoch: 10 Global Step: 172460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:50:23,947-Speed 5200.59 samples/sec Loss 2.1421 LearningRate 0.0234 Epoch: 10 Global Step: 172470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:50:25,918-Speed 5196.70 samples/sec Loss 2.0536 LearningRate 0.0234 Epoch: 10 Global Step: 172480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:50:27,913-Speed 5135.08 samples/sec Loss 2.0998 LearningRate 0.0234 Epoch: 10 Global Step: 172490 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:50:29,879-Speed 5210.03 samples/sec Loss 2.0968 LearningRate 0.0234 Epoch: 10 Global Step: 172500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:50:31,843-Speed 5214.06 samples/sec Loss 2.0850 LearningRate 0.0234 Epoch: 10 Global Step: 172510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:50:33,822-Speed 5176.85 samples/sec Loss 2.0863 LearningRate 0.0233 Epoch: 10 Global Step: 172520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:50:35,797-Speed 5186.79 samples/sec Loss 2.0826 LearningRate 0.0233 Epoch: 10 Global Step: 172530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:50:37,786-Speed 5149.97 samples/sec Loss 2.1176 LearningRate 0.0233 Epoch: 10 Global Step: 172540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:50:39,774-Speed 5153.06 samples/sec Loss 2.0714 LearningRate 0.0233 Epoch: 10 Global Step: 172550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:50:41,772-Speed 5126.73 samples/sec Loss 2.0912 LearningRate 0.0233 Epoch: 10 Global Step: 172560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:50:43,737-Speed 5213.07 samples/sec Loss 2.0562 LearningRate 0.0233 Epoch: 10 Global Step: 172570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:50:45,722-Speed 5160.74 samples/sec Loss 2.0574 LearningRate 0.0233 Epoch: 10 Global Step: 172580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:50:47,700-Speed 5178.71 samples/sec Loss 2.0895 LearningRate 0.0233 Epoch: 10 Global Step: 172590 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:50:49,709-Speed 5098.16 samples/sec Loss 2.0634 LearningRate 0.0233 Epoch: 10 Global Step: 172600 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:50:51,724-Speed 5084.58 samples/sec Loss 2.0978 LearningRate 0.0233 Epoch: 10 Global Step: 172610 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:50:53,702-Speed 5176.31 samples/sec Loss 2.0772 LearningRate 0.0233 Epoch: 10 Global Step: 172620 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:50:55,674-Speed 5194.73 samples/sec Loss 2.1179 LearningRate 0.0233 Epoch: 10 Global Step: 172630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:50:57,661-Speed 5155.52 samples/sec Loss 2.0976 LearningRate 0.0233 Epoch: 10 Global Step: 172640 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:50:59,629-Speed 5205.94 samples/sec Loss 2.0764 LearningRate 0.0233 Epoch: 10 Global Step: 172650 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:51:01,619-Speed 5147.34 samples/sec Loss 2.0724 LearningRate 0.0233 Epoch: 10 Global Step: 172660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:51:03,599-Speed 5173.40 samples/sec Loss 2.1519 LearningRate 0.0233 Epoch: 10 Global Step: 172670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:51:05,563-Speed 5216.25 samples/sec Loss 2.0627 LearningRate 0.0233 Epoch: 10 Global Step: 172680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:51:07,532-Speed 5201.35 samples/sec Loss 2.0869 LearningRate 0.0233 Epoch: 10 Global Step: 172690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:51:09,502-Speed 5199.57 samples/sec Loss 2.1153 LearningRate 0.0233 Epoch: 10 Global Step: 172700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:51:11,467-Speed 5212.95 samples/sec Loss 2.1481 LearningRate 0.0233 Epoch: 10 Global Step: 172710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:51:13,466-Speed 5124.61 samples/sec Loss 2.1688 LearningRate 0.0233 Epoch: 10 Global Step: 172720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:51:15,441-Speed 5187.53 samples/sec Loss 2.0901 LearningRate 0.0233 Epoch: 10 Global Step: 172730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:51:17,415-Speed 5188.30 samples/sec Loss 2.1691 LearningRate 0.0233 Epoch: 10 Global Step: 172740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:51:19,395-Speed 5172.73 samples/sec Loss 2.0990 LearningRate 0.0233 Epoch: 10 Global Step: 172750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:51:21,375-Speed 5175.39 samples/sec Loss 2.0896 LearningRate 0.0233 Epoch: 10 Global Step: 172760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:51:23,360-Speed 5159.39 samples/sec Loss 2.1035 LearningRate 0.0233 Epoch: 10 Global Step: 172770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:51:25,345-Speed 5160.59 samples/sec Loss 2.0622 LearningRate 0.0233 Epoch: 10 Global Step: 172780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:51:27,346-Speed 5120.87 samples/sec Loss 2.1070 LearningRate 0.0233 Epoch: 10 Global Step: 172790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:51:29,325-Speed 5175.96 samples/sec Loss 2.1560 LearningRate 0.0233 Epoch: 10 Global Step: 172800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:51:31,291-Speed 5208.53 samples/sec Loss 2.1135 LearningRate 0.0233 Epoch: 10 Global Step: 172810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:51:33,266-Speed 5188.56 samples/sec Loss 2.0917 LearningRate 0.0233 Epoch: 10 Global Step: 172820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:51:35,241-Speed 5184.06 samples/sec Loss 2.1608 LearningRate 0.0233 Epoch: 10 Global Step: 172830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:51:37,222-Speed 5172.21 samples/sec Loss 2.1288 LearningRate 0.0233 Epoch: 10 Global Step: 172840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:51:39,194-Speed 5192.47 samples/sec Loss 2.0902 LearningRate 0.0233 Epoch: 10 Global Step: 172850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:51:41,185-Speed 5145.72 samples/sec Loss 2.0955 LearningRate 0.0232 Epoch: 10 Global Step: 172860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:51:43,153-Speed 5205.02 samples/sec Loss 2.1167 LearningRate 0.0232 Epoch: 10 Global Step: 172870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:51:45,143-Speed 5147.97 samples/sec Loss 2.0837 LearningRate 0.0232 Epoch: 10 Global Step: 172880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:51:47,127-Speed 5163.25 samples/sec Loss 2.0585 LearningRate 0.0232 Epoch: 10 Global Step: 172890 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:51:49,106-Speed 5176.88 samples/sec Loss 2.1284 LearningRate 0.0232 Epoch: 10 Global Step: 172900 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:51:51,084-Speed 5178.77 samples/sec Loss 2.1652 LearningRate 0.0232 Epoch: 10 Global Step: 172910 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:51:53,068-Speed 5161.10 samples/sec Loss 2.0974 LearningRate 0.0232 Epoch: 10 Global Step: 172920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:51:55,036-Speed 5205.13 samples/sec Loss 2.0854 LearningRate 0.0232 Epoch: 10 Global Step: 172930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:51:57,014-Speed 5181.66 samples/sec Loss 2.1283 LearningRate 0.0232 Epoch: 10 Global Step: 172940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:51:59,000-Speed 5158.30 samples/sec Loss 2.1226 LearningRate 0.0232 Epoch: 10 Global Step: 172950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:52:00,978-Speed 5178.79 samples/sec Loss 2.0747 LearningRate 0.0232 Epoch: 10 Global Step: 172960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:52:02,981-Speed 5114.34 samples/sec Loss 2.1706 LearningRate 0.0232 Epoch: 10 Global Step: 172970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:52:04,970-Speed 5151.40 samples/sec Loss 2.0539 LearningRate 0.0232 Epoch: 10 Global Step: 172980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:52:06,953-Speed 5164.88 samples/sec Loss 2.1098 LearningRate 0.0232 Epoch: 10 Global Step: 172990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:52:08,964-Speed 5094.81 samples/sec Loss 2.1254 LearningRate 0.0232 Epoch: 10 Global Step: 173000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:52:10,981-Speed 5076.87 samples/sec Loss 2.1319 LearningRate 0.0232 Epoch: 10 Global Step: 173010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:52:12,962-Speed 5170.95 samples/sec Loss 2.0368 LearningRate 0.0232 Epoch: 10 Global Step: 173020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:52:14,933-Speed 5197.22 samples/sec Loss 2.0939 LearningRate 0.0232 Epoch: 10 Global Step: 173030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:52:16,903-Speed 5199.33 samples/sec Loss 2.1190 LearningRate 0.0232 Epoch: 10 Global Step: 173040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:52:18,879-Speed 5186.49 samples/sec Loss 2.0629 LearningRate 0.0232 Epoch: 10 Global Step: 173050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:52:20,880-Speed 5119.02 samples/sec Loss 2.0726 LearningRate 0.0232 Epoch: 10 Global Step: 173060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:52:22,854-Speed 5189.17 samples/sec Loss 2.0512 LearningRate 0.0232 Epoch: 10 Global Step: 173070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:52:24,833-Speed 5175.67 samples/sec Loss 2.0249 LearningRate 0.0232 Epoch: 10 Global Step: 173080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:52:26,804-Speed 5199.47 samples/sec Loss 2.1934 LearningRate 0.0232 Epoch: 10 Global Step: 173090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:52:28,780-Speed 5184.46 samples/sec Loss 2.0954 LearningRate 0.0232 Epoch: 10 Global Step: 173100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:52:30,758-Speed 5179.70 samples/sec Loss 2.1436 LearningRate 0.0232 Epoch: 10 Global Step: 173110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:52:32,719-Speed 5221.91 samples/sec Loss 2.1921 LearningRate 0.0232 Epoch: 10 Global Step: 173120 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:52:34,693-Speed 5190.49 samples/sec Loss 2.1078 LearningRate 0.0232 Epoch: 10 Global Step: 173130 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:52:36,664-Speed 5196.67 samples/sec Loss 2.1136 LearningRate 0.0232 Epoch: 10 Global Step: 173140 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:52:38,634-Speed 5201.53 samples/sec Loss 2.0856 LearningRate 0.0232 Epoch: 10 Global Step: 173150 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:52:40,604-Speed 5197.56 samples/sec Loss 2.1589 LearningRate 0.0232 Epoch: 10 Global Step: 173160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:52:42,573-Speed 5202.86 samples/sec Loss 2.0851 LearningRate 0.0232 Epoch: 10 Global Step: 173170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:52:44,540-Speed 5208.30 samples/sec Loss 2.0903 LearningRate 0.0232 Epoch: 10 Global Step: 173180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:52:46,540-Speed 5120.69 samples/sec Loss 2.0991 LearningRate 0.0232 Epoch: 10 Global Step: 173190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:52:48,517-Speed 5179.63 samples/sec Loss 2.1321 LearningRate 0.0232 Epoch: 10 Global Step: 173200 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:52:50,506-Speed 5152.23 samples/sec Loss 2.1502 LearningRate 0.0231 Epoch: 10 Global Step: 173210 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:52:52,476-Speed 5198.41 samples/sec Loss 2.0720 LearningRate 0.0231 Epoch: 10 Global Step: 173220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:52:54,446-Speed 5202.63 samples/sec Loss 2.0974 LearningRate 0.0231 Epoch: 10 Global Step: 173230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:52:56,437-Speed 5145.71 samples/sec Loss 2.1219 LearningRate 0.0231 Epoch: 10 Global Step: 173240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:52:58,439-Speed 5115.98 samples/sec Loss 2.1471 LearningRate 0.0231 Epoch: 10 Global Step: 173250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:53:00,450-Speed 5093.73 samples/sec Loss 2.0968 LearningRate 0.0231 Epoch: 10 Global Step: 173260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:53:02,426-Speed 5186.40 samples/sec Loss 2.0612 LearningRate 0.0231 Epoch: 10 Global Step: 173270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:53:04,405-Speed 5174.55 samples/sec Loss 2.1342 LearningRate 0.0231 Epoch: 10 Global Step: 173280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:53:06,377-Speed 5195.70 samples/sec Loss 2.1298 LearningRate 0.0231 Epoch: 10 Global Step: 173290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:53:08,349-Speed 5193.95 samples/sec Loss 2.0813 LearningRate 0.0231 Epoch: 10 Global Step: 173300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:53:10,321-Speed 5195.90 samples/sec Loss 2.1462 LearningRate 0.0231 Epoch: 10 Global Step: 173310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:53:12,298-Speed 5181.62 samples/sec Loss 2.1687 LearningRate 0.0231 Epoch: 10 Global Step: 173320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:53:14,269-Speed 5196.26 samples/sec Loss 2.1204 LearningRate 0.0231 Epoch: 10 Global Step: 173330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:53:16,267-Speed 5128.10 samples/sec Loss 2.0998 LearningRate 0.0231 Epoch: 10 Global Step: 173340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:53:18,246-Speed 5175.25 samples/sec Loss 2.1272 LearningRate 0.0231 Epoch: 10 Global Step: 173350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:53:20,240-Speed 5138.29 samples/sec Loss 2.1581 LearningRate 0.0231 Epoch: 10 Global Step: 173360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:53:22,212-Speed 5192.63 samples/sec Loss 2.1154 LearningRate 0.0231 Epoch: 10 Global Step: 173370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:53:24,194-Speed 5170.56 samples/sec Loss 2.0839 LearningRate 0.0231 Epoch: 10 Global Step: 173380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:53:26,178-Speed 5160.86 samples/sec Loss 2.1046 LearningRate 0.0231 Epoch: 10 Global Step: 173390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:53:28,163-Speed 5160.12 samples/sec Loss 2.1050 LearningRate 0.0231 Epoch: 10 Global Step: 173400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:53:30,132-Speed 5202.20 samples/sec Loss 2.0786 LearningRate 0.0231 Epoch: 10 Global Step: 173410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:53:32,107-Speed 5188.59 samples/sec Loss 2.1042 LearningRate 0.0231 Epoch: 10 Global Step: 173420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:53:34,087-Speed 5171.83 samples/sec Loss 2.1231 LearningRate 0.0231 Epoch: 10 Global Step: 173430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:53:36,053-Speed 5210.97 samples/sec Loss 2.1304 LearningRate 0.0231 Epoch: 10 Global Step: 173440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:53:38,026-Speed 5192.70 samples/sec Loss 2.1091 LearningRate 0.0231 Epoch: 10 Global Step: 173450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:53:40,042-Speed 5087.70 samples/sec Loss 2.1171 LearningRate 0.0231 Epoch: 10 Global Step: 173460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:53:42,013-Speed 5196.51 samples/sec Loss 2.1627 LearningRate 0.0231 Epoch: 10 Global Step: 173470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:53:43,983-Speed 5197.92 samples/sec Loss 2.1803 LearningRate 0.0231 Epoch: 10 Global Step: 173480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:53:45,953-Speed 5199.37 samples/sec Loss 2.1477 LearningRate 0.0231 Epoch: 10 Global Step: 173490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:53:47,935-Speed 5168.34 samples/sec Loss 2.1197 LearningRate 0.0231 Epoch: 10 Global Step: 173500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:53:49,953-Speed 5077.14 samples/sec Loss 2.0939 LearningRate 0.0231 Epoch: 10 Global Step: 173510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:53:51,923-Speed 5198.04 samples/sec Loss 2.1473 LearningRate 0.0231 Epoch: 10 Global Step: 173520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:53:53,930-Speed 5104.28 samples/sec Loss 2.1112 LearningRate 0.0231 Epoch: 10 Global Step: 173530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:53:55,903-Speed 5193.74 samples/sec Loss 2.1481 LearningRate 0.0231 Epoch: 10 Global Step: 173540 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:53:57,890-Speed 5154.86 samples/sec Loss 2.0662 LearningRate 0.0231 Epoch: 10 Global Step: 173550 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:53:59,873-Speed 5164.51 samples/sec Loss 2.0775 LearningRate 0.0230 Epoch: 10 Global Step: 173560 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:54:01,846-Speed 5194.06 samples/sec Loss 2.1288 LearningRate 0.0230 Epoch: 10 Global Step: 173570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:54:03,847-Speed 5118.58 samples/sec Loss 2.1337 LearningRate 0.0230 Epoch: 10 Global Step: 173580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:54:05,817-Speed 5198.09 samples/sec Loss 2.2335 LearningRate 0.0230 Epoch: 10 Global Step: 173590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:54:07,782-Speed 5213.44 samples/sec Loss 2.0916 LearningRate 0.0230 Epoch: 10 Global Step: 173600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:54:09,755-Speed 5191.73 samples/sec Loss 2.0773 LearningRate 0.0230 Epoch: 10 Global Step: 173610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:54:11,740-Speed 5161.06 samples/sec Loss 2.1102 LearningRate 0.0230 Epoch: 10 Global Step: 173620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:54:13,712-Speed 5194.19 samples/sec Loss 2.1178 LearningRate 0.0230 Epoch: 10 Global Step: 173630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:54:15,685-Speed 5190.54 samples/sec Loss 2.1557 LearningRate 0.0230 Epoch: 10 Global Step: 173640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:54:17,666-Speed 5172.46 samples/sec Loss 2.0838 LearningRate 0.0230 Epoch: 10 Global Step: 173650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:54:19,636-Speed 5200.63 samples/sec Loss 2.0770 LearningRate 0.0230 Epoch: 10 Global Step: 173660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:54:21,608-Speed 5193.69 samples/sec Loss 2.0126 LearningRate 0.0230 Epoch: 10 Global Step: 173670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:54:23,581-Speed 5190.34 samples/sec Loss 2.1345 LearningRate 0.0230 Epoch: 10 Global Step: 173680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:54:25,553-Speed 5196.45 samples/sec Loss 2.1095 LearningRate 0.0230 Epoch: 10 Global Step: 173690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:54:27,531-Speed 5178.63 samples/sec Loss 2.1279 LearningRate 0.0230 Epoch: 10 Global Step: 173700 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:54:29,496-Speed 5212.08 samples/sec Loss 2.1561 LearningRate 0.0230 Epoch: 10 Global Step: 173710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:54:31,466-Speed 5203.29 samples/sec Loss 2.1148 LearningRate 0.0230 Epoch: 10 Global Step: 173720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:54:33,438-Speed 5195.42 samples/sec Loss 2.1005 LearningRate 0.0230 Epoch: 10 Global Step: 173730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:54:35,420-Speed 5168.30 samples/sec Loss 2.1908 LearningRate 0.0230 Epoch: 10 Global Step: 173740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:54:37,404-Speed 5163.63 samples/sec Loss 2.1148 LearningRate 0.0230 Epoch: 10 Global Step: 173750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:54:39,383-Speed 5175.12 samples/sec Loss 2.1873 LearningRate 0.0230 Epoch: 10 Global Step: 173760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:54:41,369-Speed 5158.93 samples/sec Loss 2.1229 LearningRate 0.0230 Epoch: 10 Global Step: 173770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:54:43,362-Speed 5138.62 samples/sec Loss 2.1230 LearningRate 0.0230 Epoch: 10 Global Step: 173780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:54:45,347-Speed 5160.62 samples/sec Loss 2.2273 LearningRate 0.0230 Epoch: 10 Global Step: 173790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:54:47,344-Speed 5128.12 samples/sec Loss 2.2295 LearningRate 0.0230 Epoch: 10 Global Step: 173800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:54:49,322-Speed 5180.27 samples/sec Loss 2.1708 LearningRate 0.0230 Epoch: 10 Global Step: 173810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:54:51,301-Speed 5174.01 samples/sec Loss 2.1510 LearningRate 0.0230 Epoch: 10 Global Step: 173820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:54:53,275-Speed 5189.74 samples/sec Loss 2.0867 LearningRate 0.0230 Epoch: 10 Global Step: 173830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:54:55,245-Speed 5199.72 samples/sec Loss 2.1333 LearningRate 0.0230 Epoch: 10 Global Step: 173840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:54:57,216-Speed 5198.31 samples/sec Loss 2.1301 LearningRate 0.0230 Epoch: 10 Global Step: 173850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:54:59,201-Speed 5161.41 samples/sec Loss 2.1277 LearningRate 0.0230 Epoch: 10 Global Step: 173860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:55:01,192-Speed 5144.78 samples/sec Loss 2.1149 LearningRate 0.0230 Epoch: 10 Global Step: 173870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:55:03,189-Speed 5129.97 samples/sec Loss 2.1225 LearningRate 0.0230 Epoch: 10 Global Step: 173880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:55:05,198-Speed 5096.98 samples/sec Loss 2.1067 LearningRate 0.0230 Epoch: 10 Global Step: 173890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:55:07,184-Speed 5159.19 samples/sec Loss 2.1337 LearningRate 0.0229 Epoch: 10 Global Step: 173900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:55:09,147-Speed 5217.00 samples/sec Loss 2.1229 LearningRate 0.0229 Epoch: 10 Global Step: 173910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:55:11,147-Speed 5123.38 samples/sec Loss 2.1666 LearningRate 0.0229 Epoch: 10 Global Step: 173920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:55:13,129-Speed 5165.71 samples/sec Loss 2.1456 LearningRate 0.0229 Epoch: 10 Global Step: 173930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:55:15,102-Speed 5191.97 samples/sec Loss 2.1348 LearningRate 0.0229 Epoch: 10 Global Step: 173940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:55:17,074-Speed 5194.25 samples/sec Loss 2.1539 LearningRate 0.0229 Epoch: 10 Global Step: 173950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:55:19,053-Speed 5177.44 samples/sec Loss 2.1828 LearningRate 0.0229 Epoch: 10 Global Step: 173960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:55:21,032-Speed 5175.91 samples/sec Loss 2.0911 LearningRate 0.0229 Epoch: 10 Global Step: 173970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:55:23,014-Speed 5170.56 samples/sec Loss 2.1302 LearningRate 0.0229 Epoch: 10 Global Step: 173980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:55:24,986-Speed 5194.20 samples/sec Loss 2.0797 LearningRate 0.0229 Epoch: 10 Global Step: 173990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:55:26,958-Speed 5192.51 samples/sec Loss 2.1069 LearningRate 0.0229 Epoch: 10 Global Step: 174000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:55:53,515-[lfw][174000]XNorm: 22.766961 Training: 2022-04-11 10:55:53,515-[lfw][174000]Accuracy-Flip: 0.99783+-0.00289 Training: 2022-04-11 10:55:53,516-[lfw][174000]Accuracy-Highest: 0.99833 Training: 2022-04-11 10:56:24,290-[cfp_fp][174000]XNorm: 21.799399 Training: 2022-04-11 10:56:24,291-[cfp_fp][174000]Accuracy-Flip: 0.98529+-0.00676 Training: 2022-04-11 10:56:24,291-[cfp_fp][174000]Accuracy-Highest: 0.98571 Training: 2022-04-11 10:56:50,721-[agedb_30][174000]XNorm: 23.068397 Training: 2022-04-11 10:56:50,721-[agedb_30][174000]Accuracy-Flip: 0.98067+-0.00775 Training: 2022-04-11 10:56:50,722-[agedb_30][174000]Accuracy-Highest: 0.98167 Training: 2022-04-11 10:56:52,708-Speed 119.42 samples/sec Loss 2.1047 LearningRate 0.0229 Epoch: 10 Global Step: 174010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:56:54,664-Speed 5237.21 samples/sec Loss 2.1409 LearningRate 0.0229 Epoch: 10 Global Step: 174020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:56:56,634-Speed 5199.43 samples/sec Loss 2.1461 LearningRate 0.0229 Epoch: 10 Global Step: 174030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:56:58,601-Speed 5208.86 samples/sec Loss 2.1874 LearningRate 0.0229 Epoch: 10 Global Step: 174040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:57:00,567-Speed 5209.74 samples/sec Loss 2.1299 LearningRate 0.0229 Epoch: 10 Global Step: 174050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:57:02,547-Speed 5174.23 samples/sec Loss 2.0998 LearningRate 0.0229 Epoch: 10 Global Step: 174060 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:57:04,522-Speed 5186.21 samples/sec Loss 2.0893 LearningRate 0.0229 Epoch: 10 Global Step: 174070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:57:06,491-Speed 5202.39 samples/sec Loss 2.0663 LearningRate 0.0229 Epoch: 10 Global Step: 174080 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:57:08,462-Speed 5195.66 samples/sec Loss 2.1245 LearningRate 0.0229 Epoch: 10 Global Step: 174090 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:57:10,436-Speed 5191.15 samples/sec Loss 2.1424 LearningRate 0.0229 Epoch: 10 Global Step: 174100 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:57:12,418-Speed 5169.11 samples/sec Loss 2.1366 LearningRate 0.0229 Epoch: 10 Global Step: 174110 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:57:14,371-Speed 5243.38 samples/sec Loss 2.1690 LearningRate 0.0229 Epoch: 10 Global Step: 174120 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:57:16,334-Speed 5218.91 samples/sec Loss 2.1286 LearningRate 0.0229 Epoch: 10 Global Step: 174130 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:57:18,304-Speed 5199.99 samples/sec Loss 2.1841 LearningRate 0.0229 Epoch: 10 Global Step: 174140 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:57:20,269-Speed 5212.96 samples/sec Loss 2.1696 LearningRate 0.0229 Epoch: 10 Global Step: 174150 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:57:22,239-Speed 5200.26 samples/sec Loss 2.1598 LearningRate 0.0229 Epoch: 10 Global Step: 174160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:57:24,225-Speed 5156.72 samples/sec Loss 2.1326 LearningRate 0.0229 Epoch: 10 Global Step: 174170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:57:26,192-Speed 5208.81 samples/sec Loss 2.1550 LearningRate 0.0229 Epoch: 10 Global Step: 174180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:57:28,221-Speed 5048.82 samples/sec Loss 2.1039 LearningRate 0.0229 Epoch: 10 Global Step: 174190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:57:30,196-Speed 5187.13 samples/sec Loss 2.1259 LearningRate 0.0229 Epoch: 10 Global Step: 174200 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:57:32,165-Speed 5202.12 samples/sec Loss 2.1299 LearningRate 0.0229 Epoch: 10 Global Step: 174210 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:57:34,135-Speed 5198.99 samples/sec Loss 2.0838 LearningRate 0.0229 Epoch: 10 Global Step: 174220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:57:36,108-Speed 5192.86 samples/sec Loss 2.1330 LearningRate 0.0229 Epoch: 10 Global Step: 174230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:57:38,088-Speed 5173.30 samples/sec Loss 2.1238 LearningRate 0.0229 Epoch: 10 Global Step: 174240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:57:40,071-Speed 5164.31 samples/sec Loss 2.1919 LearningRate 0.0228 Epoch: 10 Global Step: 174250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:57:42,052-Speed 5172.78 samples/sec Loss 2.0985 LearningRate 0.0228 Epoch: 10 Global Step: 174260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:57:44,021-Speed 5201.44 samples/sec Loss 2.1624 LearningRate 0.0228 Epoch: 10 Global Step: 174270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:57:46,003-Speed 5169.24 samples/sec Loss 2.0637 LearningRate 0.0228 Epoch: 10 Global Step: 174280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:57:47,995-Speed 5141.15 samples/sec Loss 2.1276 LearningRate 0.0228 Epoch: 10 Global Step: 174290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:57:50,032-Speed 5028.51 samples/sec Loss 2.1271 LearningRate 0.0228 Epoch: 10 Global Step: 174300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:57:52,010-Speed 5178.46 samples/sec Loss 2.1409 LearningRate 0.0228 Epoch: 10 Global Step: 174310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:57:53,997-Speed 5156.03 samples/sec Loss 2.0810 LearningRate 0.0228 Epoch: 10 Global Step: 174320 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:57:55,957-Speed 5225.18 samples/sec Loss 2.1333 LearningRate 0.0228 Epoch: 10 Global Step: 174330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:57:57,926-Speed 5202.78 samples/sec Loss 2.1002 LearningRate 0.0228 Epoch: 10 Global Step: 174340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:57:59,912-Speed 5158.82 samples/sec Loss 2.1509 LearningRate 0.0228 Epoch: 10 Global Step: 174350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:58:01,884-Speed 5194.74 samples/sec Loss 2.1117 LearningRate 0.0228 Epoch: 10 Global Step: 174360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:58:03,855-Speed 5197.48 samples/sec Loss 2.1508 LearningRate 0.0228 Epoch: 10 Global Step: 174370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:58:05,836-Speed 5171.38 samples/sec Loss 2.0911 LearningRate 0.0228 Epoch: 10 Global Step: 174380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:58:07,804-Speed 5204.79 samples/sec Loss 2.1164 LearningRate 0.0228 Epoch: 10 Global Step: 174390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:58:09,788-Speed 5163.28 samples/sec Loss 2.1238 LearningRate 0.0228 Epoch: 10 Global Step: 174400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:58:11,784-Speed 5130.26 samples/sec Loss 2.1333 LearningRate 0.0228 Epoch: 10 Global Step: 174410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:58:13,778-Speed 5138.77 samples/sec Loss 2.1508 LearningRate 0.0228 Epoch: 10 Global Step: 174420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:58:15,763-Speed 5159.55 samples/sec Loss 2.2073 LearningRate 0.0228 Epoch: 10 Global Step: 174430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:58:17,726-Speed 5218.06 samples/sec Loss 2.2057 LearningRate 0.0228 Epoch: 10 Global Step: 174440 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:58:19,690-Speed 5217.17 samples/sec Loss 2.1294 LearningRate 0.0228 Epoch: 10 Global Step: 174450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:58:21,656-Speed 5210.27 samples/sec Loss 2.0619 LearningRate 0.0228 Epoch: 10 Global Step: 174460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:58:23,632-Speed 5182.56 samples/sec Loss 2.1783 LearningRate 0.0228 Epoch: 10 Global Step: 174470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:58:25,613-Speed 5171.42 samples/sec Loss 2.2070 LearningRate 0.0228 Epoch: 10 Global Step: 174480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:58:27,580-Speed 5208.70 samples/sec Loss 2.1781 LearningRate 0.0228 Epoch: 10 Global Step: 174490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:58:29,553-Speed 5191.38 samples/sec Loss 2.1523 LearningRate 0.0228 Epoch: 10 Global Step: 174500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:58:31,517-Speed 5214.44 samples/sec Loss 2.1177 LearningRate 0.0228 Epoch: 10 Global Step: 174510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:58:33,482-Speed 5213.52 samples/sec Loss 2.1590 LearningRate 0.0228 Epoch: 10 Global Step: 174520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:58:35,469-Speed 5153.59 samples/sec Loss 2.1955 LearningRate 0.0228 Epoch: 10 Global Step: 174530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:58:37,441-Speed 5196.56 samples/sec Loss 2.1681 LearningRate 0.0228 Epoch: 10 Global Step: 174540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:58:39,441-Speed 5121.72 samples/sec Loss 2.1083 LearningRate 0.0228 Epoch: 10 Global Step: 174550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:58:41,411-Speed 5198.93 samples/sec Loss 2.1970 LearningRate 0.0228 Epoch: 10 Global Step: 174560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:58:43,382-Speed 5198.46 samples/sec Loss 2.1511 LearningRate 0.0228 Epoch: 10 Global Step: 174570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:58:45,366-Speed 5161.90 samples/sec Loss 2.0993 LearningRate 0.0228 Epoch: 10 Global Step: 174580 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:58:47,329-Speed 5217.19 samples/sec Loss 2.0911 LearningRate 0.0228 Epoch: 10 Global Step: 174590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:58:49,326-Speed 5131.00 samples/sec Loss 2.1455 LearningRate 0.0227 Epoch: 10 Global Step: 174600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:58:51,296-Speed 5197.67 samples/sec Loss 2.1226 LearningRate 0.0227 Epoch: 10 Global Step: 174610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:58:53,262-Speed 5211.26 samples/sec Loss 2.1588 LearningRate 0.0227 Epoch: 10 Global Step: 174620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:58:55,234-Speed 5194.85 samples/sec Loss 2.1133 LearningRate 0.0227 Epoch: 10 Global Step: 174630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:58:57,238-Speed 5111.55 samples/sec Loss 2.1639 LearningRate 0.0227 Epoch: 10 Global Step: 174640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:58:59,228-Speed 5147.08 samples/sec Loss 2.1319 LearningRate 0.0227 Epoch: 10 Global Step: 174650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:59:01,237-Speed 5099.67 samples/sec Loss 2.1386 LearningRate 0.0227 Epoch: 10 Global Step: 174660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:59:03,217-Speed 5173.92 samples/sec Loss 2.1620 LearningRate 0.0227 Epoch: 10 Global Step: 174670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:59:05,198-Speed 5170.58 samples/sec Loss 2.1179 LearningRate 0.0227 Epoch: 10 Global Step: 174680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:59:07,170-Speed 5193.28 samples/sec Loss 2.1215 LearningRate 0.0227 Epoch: 10 Global Step: 174690 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:59:09,143-Speed 5192.68 samples/sec Loss 2.0754 LearningRate 0.0227 Epoch: 10 Global Step: 174700 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:59:11,112-Speed 5202.77 samples/sec Loss 2.0917 LearningRate 0.0227 Epoch: 10 Global Step: 174710 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:59:13,079-Speed 5206.02 samples/sec Loss 2.1306 LearningRate 0.0227 Epoch: 10 Global Step: 174720 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:59:15,053-Speed 5191.27 samples/sec Loss 2.1380 LearningRate 0.0227 Epoch: 10 Global Step: 174730 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:59:17,030-Speed 5180.03 samples/sec Loss 2.1838 LearningRate 0.0227 Epoch: 10 Global Step: 174740 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:59:18,996-Speed 5209.14 samples/sec Loss 2.1688 LearningRate 0.0227 Epoch: 10 Global Step: 174750 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:59:20,966-Speed 5201.40 samples/sec Loss 2.1277 LearningRate 0.0227 Epoch: 10 Global Step: 174760 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 10:59:22,964-Speed 5126.37 samples/sec Loss 2.1389 LearningRate 0.0227 Epoch: 10 Global Step: 174770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:59:24,975-Speed 5095.19 samples/sec Loss 2.1979 LearningRate 0.0227 Epoch: 10 Global Step: 174780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:59:26,951-Speed 5183.55 samples/sec Loss 2.1216 LearningRate 0.0227 Epoch: 10 Global Step: 174790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:59:28,932-Speed 5168.86 samples/sec Loss 2.0286 LearningRate 0.0227 Epoch: 10 Global Step: 174800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:59:30,896-Speed 5215.89 samples/sec Loss 2.1366 LearningRate 0.0227 Epoch: 10 Global Step: 174810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:59:32,862-Speed 5209.69 samples/sec Loss 2.1450 LearningRate 0.0227 Epoch: 10 Global Step: 174820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:59:34,834-Speed 5196.16 samples/sec Loss 2.1321 LearningRate 0.0227 Epoch: 10 Global Step: 174830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:59:36,822-Speed 5152.45 samples/sec Loss 2.1913 LearningRate 0.0227 Epoch: 10 Global Step: 174840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:59:38,821-Speed 5122.27 samples/sec Loss 2.1301 LearningRate 0.0227 Epoch: 10 Global Step: 174850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:59:40,798-Speed 5183.62 samples/sec Loss 2.2224 LearningRate 0.0227 Epoch: 10 Global Step: 174860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:59:42,776-Speed 5178.63 samples/sec Loss 2.1248 LearningRate 0.0227 Epoch: 10 Global Step: 174870 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 10:59:44,744-Speed 5204.92 samples/sec Loss 2.1729 LearningRate 0.0227 Epoch: 10 Global Step: 174880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:59:46,710-Speed 5210.29 samples/sec Loss 2.1629 LearningRate 0.0227 Epoch: 10 Global Step: 174890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:59:48,710-Speed 5121.82 samples/sec Loss 2.0963 LearningRate 0.0227 Epoch: 10 Global Step: 174900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:59:50,715-Speed 5107.00 samples/sec Loss 2.1236 LearningRate 0.0227 Epoch: 10 Global Step: 174910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:59:52,710-Speed 5134.51 samples/sec Loss 2.0590 LearningRate 0.0227 Epoch: 10 Global Step: 174920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:59:54,705-Speed 5137.08 samples/sec Loss 2.0591 LearningRate 0.0227 Epoch: 10 Global Step: 174930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:59:56,690-Speed 5159.74 samples/sec Loss 2.1109 LearningRate 0.0227 Epoch: 10 Global Step: 174940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 10:59:58,655-Speed 5213.17 samples/sec Loss 2.1832 LearningRate 0.0226 Epoch: 10 Global Step: 174950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:00:00,642-Speed 5154.80 samples/sec Loss 2.1237 LearningRate 0.0226 Epoch: 10 Global Step: 174960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:00:02,611-Speed 5203.44 samples/sec Loss 2.1292 LearningRate 0.0226 Epoch: 10 Global Step: 174970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:00:04,581-Speed 5199.26 samples/sec Loss 2.1293 LearningRate 0.0226 Epoch: 10 Global Step: 174980 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:00:06,548-Speed 5207.43 samples/sec Loss 2.1361 LearningRate 0.0226 Epoch: 10 Global Step: 174990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:00:08,520-Speed 5193.98 samples/sec Loss 2.1491 LearningRate 0.0226 Epoch: 10 Global Step: 175000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:00:10,498-Speed 5178.95 samples/sec Loss 2.1216 LearningRate 0.0226 Epoch: 10 Global Step: 175010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:00:12,495-Speed 5130.93 samples/sec Loss 2.2327 LearningRate 0.0226 Epoch: 10 Global Step: 175020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:00:14,467-Speed 5195.22 samples/sec Loss 2.1034 LearningRate 0.0226 Epoch: 10 Global Step: 175030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:00:16,441-Speed 5187.13 samples/sec Loss 2.1073 LearningRate 0.0226 Epoch: 10 Global Step: 175040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:00:18,406-Speed 5212.51 samples/sec Loss 2.1443 LearningRate 0.0226 Epoch: 10 Global Step: 175050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:00:20,372-Speed 5212.10 samples/sec Loss 2.1185 LearningRate 0.0226 Epoch: 10 Global Step: 175060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:00:22,349-Speed 5179.90 samples/sec Loss 2.1474 LearningRate 0.0226 Epoch: 10 Global Step: 175070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:00:24,353-Speed 5112.70 samples/sec Loss 2.2278 LearningRate 0.0226 Epoch: 10 Global Step: 175080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:00:26,339-Speed 5157.30 samples/sec Loss 2.1563 LearningRate 0.0226 Epoch: 10 Global Step: 175090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:00:28,311-Speed 5194.86 samples/sec Loss 2.1453 LearningRate 0.0226 Epoch: 10 Global Step: 175100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:00:30,280-Speed 5203.02 samples/sec Loss 2.1496 LearningRate 0.0226 Epoch: 10 Global Step: 175110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:00:32,246-Speed 5211.56 samples/sec Loss 2.1617 LearningRate 0.0226 Epoch: 10 Global Step: 175120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:00:34,216-Speed 5199.65 samples/sec Loss 2.0911 LearningRate 0.0226 Epoch: 10 Global Step: 175130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:00:36,193-Speed 5182.02 samples/sec Loss 2.1348 LearningRate 0.0226 Epoch: 10 Global Step: 175140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:00:38,174-Speed 5169.99 samples/sec Loss 2.1477 LearningRate 0.0226 Epoch: 10 Global Step: 175150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:00:40,140-Speed 5210.45 samples/sec Loss 2.1554 LearningRate 0.0226 Epoch: 10 Global Step: 175160 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:00:42,105-Speed 5212.15 samples/sec Loss 2.1243 LearningRate 0.0226 Epoch: 10 Global Step: 175170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:00:44,089-Speed 5164.68 samples/sec Loss 2.1862 LearningRate 0.0226 Epoch: 10 Global Step: 175180 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:00:46,063-Speed 5188.89 samples/sec Loss 2.1957 LearningRate 0.0226 Epoch: 10 Global Step: 175190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:00:48,042-Speed 5175.90 samples/sec Loss 2.0812 LearningRate 0.0226 Epoch: 10 Global Step: 175200 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:00:50,013-Speed 5196.30 samples/sec Loss 2.1404 LearningRate 0.0226 Epoch: 10 Global Step: 175210 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:00:52,005-Speed 5144.30 samples/sec Loss 2.1699 LearningRate 0.0226 Epoch: 10 Global Step: 175220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:00:53,975-Speed 5201.73 samples/sec Loss 2.1452 LearningRate 0.0226 Epoch: 10 Global Step: 175230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:00:55,952-Speed 5180.94 samples/sec Loss 2.1477 LearningRate 0.0226 Epoch: 10 Global Step: 175240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:00:57,937-Speed 5160.84 samples/sec Loss 2.0880 LearningRate 0.0226 Epoch: 10 Global Step: 175250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:00:59,927-Speed 5146.03 samples/sec Loss 2.2058 LearningRate 0.0226 Epoch: 10 Global Step: 175260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:01:01,917-Speed 5149.58 samples/sec Loss 2.1569 LearningRate 0.0226 Epoch: 10 Global Step: 175270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:01:03,899-Speed 5169.40 samples/sec Loss 2.0985 LearningRate 0.0226 Epoch: 10 Global Step: 175280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:01:05,869-Speed 5197.56 samples/sec Loss 2.1174 LearningRate 0.0226 Epoch: 10 Global Step: 175290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:01:07,837-Speed 5206.89 samples/sec Loss 2.1858 LearningRate 0.0225 Epoch: 10 Global Step: 175300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:01:09,805-Speed 5205.12 samples/sec Loss 2.1035 LearningRate 0.0225 Epoch: 10 Global Step: 175310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:01:11,783-Speed 5177.73 samples/sec Loss 2.0887 LearningRate 0.0225 Epoch: 10 Global Step: 175320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:01:13,760-Speed 5180.56 samples/sec Loss 2.1055 LearningRate 0.0225 Epoch: 10 Global Step: 175330 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:01:15,742-Speed 5169.03 samples/sec Loss 2.1372 LearningRate 0.0225 Epoch: 10 Global Step: 175340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:01:17,745-Speed 5113.34 samples/sec Loss 2.1447 LearningRate 0.0225 Epoch: 10 Global Step: 175350 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:01:19,723-Speed 5178.14 samples/sec Loss 2.1147 LearningRate 0.0225 Epoch: 10 Global Step: 175360 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:01:21,710-Speed 5154.90 samples/sec Loss 2.0888 LearningRate 0.0225 Epoch: 10 Global Step: 175370 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:01:23,704-Speed 5139.13 samples/sec Loss 2.1147 LearningRate 0.0225 Epoch: 10 Global Step: 175380 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:01:25,688-Speed 5163.45 samples/sec Loss 2.1341 LearningRate 0.0225 Epoch: 10 Global Step: 175390 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:01:27,701-Speed 5088.32 samples/sec Loss 2.2002 LearningRate 0.0225 Epoch: 10 Global Step: 175400 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:01:29,680-Speed 5176.49 samples/sec Loss 2.1444 LearningRate 0.0225 Epoch: 10 Global Step: 175410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:01:31,663-Speed 5164.14 samples/sec Loss 2.1496 LearningRate 0.0225 Epoch: 10 Global Step: 175420 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:01:33,624-Speed 5224.97 samples/sec Loss 2.1832 LearningRate 0.0225 Epoch: 10 Global Step: 175430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:01:35,594-Speed 5198.69 samples/sec Loss 2.1261 LearningRate 0.0225 Epoch: 10 Global Step: 175440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:01:37,563-Speed 5203.81 samples/sec Loss 2.2008 LearningRate 0.0225 Epoch: 10 Global Step: 175450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:01:39,547-Speed 5161.95 samples/sec Loss 2.0702 LearningRate 0.0225 Epoch: 10 Global Step: 175460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:01:41,534-Speed 5155.52 samples/sec Loss 2.1276 LearningRate 0.0225 Epoch: 10 Global Step: 175470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:01:43,503-Speed 5202.67 samples/sec Loss 2.1795 LearningRate 0.0225 Epoch: 10 Global Step: 175480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:01:45,470-Speed 5207.96 samples/sec Loss 2.1106 LearningRate 0.0225 Epoch: 10 Global Step: 175490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:01:47,453-Speed 5166.18 samples/sec Loss 2.2781 LearningRate 0.0225 Epoch: 10 Global Step: 175500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:01:49,458-Speed 5109.27 samples/sec Loss 2.1280 LearningRate 0.0225 Epoch: 10 Global Step: 175510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:01:51,447-Speed 5149.47 samples/sec Loss 2.1415 LearningRate 0.0225 Epoch: 10 Global Step: 175520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:01:53,418-Speed 5195.16 samples/sec Loss 2.1430 LearningRate 0.0225 Epoch: 10 Global Step: 175530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:01:55,395-Speed 5181.06 samples/sec Loss 2.1074 LearningRate 0.0225 Epoch: 10 Global Step: 175540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:01:57,370-Speed 5188.96 samples/sec Loss 2.0561 LearningRate 0.0225 Epoch: 10 Global Step: 175550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:01:59,366-Speed 5132.51 samples/sec Loss 2.1288 LearningRate 0.0225 Epoch: 10 Global Step: 175560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:02:01,338-Speed 5192.92 samples/sec Loss 2.1869 LearningRate 0.0225 Epoch: 10 Global Step: 175570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:02:03,314-Speed 5183.86 samples/sec Loss 2.1916 LearningRate 0.0225 Epoch: 10 Global Step: 175580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:02:05,309-Speed 5136.22 samples/sec Loss 2.1678 LearningRate 0.0225 Epoch: 10 Global Step: 175590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:02:07,286-Speed 5181.55 samples/sec Loss 2.1650 LearningRate 0.0225 Epoch: 10 Global Step: 175600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:02:09,260-Speed 5187.06 samples/sec Loss 2.1104 LearningRate 0.0225 Epoch: 10 Global Step: 175610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:02:11,237-Speed 5183.03 samples/sec Loss 2.0756 LearningRate 0.0225 Epoch: 10 Global Step: 175620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:02:13,221-Speed 5162.14 samples/sec Loss 2.1572 LearningRate 0.0225 Epoch: 10 Global Step: 175630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:02:15,218-Speed 5129.61 samples/sec Loss 2.1327 LearningRate 0.0225 Epoch: 10 Global Step: 175640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:02:17,199-Speed 5172.25 samples/sec Loss 2.1769 LearningRate 0.0225 Epoch: 10 Global Step: 175650 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:02:19,176-Speed 5179.62 samples/sec Loss 2.1108 LearningRate 0.0224 Epoch: 10 Global Step: 175660 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:02:21,154-Speed 5181.00 samples/sec Loss 2.1731 LearningRate 0.0224 Epoch: 10 Global Step: 175670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:02:23,127-Speed 5191.29 samples/sec Loss 2.1663 LearningRate 0.0224 Epoch: 10 Global Step: 175680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:02:25,106-Speed 5176.50 samples/sec Loss 2.1735 LearningRate 0.0224 Epoch: 10 Global Step: 175690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:02:27,125-Speed 5073.63 samples/sec Loss 2.1333 LearningRate 0.0224 Epoch: 10 Global Step: 175700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:02:29,141-Speed 5081.99 samples/sec Loss 2.1304 LearningRate 0.0224 Epoch: 10 Global Step: 175710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:02:31,111-Speed 5197.64 samples/sec Loss 2.1975 LearningRate 0.0224 Epoch: 10 Global Step: 175720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:02:33,086-Speed 5186.46 samples/sec Loss 2.1637 LearningRate 0.0224 Epoch: 10 Global Step: 175730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:02:35,055-Speed 5203.41 samples/sec Loss 2.1344 LearningRate 0.0224 Epoch: 10 Global Step: 175740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:02:37,040-Speed 5160.35 samples/sec Loss 2.1345 LearningRate 0.0224 Epoch: 10 Global Step: 175750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:02:39,029-Speed 5149.82 samples/sec Loss 2.1092 LearningRate 0.0224 Epoch: 10 Global Step: 175760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:02:41,001-Speed 5194.66 samples/sec Loss 2.1538 LearningRate 0.0224 Epoch: 10 Global Step: 175770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:02:42,980-Speed 5175.54 samples/sec Loss 2.2527 LearningRate 0.0224 Epoch: 10 Global Step: 175780 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:02:44,969-Speed 5150.02 samples/sec Loss 2.1558 LearningRate 0.0224 Epoch: 10 Global Step: 175790 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:02:46,974-Speed 5108.87 samples/sec Loss 2.1790 LearningRate 0.0224 Epoch: 10 Global Step: 175800 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:02:48,998-Speed 5060.36 samples/sec Loss 2.0695 LearningRate 0.0224 Epoch: 10 Global Step: 175810 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:02:50,966-Speed 5206.06 samples/sec Loss 2.1689 LearningRate 0.0224 Epoch: 10 Global Step: 175820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:02:52,946-Speed 5172.66 samples/sec Loss 2.1660 LearningRate 0.0224 Epoch: 10 Global Step: 175830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:02:54,915-Speed 5202.04 samples/sec Loss 2.1380 LearningRate 0.0224 Epoch: 10 Global Step: 175840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:02:56,898-Speed 5165.39 samples/sec Loss 2.1674 LearningRate 0.0224 Epoch: 10 Global Step: 175850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:02:58,872-Speed 5190.15 samples/sec Loss 2.1698 LearningRate 0.0224 Epoch: 10 Global Step: 175860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:03:00,853-Speed 5170.51 samples/sec Loss 2.1870 LearningRate 0.0224 Epoch: 10 Global Step: 175870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:03:02,836-Speed 5165.06 samples/sec Loss 2.1724 LearningRate 0.0224 Epoch: 10 Global Step: 175880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:03:04,838-Speed 5117.90 samples/sec Loss 2.1440 LearningRate 0.0224 Epoch: 10 Global Step: 175890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:03:06,821-Speed 5166.13 samples/sec Loss 2.2081 LearningRate 0.0224 Epoch: 10 Global Step: 175900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:03:08,805-Speed 5163.04 samples/sec Loss 2.1561 LearningRate 0.0224 Epoch: 10 Global Step: 175910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:03:10,777-Speed 5195.18 samples/sec Loss 2.1281 LearningRate 0.0224 Epoch: 10 Global Step: 175920 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:03:12,742-Speed 5211.41 samples/sec Loss 2.2091 LearningRate 0.0224 Epoch: 10 Global Step: 175930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:03:14,715-Speed 5192.16 samples/sec Loss 2.1031 LearningRate 0.0224 Epoch: 10 Global Step: 175940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:03:16,697-Speed 5167.26 samples/sec Loss 2.1347 LearningRate 0.0224 Epoch: 10 Global Step: 175950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:03:18,672-Speed 5187.42 samples/sec Loss 2.1364 LearningRate 0.0224 Epoch: 10 Global Step: 175960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:03:20,656-Speed 5163.85 samples/sec Loss 2.1179 LearningRate 0.0224 Epoch: 10 Global Step: 175970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:03:22,623-Speed 5206.93 samples/sec Loss 2.1231 LearningRate 0.0224 Epoch: 10 Global Step: 175980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:03:24,614-Speed 5143.72 samples/sec Loss 2.1413 LearningRate 0.0224 Epoch: 10 Global Step: 175990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:03:26,591-Speed 5183.11 samples/sec Loss 2.2254 LearningRate 0.0224 Epoch: 10 Global Step: 176000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:03:53,301-[lfw][176000]XNorm: 24.409161 Training: 2022-04-11 11:03:53,302-[lfw][176000]Accuracy-Flip: 0.99800+-0.00267 Training: 2022-04-11 11:03:53,302-[lfw][176000]Accuracy-Highest: 0.99833 Training: 2022-04-11 11:04:24,098-[cfp_fp][176000]XNorm: 22.737753 Training: 2022-04-11 11:04:24,099-[cfp_fp][176000]Accuracy-Flip: 0.98371+-0.00531 Training: 2022-04-11 11:04:24,099-[cfp_fp][176000]Accuracy-Highest: 0.98571 Training: 2022-04-11 11:04:50,546-[agedb_30][176000]XNorm: 24.105911 Training: 2022-04-11 11:04:50,547-[agedb_30][176000]Accuracy-Flip: 0.97933+-0.00929 Training: 2022-04-11 11:04:50,547-[agedb_30][176000]Accuracy-Highest: 0.98167 Training: 2022-04-11 11:04:52,537-Speed 119.15 samples/sec Loss 2.1497 LearningRate 0.0223 Epoch: 10 Global Step: 176010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:04:54,519-Speed 5167.25 samples/sec Loss 2.1852 LearningRate 0.0223 Epoch: 10 Global Step: 176020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:04:56,476-Speed 5235.15 samples/sec Loss 2.1691 LearningRate 0.0223 Epoch: 10 Global Step: 176030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:04:58,437-Speed 5221.84 samples/sec Loss 2.1392 LearningRate 0.0223 Epoch: 10 Global Step: 176040 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:05:00,416-Speed 5177.00 samples/sec Loss 2.2113 LearningRate 0.0223 Epoch: 10 Global Step: 176050 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:05:02,372-Speed 5238.07 samples/sec Loss 2.1494 LearningRate 0.0223 Epoch: 10 Global Step: 176060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:05:04,346-Speed 5187.98 samples/sec Loss 2.1313 LearningRate 0.0223 Epoch: 10 Global Step: 176070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:05:06,307-Speed 5223.91 samples/sec Loss 2.1459 LearningRate 0.0223 Epoch: 10 Global Step: 176080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:05:08,266-Speed 5228.97 samples/sec Loss 2.2081 LearningRate 0.0223 Epoch: 10 Global Step: 176090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:05:10,227-Speed 5222.99 samples/sec Loss 2.1517 LearningRate 0.0223 Epoch: 10 Global Step: 176100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:05:12,185-Speed 5232.32 samples/sec Loss 2.2342 LearningRate 0.0223 Epoch: 10 Global Step: 176110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:05:14,164-Speed 5174.96 samples/sec Loss 2.1311 LearningRate 0.0223 Epoch: 10 Global Step: 176120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:05:16,156-Speed 5143.43 samples/sec Loss 2.1383 LearningRate 0.0223 Epoch: 10 Global Step: 176130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:05:18,118-Speed 5221.20 samples/sec Loss 2.0965 LearningRate 0.0223 Epoch: 10 Global Step: 176140 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 11:05:20,092-Speed 5190.20 samples/sec Loss 2.1871 LearningRate 0.0223 Epoch: 10 Global Step: 176150 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 11:05:22,055-Speed 5217.76 samples/sec Loss 2.1889 LearningRate 0.0223 Epoch: 10 Global Step: 176160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 11:05:24,035-Speed 5173.03 samples/sec Loss 2.0788 LearningRate 0.0223 Epoch: 10 Global Step: 176170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 11:05:26,004-Speed 5201.23 samples/sec Loss 2.1570 LearningRate 0.0223 Epoch: 10 Global Step: 176180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 11:05:27,985-Speed 5170.13 samples/sec Loss 2.2349 LearningRate 0.0223 Epoch: 10 Global Step: 176190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 11:05:29,977-Speed 5142.98 samples/sec Loss 2.1184 LearningRate 0.0223 Epoch: 10 Global Step: 176200 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 11:05:31,944-Speed 5208.40 samples/sec Loss 2.2207 LearningRate 0.0223 Epoch: 10 Global Step: 176210 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 11:05:33,905-Speed 5221.90 samples/sec Loss 2.1839 LearningRate 0.0223 Epoch: 10 Global Step: 176220 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 11:05:35,882-Speed 5183.81 samples/sec Loss 2.1954 LearningRate 0.0223 Epoch: 10 Global Step: 176230 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 11:05:37,846-Speed 5214.93 samples/sec Loss 2.1232 LearningRate 0.0223 Epoch: 10 Global Step: 176240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:05:39,814-Speed 5204.17 samples/sec Loss 2.1991 LearningRate 0.0223 Epoch: 10 Global Step: 176250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:05:41,783-Speed 5203.03 samples/sec Loss 2.0869 LearningRate 0.0223 Epoch: 10 Global Step: 176260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:05:43,754-Speed 5196.86 samples/sec Loss 2.1366 LearningRate 0.0223 Epoch: 10 Global Step: 176270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:05:45,730-Speed 5185.60 samples/sec Loss 2.1401 LearningRate 0.0223 Epoch: 10 Global Step: 176280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:05:47,742-Speed 5089.72 samples/sec Loss 2.1191 LearningRate 0.0223 Epoch: 10 Global Step: 176290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:05:49,715-Speed 5190.96 samples/sec Loss 2.1901 LearningRate 0.0223 Epoch: 10 Global Step: 176300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:05:51,700-Speed 5161.47 samples/sec Loss 2.1146 LearningRate 0.0223 Epoch: 10 Global Step: 176310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:05:53,677-Speed 5180.98 samples/sec Loss 2.1524 LearningRate 0.0223 Epoch: 10 Global Step: 176320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:05:55,647-Speed 5199.85 samples/sec Loss 2.1366 LearningRate 0.0223 Epoch: 10 Global Step: 176330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:05:57,614-Speed 5208.26 samples/sec Loss 2.1439 LearningRate 0.0223 Epoch: 10 Global Step: 176340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:05:59,579-Speed 5214.12 samples/sec Loss 2.1533 LearningRate 0.0223 Epoch: 10 Global Step: 176350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:06:01,565-Speed 5157.48 samples/sec Loss 2.1432 LearningRate 0.0222 Epoch: 10 Global Step: 176360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:06:03,533-Speed 5204.64 samples/sec Loss 2.1278 LearningRate 0.0222 Epoch: 10 Global Step: 176370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:06:05,500-Speed 5208.62 samples/sec Loss 2.1207 LearningRate 0.0222 Epoch: 10 Global Step: 176380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:06:07,480-Speed 5174.14 samples/sec Loss 2.1892 LearningRate 0.0222 Epoch: 10 Global Step: 176390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:06:09,444-Speed 5214.53 samples/sec Loss 2.1773 LearningRate 0.0222 Epoch: 10 Global Step: 176400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:06:11,417-Speed 5191.47 samples/sec Loss 2.1046 LearningRate 0.0222 Epoch: 10 Global Step: 176410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:06:13,400-Speed 5165.46 samples/sec Loss 2.1689 LearningRate 0.0222 Epoch: 10 Global Step: 176420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:06:15,405-Speed 5109.34 samples/sec Loss 2.1500 LearningRate 0.0222 Epoch: 10 Global Step: 176430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:06:17,387-Speed 5169.27 samples/sec Loss 2.1449 LearningRate 0.0222 Epoch: 10 Global Step: 176440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:06:19,360-Speed 5192.70 samples/sec Loss 2.1671 LearningRate 0.0222 Epoch: 10 Global Step: 176450 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:06:21,325-Speed 5211.92 samples/sec Loss 2.1190 LearningRate 0.0222 Epoch: 10 Global Step: 176460 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:06:23,292-Speed 5208.10 samples/sec Loss 2.1195 LearningRate 0.0222 Epoch: 10 Global Step: 176470 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:06:25,260-Speed 5205.12 samples/sec Loss 2.1579 LearningRate 0.0222 Epoch: 10 Global Step: 176480 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:06:27,222-Speed 5220.91 samples/sec Loss 2.2412 LearningRate 0.0222 Epoch: 10 Global Step: 176490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:06:29,189-Speed 5207.42 samples/sec Loss 2.0960 LearningRate 0.0222 Epoch: 10 Global Step: 176500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:06:31,149-Speed 5225.87 samples/sec Loss 2.1470 LearningRate 0.0222 Epoch: 10 Global Step: 176510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:06:33,115-Speed 5211.83 samples/sec Loss 2.2142 LearningRate 0.0222 Epoch: 10 Global Step: 176520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:06:35,089-Speed 5187.19 samples/sec Loss 2.1560 LearningRate 0.0222 Epoch: 10 Global Step: 176530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:06:37,105-Speed 5082.68 samples/sec Loss 2.1739 LearningRate 0.0222 Epoch: 10 Global Step: 176540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:06:39,084-Speed 5176.99 samples/sec Loss 2.1604 LearningRate 0.0222 Epoch: 10 Global Step: 176550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:06:41,055-Speed 5197.56 samples/sec Loss 2.1787 LearningRate 0.0222 Epoch: 10 Global Step: 176560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:06:43,016-Speed 5222.99 samples/sec Loss 2.1557 LearningRate 0.0222 Epoch: 10 Global Step: 176570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:06:45,001-Speed 5160.86 samples/sec Loss 2.1751 LearningRate 0.0222 Epoch: 10 Global Step: 176580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:06:46,990-Speed 5147.53 samples/sec Loss 2.1144 LearningRate 0.0222 Epoch: 10 Global Step: 176590 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:06:48,974-Speed 5163.07 samples/sec Loss 2.1541 LearningRate 0.0222 Epoch: 10 Global Step: 176600 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:06:50,947-Speed 5193.43 samples/sec Loss 2.1113 LearningRate 0.0222 Epoch: 10 Global Step: 176610 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:06:52,945-Speed 5127.26 samples/sec Loss 2.2035 LearningRate 0.0222 Epoch: 10 Global Step: 176620 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:06:54,912-Speed 5206.71 samples/sec Loss 2.1343 LearningRate 0.0222 Epoch: 10 Global Step: 176630 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:06:56,896-Speed 5162.51 samples/sec Loss 2.1900 LearningRate 0.0222 Epoch: 10 Global Step: 176640 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:06:58,853-Speed 5234.69 samples/sec Loss 2.1053 LearningRate 0.0222 Epoch: 10 Global Step: 176650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:07:00,839-Speed 5157.03 samples/sec Loss 2.1696 LearningRate 0.0222 Epoch: 10 Global Step: 176660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:07:02,820-Speed 5171.32 samples/sec Loss 2.1162 LearningRate 0.0222 Epoch: 10 Global Step: 176670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:07:04,837-Speed 5079.51 samples/sec Loss 2.1514 LearningRate 0.0222 Epoch: 10 Global Step: 176680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:07:06,814-Speed 5181.30 samples/sec Loss 2.1104 LearningRate 0.0222 Epoch: 10 Global Step: 176690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:07:08,786-Speed 5194.78 samples/sec Loss 2.1305 LearningRate 0.0222 Epoch: 10 Global Step: 176700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:07:10,750-Speed 5215.36 samples/sec Loss 2.1054 LearningRate 0.0222 Epoch: 10 Global Step: 176710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:07:12,718-Speed 5204.71 samples/sec Loss 2.1519 LearningRate 0.0221 Epoch: 10 Global Step: 176720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:07:14,719-Speed 5117.91 samples/sec Loss 2.2106 LearningRate 0.0221 Epoch: 10 Global Step: 176730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:07:16,694-Speed 5186.61 samples/sec Loss 2.1261 LearningRate 0.0221 Epoch: 10 Global Step: 176740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:07:18,660-Speed 5210.91 samples/sec Loss 2.1540 LearningRate 0.0221 Epoch: 10 Global Step: 176750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:07:20,641-Speed 5172.93 samples/sec Loss 2.1823 LearningRate 0.0221 Epoch: 10 Global Step: 176760 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:07:22,617-Speed 5183.69 samples/sec Loss 2.1755 LearningRate 0.0221 Epoch: 10 Global Step: 176770 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:07:24,574-Speed 5233.78 samples/sec Loss 2.2180 LearningRate 0.0221 Epoch: 10 Global Step: 176780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:07:26,551-Speed 5180.37 samples/sec Loss 2.2043 LearningRate 0.0221 Epoch: 10 Global Step: 176790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:07:28,513-Speed 5219.90 samples/sec Loss 2.1475 LearningRate 0.0221 Epoch: 10 Global Step: 176800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:07:30,512-Speed 5125.10 samples/sec Loss 2.1241 LearningRate 0.0221 Epoch: 10 Global Step: 176810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:07:32,473-Speed 5223.48 samples/sec Loss 2.1554 LearningRate 0.0221 Epoch: 10 Global Step: 176820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:07:34,445-Speed 5195.57 samples/sec Loss 2.1964 LearningRate 0.0221 Epoch: 10 Global Step: 176830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:07:36,422-Speed 5182.22 samples/sec Loss 2.1235 LearningRate 0.0221 Epoch: 10 Global Step: 176840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:07:38,395-Speed 5192.46 samples/sec Loss 2.1796 LearningRate 0.0221 Epoch: 10 Global Step: 176850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:07:40,382-Speed 5154.77 samples/sec Loss 2.1738 LearningRate 0.0221 Epoch: 10 Global Step: 176860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:07:42,356-Speed 5189.50 samples/sec Loss 2.1978 LearningRate 0.0221 Epoch: 10 Global Step: 176870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:07:44,320-Speed 5216.04 samples/sec Loss 2.1503 LearningRate 0.0221 Epoch: 10 Global Step: 176880 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:07:46,279-Speed 5228.63 samples/sec Loss 2.1133 LearningRate 0.0221 Epoch: 10 Global Step: 176890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:07:48,256-Speed 5179.95 samples/sec Loss 2.1545 LearningRate 0.0221 Epoch: 10 Global Step: 176900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:07:50,265-Speed 5101.43 samples/sec Loss 2.2039 LearningRate 0.0221 Epoch: 10 Global Step: 176910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:07:52,254-Speed 5149.40 samples/sec Loss 2.0789 LearningRate 0.0221 Epoch: 10 Global Step: 176920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:07:54,228-Speed 5189.35 samples/sec Loss 2.1949 LearningRate 0.0221 Epoch: 10 Global Step: 176930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:07:56,199-Speed 5194.48 samples/sec Loss 2.1420 LearningRate 0.0221 Epoch: 10 Global Step: 176940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:07:58,172-Speed 5194.27 samples/sec Loss 2.1940 LearningRate 0.0221 Epoch: 10 Global Step: 176950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:08:00,135-Speed 5219.09 samples/sec Loss 2.1944 LearningRate 0.0221 Epoch: 10 Global Step: 176960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:08:02,129-Speed 5136.97 samples/sec Loss 2.1583 LearningRate 0.0221 Epoch: 10 Global Step: 176970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:08:04,108-Speed 5175.63 samples/sec Loss 2.1740 LearningRate 0.0221 Epoch: 10 Global Step: 176980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:08:06,075-Speed 5206.66 samples/sec Loss 2.1357 LearningRate 0.0221 Epoch: 10 Global Step: 176990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:08:08,036-Speed 5223.57 samples/sec Loss 2.1423 LearningRate 0.0221 Epoch: 10 Global Step: 177000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:08:10,009-Speed 5193.11 samples/sec Loss 2.1756 LearningRate 0.0221 Epoch: 10 Global Step: 177010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:08:11,984-Speed 5187.12 samples/sec Loss 2.2223 LearningRate 0.0221 Epoch: 10 Global Step: 177020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:08:13,978-Speed 5135.09 samples/sec Loss 2.1906 LearningRate 0.0221 Epoch: 10 Global Step: 177030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:08:15,940-Speed 5221.12 samples/sec Loss 2.1242 LearningRate 0.0221 Epoch: 10 Global Step: 177040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:08:17,923-Speed 5164.50 samples/sec Loss 2.1600 LearningRate 0.0221 Epoch: 10 Global Step: 177050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:08:19,885-Speed 5223.54 samples/sec Loss 2.1562 LearningRate 0.0221 Epoch: 10 Global Step: 177060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:08:21,855-Speed 5199.13 samples/sec Loss 2.1438 LearningRate 0.0220 Epoch: 10 Global Step: 177070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:08:23,852-Speed 5131.07 samples/sec Loss 2.1722 LearningRate 0.0220 Epoch: 10 Global Step: 177080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:08:25,820-Speed 5202.80 samples/sec Loss 2.1285 LearningRate 0.0220 Epoch: 10 Global Step: 177090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:08:27,789-Speed 5202.92 samples/sec Loss 2.2268 LearningRate 0.0220 Epoch: 10 Global Step: 177100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:08:29,751-Speed 5221.76 samples/sec Loss 2.1293 LearningRate 0.0220 Epoch: 10 Global Step: 177110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:08:31,729-Speed 5178.41 samples/sec Loss 2.2089 LearningRate 0.0220 Epoch: 10 Global Step: 177120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:08:33,692-Speed 5217.98 samples/sec Loss 2.1287 LearningRate 0.0220 Epoch: 10 Global Step: 177130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:08:35,678-Speed 5157.49 samples/sec Loss 2.1941 LearningRate 0.0220 Epoch: 10 Global Step: 177140 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:08:37,664-Speed 5157.22 samples/sec Loss 2.1624 LearningRate 0.0220 Epoch: 10 Global Step: 177150 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:08:39,673-Speed 5100.53 samples/sec Loss 2.1648 LearningRate 0.0220 Epoch: 10 Global Step: 177160 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-11 11:08:41,646-Speed 5191.07 samples/sec Loss 2.1481 LearningRate 0.0220 Epoch: 10 Global Step: 177170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:08:43,616-Speed 5199.11 samples/sec Loss 2.2321 LearningRate 0.0220 Epoch: 10 Global Step: 177180 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:08:45,592-Speed 5185.05 samples/sec Loss 2.1545 LearningRate 0.0220 Epoch: 10 Global Step: 177190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:08:47,570-Speed 5178.91 samples/sec Loss 2.1341 LearningRate 0.0220 Epoch: 10 Global Step: 177200 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 11:08:49,550-Speed 5172.70 samples/sec Loss 2.2061 LearningRate 0.0220 Epoch: 10 Global Step: 177210 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 11:08:51,522-Speed 5195.50 samples/sec Loss 2.1753 LearningRate 0.0220 Epoch: 10 Global Step: 177220 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 11:08:53,502-Speed 5172.80 samples/sec Loss 2.1331 LearningRate 0.0220 Epoch: 10 Global Step: 177230 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 11:08:55,466-Speed 5214.99 samples/sec Loss 2.1447 LearningRate 0.0220 Epoch: 10 Global Step: 177240 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 11:08:57,450-Speed 5163.47 samples/sec Loss 2.0981 LearningRate 0.0220 Epoch: 10 Global Step: 177250 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 11:08:59,432-Speed 5167.94 samples/sec Loss 2.1386 LearningRate 0.0220 Epoch: 10 Global Step: 177260 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 11:09:01,400-Speed 5205.05 samples/sec Loss 2.1556 LearningRate 0.0220 Epoch: 10 Global Step: 177270 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 11:09:03,367-Speed 5207.96 samples/sec Loss 2.1152 LearningRate 0.0220 Epoch: 10 Global Step: 177280 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 11:09:05,349-Speed 5170.80 samples/sec Loss 2.1084 LearningRate 0.0220 Epoch: 10 Global Step: 177290 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-11 11:09:07,312-Speed 5216.51 samples/sec Loss 2.1838 LearningRate 0.0220 Epoch: 10 Global Step: 177300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:09:09,282-Speed 5200.32 samples/sec Loss 2.1423 LearningRate 0.0220 Epoch: 10 Global Step: 177310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:09:11,257-Speed 5186.38 samples/sec Loss 2.1409 LearningRate 0.0220 Epoch: 10 Global Step: 177320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:09:13,235-Speed 5181.92 samples/sec Loss 2.1921 LearningRate 0.0220 Epoch: 10 Global Step: 177330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:09:15,204-Speed 5202.32 samples/sec Loss 2.1983 LearningRate 0.0220 Epoch: 10 Global Step: 177340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:09:17,193-Speed 5149.62 samples/sec Loss 2.1958 LearningRate 0.0220 Epoch: 10 Global Step: 177350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:09:19,166-Speed 5192.68 samples/sec Loss 2.1758 LearningRate 0.0220 Epoch: 10 Global Step: 177360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-11 11:09:21,136-Speed 5199.89 samples/sec Loss 2.1936 LearningRate 0.0220 Epoch: 10 Global Step: 177370 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:09:23,118-Speed 5167.97 samples/sec Loss 2.1404 LearningRate 0.0220 Epoch: 10 Global Step: 177380 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:09:25,087-Speed 5201.46 samples/sec Loss 2.1789 LearningRate 0.0220 Epoch: 10 Global Step: 177390 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:09:27,058-Speed 5196.81 samples/sec Loss 2.1922 LearningRate 0.0220 Epoch: 10 Global Step: 177400 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:09:29,034-Speed 5185.80 samples/sec Loss 2.2004 LearningRate 0.0220 Epoch: 10 Global Step: 177410 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:09:31,006-Speed 5194.34 samples/sec Loss 2.1943 LearningRate 0.0220 Epoch: 10 Global Step: 177420 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:09:32,973-Speed 5207.28 samples/sec Loss 2.1726 LearningRate 0.0219 Epoch: 10 Global Step: 177430 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:09:34,956-Speed 5164.45 samples/sec Loss 2.1446 LearningRate 0.0219 Epoch: 10 Global Step: 177440 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:09:36,939-Speed 5165.41 samples/sec Loss 2.2250 LearningRate 0.0219 Epoch: 10 Global Step: 177450 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:09:38,906-Speed 5207.32 samples/sec Loss 2.1859 LearningRate 0.0219 Epoch: 10 Global Step: 177460 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:09:40,883-Speed 5182.44 samples/sec Loss 2.1959 LearningRate 0.0219 Epoch: 10 Global Step: 177470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:09:42,870-Speed 5155.90 samples/sec Loss 2.1322 LearningRate 0.0219 Epoch: 10 Global Step: 177480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:09:44,845-Speed 5187.18 samples/sec Loss 2.1684 LearningRate 0.0219 Epoch: 10 Global Step: 177490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:09:46,836-Speed 5144.47 samples/sec Loss 2.1583 LearningRate 0.0219 Epoch: 10 Global Step: 177500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:09:48,820-Speed 5160.83 samples/sec Loss 2.1617 LearningRate 0.0219 Epoch: 10 Global Step: 177510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:09:50,826-Speed 5107.64 samples/sec Loss 2.2403 LearningRate 0.0219 Epoch: 10 Global Step: 177520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:09:52,807-Speed 5169.60 samples/sec Loss 2.1670 LearningRate 0.0219 Epoch: 10 Global Step: 177530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:09:54,778-Speed 5198.70 samples/sec Loss 2.1596 LearningRate 0.0219 Epoch: 10 Global Step: 177540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:09:56,753-Speed 5186.22 samples/sec Loss 2.1940 LearningRate 0.0219 Epoch: 10 Global Step: 177550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:09:58,730-Speed 5180.96 samples/sec Loss 2.1763 LearningRate 0.0219 Epoch: 10 Global Step: 177560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:00,705-Speed 5184.76 samples/sec Loss 2.1646 LearningRate 0.0219 Epoch: 10 Global Step: 177570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:02,694-Speed 5151.63 samples/sec Loss 2.1582 LearningRate 0.0219 Epoch: 10 Global Step: 177580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:04,673-Speed 5176.84 samples/sec Loss 2.1456 LearningRate 0.0219 Epoch: 10 Global Step: 177590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:06,652-Speed 5175.32 samples/sec Loss 2.1826 LearningRate 0.0219 Epoch: 10 Global Step: 177600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:08,619-Speed 5208.68 samples/sec Loss 2.2053 LearningRate 0.0219 Epoch: 10 Global Step: 177610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:10,584-Speed 5211.59 samples/sec Loss 2.1992 LearningRate 0.0219 Epoch: 10 Global Step: 177620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:12,585-Speed 5121.01 samples/sec Loss 2.2150 LearningRate 0.0219 Epoch: 10 Global Step: 177630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:14,591-Speed 5105.93 samples/sec Loss 2.1543 LearningRate 0.0219 Epoch: 10 Global Step: 177640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:16,587-Speed 5132.29 samples/sec Loss 2.1745 LearningRate 0.0219 Epoch: 10 Global Step: 177650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:18,553-Speed 5211.13 samples/sec Loss 2.1597 LearningRate 0.0219 Epoch: 10 Global Step: 177660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:20,533-Speed 5172.38 samples/sec Loss 2.1426 LearningRate 0.0219 Epoch: 10 Global Step: 177670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:10:22,529-Speed 5132.70 samples/sec Loss 2.1395 LearningRate 0.0219 Epoch: 10 Global Step: 177680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:24,506-Speed 5181.99 samples/sec Loss 2.2050 LearningRate 0.0219 Epoch: 10 Global Step: 177690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:26,553-Speed 5003.89 samples/sec Loss 2.1640 LearningRate 0.0219 Epoch: 10 Global Step: 177700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:28,535-Speed 5167.91 samples/sec Loss 2.1840 LearningRate 0.0219 Epoch: 10 Global Step: 177710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:30,505-Speed 5198.62 samples/sec Loss 2.1580 LearningRate 0.0219 Epoch: 10 Global Step: 177720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:32,470-Speed 5214.20 samples/sec Loss 2.1574 LearningRate 0.0219 Epoch: 10 Global Step: 177730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:34,457-Speed 5154.63 samples/sec Loss 2.1071 LearningRate 0.0219 Epoch: 10 Global Step: 177740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:36,448-Speed 5144.07 samples/sec Loss 2.1324 LearningRate 0.0219 Epoch: 10 Global Step: 177750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:38,433-Speed 5160.40 samples/sec Loss 2.1682 LearningRate 0.0219 Epoch: 10 Global Step: 177760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:40,400-Speed 5206.11 samples/sec Loss 2.1357 LearningRate 0.0219 Epoch: 10 Global Step: 177770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:42,362-Speed 5222.20 samples/sec Loss 2.0908 LearningRate 0.0218 Epoch: 10 Global Step: 177780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:44,342-Speed 5174.57 samples/sec Loss 2.2170 LearningRate 0.0218 Epoch: 10 Global Step: 177790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:46,356-Speed 5085.25 samples/sec Loss 2.1949 LearningRate 0.0218 Epoch: 10 Global Step: 177800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:48,350-Speed 5136.78 samples/sec Loss 2.1821 LearningRate 0.0218 Epoch: 10 Global Step: 177810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:50,340-Speed 5149.07 samples/sec Loss 2.1906 LearningRate 0.0218 Epoch: 10 Global Step: 177820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:52,330-Speed 5145.53 samples/sec Loss 2.1847 LearningRate 0.0218 Epoch: 10 Global Step: 177830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:54,302-Speed 5194.19 samples/sec Loss 2.1311 LearningRate 0.0218 Epoch: 10 Global Step: 177840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:56,272-Speed 5200.06 samples/sec Loss 2.1192 LearningRate 0.0218 Epoch: 10 Global Step: 177850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:10:58,250-Speed 5178.36 samples/sec Loss 2.1671 LearningRate 0.0218 Epoch: 10 Global Step: 177860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:11:00,232-Speed 5169.35 samples/sec Loss 2.1247 LearningRate 0.0218 Epoch: 10 Global Step: 177870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:11:02,234-Speed 5117.07 samples/sec Loss 2.2467 LearningRate 0.0218 Epoch: 10 Global Step: 177880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:11:04,241-Speed 5103.59 samples/sec Loss 2.2062 LearningRate 0.0218 Epoch: 10 Global Step: 177890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:11:06,209-Speed 5206.72 samples/sec Loss 2.1402 LearningRate 0.0218 Epoch: 10 Global Step: 177900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:11:08,180-Speed 5196.88 samples/sec Loss 2.2055 LearningRate 0.0218 Epoch: 10 Global Step: 177910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:11:10,184-Speed 5112.15 samples/sec Loss 2.1929 LearningRate 0.0218 Epoch: 10 Global Step: 177920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:11:12,153-Speed 5202.81 samples/sec Loss 2.1887 LearningRate 0.0218 Epoch: 10 Global Step: 177930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:11:14,131-Speed 5178.47 samples/sec Loss 2.1551 LearningRate 0.0218 Epoch: 10 Global Step: 177940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:11:16,126-Speed 5136.14 samples/sec Loss 2.1122 LearningRate 0.0218 Epoch: 10 Global Step: 177950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:11:18,097-Speed 5196.12 samples/sec Loss 2.1451 LearningRate 0.0218 Epoch: 10 Global Step: 177960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:11:20,077-Speed 5172.07 samples/sec Loss 2.1192 LearningRate 0.0218 Epoch: 10 Global Step: 177970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:11:22,069-Speed 5142.85 samples/sec Loss 2.1231 LearningRate 0.0218 Epoch: 10 Global Step: 177980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:11:24,058-Speed 5151.51 samples/sec Loss 2.1481 LearningRate 0.0218 Epoch: 10 Global Step: 177990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:11:26,052-Speed 5137.22 samples/sec Loss 2.1440 LearningRate 0.0218 Epoch: 10 Global Step: 178000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:11:52,656-[lfw][178000]XNorm: 23.243783 Training: 2022-04-11 11:11:52,657-[lfw][178000]Accuracy-Flip: 0.99783+-0.00269 Training: 2022-04-11 11:11:52,657-[lfw][178000]Accuracy-Highest: 0.99833 Training: 2022-04-11 11:12:23,497-[cfp_fp][178000]XNorm: 21.629580 Training: 2022-04-11 11:12:23,497-[cfp_fp][178000]Accuracy-Flip: 0.98629+-0.00444 Training: 2022-04-11 11:12:23,498-[cfp_fp][178000]Accuracy-Highest: 0.98629 Training: 2022-04-11 11:12:50,090-[agedb_30][178000]XNorm: 23.145735 Training: 2022-04-11 11:12:50,090-[agedb_30][178000]Accuracy-Flip: 0.97933+-0.00786 Training: 2022-04-11 11:12:50,091-[agedb_30][178000]Accuracy-Highest: 0.98167 Training: 2022-04-11 11:12:52,076-Speed 119.04 samples/sec Loss 2.1573 LearningRate 0.0218 Epoch: 10 Global Step: 178010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:12:54,038-Speed 5222.24 samples/sec Loss 2.1840 LearningRate 0.0218 Epoch: 10 Global Step: 178020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:12:56,008-Speed 5197.50 samples/sec Loss 2.1833 LearningRate 0.0218 Epoch: 10 Global Step: 178030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:12:57,970-Speed 5222.85 samples/sec Loss 2.1300 LearningRate 0.0218 Epoch: 10 Global Step: 178040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:12:59,967-Speed 5128.50 samples/sec Loss 2.1562 LearningRate 0.0218 Epoch: 10 Global Step: 178050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:13:01,949-Speed 5169.67 samples/sec Loss 2.1990 LearningRate 0.0218 Epoch: 10 Global Step: 178060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:13:03,917-Speed 5204.55 samples/sec Loss 2.1042 LearningRate 0.0218 Epoch: 10 Global Step: 178070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:13:05,894-Speed 5181.23 samples/sec Loss 2.2107 LearningRate 0.0218 Epoch: 10 Global Step: 178080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:13:07,861-Speed 5206.00 samples/sec Loss 2.1999 LearningRate 0.0218 Epoch: 10 Global Step: 178090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:13:09,826-Speed 5213.05 samples/sec Loss 2.1540 LearningRate 0.0218 Epoch: 10 Global Step: 178100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:13:11,808-Speed 5167.76 samples/sec Loss 2.1519 LearningRate 0.0218 Epoch: 10 Global Step: 178110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:13:13,790-Speed 5168.67 samples/sec Loss 2.0977 LearningRate 0.0218 Epoch: 10 Global Step: 178120 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:13:15,772-Speed 5167.97 samples/sec Loss 2.0818 LearningRate 0.0218 Epoch: 10 Global Step: 178130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:13:17,748-Speed 5185.73 samples/sec Loss 2.1528 LearningRate 0.0217 Epoch: 10 Global Step: 178140 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:13:19,714-Speed 5209.86 samples/sec Loss 2.2199 LearningRate 0.0217 Epoch: 10 Global Step: 178150 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:13:21,687-Speed 5191.60 samples/sec Loss 2.1335 LearningRate 0.0217 Epoch: 10 Global Step: 178160 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:13:23,667-Speed 5172.71 samples/sec Loss 2.1119 LearningRate 0.0217 Epoch: 10 Global Step: 178170 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:13:25,656-Speed 5151.36 samples/sec Loss 2.0885 LearningRate 0.0217 Epoch: 10 Global Step: 178180 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:13:27,647-Speed 5143.99 samples/sec Loss 2.2107 LearningRate 0.0217 Epoch: 10 Global Step: 178190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:13:29,651-Speed 5111.14 samples/sec Loss 2.1482 LearningRate 0.0217 Epoch: 10 Global Step: 178200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:13:31,639-Speed 5152.48 samples/sec Loss 2.1120 LearningRate 0.0217 Epoch: 10 Global Step: 178210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:13:33,614-Speed 5186.72 samples/sec Loss 2.1340 LearningRate 0.0217 Epoch: 10 Global Step: 178220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:13:35,602-Speed 5153.17 samples/sec Loss 2.1406 LearningRate 0.0217 Epoch: 10 Global Step: 178230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:13:37,605-Speed 5112.77 samples/sec Loss 2.1068 LearningRate 0.0217 Epoch: 10 Global Step: 178240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:13:39,595-Speed 5149.63 samples/sec Loss 2.1692 LearningRate 0.0217 Epoch: 10 Global Step: 178250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:13:41,575-Speed 5173.19 samples/sec Loss 2.1877 LearningRate 0.0217 Epoch: 10 Global Step: 178260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:13:43,550-Speed 5186.91 samples/sec Loss 2.1675 LearningRate 0.0217 Epoch: 10 Global Step: 178270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:13:45,536-Speed 5157.35 samples/sec Loss 2.1927 LearningRate 0.0217 Epoch: 10 Global Step: 178280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:13:47,522-Speed 5156.43 samples/sec Loss 2.1796 LearningRate 0.0217 Epoch: 10 Global Step: 178290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:13:49,501-Speed 5175.31 samples/sec Loss 2.2041 LearningRate 0.0217 Epoch: 10 Global Step: 178300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:13:51,505-Speed 5113.28 samples/sec Loss 2.1335 LearningRate 0.0217 Epoch: 10 Global Step: 178310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:13:53,491-Speed 5157.18 samples/sec Loss 2.1379 LearningRate 0.0217 Epoch: 10 Global Step: 178320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:13:55,473-Speed 5170.08 samples/sec Loss 2.1436 LearningRate 0.0217 Epoch: 10 Global Step: 178330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:13:57,453-Speed 5173.37 samples/sec Loss 2.1556 LearningRate 0.0217 Epoch: 10 Global Step: 178340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:13:59,434-Speed 5169.34 samples/sec Loss 2.1963 LearningRate 0.0217 Epoch: 10 Global Step: 178350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:14:01,418-Speed 5163.71 samples/sec Loss 2.2080 LearningRate 0.0217 Epoch: 10 Global Step: 178360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:14:03,399-Speed 5168.95 samples/sec Loss 2.1218 LearningRate 0.0217 Epoch: 10 Global Step: 178370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:14:05,375-Speed 5185.51 samples/sec Loss 2.1557 LearningRate 0.0217 Epoch: 10 Global Step: 178380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:14:07,357-Speed 5167.62 samples/sec Loss 2.1186 LearningRate 0.0217 Epoch: 10 Global Step: 178390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:14:09,347-Speed 5148.35 samples/sec Loss 2.1817 LearningRate 0.0217 Epoch: 10 Global Step: 178400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:14:11,333-Speed 5156.17 samples/sec Loss 2.1733 LearningRate 0.0217 Epoch: 10 Global Step: 178410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:14:13,299-Speed 5211.99 samples/sec Loss 2.1810 LearningRate 0.0217 Epoch: 10 Global Step: 178420 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:14:15,274-Speed 5186.14 samples/sec Loss 2.2554 LearningRate 0.0217 Epoch: 10 Global Step: 178430 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:14:17,256-Speed 5168.54 samples/sec Loss 2.1475 LearningRate 0.0217 Epoch: 10 Global Step: 178440 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:14:19,227-Speed 5198.02 samples/sec Loss 2.1387 LearningRate 0.0217 Epoch: 10 Global Step: 178450 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:14:21,198-Speed 5197.71 samples/sec Loss 2.1495 LearningRate 0.0217 Epoch: 10 Global Step: 178460 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:14:23,175-Speed 5179.16 samples/sec Loss 2.1302 LearningRate 0.0217 Epoch: 10 Global Step: 178470 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:14:25,150-Speed 5188.66 samples/sec Loss 2.1276 LearningRate 0.0217 Epoch: 10 Global Step: 178480 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:14:27,136-Speed 5155.84 samples/sec Loss 2.1362 LearningRate 0.0217 Epoch: 10 Global Step: 178490 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:14:29,120-Speed 5162.81 samples/sec Loss 2.1924 LearningRate 0.0216 Epoch: 10 Global Step: 178500 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:14:31,094-Speed 5190.43 samples/sec Loss 2.1762 LearningRate 0.0216 Epoch: 10 Global Step: 178510 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:14:33,065-Speed 5198.10 samples/sec Loss 2.1400 LearningRate 0.0216 Epoch: 10 Global Step: 178520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:14:35,052-Speed 5154.37 samples/sec Loss 2.1971 LearningRate 0.0216 Epoch: 10 Global Step: 178530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:14:37,057-Speed 5110.12 samples/sec Loss 2.0869 LearningRate 0.0216 Epoch: 10 Global Step: 178540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:14:39,036-Speed 5173.75 samples/sec Loss 2.2088 LearningRate 0.0216 Epoch: 10 Global Step: 178550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:14:41,021-Speed 5160.64 samples/sec Loss 2.1932 LearningRate 0.0216 Epoch: 10 Global Step: 178560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:14:43,018-Speed 5129.33 samples/sec Loss 2.2023 LearningRate 0.0216 Epoch: 10 Global Step: 178570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:14:45,003-Speed 5160.90 samples/sec Loss 2.1180 LearningRate 0.0216 Epoch: 10 Global Step: 178580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:14:47,033-Speed 5046.90 samples/sec Loss 2.1137 LearningRate 0.0216 Epoch: 10 Global Step: 178590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:14:49,013-Speed 5173.96 samples/sec Loss 2.1699 LearningRate 0.0216 Epoch: 10 Global Step: 178600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:14:50,982-Speed 5200.61 samples/sec Loss 2.1838 LearningRate 0.0216 Epoch: 10 Global Step: 178610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:14:52,972-Speed 5148.07 samples/sec Loss 2.1631 LearningRate 0.0216 Epoch: 10 Global Step: 178620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:14:54,953-Speed 5170.97 samples/sec Loss 2.2193 LearningRate 0.0216 Epoch: 10 Global Step: 178630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:14:56,916-Speed 5218.10 samples/sec Loss 2.2768 LearningRate 0.0216 Epoch: 10 Global Step: 178640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:14:58,887-Speed 5197.54 samples/sec Loss 2.1769 LearningRate 0.0216 Epoch: 10 Global Step: 178650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:15:00,856-Speed 5202.99 samples/sec Loss 2.1162 LearningRate 0.0216 Epoch: 10 Global Step: 178660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:15:02,820-Speed 5214.47 samples/sec Loss 2.1606 LearningRate 0.0216 Epoch: 10 Global Step: 178670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:15:04,788-Speed 5206.02 samples/sec Loss 2.1725 LearningRate 0.0216 Epoch: 10 Global Step: 178680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:15:06,761-Speed 5190.30 samples/sec Loss 2.0823 LearningRate 0.0216 Epoch: 10 Global Step: 178690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:15:08,731-Speed 5200.33 samples/sec Loss 2.2170 LearningRate 0.0216 Epoch: 10 Global Step: 178700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:15:10,709-Speed 5177.53 samples/sec Loss 2.1054 LearningRate 0.0216 Epoch: 10 Global Step: 178710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:15:12,686-Speed 5183.06 samples/sec Loss 2.1781 LearningRate 0.0216 Epoch: 10 Global Step: 178720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:15:14,655-Speed 5204.10 samples/sec Loss 2.0993 LearningRate 0.0216 Epoch: 10 Global Step: 178730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:15:16,620-Speed 5212.47 samples/sec Loss 2.1300 LearningRate 0.0216 Epoch: 10 Global Step: 178740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:15:18,587-Speed 5206.08 samples/sec Loss 2.1463 LearningRate 0.0216 Epoch: 10 Global Step: 178750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:15:20,548-Speed 5224.87 samples/sec Loss 2.1341 LearningRate 0.0216 Epoch: 10 Global Step: 178760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:15:22,539-Speed 5144.19 samples/sec Loss 2.1432 LearningRate 0.0216 Epoch: 10 Global Step: 178770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:15:24,518-Speed 5176.91 samples/sec Loss 2.1913 LearningRate 0.0216 Epoch: 10 Global Step: 178780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:15:26,485-Speed 5207.26 samples/sec Loss 2.1566 LearningRate 0.0216 Epoch: 10 Global Step: 178790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:15:28,469-Speed 5161.95 samples/sec Loss 2.1132 LearningRate 0.0216 Epoch: 10 Global Step: 178800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:15:30,435-Speed 5209.01 samples/sec Loss 2.1377 LearningRate 0.0216 Epoch: 10 Global Step: 178810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:15:32,413-Speed 5179.04 samples/sec Loss 2.1729 LearningRate 0.0216 Epoch: 10 Global Step: 178820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:15:34,390-Speed 5183.69 samples/sec Loss 2.1372 LearningRate 0.0216 Epoch: 10 Global Step: 178830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:15:36,365-Speed 5185.38 samples/sec Loss 2.1394 LearningRate 0.0216 Epoch: 10 Global Step: 178840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:15:38,346-Speed 5170.12 samples/sec Loss 2.1998 LearningRate 0.0216 Epoch: 10 Global Step: 178850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:15:40,332-Speed 5157.14 samples/sec Loss 2.1360 LearningRate 0.0215 Epoch: 10 Global Step: 178860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:15:42,296-Speed 5216.18 samples/sec Loss 2.2187 LearningRate 0.0215 Epoch: 10 Global Step: 178870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:15:44,264-Speed 5206.75 samples/sec Loss 2.1646 LearningRate 0.0215 Epoch: 10 Global Step: 178880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:15:46,263-Speed 5122.69 samples/sec Loss 2.2733 LearningRate 0.0215 Epoch: 10 Global Step: 178890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:15:48,236-Speed 5192.60 samples/sec Loss 2.1746 LearningRate 0.0215 Epoch: 10 Global Step: 178900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:15:50,226-Speed 5146.90 samples/sec Loss 2.1536 LearningRate 0.0215 Epoch: 10 Global Step: 178910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:15:52,193-Speed 5209.17 samples/sec Loss 2.1066 LearningRate 0.0215 Epoch: 10 Global Step: 178920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:15:54,158-Speed 5212.00 samples/sec Loss 2.2026 LearningRate 0.0215 Epoch: 10 Global Step: 178930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:15:56,127-Speed 5202.23 samples/sec Loss 2.1782 LearningRate 0.0215 Epoch: 10 Global Step: 178940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:15:58,126-Speed 5123.80 samples/sec Loss 2.1703 LearningRate 0.0215 Epoch: 10 Global Step: 178950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:16:00,129-Speed 5114.24 samples/sec Loss 2.2194 LearningRate 0.0215 Epoch: 10 Global Step: 178960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:16:02,134-Speed 5107.94 samples/sec Loss 2.2011 LearningRate 0.0215 Epoch: 10 Global Step: 178970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:16:04,112-Speed 5180.26 samples/sec Loss 2.1018 LearningRate 0.0215 Epoch: 10 Global Step: 178980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:16:06,092-Speed 5174.44 samples/sec Loss 2.1284 LearningRate 0.0215 Epoch: 10 Global Step: 178990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:16:08,077-Speed 5159.38 samples/sec Loss 2.2194 LearningRate 0.0215 Epoch: 10 Global Step: 179000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:16:10,051-Speed 5190.38 samples/sec Loss 2.1782 LearningRate 0.0215 Epoch: 10 Global Step: 179010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:16:12,036-Speed 5161.01 samples/sec Loss 2.1865 LearningRate 0.0215 Epoch: 10 Global Step: 179020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:16:14,021-Speed 5160.40 samples/sec Loss 2.1526 LearningRate 0.0215 Epoch: 10 Global Step: 179030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:16:16,006-Speed 5160.30 samples/sec Loss 2.1851 LearningRate 0.0215 Epoch: 10 Global Step: 179040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:16:17,986-Speed 5172.66 samples/sec Loss 2.1147 LearningRate 0.0215 Epoch: 10 Global Step: 179050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:16:19,955-Speed 5202.83 samples/sec Loss 2.1765 LearningRate 0.0215 Epoch: 10 Global Step: 179060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:16:21,939-Speed 5164.50 samples/sec Loss 2.0969 LearningRate 0.0215 Epoch: 10 Global Step: 179070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:16:23,912-Speed 5191.01 samples/sec Loss 2.1807 LearningRate 0.0215 Epoch: 10 Global Step: 179080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:16:25,903-Speed 5144.23 samples/sec Loss 2.1849 LearningRate 0.0215 Epoch: 10 Global Step: 179090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:16:27,890-Speed 5155.54 samples/sec Loss 2.1174 LearningRate 0.0215 Epoch: 10 Global Step: 179100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:16:29,870-Speed 5175.38 samples/sec Loss 2.1619 LearningRate 0.0215 Epoch: 10 Global Step: 179110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:16:31,839-Speed 5201.98 samples/sec Loss 2.1943 LearningRate 0.0215 Epoch: 10 Global Step: 179120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:16:33,809-Speed 5199.46 samples/sec Loss 2.1202 LearningRate 0.0215 Epoch: 10 Global Step: 179130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:16:35,792-Speed 5165.22 samples/sec Loss 2.1957 LearningRate 0.0215 Epoch: 10 Global Step: 179140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:16:37,782-Speed 5148.30 samples/sec Loss 2.2120 LearningRate 0.0215 Epoch: 10 Global Step: 179150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:16:39,775-Speed 5138.81 samples/sec Loss 2.1812 LearningRate 0.0215 Epoch: 10 Global Step: 179160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:16:41,764-Speed 5148.52 samples/sec Loss 2.1408 LearningRate 0.0215 Epoch: 10 Global Step: 179170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:16:43,740-Speed 5185.49 samples/sec Loss 2.1760 LearningRate 0.0215 Epoch: 10 Global Step: 179180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:16:45,734-Speed 5135.51 samples/sec Loss 2.1401 LearningRate 0.0215 Epoch: 10 Global Step: 179190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:16:47,724-Speed 5149.10 samples/sec Loss 2.2340 LearningRate 0.0215 Epoch: 10 Global Step: 179200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:16:49,720-Speed 5132.53 samples/sec Loss 2.1990 LearningRate 0.0215 Epoch: 10 Global Step: 179210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:16:51,696-Speed 5184.29 samples/sec Loss 2.2099 LearningRate 0.0214 Epoch: 10 Global Step: 179220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:16:53,678-Speed 5168.34 samples/sec Loss 2.1537 LearningRate 0.0214 Epoch: 10 Global Step: 179230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:16:55,649-Speed 5197.23 samples/sec Loss 2.1625 LearningRate 0.0214 Epoch: 10 Global Step: 179240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:16:57,616-Speed 5206.21 samples/sec Loss 2.1267 LearningRate 0.0214 Epoch: 10 Global Step: 179250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:16:59,627-Speed 5094.14 samples/sec Loss 2.1827 LearningRate 0.0214 Epoch: 10 Global Step: 179260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:17:01,616-Speed 5150.40 samples/sec Loss 2.1079 LearningRate 0.0214 Epoch: 10 Global Step: 179270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:17:03,606-Speed 5146.95 samples/sec Loss 2.1279 LearningRate 0.0214 Epoch: 10 Global Step: 179280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:17:05,590-Speed 5162.89 samples/sec Loss 2.1664 LearningRate 0.0214 Epoch: 10 Global Step: 179290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:17:07,573-Speed 5165.65 samples/sec Loss 2.1587 LearningRate 0.0214 Epoch: 10 Global Step: 179300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:17:09,532-Speed 5229.37 samples/sec Loss 2.1600 LearningRate 0.0214 Epoch: 10 Global Step: 179310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:17:11,501-Speed 5205.71 samples/sec Loss 2.1317 LearningRate 0.0214 Epoch: 10 Global Step: 179320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:17:13,467-Speed 5208.75 samples/sec Loss 2.1498 LearningRate 0.0214 Epoch: 10 Global Step: 179330 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:17:15,447-Speed 5174.88 samples/sec Loss 2.2028 LearningRate 0.0214 Epoch: 10 Global Step: 179340 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:17:17,432-Speed 5159.89 samples/sec Loss 2.1858 LearningRate 0.0214 Epoch: 10 Global Step: 179350 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:17:19,416-Speed 5163.28 samples/sec Loss 2.1059 LearningRate 0.0214 Epoch: 10 Global Step: 179360 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:17:21,384-Speed 5204.53 samples/sec Loss 2.1447 LearningRate 0.0214 Epoch: 10 Global Step: 179370 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:17:23,354-Speed 5201.18 samples/sec Loss 2.2149 LearningRate 0.0214 Epoch: 10 Global Step: 179380 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:17:25,341-Speed 5155.63 samples/sec Loss 2.2127 LearningRate 0.0214 Epoch: 10 Global Step: 179390 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:17:27,329-Speed 5151.31 samples/sec Loss 2.2415 LearningRate 0.0214 Epoch: 10 Global Step: 179400 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:17:29,340-Speed 5094.45 samples/sec Loss 2.2194 LearningRate 0.0214 Epoch: 10 Global Step: 179410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:17:31,321-Speed 5170.84 samples/sec Loss 2.2382 LearningRate 0.0214 Epoch: 10 Global Step: 179420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:17:33,290-Speed 5203.00 samples/sec Loss 2.2046 LearningRate 0.0214 Epoch: 10 Global Step: 179430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:17:35,263-Speed 5191.06 samples/sec Loss 2.1910 LearningRate 0.0214 Epoch: 10 Global Step: 179440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:17:37,281-Speed 5076.40 samples/sec Loss 2.1277 LearningRate 0.0214 Epoch: 10 Global Step: 179450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:17:39,254-Speed 5192.59 samples/sec Loss 2.1511 LearningRate 0.0214 Epoch: 10 Global Step: 179460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:17:41,235-Speed 5169.98 samples/sec Loss 2.1659 LearningRate 0.0214 Epoch: 10 Global Step: 179470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:17:43,209-Speed 5188.56 samples/sec Loss 2.1181 LearningRate 0.0214 Epoch: 10 Global Step: 179480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:17:45,176-Speed 5208.17 samples/sec Loss 2.1401 LearningRate 0.0214 Epoch: 10 Global Step: 179490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:17:47,144-Speed 5204.33 samples/sec Loss 2.1225 LearningRate 0.0214 Epoch: 10 Global Step: 179500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:17:49,118-Speed 5189.68 samples/sec Loss 2.1131 LearningRate 0.0214 Epoch: 10 Global Step: 179510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:17:51,085-Speed 5209.62 samples/sec Loss 2.1983 LearningRate 0.0214 Epoch: 10 Global Step: 179520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:17:53,063-Speed 5176.62 samples/sec Loss 2.2443 LearningRate 0.0214 Epoch: 10 Global Step: 179530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:17:55,032-Speed 5204.53 samples/sec Loss 2.1592 LearningRate 0.0214 Epoch: 10 Global Step: 179540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:17:56,999-Speed 5205.36 samples/sec Loss 2.1544 LearningRate 0.0214 Epoch: 10 Global Step: 179550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:17:58,975-Speed 5184.61 samples/sec Loss 2.1004 LearningRate 0.0214 Epoch: 10 Global Step: 179560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:18:00,968-Speed 5141.18 samples/sec Loss 2.1323 LearningRate 0.0214 Epoch: 10 Global Step: 179570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:18:02,964-Speed 5130.12 samples/sec Loss 2.1289 LearningRate 0.0213 Epoch: 10 Global Step: 179580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:18:04,935-Speed 5197.25 samples/sec Loss 2.1520 LearningRate 0.0213 Epoch: 10 Global Step: 179590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:18:06,906-Speed 5199.87 samples/sec Loss 2.1116 LearningRate 0.0213 Epoch: 10 Global Step: 179600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:18:08,891-Speed 5160.35 samples/sec Loss 2.1247 LearningRate 0.0213 Epoch: 10 Global Step: 179610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:18:10,893-Speed 5115.08 samples/sec Loss 2.1960 LearningRate 0.0213 Epoch: 10 Global Step: 179620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:18:12,885-Speed 5141.64 samples/sec Loss 2.2051 LearningRate 0.0213 Epoch: 10 Global Step: 179630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:18:14,859-Speed 5191.50 samples/sec Loss 2.1846 LearningRate 0.0213 Epoch: 10 Global Step: 179640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:18:16,832-Speed 5191.28 samples/sec Loss 2.2029 LearningRate 0.0213 Epoch: 10 Global Step: 179650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:18:18,816-Speed 5162.43 samples/sec Loss 2.1878 LearningRate 0.0213 Epoch: 10 Global Step: 179660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:18:20,806-Speed 5146.53 samples/sec Loss 2.2125 LearningRate 0.0213 Epoch: 10 Global Step: 179670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:18:22,785-Speed 5177.35 samples/sec Loss 2.1910 LearningRate 0.0213 Epoch: 10 Global Step: 179680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:18:24,758-Speed 5190.58 samples/sec Loss 2.1889 LearningRate 0.0213 Epoch: 10 Global Step: 179690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:18:26,740-Speed 5170.90 samples/sec Loss 2.2053 LearningRate 0.0213 Epoch: 10 Global Step: 179700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:18:28,717-Speed 5180.60 samples/sec Loss 2.1278 LearningRate 0.0213 Epoch: 10 Global Step: 179710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:18:30,686-Speed 5201.63 samples/sec Loss 2.1706 LearningRate 0.0213 Epoch: 10 Global Step: 179720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:18:32,652-Speed 5209.48 samples/sec Loss 2.2262 LearningRate 0.0213 Epoch: 10 Global Step: 179730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:18:34,625-Speed 5192.52 samples/sec Loss 2.1946 LearningRate 0.0213 Epoch: 10 Global Step: 179740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:18:36,610-Speed 5160.50 samples/sec Loss 2.1660 LearningRate 0.0213 Epoch: 10 Global Step: 179750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:18:38,581-Speed 5195.43 samples/sec Loss 2.1453 LearningRate 0.0213 Epoch: 10 Global Step: 179760 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:18:40,548-Speed 5209.45 samples/sec Loss 2.1793 LearningRate 0.0213 Epoch: 10 Global Step: 179770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:18:42,531-Speed 5163.28 samples/sec Loss 2.2380 LearningRate 0.0213 Epoch: 10 Global Step: 179780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:18:44,529-Speed 5128.20 samples/sec Loss 2.1649 LearningRate 0.0213 Epoch: 10 Global Step: 179790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:18:46,518-Speed 5150.73 samples/sec Loss 2.1892 LearningRate 0.0213 Epoch: 10 Global Step: 179800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:18:48,508-Speed 5149.28 samples/sec Loss 2.2493 LearningRate 0.0213 Epoch: 10 Global Step: 179810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:18:50,489-Speed 5168.81 samples/sec Loss 2.1585 LearningRate 0.0213 Epoch: 10 Global Step: 179820 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:18:52,476-Speed 5154.52 samples/sec Loss 2.1710 LearningRate 0.0213 Epoch: 10 Global Step: 179830 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:18:54,463-Speed 5156.73 samples/sec Loss 2.1686 LearningRate 0.0213 Epoch: 10 Global Step: 179840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:18:56,441-Speed 5177.72 samples/sec Loss 2.1948 LearningRate 0.0213 Epoch: 10 Global Step: 179850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:18:58,436-Speed 5136.34 samples/sec Loss 2.1360 LearningRate 0.0213 Epoch: 10 Global Step: 179860 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:19:00,431-Speed 5134.04 samples/sec Loss 2.2123 LearningRate 0.0213 Epoch: 10 Global Step: 179870 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:19:02,429-Speed 5125.42 samples/sec Loss 2.2311 LearningRate 0.0213 Epoch: 10 Global Step: 179880 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:19:04,443-Speed 5087.16 samples/sec Loss 2.1161 LearningRate 0.0213 Epoch: 10 Global Step: 179890 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:19:06,412-Speed 5201.95 samples/sec Loss 2.1310 LearningRate 0.0213 Epoch: 10 Global Step: 179900 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:19:08,391-Speed 5176.06 samples/sec Loss 2.1798 LearningRate 0.0213 Epoch: 10 Global Step: 179910 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:19:10,358-Speed 5209.53 samples/sec Loss 2.1772 LearningRate 0.0213 Epoch: 10 Global Step: 179920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:19:12,349-Speed 5143.93 samples/sec Loss 2.1466 LearningRate 0.0213 Epoch: 10 Global Step: 179930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:19:14,333-Speed 5163.51 samples/sec Loss 2.1409 LearningRate 0.0212 Epoch: 10 Global Step: 179940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:19:16,309-Speed 5182.38 samples/sec Loss 2.2215 LearningRate 0.0212 Epoch: 10 Global Step: 179950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:19:18,279-Speed 5201.64 samples/sec Loss 2.1929 LearningRate 0.0212 Epoch: 10 Global Step: 179960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:19:20,249-Speed 5199.67 samples/sec Loss 2.1825 LearningRate 0.0212 Epoch: 10 Global Step: 179970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:19:22,221-Speed 5194.09 samples/sec Loss 2.1573 LearningRate 0.0212 Epoch: 10 Global Step: 179980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:19:24,190-Speed 5204.18 samples/sec Loss 2.1784 LearningRate 0.0212 Epoch: 10 Global Step: 179990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:19:26,176-Speed 5159.84 samples/sec Loss 2.1876 LearningRate 0.0212 Epoch: 10 Global Step: 180000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:19:52,778-[lfw][180000]XNorm: 23.573790 Training: 2022-04-11 11:19:52,778-[lfw][180000]Accuracy-Flip: 0.99767+-0.00281 Training: 2022-04-11 11:19:52,779-[lfw][180000]Accuracy-Highest: 0.99833 Training: 2022-04-11 11:20:23,654-[cfp_fp][180000]XNorm: 22.264645 Training: 2022-04-11 11:20:23,655-[cfp_fp][180000]Accuracy-Flip: 0.98443+-0.00573 Training: 2022-04-11 11:20:23,657-[cfp_fp][180000]Accuracy-Highest: 0.98629 Training: 2022-04-11 11:20:50,240-[agedb_30][180000]XNorm: 23.673159 Training: 2022-04-11 11:20:50,241-[agedb_30][180000]Accuracy-Flip: 0.97883+-0.00768 Training: 2022-04-11 11:20:50,241-[agedb_30][180000]Accuracy-Highest: 0.98167 Training: 2022-04-11 11:20:52,254-Speed 118.96 samples/sec Loss 2.1565 LearningRate 0.0212 Epoch: 10 Global Step: 180010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:20:54,232-Speed 5178.53 samples/sec Loss 2.1927 LearningRate 0.0212 Epoch: 10 Global Step: 180020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:20:56,195-Speed 5218.09 samples/sec Loss 2.1453 LearningRate 0.0212 Epoch: 10 Global Step: 180030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:20:58,159-Speed 5215.36 samples/sec Loss 2.1508 LearningRate 0.0212 Epoch: 10 Global Step: 180040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:21:00,143-Speed 5162.85 samples/sec Loss 2.2481 LearningRate 0.0212 Epoch: 10 Global Step: 180050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:21:02,170-Speed 5055.10 samples/sec Loss 2.2022 LearningRate 0.0212 Epoch: 10 Global Step: 180060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:21:04,219-Speed 4997.62 samples/sec Loss 2.2108 LearningRate 0.0212 Epoch: 10 Global Step: 180070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:21:06,190-Speed 5198.55 samples/sec Loss 2.2230 LearningRate 0.0212 Epoch: 10 Global Step: 180080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:21:08,177-Speed 5153.05 samples/sec Loss 2.1273 LearningRate 0.0212 Epoch: 10 Global Step: 180090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:21:10,152-Speed 5185.91 samples/sec Loss 2.1226 LearningRate 0.0212 Epoch: 10 Global Step: 180100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:21:12,179-Speed 5053.31 samples/sec Loss 2.1381 LearningRate 0.0212 Epoch: 10 Global Step: 180110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:21:14,162-Speed 5166.21 samples/sec Loss 2.1888 LearningRate 0.0212 Epoch: 10 Global Step: 180120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:21:16,169-Speed 5104.36 samples/sec Loss 2.0837 LearningRate 0.0212 Epoch: 10 Global Step: 180130 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:21:18,139-Speed 5201.01 samples/sec Loss 2.2536 LearningRate 0.0212 Epoch: 10 Global Step: 180140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:21:20,128-Speed 5149.84 samples/sec Loss 2.1705 LearningRate 0.0212 Epoch: 10 Global Step: 180150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:21:22,121-Speed 5138.90 samples/sec Loss 2.1126 LearningRate 0.0212 Epoch: 10 Global Step: 180160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:21:24,093-Speed 5195.65 samples/sec Loss 2.1930 LearningRate 0.0212 Epoch: 10 Global Step: 180170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:21:26,078-Speed 5159.48 samples/sec Loss 2.1554 LearningRate 0.0212 Epoch: 10 Global Step: 180180 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:21:28,065-Speed 5156.61 samples/sec Loss 2.1605 LearningRate 0.0212 Epoch: 10 Global Step: 180190 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:21:30,045-Speed 5173.89 samples/sec Loss 2.2042 LearningRate 0.0212 Epoch: 10 Global Step: 180200 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:21:32,026-Speed 5170.09 samples/sec Loss 2.1297 LearningRate 0.0212 Epoch: 10 Global Step: 180210 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:21:34,014-Speed 5153.91 samples/sec Loss 2.1286 LearningRate 0.0212 Epoch: 10 Global Step: 180220 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:21:35,987-Speed 5192.13 samples/sec Loss 2.1555 LearningRate 0.0212 Epoch: 10 Global Step: 180230 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:21:37,971-Speed 5163.58 samples/sec Loss 2.1926 LearningRate 0.0212 Epoch: 10 Global Step: 180240 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:21:39,969-Speed 5124.40 samples/sec Loss 2.2020 LearningRate 0.0212 Epoch: 10 Global Step: 180250 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:21:41,945-Speed 5184.03 samples/sec Loss 2.1324 LearningRate 0.0212 Epoch: 10 Global Step: 180260 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:21:43,918-Speed 5193.26 samples/sec Loss 2.1101 LearningRate 0.0212 Epoch: 10 Global Step: 180270 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:21:45,899-Speed 5170.45 samples/sec Loss 2.1946 LearningRate 0.0212 Epoch: 10 Global Step: 180280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:21:47,894-Speed 5133.86 samples/sec Loss 2.1487 LearningRate 0.0212 Epoch: 10 Global Step: 180290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:21:49,903-Speed 5099.70 samples/sec Loss 2.1669 LearningRate 0.0211 Epoch: 10 Global Step: 180300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:21:51,892-Speed 5147.97 samples/sec Loss 2.2058 LearningRate 0.0211 Epoch: 10 Global Step: 180310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:21:53,876-Speed 5163.65 samples/sec Loss 2.1950 LearningRate 0.0211 Epoch: 10 Global Step: 180320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:21:55,851-Speed 5187.04 samples/sec Loss 2.1642 LearningRate 0.0211 Epoch: 10 Global Step: 180330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:21:57,837-Speed 5160.10 samples/sec Loss 2.1388 LearningRate 0.0211 Epoch: 10 Global Step: 180340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:21:59,828-Speed 5144.40 samples/sec Loss 2.1164 LearningRate 0.0211 Epoch: 10 Global Step: 180350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:22:01,823-Speed 5133.24 samples/sec Loss 2.1304 LearningRate 0.0211 Epoch: 10 Global Step: 180360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:22:03,813-Speed 5147.09 samples/sec Loss 2.1461 LearningRate 0.0211 Epoch: 10 Global Step: 180370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:22:05,795-Speed 5168.36 samples/sec Loss 2.1567 LearningRate 0.0211 Epoch: 10 Global Step: 180380 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:22:07,783-Speed 5152.43 samples/sec Loss 2.0908 LearningRate 0.0211 Epoch: 10 Global Step: 180390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:22:09,754-Speed 5197.44 samples/sec Loss 2.1285 LearningRate 0.0211 Epoch: 10 Global Step: 180400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:22:11,749-Speed 5135.46 samples/sec Loss 2.1300 LearningRate 0.0211 Epoch: 10 Global Step: 180410 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:22:13,758-Speed 5098.94 samples/sec Loss 2.1796 LearningRate 0.0211 Epoch: 10 Global Step: 180420 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:22:15,742-Speed 5163.77 samples/sec Loss 2.1096 LearningRate 0.0211 Epoch: 10 Global Step: 180430 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:22:17,731-Speed 5149.25 samples/sec Loss 2.1617 LearningRate 0.0211 Epoch: 10 Global Step: 180440 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:22:19,706-Speed 5187.61 samples/sec Loss 2.1794 LearningRate 0.0211 Epoch: 10 Global Step: 180450 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:22:21,704-Speed 5124.91 samples/sec Loss 2.2318 LearningRate 0.0211 Epoch: 10 Global Step: 180460 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:22:23,691-Speed 5157.14 samples/sec Loss 2.1463 LearningRate 0.0211 Epoch: 10 Global Step: 180470 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:22:25,672-Speed 5170.58 samples/sec Loss 2.2224 LearningRate 0.0211 Epoch: 10 Global Step: 180480 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:22:27,647-Speed 5185.37 samples/sec Loss 2.1503 LearningRate 0.0211 Epoch: 10 Global Step: 180490 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:22:29,637-Speed 5148.84 samples/sec Loss 2.1790 LearningRate 0.0211 Epoch: 10 Global Step: 180500 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:22:31,619-Speed 5165.87 samples/sec Loss 2.1840 LearningRate 0.0211 Epoch: 10 Global Step: 180510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:22:33,606-Speed 5156.37 samples/sec Loss 2.2169 LearningRate 0.0211 Epoch: 10 Global Step: 180520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:22:35,649-Speed 5014.60 samples/sec Loss 2.0886 LearningRate 0.0211 Epoch: 10 Global Step: 180530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:22:37,627-Speed 5177.94 samples/sec Loss 2.1581 LearningRate 0.0211 Epoch: 10 Global Step: 180540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:22:39,617-Speed 5148.09 samples/sec Loss 2.1265 LearningRate 0.0211 Epoch: 10 Global Step: 180550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:22:41,615-Speed 5135.14 samples/sec Loss 2.1740 LearningRate 0.0211 Epoch: 10 Global Step: 180560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:22:43,585-Speed 5198.95 samples/sec Loss 2.1993 LearningRate 0.0211 Epoch: 10 Global Step: 180570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:22:45,570-Speed 5161.47 samples/sec Loss 2.2456 LearningRate 0.0211 Epoch: 10 Global Step: 180580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:22:47,570-Speed 5120.68 samples/sec Loss 2.1997 LearningRate 0.0211 Epoch: 10 Global Step: 180590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:22:49,547-Speed 5180.27 samples/sec Loss 2.0848 LearningRate 0.0211 Epoch: 10 Global Step: 180600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:22:51,528-Speed 5173.75 samples/sec Loss 2.1473 LearningRate 0.0211 Epoch: 10 Global Step: 180610 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:22:53,500-Speed 5193.96 samples/sec Loss 2.1810 LearningRate 0.0211 Epoch: 10 Global Step: 180620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:22:55,474-Speed 5189.08 samples/sec Loss 2.1060 LearningRate 0.0211 Epoch: 10 Global Step: 180630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:22:57,482-Speed 5194.63 samples/sec Loss 2.1389 LearningRate 0.0211 Epoch: 10 Global Step: 180640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:22:59,453-Speed 5196.96 samples/sec Loss 2.1926 LearningRate 0.0211 Epoch: 10 Global Step: 180650 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:23:01,428-Speed 5186.76 samples/sec Loss 2.2106 LearningRate 0.0211 Epoch: 10 Global Step: 180660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:23:03,414-Speed 5156.26 samples/sec Loss 2.1930 LearningRate 0.0210 Epoch: 10 Global Step: 180670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:23:05,397-Speed 5165.07 samples/sec Loss 2.1339 LearningRate 0.0210 Epoch: 10 Global Step: 180680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:23:07,398-Speed 5120.82 samples/sec Loss 2.1625 LearningRate 0.0210 Epoch: 10 Global Step: 180690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:23:09,384-Speed 5156.71 samples/sec Loss 2.1320 LearningRate 0.0210 Epoch: 10 Global Step: 180700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:23:11,373-Speed 5152.76 samples/sec Loss 2.2004 LearningRate 0.0210 Epoch: 10 Global Step: 180710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:23:13,346-Speed 5190.92 samples/sec Loss 2.1632 LearningRate 0.0210 Epoch: 10 Global Step: 180720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:23:15,347-Speed 5119.60 samples/sec Loss 2.1988 LearningRate 0.0210 Epoch: 10 Global Step: 180730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:23:17,339-Speed 5141.38 samples/sec Loss 2.2086 LearningRate 0.0210 Epoch: 10 Global Step: 180740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:23:19,315-Speed 5193.90 samples/sec Loss 2.1296 LearningRate 0.0210 Epoch: 10 Global Step: 180750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:23:21,317-Speed 5115.90 samples/sec Loss 2.1365 LearningRate 0.0210 Epoch: 10 Global Step: 180760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:23:23,303-Speed 5156.14 samples/sec Loss 2.1471 LearningRate 0.0210 Epoch: 10 Global Step: 180770 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:23:25,276-Speed 5193.20 samples/sec Loss 2.1918 LearningRate 0.0210 Epoch: 10 Global Step: 180780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:23:27,246-Speed 5210.97 samples/sec Loss 2.1709 LearningRate 0.0210 Epoch: 10 Global Step: 180790 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:23:29,220-Speed 5189.92 samples/sec Loss 2.1677 LearningRate 0.0210 Epoch: 10 Global Step: 180800 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:23:31,194-Speed 5187.56 samples/sec Loss 2.1858 LearningRate 0.0210 Epoch: 10 Global Step: 180810 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:23:33,176-Speed 5168.25 samples/sec Loss 2.1288 LearningRate 0.0210 Epoch: 10 Global Step: 180820 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:23:35,160-Speed 5162.62 samples/sec Loss 2.0941 LearningRate 0.0210 Epoch: 10 Global Step: 180830 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:23:37,138-Speed 5179.49 samples/sec Loss 2.1783 LearningRate 0.0210 Epoch: 10 Global Step: 180840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:23:39,111-Speed 5191.09 samples/sec Loss 2.1186 LearningRate 0.0210 Epoch: 10 Global Step: 180850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:23:41,105-Speed 5154.00 samples/sec Loss 2.1405 LearningRate 0.0210 Epoch: 10 Global Step: 180860 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:23:43,075-Speed 5201.14 samples/sec Loss 2.1504 LearningRate 0.0210 Epoch: 10 Global Step: 180870 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:23:45,048-Speed 5190.52 samples/sec Loss 2.2051 LearningRate 0.0210 Epoch: 10 Global Step: 180880 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:23:47,027-Speed 5176.27 samples/sec Loss 2.2122 LearningRate 0.0210 Epoch: 10 Global Step: 180890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:23:49,030-Speed 5150.81 samples/sec Loss 2.1626 LearningRate 0.0210 Epoch: 10 Global Step: 180900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:23:51,009-Speed 5174.14 samples/sec Loss 2.1176 LearningRate 0.0210 Epoch: 10 Global Step: 180910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:23:52,995-Speed 5158.40 samples/sec Loss 2.2367 LearningRate 0.0210 Epoch: 10 Global Step: 180920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:23:54,979-Speed 5163.94 samples/sec Loss 2.1862 LearningRate 0.0210 Epoch: 10 Global Step: 180930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:23:56,948-Speed 5202.49 samples/sec Loss 2.1681 LearningRate 0.0210 Epoch: 10 Global Step: 180940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:23:58,921-Speed 5191.44 samples/sec Loss 2.1790 LearningRate 0.0210 Epoch: 10 Global Step: 180950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:24:00,911-Speed 5146.09 samples/sec Loss 2.2434 LearningRate 0.0210 Epoch: 10 Global Step: 180960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:24:02,891-Speed 5174.25 samples/sec Loss 2.1512 LearningRate 0.0210 Epoch: 10 Global Step: 180970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:24:04,876-Speed 5158.93 samples/sec Loss 2.2264 LearningRate 0.0210 Epoch: 10 Global Step: 180980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:24:06,863-Speed 5157.27 samples/sec Loss 2.1728 LearningRate 0.0210 Epoch: 10 Global Step: 180990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:24:08,846-Speed 5165.34 samples/sec Loss 2.1893 LearningRate 0.0210 Epoch: 10 Global Step: 181000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:24:10,850-Speed 5135.74 samples/sec Loss 2.1540 LearningRate 0.0210 Epoch: 10 Global Step: 181010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:24:12,827-Speed 5182.30 samples/sec Loss 2.1576 LearningRate 0.0210 Epoch: 10 Global Step: 181020 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:24:14,813-Speed 5157.49 samples/sec Loss 2.1466 LearningRate 0.0209 Epoch: 10 Global Step: 181030 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:24:16,805-Speed 5141.99 samples/sec Loss 2.1854 LearningRate 0.0209 Epoch: 10 Global Step: 181040 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:24:18,797-Speed 5188.65 samples/sec Loss 2.1508 LearningRate 0.0209 Epoch: 10 Global Step: 181050 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:24:20,769-Speed 5192.35 samples/sec Loss 2.1191 LearningRate 0.0209 Epoch: 10 Global Step: 181060 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:24:22,752-Speed 5166.41 samples/sec Loss 2.1944 LearningRate 0.0209 Epoch: 10 Global Step: 181070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:24:24,728-Speed 5184.34 samples/sec Loss 2.1842 LearningRate 0.0209 Epoch: 10 Global Step: 181080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:24:26,735-Speed 5138.03 samples/sec Loss 2.0830 LearningRate 0.0209 Epoch: 10 Global Step: 181090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:24:28,717-Speed 5169.08 samples/sec Loss 2.1393 LearningRate 0.0209 Epoch: 10 Global Step: 181100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:24:30,710-Speed 5139.31 samples/sec Loss 2.1913 LearningRate 0.0209 Epoch: 10 Global Step: 181110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:24:32,695-Speed 5160.46 samples/sec Loss 2.1887 LearningRate 0.0209 Epoch: 10 Global Step: 181120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:24:34,668-Speed 5192.82 samples/sec Loss 2.1086 LearningRate 0.0209 Epoch: 10 Global Step: 181130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:24:36,656-Speed 5152.54 samples/sec Loss 2.1476 LearningRate 0.0209 Epoch: 10 Global Step: 181140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:24:38,636-Speed 5172.03 samples/sec Loss 2.1900 LearningRate 0.0209 Epoch: 10 Global Step: 181150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:24:40,620-Speed 5186.26 samples/sec Loss 2.1282 LearningRate 0.0209 Epoch: 10 Global Step: 181160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:24:42,592-Speed 5194.50 samples/sec Loss 2.1686 LearningRate 0.0209 Epoch: 10 Global Step: 181170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:24:44,565-Speed 5192.90 samples/sec Loss 2.0470 LearningRate 0.0209 Epoch: 10 Global Step: 181180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:24:46,541-Speed 5183.34 samples/sec Loss 2.1229 LearningRate 0.0209 Epoch: 10 Global Step: 181190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:24:48,532-Speed 5144.96 samples/sec Loss 2.0917 LearningRate 0.0209 Epoch: 10 Global Step: 181200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:24:50,561-Speed 5083.61 samples/sec Loss 2.1972 LearningRate 0.0209 Epoch: 10 Global Step: 181210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:24:52,562-Speed 5116.97 samples/sec Loss 2.2205 LearningRate 0.0209 Epoch: 10 Global Step: 181220 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:24:54,551-Speed 5151.08 samples/sec Loss 2.2200 LearningRate 0.0209 Epoch: 10 Global Step: 181230 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:24:56,526-Speed 5185.51 samples/sec Loss 2.1511 LearningRate 0.0209 Epoch: 10 Global Step: 181240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:24:58,499-Speed 5193.03 samples/sec Loss 2.2288 LearningRate 0.0209 Epoch: 10 Global Step: 181250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:25:00,485-Speed 5156.94 samples/sec Loss 2.1102 LearningRate 0.0209 Epoch: 10 Global Step: 181260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:25:02,459-Speed 5188.98 samples/sec Loss 2.1514 LearningRate 0.0209 Epoch: 10 Global Step: 181270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:25:04,447-Speed 5152.97 samples/sec Loss 2.1688 LearningRate 0.0209 Epoch: 10 Global Step: 181280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:25:06,429-Speed 5168.36 samples/sec Loss 2.0875 LearningRate 0.0209 Epoch: 10 Global Step: 181290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:25:08,399-Speed 5198.75 samples/sec Loss 2.1535 LearningRate 0.0209 Epoch: 10 Global Step: 181300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:25:10,408-Speed 5123.77 samples/sec Loss 2.2208 LearningRate 0.0209 Epoch: 10 Global Step: 181310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:25:12,412-Speed 5112.88 samples/sec Loss 2.1669 LearningRate 0.0209 Epoch: 10 Global Step: 181320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:25:14,389-Speed 5180.11 samples/sec Loss 2.1303 LearningRate 0.0209 Epoch: 10 Global Step: 181330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:25:16,369-Speed 5174.93 samples/sec Loss 2.1940 LearningRate 0.0209 Epoch: 10 Global Step: 181340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:25:18,343-Speed 5187.16 samples/sec Loss 2.1573 LearningRate 0.0209 Epoch: 10 Global Step: 181350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:25:20,315-Speed 5196.18 samples/sec Loss 2.2500 LearningRate 0.0209 Epoch: 10 Global Step: 181360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:25:22,300-Speed 5160.56 samples/sec Loss 2.1502 LearningRate 0.0209 Epoch: 10 Global Step: 181370 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:25:24,293-Speed 5138.95 samples/sec Loss 2.1904 LearningRate 0.0209 Epoch: 10 Global Step: 181380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:25:26,286-Speed 5139.51 samples/sec Loss 2.1203 LearningRate 0.0209 Epoch: 10 Global Step: 181390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:25:28,310-Speed 5060.78 samples/sec Loss 2.0934 LearningRate 0.0208 Epoch: 10 Global Step: 181400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:25:30,309-Speed 5124.63 samples/sec Loss 2.1868 LearningRate 0.0208 Epoch: 10 Global Step: 181410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:25:32,274-Speed 5213.47 samples/sec Loss 2.1194 LearningRate 0.0208 Epoch: 10 Global Step: 181420 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:25:34,263-Speed 5150.06 samples/sec Loss 2.1408 LearningRate 0.0208 Epoch: 10 Global Step: 181430 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:25:36,238-Speed 5185.22 samples/sec Loss 2.1382 LearningRate 0.0208 Epoch: 10 Global Step: 181440 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:25:38,222-Speed 5164.50 samples/sec Loss 2.1304 LearningRate 0.0208 Epoch: 10 Global Step: 181450 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:25:40,197-Speed 5184.76 samples/sec Loss 2.1952 LearningRate 0.0208 Epoch: 10 Global Step: 181460 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:25:42,195-Speed 5128.57 samples/sec Loss 2.0801 LearningRate 0.0208 Epoch: 10 Global Step: 181470 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:25:44,178-Speed 5163.75 samples/sec Loss 2.1557 LearningRate 0.0208 Epoch: 10 Global Step: 181480 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:25:46,165-Speed 5156.37 samples/sec Loss 2.1067 LearningRate 0.0208 Epoch: 10 Global Step: 181490 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:25:48,148-Speed 5164.42 samples/sec Loss 2.2671 LearningRate 0.0208 Epoch: 10 Global Step: 181500 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:25:50,152-Speed 5143.10 samples/sec Loss 2.1721 LearningRate 0.0208 Epoch: 10 Global Step: 181510 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:25:52,145-Speed 5141.24 samples/sec Loss 2.0787 LearningRate 0.0208 Epoch: 10 Global Step: 181520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:25:54,119-Speed 5188.41 samples/sec Loss 2.1626 LearningRate 0.0208 Epoch: 10 Global Step: 181530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:25:56,106-Speed 5156.13 samples/sec Loss 2.1575 LearningRate 0.0208 Epoch: 10 Global Step: 181540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:25:58,086-Speed 5183.16 samples/sec Loss 2.1356 LearningRate 0.0208 Epoch: 10 Global Step: 181550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:26:00,067-Speed 5170.73 samples/sec Loss 2.1088 LearningRate 0.0208 Epoch: 10 Global Step: 181560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:26:02,049-Speed 5167.27 samples/sec Loss 2.1654 LearningRate 0.0208 Epoch: 10 Global Step: 181570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:26:04,026-Speed 5180.61 samples/sec Loss 2.1074 LearningRate 0.0208 Epoch: 10 Global Step: 181580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:26:06,018-Speed 5143.41 samples/sec Loss 2.2193 LearningRate 0.0208 Epoch: 10 Global Step: 181590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:26:07,991-Speed 5190.14 samples/sec Loss 2.1740 LearningRate 0.0208 Epoch: 10 Global Step: 181600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:26:09,987-Speed 5131.80 samples/sec Loss 2.1341 LearningRate 0.0208 Epoch: 10 Global Step: 181610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:26:11,984-Speed 5142.97 samples/sec Loss 2.1095 LearningRate 0.0208 Epoch: 10 Global Step: 181620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:26:13,958-Speed 5188.76 samples/sec Loss 2.1082 LearningRate 0.0208 Epoch: 10 Global Step: 181630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:26:15,939-Speed 5171.53 samples/sec Loss 2.1239 LearningRate 0.0208 Epoch: 10 Global Step: 181640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:26:17,915-Speed 5184.07 samples/sec Loss 2.1252 LearningRate 0.0208 Epoch: 10 Global Step: 181650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:26:19,893-Speed 5188.36 samples/sec Loss 2.1769 LearningRate 0.0208 Epoch: 10 Global Step: 181660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:26:21,876-Speed 5165.51 samples/sec Loss 2.2108 LearningRate 0.0208 Epoch: 10 Global Step: 181670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:26:23,884-Speed 5102.19 samples/sec Loss 2.1958 LearningRate 0.0208 Epoch: 10 Global Step: 181680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:26:25,871-Speed 5154.46 samples/sec Loss 2.1365 LearningRate 0.0208 Epoch: 10 Global Step: 181690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:26:27,861-Speed 5168.19 samples/sec Loss 2.1825 LearningRate 0.0208 Epoch: 10 Global Step: 181700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:26:29,839-Speed 5177.04 samples/sec Loss 2.1276 LearningRate 0.0208 Epoch: 10 Global Step: 181710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:26:31,814-Speed 5186.66 samples/sec Loss 2.0811 LearningRate 0.0208 Epoch: 10 Global Step: 181720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:26:33,792-Speed 5178.35 samples/sec Loss 2.1710 LearningRate 0.0208 Epoch: 10 Global Step: 181730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:26:35,761-Speed 5202.33 samples/sec Loss 2.1538 LearningRate 0.0208 Epoch: 10 Global Step: 181740 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:26:37,744-Speed 5166.62 samples/sec Loss 2.2186 LearningRate 0.0208 Epoch: 10 Global Step: 181750 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:26:39,724-Speed 5173.22 samples/sec Loss 2.1504 LearningRate 0.0207 Epoch: 10 Global Step: 181760 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:26:41,709-Speed 5160.34 samples/sec Loss 2.2043 LearningRate 0.0207 Epoch: 10 Global Step: 181770 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:26:43,684-Speed 5186.00 samples/sec Loss 2.1639 LearningRate 0.0207 Epoch: 10 Global Step: 181780 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:26:45,699-Speed 5083.63 samples/sec Loss 2.1192 LearningRate 0.0207 Epoch: 10 Global Step: 181790 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:26:47,717-Speed 5077.54 samples/sec Loss 2.1729 LearningRate 0.0207 Epoch: 10 Global Step: 181800 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:26:50,008-Speed 5106.58 samples/sec Loss 2.1372 LearningRate 0.0207 Epoch: 10 Global Step: 181810 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:26:51,992-Speed 5163.33 samples/sec Loss 2.1071 LearningRate 0.0207 Epoch: 10 Global Step: 181820 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:26:53,969-Speed 5182.40 samples/sec Loss 2.1702 LearningRate 0.0207 Epoch: 10 Global Step: 181830 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:26:55,948-Speed 5185.75 samples/sec Loss 2.1613 LearningRate 0.0207 Epoch: 10 Global Step: 181840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:26:57,925-Speed 5179.55 samples/sec Loss 2.0858 LearningRate 0.0207 Epoch: 10 Global Step: 181850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:26:59,945-Speed 5071.30 samples/sec Loss 2.1323 LearningRate 0.0207 Epoch: 10 Global Step: 181860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:27:01,931-Speed 5156.84 samples/sec Loss 2.2072 LearningRate 0.0207 Epoch: 10 Global Step: 181870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:27:03,926-Speed 5136.57 samples/sec Loss 2.1311 LearningRate 0.0207 Epoch: 10 Global Step: 181880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:27:05,909-Speed 5165.00 samples/sec Loss 2.1457 LearningRate 0.0207 Epoch: 10 Global Step: 181890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:27:07,883-Speed 5188.30 samples/sec Loss 2.1163 LearningRate 0.0207 Epoch: 10 Global Step: 181900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:27:09,868-Speed 5159.88 samples/sec Loss 2.1556 LearningRate 0.0207 Epoch: 10 Global Step: 181910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:27:11,871-Speed 5114.75 samples/sec Loss 2.1172 LearningRate 0.0207 Epoch: 10 Global Step: 181920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:27:13,848-Speed 5182.66 samples/sec Loss 2.1243 LearningRate 0.0207 Epoch: 10 Global Step: 181930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:27:15,835-Speed 5153.24 samples/sec Loss 2.1120 LearningRate 0.0207 Epoch: 10 Global Step: 181940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:27:17,810-Speed 5185.63 samples/sec Loss 2.1288 LearningRate 0.0207 Epoch: 10 Global Step: 181950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:27:19,800-Speed 5167.86 samples/sec Loss 2.1446 LearningRate 0.0207 Epoch: 10 Global Step: 181960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:27:21,774-Speed 5188.88 samples/sec Loss 2.1285 LearningRate 0.0207 Epoch: 10 Global Step: 181970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:27:23,761-Speed 5155.87 samples/sec Loss 2.1652 LearningRate 0.0207 Epoch: 10 Global Step: 181980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:27:25,741-Speed 5172.54 samples/sec Loss 2.1121 LearningRate 0.0207 Epoch: 10 Global Step: 181990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:27:27,765-Speed 5171.92 samples/sec Loss 2.0868 LearningRate 0.0207 Epoch: 10 Global Step: 182000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:27:54,283-[lfw][182000]XNorm: 23.251915 Training: 2022-04-11 11:27:54,283-[lfw][182000]Accuracy-Flip: 0.99767+-0.00249 Training: 2022-04-11 11:27:54,284-[lfw][182000]Accuracy-Highest: 0.99833 Training: 2022-04-11 11:28:24,914-[cfp_fp][182000]XNorm: 21.807954 Training: 2022-04-11 11:28:24,917-[cfp_fp][182000]Accuracy-Flip: 0.98543+-0.00657 Training: 2022-04-11 11:28:24,918-[cfp_fp][182000]Accuracy-Highest: 0.98629 Training: 2022-04-11 11:28:51,355-[agedb_30][182000]XNorm: 23.524001 Training: 2022-04-11 11:28:51,355-[agedb_30][182000]Accuracy-Flip: 0.98250+-0.00684 Training: 2022-04-11 11:28:51,356-[agedb_30][182000]Accuracy-Highest: 0.98250 Training: 2022-04-11 11:28:53,338-Speed 119.66 samples/sec Loss 2.1822 LearningRate 0.0207 Epoch: 10 Global Step: 182010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:28:55,300-Speed 5219.45 samples/sec Loss 2.1715 LearningRate 0.0207 Epoch: 10 Global Step: 182020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:28:57,265-Speed 5213.14 samples/sec Loss 2.1317 LearningRate 0.0207 Epoch: 10 Global Step: 182030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:28:59,241-Speed 5183.03 samples/sec Loss 2.1527 LearningRate 0.0207 Epoch: 10 Global Step: 182040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:29:01,220-Speed 5213.83 samples/sec Loss 2.1300 LearningRate 0.0207 Epoch: 10 Global Step: 182050 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:29:03,224-Speed 5110.20 samples/sec Loss 2.1881 LearningRate 0.0207 Epoch: 10 Global Step: 182060 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:29:05,246-Speed 5065.54 samples/sec Loss 2.1258 LearningRate 0.0207 Epoch: 10 Global Step: 182070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:29:07,217-Speed 5206.02 samples/sec Loss 2.2221 LearningRate 0.0207 Epoch: 10 Global Step: 182080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:29:09,184-Speed 5205.33 samples/sec Loss 2.1504 LearningRate 0.0207 Epoch: 10 Global Step: 182090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:29:11,153-Speed 5202.85 samples/sec Loss 2.1480 LearningRate 0.0207 Epoch: 10 Global Step: 182100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:29:13,128-Speed 5187.42 samples/sec Loss 2.1647 LearningRate 0.0207 Epoch: 10 Global Step: 182110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:29:15,112-Speed 5175.10 samples/sec Loss 2.2130 LearningRate 0.0207 Epoch: 10 Global Step: 182120 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:29:17,086-Speed 5187.93 samples/sec Loss 2.1025 LearningRate 0.0206 Epoch: 10 Global Step: 182130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:29:19,056-Speed 5199.62 samples/sec Loss 2.1947 LearningRate 0.0206 Epoch: 10 Global Step: 182140 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:29:21,026-Speed 5198.42 samples/sec Loss 2.1891 LearningRate 0.0206 Epoch: 10 Global Step: 182150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:29:23,023-Speed 5131.69 samples/sec Loss 2.1803 LearningRate 0.0206 Epoch: 10 Global Step: 182160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:29:25,021-Speed 5126.10 samples/sec Loss 2.1008 LearningRate 0.0206 Epoch: 10 Global Step: 182170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:29:27,011-Speed 5148.43 samples/sec Loss 2.1165 LearningRate 0.0206 Epoch: 10 Global Step: 182180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:29:28,985-Speed 5189.39 samples/sec Loss 2.1662 LearningRate 0.0206 Epoch: 10 Global Step: 182190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:29:30,954-Speed 5202.59 samples/sec Loss 2.1784 LearningRate 0.0206 Epoch: 10 Global Step: 182200 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:29:32,941-Speed 5155.15 samples/sec Loss 2.1232 LearningRate 0.0206 Epoch: 10 Global Step: 182210 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:29:34,934-Speed 5138.42 samples/sec Loss 2.1625 LearningRate 0.0206 Epoch: 10 Global Step: 182220 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:29:36,928-Speed 5137.27 samples/sec Loss 2.1147 LearningRate 0.0206 Epoch: 10 Global Step: 182230 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:29:38,963-Speed 5169.78 samples/sec Loss 2.1351 LearningRate 0.0206 Epoch: 10 Global Step: 182240 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:29:40,961-Speed 5126.93 samples/sec Loss 2.1734 LearningRate 0.0206 Epoch: 10 Global Step: 182250 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:29:42,935-Speed 5190.09 samples/sec Loss 2.1465 LearningRate 0.0206 Epoch: 10 Global Step: 182260 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:29:44,908-Speed 5191.84 samples/sec Loss 2.2415 LearningRate 0.0206 Epoch: 10 Global Step: 182270 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:29:46,895-Speed 5155.35 samples/sec Loss 2.1762 LearningRate 0.0206 Epoch: 10 Global Step: 182280 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:29:48,882-Speed 5154.82 samples/sec Loss 2.1533 LearningRate 0.0206 Epoch: 10 Global Step: 182290 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:29:50,858-Speed 5181.90 samples/sec Loss 2.1530 LearningRate 0.0206 Epoch: 10 Global Step: 182300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:29:52,886-Speed 5191.42 samples/sec Loss 2.1131 LearningRate 0.0206 Epoch: 10 Global Step: 182310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:29:54,861-Speed 5186.58 samples/sec Loss 2.1767 LearningRate 0.0206 Epoch: 10 Global Step: 182320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:29:56,855-Speed 5138.38 samples/sec Loss 2.2010 LearningRate 0.0206 Epoch: 10 Global Step: 182330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:29:58,840-Speed 5159.19 samples/sec Loss 2.1775 LearningRate 0.0206 Epoch: 10 Global Step: 182340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:30:00,824-Speed 5175.58 samples/sec Loss 2.1800 LearningRate 0.0206 Epoch: 10 Global Step: 182350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:30:02,802-Speed 5179.34 samples/sec Loss 2.1442 LearningRate 0.0206 Epoch: 10 Global Step: 182360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:30:04,792-Speed 5147.56 samples/sec Loss 2.2150 LearningRate 0.0206 Epoch: 10 Global Step: 182370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:30:06,796-Speed 5111.16 samples/sec Loss 2.1386 LearningRate 0.0206 Epoch: 10 Global Step: 182380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:30:08,780-Speed 5174.06 samples/sec Loss 2.1201 LearningRate 0.0206 Epoch: 10 Global Step: 182390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:30:10,759-Speed 5175.56 samples/sec Loss 2.1043 LearningRate 0.0206 Epoch: 10 Global Step: 182400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:30:12,752-Speed 5139.20 samples/sec Loss 2.0930 LearningRate 0.0206 Epoch: 10 Global Step: 182410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:30:14,737-Speed 5160.61 samples/sec Loss 2.1207 LearningRate 0.0206 Epoch: 10 Global Step: 182420 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:30:16,749-Speed 5091.68 samples/sec Loss 2.1250 LearningRate 0.0206 Epoch: 10 Global Step: 182430 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:30:18,760-Speed 5094.05 samples/sec Loss 2.1562 LearningRate 0.0206 Epoch: 10 Global Step: 182440 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:30:20,741-Speed 5170.45 samples/sec Loss 2.1157 LearningRate 0.0206 Epoch: 10 Global Step: 182450 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:30:22,730-Speed 5150.38 samples/sec Loss 2.1567 LearningRate 0.0206 Epoch: 10 Global Step: 182460 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:30:24,717-Speed 5155.88 samples/sec Loss 2.2242 LearningRate 0.0206 Epoch: 10 Global Step: 182470 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:30:26,708-Speed 5144.79 samples/sec Loss 2.1118 LearningRate 0.0206 Epoch: 10 Global Step: 182480 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:30:28,686-Speed 5178.70 samples/sec Loss 2.1474 LearningRate 0.0206 Epoch: 10 Global Step: 182490 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:30:30,661-Speed 5185.32 samples/sec Loss 2.1080 LearningRate 0.0205 Epoch: 10 Global Step: 182500 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:30:32,642-Speed 5172.10 samples/sec Loss 2.0440 LearningRate 0.0205 Epoch: 10 Global Step: 182510 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:30:34,622-Speed 5172.82 samples/sec Loss 2.1293 LearningRate 0.0205 Epoch: 10 Global Step: 182520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:30:36,611-Speed 5150.23 samples/sec Loss 2.1499 LearningRate 0.0205 Epoch: 10 Global Step: 182530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:30:38,607-Speed 5133.20 samples/sec Loss 2.1347 LearningRate 0.0205 Epoch: 10 Global Step: 182540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:30:40,595-Speed 5154.82 samples/sec Loss 2.1110 LearningRate 0.0205 Epoch: 10 Global Step: 182550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:30:42,577-Speed 5168.13 samples/sec Loss 2.1249 LearningRate 0.0205 Epoch: 10 Global Step: 182560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:30:44,574-Speed 5127.82 samples/sec Loss 2.2096 LearningRate 0.0205 Epoch: 10 Global Step: 182570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:30:46,633-Speed 4977.50 samples/sec Loss 2.1240 LearningRate 0.0205 Epoch: 10 Global Step: 182580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:30:48,620-Speed 5153.79 samples/sec Loss 2.1576 LearningRate 0.0205 Epoch: 10 Global Step: 182590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:30:50,626-Speed 5107.23 samples/sec Loss 2.1167 LearningRate 0.0205 Epoch: 10 Global Step: 182600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:30:52,602-Speed 5185.32 samples/sec Loss 2.1256 LearningRate 0.0205 Epoch: 10 Global Step: 182610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:30:54,581-Speed 5176.61 samples/sec Loss 2.1451 LearningRate 0.0205 Epoch: 10 Global Step: 182620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:30:56,568-Speed 5153.20 samples/sec Loss 2.1988 LearningRate 0.0205 Epoch: 10 Global Step: 182630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:30:58,548-Speed 5174.00 samples/sec Loss 2.1111 LearningRate 0.0205 Epoch: 10 Global Step: 182640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:31:00,541-Speed 5141.19 samples/sec Loss 2.1870 LearningRate 0.0205 Epoch: 10 Global Step: 182650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:31:02,524-Speed 5163.62 samples/sec Loss 2.1493 LearningRate 0.0205 Epoch: 10 Global Step: 182660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:31:04,516-Speed 5143.50 samples/sec Loss 2.0802 LearningRate 0.0205 Epoch: 10 Global Step: 182670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:31:06,495-Speed 5175.60 samples/sec Loss 2.0913 LearningRate 0.0205 Epoch: 10 Global Step: 182680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:31:08,478-Speed 5165.01 samples/sec Loss 2.1242 LearningRate 0.0205 Epoch: 10 Global Step: 182690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:31:10,449-Speed 5198.04 samples/sec Loss 2.1628 LearningRate 0.0205 Epoch: 10 Global Step: 182700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:31:12,424-Speed 5187.21 samples/sec Loss 2.2158 LearningRate 0.0205 Epoch: 10 Global Step: 182710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:31:14,416-Speed 5140.91 samples/sec Loss 2.1181 LearningRate 0.0205 Epoch: 10 Global Step: 182720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:31:16,396-Speed 5174.20 samples/sec Loss 2.1772 LearningRate 0.0205 Epoch: 10 Global Step: 182730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:31:18,376-Speed 5174.45 samples/sec Loss 2.1690 LearningRate 0.0205 Epoch: 10 Global Step: 182740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:31:20,349-Speed 5190.69 samples/sec Loss 2.1510 LearningRate 0.0205 Epoch: 10 Global Step: 182750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:31:22,318-Speed 5203.06 samples/sec Loss 2.1412 LearningRate 0.0205 Epoch: 10 Global Step: 182760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:31:24,296-Speed 5179.04 samples/sec Loss 2.1091 LearningRate 0.0205 Epoch: 10 Global Step: 182770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:31:26,268-Speed 5194.25 samples/sec Loss 2.1685 LearningRate 0.0205 Epoch: 10 Global Step: 182780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:31:28,253-Speed 5160.09 samples/sec Loss 2.1767 LearningRate 0.0205 Epoch: 10 Global Step: 182790 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:31:30,224-Speed 5196.29 samples/sec Loss 2.1375 LearningRate 0.0205 Epoch: 10 Global Step: 182800 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:31:32,196-Speed 5194.65 samples/sec Loss 2.1197 LearningRate 0.0205 Epoch: 10 Global Step: 182810 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:31:34,170-Speed 5190.20 samples/sec Loss 2.1190 LearningRate 0.0205 Epoch: 10 Global Step: 182820 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:31:36,156-Speed 5158.38 samples/sec Loss 2.1377 LearningRate 0.0205 Epoch: 10 Global Step: 182830 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:31:38,124-Speed 5204.17 samples/sec Loss 2.1596 LearningRate 0.0205 Epoch: 10 Global Step: 182840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:31:40,115-Speed 5145.68 samples/sec Loss 2.1119 LearningRate 0.0205 Epoch: 10 Global Step: 182850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:31:42,089-Speed 5189.35 samples/sec Loss 2.0910 LearningRate 0.0205 Epoch: 10 Global Step: 182860 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:31:44,061-Speed 5193.01 samples/sec Loss 2.1603 LearningRate 0.0204 Epoch: 10 Global Step: 182870 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:31:46,050-Speed 5149.59 samples/sec Loss 2.1499 LearningRate 0.0204 Epoch: 10 Global Step: 182880 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:31:48,067-Speed 5080.44 samples/sec Loss 2.1943 LearningRate 0.0204 Epoch: 10 Global Step: 182890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:31:50,059-Speed 5140.35 samples/sec Loss 2.1536 LearningRate 0.0204 Epoch: 10 Global Step: 182900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:31:52,057-Speed 5127.07 samples/sec Loss 2.1882 LearningRate 0.0204 Epoch: 10 Global Step: 182910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:31:54,037-Speed 5173.63 samples/sec Loss 2.1494 LearningRate 0.0204 Epoch: 10 Global Step: 182920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:31:56,015-Speed 5181.34 samples/sec Loss 2.1477 LearningRate 0.0204 Epoch: 10 Global Step: 182930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:31:57,988-Speed 5190.34 samples/sec Loss 2.1629 LearningRate 0.0204 Epoch: 10 Global Step: 182940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:31:59,977-Speed 5148.93 samples/sec Loss 2.0725 LearningRate 0.0204 Epoch: 10 Global Step: 182950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:32:01,959-Speed 5169.24 samples/sec Loss 2.1710 LearningRate 0.0204 Epoch: 10 Global Step: 182960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:32:03,931-Speed 5194.56 samples/sec Loss 2.0903 LearningRate 0.0204 Epoch: 10 Global Step: 182970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:32:05,907-Speed 5182.86 samples/sec Loss 2.1809 LearningRate 0.0204 Epoch: 10 Global Step: 182980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:32:07,882-Speed 5188.05 samples/sec Loss 2.1743 LearningRate 0.0204 Epoch: 10 Global Step: 182990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:32:09,875-Speed 5140.83 samples/sec Loss 2.0694 LearningRate 0.0204 Epoch: 10 Global Step: 183000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:32:11,857-Speed 5166.66 samples/sec Loss 2.1324 LearningRate 0.0204 Epoch: 10 Global Step: 183010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:32:13,854-Speed 5129.16 samples/sec Loss 2.1690 LearningRate 0.0204 Epoch: 10 Global Step: 183020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:32:15,834-Speed 5175.30 samples/sec Loss 2.1283 LearningRate 0.0204 Epoch: 10 Global Step: 183030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:32:17,829-Speed 5134.41 samples/sec Loss 2.1563 LearningRate 0.0204 Epoch: 10 Global Step: 183040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:32:19,796-Speed 5207.08 samples/sec Loss 2.1249 LearningRate 0.0204 Epoch: 10 Global Step: 183050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:32:21,781-Speed 5163.67 samples/sec Loss 2.1732 LearningRate 0.0204 Epoch: 10 Global Step: 183060 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:32:23,768-Speed 5155.30 samples/sec Loss 2.1080 LearningRate 0.0204 Epoch: 10 Global Step: 183070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:32:25,758-Speed 5148.33 samples/sec Loss 2.1404 LearningRate 0.0204 Epoch: 10 Global Step: 183080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:32:27,742-Speed 5163.04 samples/sec Loss 2.1354 LearningRate 0.0204 Epoch: 10 Global Step: 183090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:32:29,722-Speed 5171.63 samples/sec Loss 2.1493 LearningRate 0.0204 Epoch: 10 Global Step: 183100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:32:31,703-Speed 5172.34 samples/sec Loss 2.1050 LearningRate 0.0204 Epoch: 10 Global Step: 183110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:32:33,688-Speed 5158.86 samples/sec Loss 2.1625 LearningRate 0.0204 Epoch: 10 Global Step: 183120 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:32:35,671-Speed 5167.85 samples/sec Loss 2.1524 LearningRate 0.0204 Epoch: 10 Global Step: 183130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:32:37,643-Speed 5192.94 samples/sec Loss 2.1422 LearningRate 0.0204 Epoch: 10 Global Step: 183140 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:32:39,651-Speed 5102.20 samples/sec Loss 2.1755 LearningRate 0.0204 Epoch: 10 Global Step: 183150 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:32:41,641-Speed 5146.47 samples/sec Loss 2.1333 LearningRate 0.0204 Epoch: 10 Global Step: 183160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:32:43,617-Speed 5185.52 samples/sec Loss 2.0880 LearningRate 0.0204 Epoch: 10 Global Step: 183170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:32:45,630-Speed 5087.85 samples/sec Loss 2.1750 LearningRate 0.0204 Epoch: 10 Global Step: 183180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:32:47,623-Speed 5139.59 samples/sec Loss 2.1899 LearningRate 0.0204 Epoch: 10 Global Step: 183190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:32:49,617-Speed 5135.88 samples/sec Loss 2.1407 LearningRate 0.0204 Epoch: 10 Global Step: 183200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:32:51,607-Speed 5148.47 samples/sec Loss 2.1288 LearningRate 0.0204 Epoch: 10 Global Step: 183210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:32:53,583-Speed 5182.95 samples/sec Loss 2.1689 LearningRate 0.0204 Epoch: 10 Global Step: 183220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:32:55,558-Speed 5188.31 samples/sec Loss 2.1765 LearningRate 0.0204 Epoch: 10 Global Step: 183230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:32:57,536-Speed 5177.87 samples/sec Loss 2.2057 LearningRate 0.0203 Epoch: 10 Global Step: 183240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:32:59,526-Speed 5149.24 samples/sec Loss 2.1413 LearningRate 0.0203 Epoch: 10 Global Step: 183250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:33:01,502-Speed 5183.10 samples/sec Loss 2.1709 LearningRate 0.0203 Epoch: 10 Global Step: 183260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:33:03,477-Speed 5187.23 samples/sec Loss 2.1484 LearningRate 0.0203 Epoch: 10 Global Step: 183270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:33:05,462-Speed 5160.05 samples/sec Loss 2.1862 LearningRate 0.0203 Epoch: 10 Global Step: 183280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:33:07,430-Speed 5205.70 samples/sec Loss 2.1486 LearningRate 0.0203 Epoch: 10 Global Step: 183290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:33:09,412-Speed 5168.19 samples/sec Loss 2.1219 LearningRate 0.0203 Epoch: 10 Global Step: 183300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:33:11,398-Speed 5157.58 samples/sec Loss 2.0966 LearningRate 0.0203 Epoch: 10 Global Step: 183310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:33:13,370-Speed 5193.59 samples/sec Loss 2.2116 LearningRate 0.0203 Epoch: 10 Global Step: 183320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:33:15,372-Speed 5116.82 samples/sec Loss 2.1490 LearningRate 0.0203 Epoch: 10 Global Step: 183330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:33:17,350-Speed 5177.48 samples/sec Loss 2.0847 LearningRate 0.0203 Epoch: 10 Global Step: 183340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:33:19,323-Speed 5193.42 samples/sec Loss 2.1632 LearningRate 0.0203 Epoch: 10 Global Step: 183350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:33:21,305-Speed 5166.74 samples/sec Loss 2.1341 LearningRate 0.0203 Epoch: 10 Global Step: 183360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:33:23,291-Speed 5158.38 samples/sec Loss 2.0964 LearningRate 0.0203 Epoch: 10 Global Step: 183370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:33:25,287-Speed 5132.28 samples/sec Loss 2.1485 LearningRate 0.0203 Epoch: 10 Global Step: 183380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:33:27,264-Speed 5181.99 samples/sec Loss 2.0593 LearningRate 0.0203 Epoch: 10 Global Step: 183390 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:33:29,243-Speed 5175.99 samples/sec Loss 2.1908 LearningRate 0.0203 Epoch: 10 Global Step: 183400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:33:31,222-Speed 5176.78 samples/sec Loss 2.1188 LearningRate 0.0203 Epoch: 10 Global Step: 183410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:33:33,196-Speed 5188.69 samples/sec Loss 2.0786 LearningRate 0.0203 Epoch: 10 Global Step: 183420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:33:35,180-Speed 5162.54 samples/sec Loss 2.1265 LearningRate 0.0203 Epoch: 10 Global Step: 183430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:33:37,147-Speed 5208.74 samples/sec Loss 2.1714 LearningRate 0.0203 Epoch: 10 Global Step: 183440 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:33:39,151-Speed 5111.38 samples/sec Loss 2.1236 LearningRate 0.0203 Epoch: 10 Global Step: 183450 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:33:41,133-Speed 5169.84 samples/sec Loss 2.2078 LearningRate 0.0203 Epoch: 10 Global Step: 183460 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:33:43,103-Speed 5197.64 samples/sec Loss 2.1044 LearningRate 0.0203 Epoch: 10 Global Step: 183470 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:33:45,092-Speed 5150.80 samples/sec Loss 2.1132 LearningRate 0.0203 Epoch: 10 Global Step: 183480 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:33:47,082-Speed 5146.97 samples/sec Loss 2.1546 LearningRate 0.0203 Epoch: 10 Global Step: 183490 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:33:49,074-Speed 5141.96 samples/sec Loss 2.1750 LearningRate 0.0203 Epoch: 10 Global Step: 183500 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:33:51,052-Speed 5179.86 samples/sec Loss 2.1384 LearningRate 0.0203 Epoch: 10 Global Step: 183510 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:33:53,025-Speed 5189.72 samples/sec Loss 2.1742 LearningRate 0.0203 Epoch: 10 Global Step: 183520 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:33:54,999-Speed 5189.94 samples/sec Loss 2.1731 LearningRate 0.0203 Epoch: 10 Global Step: 183530 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:33:56,970-Speed 5196.36 samples/sec Loss 2.1281 LearningRate 0.0203 Epoch: 10 Global Step: 183540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:33:58,964-Speed 5139.18 samples/sec Loss 2.1058 LearningRate 0.0203 Epoch: 10 Global Step: 183550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:34:00,943-Speed 5175.75 samples/sec Loss 2.1156 LearningRate 0.0203 Epoch: 10 Global Step: 183560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:34:02,920-Speed 5181.44 samples/sec Loss 2.1290 LearningRate 0.0203 Epoch: 10 Global Step: 183570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:34:04,897-Speed 5180.81 samples/sec Loss 2.1185 LearningRate 0.0203 Epoch: 10 Global Step: 183580 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:34:07,454-Speed 4005.55 samples/sec Loss 2.0943 LearningRate 0.0203 Epoch: 10 Global Step: 183590 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:34:09,406-Speed 5249.00 samples/sec Loss 2.1330 LearningRate 0.0203 Epoch: 10 Global Step: 183600 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:34:38,529-Speed 351.63 samples/sec Loss 1.6824 LearningRate 0.0202 Epoch: 11 Global Step: 183610 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:34:40,840-Speed 4432.50 samples/sec Loss 1.5941 LearningRate 0.0202 Epoch: 11 Global Step: 183620 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:34:42,818-Speed 5180.73 samples/sec Loss 1.6473 LearningRate 0.0202 Epoch: 11 Global Step: 183630 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:34:44,779-Speed 5222.05 samples/sec Loss 1.5610 LearningRate 0.0202 Epoch: 11 Global Step: 183640 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:34:46,787-Speed 5102.65 samples/sec Loss 1.5490 LearningRate 0.0202 Epoch: 11 Global Step: 183650 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:34:48,912-Speed 4820.29 samples/sec Loss 1.5960 LearningRate 0.0202 Epoch: 11 Global Step: 183660 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:34:50,888-Speed 5184.02 samples/sec Loss 1.6051 LearningRate 0.0202 Epoch: 11 Global Step: 183670 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:34:52,863-Speed 5186.08 samples/sec Loss 1.5265 LearningRate 0.0202 Epoch: 11 Global Step: 183680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:34:54,826-Speed 5218.15 samples/sec Loss 1.5882 LearningRate 0.0202 Epoch: 11 Global Step: 183690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:34:56,793-Speed 5209.50 samples/sec Loss 1.6783 LearningRate 0.0202 Epoch: 11 Global Step: 183700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:34:58,775-Speed 5169.01 samples/sec Loss 1.6636 LearningRate 0.0202 Epoch: 11 Global Step: 183710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:35:00,775-Speed 5122.62 samples/sec Loss 1.5862 LearningRate 0.0202 Epoch: 11 Global Step: 183720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:35:02,751-Speed 5182.76 samples/sec Loss 1.5329 LearningRate 0.0202 Epoch: 11 Global Step: 183730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:35:04,725-Speed 5188.33 samples/sec Loss 1.6369 LearningRate 0.0202 Epoch: 11 Global Step: 183740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:35:06,690-Speed 5213.76 samples/sec Loss 1.6336 LearningRate 0.0202 Epoch: 11 Global Step: 183750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:35:08,679-Speed 5150.91 samples/sec Loss 1.6589 LearningRate 0.0202 Epoch: 11 Global Step: 183760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:35:10,658-Speed 5174.88 samples/sec Loss 1.6094 LearningRate 0.0202 Epoch: 11 Global Step: 183770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:35:12,638-Speed 5172.44 samples/sec Loss 1.6516 LearningRate 0.0202 Epoch: 11 Global Step: 183780 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:35:14,623-Speed 5159.98 samples/sec Loss 1.5398 LearningRate 0.0202 Epoch: 11 Global Step: 183790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:35:16,616-Speed 5140.34 samples/sec Loss 1.6037 LearningRate 0.0202 Epoch: 11 Global Step: 183800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:35:18,585-Speed 5203.09 samples/sec Loss 1.6253 LearningRate 0.0202 Epoch: 11 Global Step: 183810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:35:20,551-Speed 5209.85 samples/sec Loss 1.6258 LearningRate 0.0202 Epoch: 11 Global Step: 183820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:35:22,525-Speed 5189.89 samples/sec Loss 1.5380 LearningRate 0.0202 Epoch: 11 Global Step: 183830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:35:24,496-Speed 5196.18 samples/sec Loss 1.5889 LearningRate 0.0202 Epoch: 11 Global Step: 183840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:35:26,482-Speed 5159.67 samples/sec Loss 1.5689 LearningRate 0.0202 Epoch: 11 Global Step: 183850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:35:28,465-Speed 5165.49 samples/sec Loss 1.6475 LearningRate 0.0202 Epoch: 11 Global Step: 183860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:35:30,438-Speed 5191.03 samples/sec Loss 1.6001 LearningRate 0.0202 Epoch: 11 Global Step: 183870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:35:32,417-Speed 5176.95 samples/sec Loss 1.6100 LearningRate 0.0202 Epoch: 11 Global Step: 183880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:35:34,392-Speed 5184.23 samples/sec Loss 1.6148 LearningRate 0.0202 Epoch: 11 Global Step: 183890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:35:36,378-Speed 5159.43 samples/sec Loss 1.6424 LearningRate 0.0202 Epoch: 11 Global Step: 183900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:35:38,366-Speed 5153.77 samples/sec Loss 1.6263 LearningRate 0.0202 Epoch: 11 Global Step: 183910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:35:40,341-Speed 5187.12 samples/sec Loss 1.6404 LearningRate 0.0202 Epoch: 11 Global Step: 183920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:35:42,331-Speed 5145.69 samples/sec Loss 1.5626 LearningRate 0.0202 Epoch: 11 Global Step: 183930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:35:44,308-Speed 5181.78 samples/sec Loss 1.5959 LearningRate 0.0202 Epoch: 11 Global Step: 183940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:35:46,284-Speed 5183.82 samples/sec Loss 1.6157 LearningRate 0.0202 Epoch: 11 Global Step: 183950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:35:48,843-Speed 4001.47 samples/sec Loss 1.6020 LearningRate 0.0202 Epoch: 11 Global Step: 183960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:35:50,834-Speed 5147.06 samples/sec Loss 1.5705 LearningRate 0.0202 Epoch: 11 Global Step: 183970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:35:52,823-Speed 5148.67 samples/sec Loss 1.6180 LearningRate 0.0201 Epoch: 11 Global Step: 183980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:35:54,799-Speed 5183.90 samples/sec Loss 1.6170 LearningRate 0.0201 Epoch: 11 Global Step: 183990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:35:56,788-Speed 5151.62 samples/sec Loss 1.6444 LearningRate 0.0201 Epoch: 11 Global Step: 184000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:36:23,469-[lfw][184000]XNorm: 22.063912 Training: 2022-04-11 11:36:23,470-[lfw][184000]Accuracy-Flip: 0.99800+-0.00296 Training: 2022-04-11 11:36:23,470-[lfw][184000]Accuracy-Highest: 0.99833 Training: 2022-04-11 11:36:54,351-[cfp_fp][184000]XNorm: 21.058685 Training: 2022-04-11 11:36:54,351-[cfp_fp][184000]Accuracy-Flip: 0.98600+-0.00437 Training: 2022-04-11 11:36:54,352-[cfp_fp][184000]Accuracy-Highest: 0.98629 Training: 2022-04-11 11:37:20,874-[agedb_30][184000]XNorm: 22.473175 Training: 2022-04-11 11:37:20,875-[agedb_30][184000]Accuracy-Flip: 0.98067+-0.00750 Training: 2022-04-11 11:37:20,875-[agedb_30][184000]Accuracy-Highest: 0.98250 Training: 2022-04-11 11:37:22,878-Speed 118.95 samples/sec Loss 1.6175 LearningRate 0.0201 Epoch: 11 Global Step: 184010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:37:24,852-Speed 5189.50 samples/sec Loss 1.6000 LearningRate 0.0201 Epoch: 11 Global Step: 184020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:37:26,834-Speed 5166.36 samples/sec Loss 1.6701 LearningRate 0.0201 Epoch: 11 Global Step: 184030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:37:28,840-Speed 5106.55 samples/sec Loss 1.6586 LearningRate 0.0201 Epoch: 11 Global Step: 184040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:37:30,792-Speed 5249.28 samples/sec Loss 1.6491 LearningRate 0.0201 Epoch: 11 Global Step: 184050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:37:32,749-Speed 5233.75 samples/sec Loss 1.6440 LearningRate 0.0201 Epoch: 11 Global Step: 184060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:37:34,721-Speed 5193.04 samples/sec Loss 1.6905 LearningRate 0.0201 Epoch: 11 Global Step: 184070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:37:36,682-Speed 5222.86 samples/sec Loss 1.6207 LearningRate 0.0201 Epoch: 11 Global Step: 184080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:37:38,662-Speed 5175.20 samples/sec Loss 1.6123 LearningRate 0.0201 Epoch: 11 Global Step: 184090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:37:40,622-Speed 5225.79 samples/sec Loss 1.6635 LearningRate 0.0201 Epoch: 11 Global Step: 184100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:37:42,595-Speed 5190.47 samples/sec Loss 1.6384 LearningRate 0.0201 Epoch: 11 Global Step: 184110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:37:44,558-Speed 5218.78 samples/sec Loss 1.6591 LearningRate 0.0201 Epoch: 11 Global Step: 184120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:37:46,521-Speed 5219.34 samples/sec Loss 1.5598 LearningRate 0.0201 Epoch: 11 Global Step: 184130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:37:48,486-Speed 5213.61 samples/sec Loss 1.6330 LearningRate 0.0201 Epoch: 11 Global Step: 184140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:37:50,456-Speed 5199.09 samples/sec Loss 1.6568 LearningRate 0.0201 Epoch: 11 Global Step: 184150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:37:52,416-Speed 5226.56 samples/sec Loss 1.6148 LearningRate 0.0201 Epoch: 11 Global Step: 184160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:37:54,389-Speed 5191.55 samples/sec Loss 1.6532 LearningRate 0.0201 Epoch: 11 Global Step: 184170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:37:56,354-Speed 5214.66 samples/sec Loss 1.6578 LearningRate 0.0201 Epoch: 11 Global Step: 184180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:37:58,340-Speed 5155.43 samples/sec Loss 1.5951 LearningRate 0.0201 Epoch: 11 Global Step: 184190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:38:00,338-Speed 5127.45 samples/sec Loss 1.6651 LearningRate 0.0201 Epoch: 11 Global Step: 184200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:38:02,311-Speed 5191.30 samples/sec Loss 1.6911 LearningRate 0.0201 Epoch: 11 Global Step: 184210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:38:04,306-Speed 5135.40 samples/sec Loss 1.6813 LearningRate 0.0201 Epoch: 11 Global Step: 184220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:38:06,433-Speed 4815.21 samples/sec Loss 1.6040 LearningRate 0.0201 Epoch: 11 Global Step: 184230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:38:08,427-Speed 5138.60 samples/sec Loss 1.6391 LearningRate 0.0201 Epoch: 11 Global Step: 184240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:38:10,401-Speed 5188.84 samples/sec Loss 1.6280 LearningRate 0.0201 Epoch: 11 Global Step: 184250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:38:12,373-Speed 5194.52 samples/sec Loss 1.6395 LearningRate 0.0201 Epoch: 11 Global Step: 184260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:38:14,353-Speed 5173.10 samples/sec Loss 1.5987 LearningRate 0.0201 Epoch: 11 Global Step: 184270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:38:16,330-Speed 5181.35 samples/sec Loss 1.5950 LearningRate 0.0201 Epoch: 11 Global Step: 184280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:38:18,302-Speed 5195.89 samples/sec Loss 1.6532 LearningRate 0.0201 Epoch: 11 Global Step: 184290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:38:20,272-Speed 5197.77 samples/sec Loss 1.6114 LearningRate 0.0201 Epoch: 11 Global Step: 184300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:38:22,259-Speed 5154.72 samples/sec Loss 1.6266 LearningRate 0.0201 Epoch: 11 Global Step: 184310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:38:24,241-Speed 5168.67 samples/sec Loss 1.6171 LearningRate 0.0201 Epoch: 11 Global Step: 184320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:38:26,224-Speed 5165.91 samples/sec Loss 1.6412 LearningRate 0.0201 Epoch: 11 Global Step: 184330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:38:28,191-Speed 5206.64 samples/sec Loss 1.6221 LearningRate 0.0201 Epoch: 11 Global Step: 184340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:38:30,157-Speed 5211.79 samples/sec Loss 1.6984 LearningRate 0.0200 Epoch: 11 Global Step: 184350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:38:32,121-Speed 5216.67 samples/sec Loss 1.6294 LearningRate 0.0200 Epoch: 11 Global Step: 184360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:38:34,088-Speed 5206.25 samples/sec Loss 1.6249 LearningRate 0.0200 Epoch: 11 Global Step: 184370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:38:36,083-Speed 5136.56 samples/sec Loss 1.7078 LearningRate 0.0200 Epoch: 11 Global Step: 184380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:38:38,071-Speed 5151.01 samples/sec Loss 1.6140 LearningRate 0.0200 Epoch: 11 Global Step: 184390 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:38:40,052-Speed 5171.17 samples/sec Loss 1.6349 LearningRate 0.0200 Epoch: 11 Global Step: 184400 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:38:42,039-Speed 5155.32 samples/sec Loss 1.6169 LearningRate 0.0200 Epoch: 11 Global Step: 184410 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:38:44,002-Speed 5216.20 samples/sec Loss 1.6479 LearningRate 0.0200 Epoch: 11 Global Step: 184420 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:38:45,980-Speed 5179.39 samples/sec Loss 1.7005 LearningRate 0.0200 Epoch: 11 Global Step: 184430 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:38:47,959-Speed 5176.74 samples/sec Loss 1.6547 LearningRate 0.0200 Epoch: 11 Global Step: 184440 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:38:49,935-Speed 5182.89 samples/sec Loss 1.6790 LearningRate 0.0200 Epoch: 11 Global Step: 184450 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:38:51,927-Speed 5145.41 samples/sec Loss 1.6736 LearningRate 0.0200 Epoch: 11 Global Step: 184460 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:38:53,899-Speed 5194.18 samples/sec Loss 1.6426 LearningRate 0.0200 Epoch: 11 Global Step: 184470 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:38:55,875-Speed 5183.20 samples/sec Loss 1.6333 LearningRate 0.0200 Epoch: 11 Global Step: 184480 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:38:57,840-Speed 5211.93 samples/sec Loss 1.6642 LearningRate 0.0200 Epoch: 11 Global Step: 184490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:38:59,814-Speed 5189.32 samples/sec Loss 1.6311 LearningRate 0.0200 Epoch: 11 Global Step: 184500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:39:01,784-Speed 5199.73 samples/sec Loss 1.6383 LearningRate 0.0200 Epoch: 11 Global Step: 184510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:39:03,757-Speed 5193.26 samples/sec Loss 1.6282 LearningRate 0.0200 Epoch: 11 Global Step: 184520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:39:05,731-Speed 5189.26 samples/sec Loss 1.6153 LearningRate 0.0200 Epoch: 11 Global Step: 184530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:39:07,696-Speed 5211.62 samples/sec Loss 1.6056 LearningRate 0.0200 Epoch: 11 Global Step: 184540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:39:09,681-Speed 5161.81 samples/sec Loss 1.6256 LearningRate 0.0200 Epoch: 11 Global Step: 184550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:39:11,654-Speed 5191.13 samples/sec Loss 1.6067 LearningRate 0.0200 Epoch: 11 Global Step: 184560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:39:13,653-Speed 5124.37 samples/sec Loss 1.7358 LearningRate 0.0200 Epoch: 11 Global Step: 184570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:39:15,619-Speed 5211.50 samples/sec Loss 1.6492 LearningRate 0.0200 Epoch: 11 Global Step: 184580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:39:17,603-Speed 5161.09 samples/sec Loss 1.7145 LearningRate 0.0200 Epoch: 11 Global Step: 184590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:39:19,573-Speed 5201.23 samples/sec Loss 1.6808 LearningRate 0.0200 Epoch: 11 Global Step: 184600 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:39:21,532-Speed 5228.95 samples/sec Loss 1.6190 LearningRate 0.0200 Epoch: 11 Global Step: 184610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:39:23,495-Speed 5218.68 samples/sec Loss 1.6881 LearningRate 0.0200 Epoch: 11 Global Step: 184620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:39:25,487-Speed 5141.40 samples/sec Loss 1.6435 LearningRate 0.0200 Epoch: 11 Global Step: 184630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:39:27,495-Speed 5102.78 samples/sec Loss 1.6087 LearningRate 0.0200 Epoch: 11 Global Step: 184640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:39:29,468-Speed 5190.66 samples/sec Loss 1.6291 LearningRate 0.0200 Epoch: 11 Global Step: 184650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:39:31,435-Speed 5207.96 samples/sec Loss 1.6843 LearningRate 0.0200 Epoch: 11 Global Step: 184660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:39:33,423-Speed 5153.09 samples/sec Loss 1.6620 LearningRate 0.0200 Epoch: 11 Global Step: 184670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:39:35,392-Speed 5202.02 samples/sec Loss 1.5979 LearningRate 0.0200 Epoch: 11 Global Step: 184680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:39:37,357-Speed 5212.47 samples/sec Loss 1.6424 LearningRate 0.0200 Epoch: 11 Global Step: 184690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:39:39,358-Speed 5119.98 samples/sec Loss 1.6254 LearningRate 0.0200 Epoch: 11 Global Step: 184700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:39:41,344-Speed 5158.06 samples/sec Loss 1.6530 LearningRate 0.0200 Epoch: 11 Global Step: 184710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:39:43,308-Speed 5214.98 samples/sec Loss 1.6822 LearningRate 0.0199 Epoch: 11 Global Step: 184720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:39:45,274-Speed 5209.61 samples/sec Loss 1.7249 LearningRate 0.0199 Epoch: 11 Global Step: 184730 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:39:47,241-Speed 5207.16 samples/sec Loss 1.6654 LearningRate 0.0199 Epoch: 11 Global Step: 184740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:39:49,218-Speed 5183.40 samples/sec Loss 1.6725 LearningRate 0.0199 Epoch: 11 Global Step: 184750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:39:51,204-Speed 5157.52 samples/sec Loss 1.6497 LearningRate 0.0199 Epoch: 11 Global Step: 184760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:39:53,186-Speed 5167.32 samples/sec Loss 1.6496 LearningRate 0.0199 Epoch: 11 Global Step: 184770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:39:55,155-Speed 5204.37 samples/sec Loss 1.6552 LearningRate 0.0199 Epoch: 11 Global Step: 184780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:39:57,129-Speed 5188.40 samples/sec Loss 1.5718 LearningRate 0.0199 Epoch: 11 Global Step: 184790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:39:59,118-Speed 5150.25 samples/sec Loss 1.6618 LearningRate 0.0199 Epoch: 11 Global Step: 184800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:40:01,111-Speed 5139.08 samples/sec Loss 1.6459 LearningRate 0.0199 Epoch: 11 Global Step: 184810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:40:03,099-Speed 5151.97 samples/sec Loss 1.6606 LearningRate 0.0199 Epoch: 11 Global Step: 184820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:40:05,077-Speed 5180.43 samples/sec Loss 1.6205 LearningRate 0.0199 Epoch: 11 Global Step: 184830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:40:07,048-Speed 5195.61 samples/sec Loss 1.6557 LearningRate 0.0199 Epoch: 11 Global Step: 184840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:40:09,012-Speed 5217.53 samples/sec Loss 1.6903 LearningRate 0.0199 Epoch: 11 Global Step: 184850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:40:11,006-Speed 5135.87 samples/sec Loss 1.6820 LearningRate 0.0199 Epoch: 11 Global Step: 184860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:40:12,977-Speed 5196.59 samples/sec Loss 1.7089 LearningRate 0.0199 Epoch: 11 Global Step: 184870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:40:14,956-Speed 5177.34 samples/sec Loss 1.7081 LearningRate 0.0199 Epoch: 11 Global Step: 184880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:40:16,937-Speed 5169.78 samples/sec Loss 1.6829 LearningRate 0.0199 Epoch: 11 Global Step: 184890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:40:18,909-Speed 5194.57 samples/sec Loss 1.6668 LearningRate 0.0199 Epoch: 11 Global Step: 184900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:40:20,883-Speed 5190.96 samples/sec Loss 1.7069 LearningRate 0.0199 Epoch: 11 Global Step: 184910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:40:22,858-Speed 5186.50 samples/sec Loss 1.7374 LearningRate 0.0199 Epoch: 11 Global Step: 184920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:40:24,825-Speed 5206.98 samples/sec Loss 1.6663 LearningRate 0.0199 Epoch: 11 Global Step: 184930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:40:26,804-Speed 5176.12 samples/sec Loss 1.6712 LearningRate 0.0199 Epoch: 11 Global Step: 184940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:40:28,785-Speed 5171.80 samples/sec Loss 1.6489 LearningRate 0.0199 Epoch: 11 Global Step: 184950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:40:30,758-Speed 5189.20 samples/sec Loss 1.6681 LearningRate 0.0199 Epoch: 11 Global Step: 184960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:40:32,751-Speed 5141.23 samples/sec Loss 1.6577 LearningRate 0.0199 Epoch: 11 Global Step: 184970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:40:34,742-Speed 5144.35 samples/sec Loss 1.7100 LearningRate 0.0199 Epoch: 11 Global Step: 184980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:40:36,721-Speed 5176.21 samples/sec Loss 1.6979 LearningRate 0.0199 Epoch: 11 Global Step: 184990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:40:38,712-Speed 5146.86 samples/sec Loss 1.7257 LearningRate 0.0199 Epoch: 11 Global Step: 185000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:40:40,694-Speed 5166.35 samples/sec Loss 1.7191 LearningRate 0.0199 Epoch: 11 Global Step: 185010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:40:42,679-Speed 5162.34 samples/sec Loss 1.7094 LearningRate 0.0199 Epoch: 11 Global Step: 185020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:40:44,661-Speed 5167.75 samples/sec Loss 1.6605 LearningRate 0.0199 Epoch: 11 Global Step: 185030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:40:46,647-Speed 5156.37 samples/sec Loss 1.6791 LearningRate 0.0199 Epoch: 11 Global Step: 185040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:40:48,627-Speed 5172.87 samples/sec Loss 1.6660 LearningRate 0.0199 Epoch: 11 Global Step: 185050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:40:50,638-Speed 5094.85 samples/sec Loss 1.6918 LearningRate 0.0199 Epoch: 11 Global Step: 185060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:40:52,614-Speed 5184.09 samples/sec Loss 1.6843 LearningRate 0.0199 Epoch: 11 Global Step: 185070 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:40:54,605-Speed 5143.99 samples/sec Loss 1.6630 LearningRate 0.0199 Epoch: 11 Global Step: 185080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:40:56,604-Speed 5124.18 samples/sec Loss 1.7069 LearningRate 0.0199 Epoch: 11 Global Step: 185090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:40:58,581-Speed 5182.85 samples/sec Loss 1.6661 LearningRate 0.0198 Epoch: 11 Global Step: 185100 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:41:00,567-Speed 5158.28 samples/sec Loss 1.6939 LearningRate 0.0198 Epoch: 11 Global Step: 185110 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:41:02,553-Speed 5156.93 samples/sec Loss 1.6935 LearningRate 0.0198 Epoch: 11 Global Step: 185120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:41:04,529-Speed 5183.94 samples/sec Loss 1.6694 LearningRate 0.0198 Epoch: 11 Global Step: 185130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:41:06,497-Speed 5204.54 samples/sec Loss 1.7257 LearningRate 0.0198 Epoch: 11 Global Step: 185140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:41:08,460-Speed 5217.41 samples/sec Loss 1.6844 LearningRate 0.0198 Epoch: 11 Global Step: 185150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:41:10,433-Speed 5192.28 samples/sec Loss 1.6346 LearningRate 0.0198 Epoch: 11 Global Step: 185160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:41:12,421-Speed 5153.83 samples/sec Loss 1.7256 LearningRate 0.0198 Epoch: 11 Global Step: 185170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:41:14,399-Speed 5178.42 samples/sec Loss 1.7405 LearningRate 0.0198 Epoch: 11 Global Step: 185180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:41:16,380-Speed 5169.94 samples/sec Loss 1.7172 LearningRate 0.0198 Epoch: 11 Global Step: 185190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:41:18,355-Speed 5189.84 samples/sec Loss 1.6557 LearningRate 0.0198 Epoch: 11 Global Step: 185200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:41:20,325-Speed 5201.71 samples/sec Loss 1.6654 LearningRate 0.0198 Epoch: 11 Global Step: 185210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:41:22,285-Speed 5224.26 samples/sec Loss 1.7224 LearningRate 0.0198 Epoch: 11 Global Step: 185220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:41:24,250-Speed 5212.70 samples/sec Loss 1.6889 LearningRate 0.0198 Epoch: 11 Global Step: 185230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:41:26,219-Speed 5201.95 samples/sec Loss 1.6920 LearningRate 0.0198 Epoch: 11 Global Step: 185240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:41:28,189-Speed 5203.32 samples/sec Loss 1.7225 LearningRate 0.0198 Epoch: 11 Global Step: 185250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:41:30,170-Speed 5168.68 samples/sec Loss 1.6749 LearningRate 0.0198 Epoch: 11 Global Step: 185260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:41:32,137-Speed 5208.78 samples/sec Loss 1.6845 LearningRate 0.0198 Epoch: 11 Global Step: 185270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:41:34,141-Speed 5112.28 samples/sec Loss 1.6548 LearningRate 0.0198 Epoch: 11 Global Step: 185280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:41:36,131-Speed 5145.70 samples/sec Loss 1.6701 LearningRate 0.0198 Epoch: 11 Global Step: 185290 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:41:38,102-Speed 5199.88 samples/sec Loss 1.7137 LearningRate 0.0198 Epoch: 11 Global Step: 185300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:41:40,072-Speed 5200.36 samples/sec Loss 1.6572 LearningRate 0.0198 Epoch: 11 Global Step: 185310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:41:42,039-Speed 5206.33 samples/sec Loss 1.7541 LearningRate 0.0198 Epoch: 11 Global Step: 185320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:41:44,009-Speed 5201.01 samples/sec Loss 1.6598 LearningRate 0.0198 Epoch: 11 Global Step: 185330 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:41:45,976-Speed 5204.84 samples/sec Loss 1.7453 LearningRate 0.0198 Epoch: 11 Global Step: 185340 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:41:47,949-Speed 5193.57 samples/sec Loss 1.6787 LearningRate 0.0198 Epoch: 11 Global Step: 185350 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:41:49,914-Speed 5212.33 samples/sec Loss 1.7045 LearningRate 0.0198 Epoch: 11 Global Step: 185360 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:41:51,893-Speed 5174.25 samples/sec Loss 1.6727 LearningRate 0.0198 Epoch: 11 Global Step: 185370 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:41:53,875-Speed 5168.83 samples/sec Loss 1.6983 LearningRate 0.0198 Epoch: 11 Global Step: 185380 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:41:55,847-Speed 5195.50 samples/sec Loss 1.7015 LearningRate 0.0198 Epoch: 11 Global Step: 185390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:41:57,828-Speed 5169.92 samples/sec Loss 1.6414 LearningRate 0.0198 Epoch: 11 Global Step: 185400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:41:59,797-Speed 5204.16 samples/sec Loss 1.6698 LearningRate 0.0198 Epoch: 11 Global Step: 185410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:42:01,775-Speed 5179.76 samples/sec Loss 1.6820 LearningRate 0.0198 Epoch: 11 Global Step: 185420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:42:03,752-Speed 5181.36 samples/sec Loss 1.6764 LearningRate 0.0198 Epoch: 11 Global Step: 185430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:42:05,722-Speed 5197.14 samples/sec Loss 1.7025 LearningRate 0.0198 Epoch: 11 Global Step: 185440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:42:07,689-Speed 5208.66 samples/sec Loss 1.6503 LearningRate 0.0198 Epoch: 11 Global Step: 185450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:42:09,667-Speed 5178.81 samples/sec Loss 1.7395 LearningRate 0.0198 Epoch: 11 Global Step: 185460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:42:11,683-Speed 5081.26 samples/sec Loss 1.7285 LearningRate 0.0197 Epoch: 11 Global Step: 185470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:42:13,666-Speed 5166.31 samples/sec Loss 1.6867 LearningRate 0.0197 Epoch: 11 Global Step: 185480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:42:15,641-Speed 5184.59 samples/sec Loss 1.6755 LearningRate 0.0197 Epoch: 11 Global Step: 185490 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:42:17,629-Speed 5153.09 samples/sec Loss 1.6960 LearningRate 0.0197 Epoch: 11 Global Step: 185500 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:42:19,601-Speed 5195.03 samples/sec Loss 1.6850 LearningRate 0.0197 Epoch: 11 Global Step: 185510 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:42:21,581-Speed 5173.88 samples/sec Loss 1.7732 LearningRate 0.0197 Epoch: 11 Global Step: 185520 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:42:23,565-Speed 5161.59 samples/sec Loss 1.7167 LearningRate 0.0197 Epoch: 11 Global Step: 185530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:42:25,549-Speed 5163.18 samples/sec Loss 1.6474 LearningRate 0.0197 Epoch: 11 Global Step: 185540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:42:27,551-Speed 5117.42 samples/sec Loss 1.7216 LearningRate 0.0197 Epoch: 11 Global Step: 185550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:42:29,520-Speed 5201.87 samples/sec Loss 1.7460 LearningRate 0.0197 Epoch: 11 Global Step: 185560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:42:31,517-Speed 5130.91 samples/sec Loss 1.7166 LearningRate 0.0197 Epoch: 11 Global Step: 185570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:42:33,495-Speed 5178.25 samples/sec Loss 1.7389 LearningRate 0.0197 Epoch: 11 Global Step: 185580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:42:35,470-Speed 5186.56 samples/sec Loss 1.7215 LearningRate 0.0197 Epoch: 11 Global Step: 185590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:42:37,450-Speed 5173.03 samples/sec Loss 1.7972 LearningRate 0.0197 Epoch: 11 Global Step: 185600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:42:39,433-Speed 5166.05 samples/sec Loss 1.7374 LearningRate 0.0197 Epoch: 11 Global Step: 185610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:42:41,418-Speed 5160.84 samples/sec Loss 1.6938 LearningRate 0.0197 Epoch: 11 Global Step: 185620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:42:43,383-Speed 5212.74 samples/sec Loss 1.7164 LearningRate 0.0197 Epoch: 11 Global Step: 185630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:42:45,365-Speed 5169.54 samples/sec Loss 1.7260 LearningRate 0.0197 Epoch: 11 Global Step: 185640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:42:47,339-Speed 5188.00 samples/sec Loss 1.7618 LearningRate 0.0197 Epoch: 11 Global Step: 185650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:42:49,326-Speed 5156.36 samples/sec Loss 1.6953 LearningRate 0.0197 Epoch: 11 Global Step: 185660 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:42:51,301-Speed 5183.75 samples/sec Loss 1.7263 LearningRate 0.0197 Epoch: 11 Global Step: 185670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:42:53,278-Speed 5182.08 samples/sec Loss 1.7084 LearningRate 0.0197 Epoch: 11 Global Step: 185680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:42:55,251-Speed 5194.38 samples/sec Loss 1.7429 LearningRate 0.0197 Epoch: 11 Global Step: 185690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:42:57,238-Speed 5154.18 samples/sec Loss 1.7386 LearningRate 0.0197 Epoch: 11 Global Step: 185700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:42:59,221-Speed 5164.59 samples/sec Loss 1.7154 LearningRate 0.0197 Epoch: 11 Global Step: 185710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:43:01,182-Speed 5223.72 samples/sec Loss 1.6587 LearningRate 0.0197 Epoch: 11 Global Step: 185720 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:43:03,151-Speed 5202.70 samples/sec Loss 1.8122 LearningRate 0.0197 Epoch: 11 Global Step: 185730 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:43:05,147-Speed 5131.56 samples/sec Loss 1.7333 LearningRate 0.0197 Epoch: 11 Global Step: 185740 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:43:07,115-Speed 5205.19 samples/sec Loss 1.7126 LearningRate 0.0197 Epoch: 11 Global Step: 185750 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:43:09,101-Speed 5157.75 samples/sec Loss 1.7205 LearningRate 0.0197 Epoch: 11 Global Step: 185760 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:43:11,087-Speed 5158.86 samples/sec Loss 1.7391 LearningRate 0.0197 Epoch: 11 Global Step: 185770 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:43:13,083-Speed 5133.77 samples/sec Loss 1.6766 LearningRate 0.0197 Epoch: 11 Global Step: 185780 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:43:15,052-Speed 5202.50 samples/sec Loss 1.6734 LearningRate 0.0197 Epoch: 11 Global Step: 185790 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:43:17,038-Speed 5156.96 samples/sec Loss 1.7213 LearningRate 0.0197 Epoch: 11 Global Step: 185800 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:43:19,011-Speed 5190.82 samples/sec Loss 1.6826 LearningRate 0.0197 Epoch: 11 Global Step: 185810 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:43:20,979-Speed 5205.48 samples/sec Loss 1.7424 LearningRate 0.0197 Epoch: 11 Global Step: 185820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:43:22,960-Speed 5171.99 samples/sec Loss 1.7034 LearningRate 0.0197 Epoch: 11 Global Step: 185830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:43:24,935-Speed 5185.07 samples/sec Loss 1.7298 LearningRate 0.0197 Epoch: 11 Global Step: 185840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:43:26,951-Speed 5082.55 samples/sec Loss 1.7271 LearningRate 0.0196 Epoch: 11 Global Step: 185850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:43:28,938-Speed 5153.97 samples/sec Loss 1.6995 LearningRate 0.0196 Epoch: 11 Global Step: 185860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:43:30,905-Speed 5206.82 samples/sec Loss 1.7471 LearningRate 0.0196 Epoch: 11 Global Step: 185870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:43:32,885-Speed 5174.04 samples/sec Loss 1.7211 LearningRate 0.0196 Epoch: 11 Global Step: 185880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:43:34,859-Speed 5191.45 samples/sec Loss 1.7690 LearningRate 0.0196 Epoch: 11 Global Step: 185890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:43:36,833-Speed 5187.05 samples/sec Loss 1.7462 LearningRate 0.0196 Epoch: 11 Global Step: 185900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:43:38,833-Speed 5123.34 samples/sec Loss 1.7476 LearningRate 0.0196 Epoch: 11 Global Step: 185910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:43:40,821-Speed 5153.35 samples/sec Loss 1.7262 LearningRate 0.0196 Epoch: 11 Global Step: 185920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:43:42,794-Speed 5190.05 samples/sec Loss 1.7083 LearningRate 0.0196 Epoch: 11 Global Step: 185930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:43:44,768-Speed 5190.41 samples/sec Loss 1.7593 LearningRate 0.0196 Epoch: 11 Global Step: 185940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:43:46,760-Speed 5140.40 samples/sec Loss 1.7318 LearningRate 0.0196 Epoch: 11 Global Step: 185950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:43:48,749-Speed 5150.96 samples/sec Loss 1.7595 LearningRate 0.0196 Epoch: 11 Global Step: 185960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:43:50,735-Speed 5158.18 samples/sec Loss 1.7987 LearningRate 0.0196 Epoch: 11 Global Step: 185970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:43:52,710-Speed 5186.98 samples/sec Loss 1.7598 LearningRate 0.0196 Epoch: 11 Global Step: 185980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:43:54,682-Speed 5193.10 samples/sec Loss 1.7599 LearningRate 0.0196 Epoch: 11 Global Step: 185990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:43:56,653-Speed 5197.69 samples/sec Loss 1.6571 LearningRate 0.0196 Epoch: 11 Global Step: 186000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:44:23,233-[lfw][186000]XNorm: 21.993380 Training: 2022-04-11 11:44:23,233-[lfw][186000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 11:44:23,234-[lfw][186000]Accuracy-Highest: 0.99833 Training: 2022-04-11 11:44:53,879-[cfp_fp][186000]XNorm: 20.847259 Training: 2022-04-11 11:44:53,880-[cfp_fp][186000]Accuracy-Flip: 0.98586+-0.00562 Training: 2022-04-11 11:44:53,880-[cfp_fp][186000]Accuracy-Highest: 0.98629 Training: 2022-04-11 11:45:20,273-[agedb_30][186000]XNorm: 22.099676 Training: 2022-04-11 11:45:20,274-[agedb_30][186000]Accuracy-Flip: 0.97950+-0.00813 Training: 2022-04-11 11:45:20,274-[agedb_30][186000]Accuracy-Highest: 0.98250 Training: 2022-04-11 11:45:22,260-Speed 119.62 samples/sec Loss 1.7613 LearningRate 0.0196 Epoch: 11 Global Step: 186010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:45:24,270-Speed 5096.04 samples/sec Loss 1.7670 LearningRate 0.0196 Epoch: 11 Global Step: 186020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:45:26,254-Speed 5163.55 samples/sec Loss 1.7597 LearningRate 0.0196 Epoch: 11 Global Step: 186030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:45:28,236-Speed 5167.58 samples/sec Loss 1.7724 LearningRate 0.0196 Epoch: 11 Global Step: 186040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:45:30,209-Speed 5191.06 samples/sec Loss 1.7812 LearningRate 0.0196 Epoch: 11 Global Step: 186050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:45:32,182-Speed 5192.48 samples/sec Loss 1.7516 LearningRate 0.0196 Epoch: 11 Global Step: 186060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:45:34,149-Speed 5208.30 samples/sec Loss 1.7695 LearningRate 0.0196 Epoch: 11 Global Step: 186070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:45:36,118-Speed 5200.89 samples/sec Loss 1.7009 LearningRate 0.0196 Epoch: 11 Global Step: 186080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:45:38,121-Speed 5114.06 samples/sec Loss 1.7388 LearningRate 0.0196 Epoch: 11 Global Step: 186090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:45:40,104-Speed 5164.44 samples/sec Loss 1.7104 LearningRate 0.0196 Epoch: 11 Global Step: 186100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:45:42,081-Speed 5182.25 samples/sec Loss 1.7928 LearningRate 0.0196 Epoch: 11 Global Step: 186110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:45:44,054-Speed 5191.62 samples/sec Loss 1.7148 LearningRate 0.0196 Epoch: 11 Global Step: 186120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:45:46,028-Speed 5190.44 samples/sec Loss 1.6936 LearningRate 0.0196 Epoch: 11 Global Step: 186130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:45:48,067-Speed 5024.86 samples/sec Loss 1.8000 LearningRate 0.0196 Epoch: 11 Global Step: 186140 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:45:50,036-Speed 5200.72 samples/sec Loss 1.7662 LearningRate 0.0196 Epoch: 11 Global Step: 186150 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:45:52,018-Speed 5169.30 samples/sec Loss 1.7641 LearningRate 0.0196 Epoch: 11 Global Step: 186160 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:45:53,994-Speed 5182.67 samples/sec Loss 1.7610 LearningRate 0.0196 Epoch: 11 Global Step: 186170 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:45:55,968-Speed 5188.55 samples/sec Loss 1.7558 LearningRate 0.0196 Epoch: 11 Global Step: 186180 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:45:57,952-Speed 5163.70 samples/sec Loss 1.7695 LearningRate 0.0196 Epoch: 11 Global Step: 186190 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:45:59,938-Speed 5159.16 samples/sec Loss 1.7638 LearningRate 0.0196 Epoch: 11 Global Step: 186200 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:46:01,917-Speed 5175.42 samples/sec Loss 1.7278 LearningRate 0.0196 Epoch: 11 Global Step: 186210 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:46:03,890-Speed 5191.30 samples/sec Loss 1.7543 LearningRate 0.0196 Epoch: 11 Global Step: 186220 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:46:05,865-Speed 5187.93 samples/sec Loss 1.7502 LearningRate 0.0195 Epoch: 11 Global Step: 186230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:46:07,845-Speed 5174.52 samples/sec Loss 1.7521 LearningRate 0.0195 Epoch: 11 Global Step: 186240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:46:09,828-Speed 5163.34 samples/sec Loss 1.8271 LearningRate 0.0195 Epoch: 11 Global Step: 186250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:46:11,830-Speed 5116.95 samples/sec Loss 1.7251 LearningRate 0.0195 Epoch: 11 Global Step: 186260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:46:13,815-Speed 5161.31 samples/sec Loss 1.7956 LearningRate 0.0195 Epoch: 11 Global Step: 186270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:46:15,808-Speed 5139.73 samples/sec Loss 1.7511 LearningRate 0.0195 Epoch: 11 Global Step: 186280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:46:17,793-Speed 5159.39 samples/sec Loss 1.7840 LearningRate 0.0195 Epoch: 11 Global Step: 186290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:46:19,784-Speed 5145.43 samples/sec Loss 1.7109 LearningRate 0.0195 Epoch: 11 Global Step: 186300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:46:21,799-Speed 5083.33 samples/sec Loss 1.7114 LearningRate 0.0195 Epoch: 11 Global Step: 186310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:46:23,777-Speed 5177.63 samples/sec Loss 1.7983 LearningRate 0.0195 Epoch: 11 Global Step: 186320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:46:25,763-Speed 5158.60 samples/sec Loss 1.7315 LearningRate 0.0195 Epoch: 11 Global Step: 186330 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:46:27,745-Speed 5168.58 samples/sec Loss 1.7524 LearningRate 0.0195 Epoch: 11 Global Step: 186340 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:46:29,735-Speed 5149.19 samples/sec Loss 1.7346 LearningRate 0.0195 Epoch: 11 Global Step: 186350 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:46:31,713-Speed 5177.25 samples/sec Loss 1.7349 LearningRate 0.0195 Epoch: 11 Global Step: 186360 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:46:33,686-Speed 5192.49 samples/sec Loss 1.7971 LearningRate 0.0195 Epoch: 11 Global Step: 186370 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:46:35,667-Speed 5171.03 samples/sec Loss 1.7345 LearningRate 0.0195 Epoch: 11 Global Step: 186380 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:46:37,659-Speed 5141.84 samples/sec Loss 1.7348 LearningRate 0.0195 Epoch: 11 Global Step: 186390 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:46:39,665-Speed 5105.13 samples/sec Loss 1.7718 LearningRate 0.0195 Epoch: 11 Global Step: 186400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:46:41,652-Speed 5155.39 samples/sec Loss 1.7563 LearningRate 0.0195 Epoch: 11 Global Step: 186410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:46:43,655-Speed 5113.76 samples/sec Loss 1.7840 LearningRate 0.0195 Epoch: 11 Global Step: 186420 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:46:45,650-Speed 5134.52 samples/sec Loss 1.7351 LearningRate 0.0195 Epoch: 11 Global Step: 186430 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:46:47,633-Speed 5166.86 samples/sec Loss 1.7141 LearningRate 0.0195 Epoch: 11 Global Step: 186440 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:46:49,645-Speed 5091.89 samples/sec Loss 1.6534 LearningRate 0.0195 Epoch: 11 Global Step: 186450 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:46:51,628-Speed 5165.58 samples/sec Loss 1.8640 LearningRate 0.0195 Epoch: 11 Global Step: 186460 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:46:53,604-Speed 5183.06 samples/sec Loss 1.7662 LearningRate 0.0195 Epoch: 11 Global Step: 186470 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:46:55,575-Speed 5197.33 samples/sec Loss 1.7993 LearningRate 0.0195 Epoch: 11 Global Step: 186480 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:46:57,542-Speed 5206.89 samples/sec Loss 1.7806 LearningRate 0.0195 Epoch: 11 Global Step: 186490 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:46:59,536-Speed 5137.39 samples/sec Loss 1.7733 LearningRate 0.0195 Epoch: 11 Global Step: 186500 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:47:01,515-Speed 5176.42 samples/sec Loss 1.7911 LearningRate 0.0195 Epoch: 11 Global Step: 186510 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:47:03,519-Speed 5111.71 samples/sec Loss 1.8010 LearningRate 0.0195 Epoch: 11 Global Step: 186520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:47:05,530-Speed 5092.22 samples/sec Loss 1.7488 LearningRate 0.0195 Epoch: 11 Global Step: 186530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:47:07,505-Speed 5188.32 samples/sec Loss 1.7013 LearningRate 0.0195 Epoch: 11 Global Step: 186540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:47:09,474-Speed 5203.06 samples/sec Loss 1.7475 LearningRate 0.0195 Epoch: 11 Global Step: 186550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:47:11,449-Speed 5186.07 samples/sec Loss 1.7460 LearningRate 0.0195 Epoch: 11 Global Step: 186560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:47:13,416-Speed 5206.98 samples/sec Loss 1.7475 LearningRate 0.0195 Epoch: 11 Global Step: 186570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:47:15,386-Speed 5200.13 samples/sec Loss 1.7374 LearningRate 0.0195 Epoch: 11 Global Step: 186580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:47:17,355-Speed 5203.12 samples/sec Loss 1.7324 LearningRate 0.0195 Epoch: 11 Global Step: 186590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:47:19,322-Speed 5206.91 samples/sec Loss 1.6953 LearningRate 0.0194 Epoch: 11 Global Step: 186600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:47:21,293-Speed 5195.46 samples/sec Loss 1.7165 LearningRate 0.0194 Epoch: 11 Global Step: 186610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:47:23,269-Speed 5184.93 samples/sec Loss 1.7399 LearningRate 0.0194 Epoch: 11 Global Step: 186620 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:47:25,244-Speed 5185.82 samples/sec Loss 1.7688 LearningRate 0.0194 Epoch: 11 Global Step: 186630 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:47:27,267-Speed 5063.66 samples/sec Loss 1.7955 LearningRate 0.0194 Epoch: 11 Global Step: 186640 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:47:29,235-Speed 5206.09 samples/sec Loss 1.7129 LearningRate 0.0194 Epoch: 11 Global Step: 186650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:47:31,204-Speed 5203.52 samples/sec Loss 1.7520 LearningRate 0.0194 Epoch: 11 Global Step: 186660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:47:33,178-Speed 5194.83 samples/sec Loss 1.7803 LearningRate 0.0194 Epoch: 11 Global Step: 186670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:47:35,152-Speed 5188.96 samples/sec Loss 1.7966 LearningRate 0.0194 Epoch: 11 Global Step: 186680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:47:37,127-Speed 5188.36 samples/sec Loss 1.8172 LearningRate 0.0194 Epoch: 11 Global Step: 186690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:47:39,096-Speed 5201.15 samples/sec Loss 1.7299 LearningRate 0.0194 Epoch: 11 Global Step: 186700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:47:41,063-Speed 5208.07 samples/sec Loss 1.7454 LearningRate 0.0194 Epoch: 11 Global Step: 186710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:47:43,048-Speed 5159.02 samples/sec Loss 1.8324 LearningRate 0.0194 Epoch: 11 Global Step: 186720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:47:45,023-Speed 5188.14 samples/sec Loss 1.8148 LearningRate 0.0194 Epoch: 11 Global Step: 186730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:47:47,009-Speed 5156.01 samples/sec Loss 1.7783 LearningRate 0.0194 Epoch: 11 Global Step: 186740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:47:49,003-Speed 5137.94 samples/sec Loss 1.7596 LearningRate 0.0194 Epoch: 11 Global Step: 186750 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:47:50,981-Speed 5179.05 samples/sec Loss 1.7776 LearningRate 0.0194 Epoch: 11 Global Step: 186760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:47:52,952-Speed 5197.96 samples/sec Loss 1.8182 LearningRate 0.0194 Epoch: 11 Global Step: 186770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:47:54,920-Speed 5205.60 samples/sec Loss 1.7904 LearningRate 0.0194 Epoch: 11 Global Step: 186780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:47:56,896-Speed 5182.78 samples/sec Loss 1.7938 LearningRate 0.0194 Epoch: 11 Global Step: 186790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:47:58,876-Speed 5174.69 samples/sec Loss 1.7517 LearningRate 0.0194 Epoch: 11 Global Step: 186800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:48:00,861-Speed 5159.28 samples/sec Loss 1.8076 LearningRate 0.0194 Epoch: 11 Global Step: 186810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:48:02,834-Speed 5192.50 samples/sec Loss 1.8052 LearningRate 0.0194 Epoch: 11 Global Step: 186820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:48:04,825-Speed 5144.15 samples/sec Loss 1.7895 LearningRate 0.0194 Epoch: 11 Global Step: 186830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:48:06,794-Speed 5201.84 samples/sec Loss 1.7003 LearningRate 0.0194 Epoch: 11 Global Step: 186840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:48:08,762-Speed 5205.18 samples/sec Loss 1.8085 LearningRate 0.0194 Epoch: 11 Global Step: 186850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:48:10,748-Speed 5158.06 samples/sec Loss 1.7409 LearningRate 0.0194 Epoch: 11 Global Step: 186860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:48:12,719-Speed 5196.31 samples/sec Loss 1.7612 LearningRate 0.0194 Epoch: 11 Global Step: 186870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:48:14,689-Speed 5199.07 samples/sec Loss 1.7497 LearningRate 0.0194 Epoch: 11 Global Step: 186880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:48:16,657-Speed 5205.90 samples/sec Loss 1.7815 LearningRate 0.0194 Epoch: 11 Global Step: 186890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:48:18,640-Speed 5166.80 samples/sec Loss 1.7753 LearningRate 0.0194 Epoch: 11 Global Step: 186900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:48:20,601-Speed 5222.21 samples/sec Loss 1.8365 LearningRate 0.0194 Epoch: 11 Global Step: 186910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:48:22,568-Speed 5208.17 samples/sec Loss 1.7505 LearningRate 0.0194 Epoch: 11 Global Step: 186920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:48:24,542-Speed 5188.50 samples/sec Loss 1.7841 LearningRate 0.0194 Epoch: 11 Global Step: 186930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:48:26,515-Speed 5191.02 samples/sec Loss 1.7737 LearningRate 0.0194 Epoch: 11 Global Step: 186940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:48:28,510-Speed 5136.45 samples/sec Loss 1.7650 LearningRate 0.0194 Epoch: 11 Global Step: 186950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:48:30,501-Speed 5144.47 samples/sec Loss 1.8150 LearningRate 0.0194 Epoch: 11 Global Step: 186960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:48:32,467-Speed 5209.17 samples/sec Loss 1.7937 LearningRate 0.0194 Epoch: 11 Global Step: 186970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:48:34,453-Speed 5159.56 samples/sec Loss 1.7984 LearningRate 0.0193 Epoch: 11 Global Step: 186980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:48:36,432-Speed 5176.79 samples/sec Loss 1.7522 LearningRate 0.0193 Epoch: 11 Global Step: 186990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:48:38,412-Speed 5173.06 samples/sec Loss 1.7806 LearningRate 0.0193 Epoch: 11 Global Step: 187000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:48:40,397-Speed 5158.90 samples/sec Loss 1.7669 LearningRate 0.0193 Epoch: 11 Global Step: 187010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:48:42,387-Speed 5148.01 samples/sec Loss 1.8182 LearningRate 0.0193 Epoch: 11 Global Step: 187020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:48:44,349-Speed 5220.08 samples/sec Loss 1.7659 LearningRate 0.0193 Epoch: 11 Global Step: 187030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:48:46,321-Speed 5193.40 samples/sec Loss 1.7991 LearningRate 0.0193 Epoch: 11 Global Step: 187040 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:48:48,290-Speed 5204.53 samples/sec Loss 1.8053 LearningRate 0.0193 Epoch: 11 Global Step: 187050 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:48:50,259-Speed 5200.93 samples/sec Loss 1.7788 LearningRate 0.0193 Epoch: 11 Global Step: 187060 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:48:52,270-Speed 5093.77 samples/sec Loss 1.8042 LearningRate 0.0193 Epoch: 11 Global Step: 187070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:48:54,267-Speed 5131.09 samples/sec Loss 1.7466 LearningRate 0.0193 Epoch: 11 Global Step: 187080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:48:56,238-Speed 5197.34 samples/sec Loss 1.7842 LearningRate 0.0193 Epoch: 11 Global Step: 187090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:48:58,209-Speed 5196.24 samples/sec Loss 1.7482 LearningRate 0.0193 Epoch: 11 Global Step: 187100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:49:00,177-Speed 5204.37 samples/sec Loss 1.7464 LearningRate 0.0193 Epoch: 11 Global Step: 187110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:49:02,144-Speed 5207.38 samples/sec Loss 1.7460 LearningRate 0.0193 Epoch: 11 Global Step: 187120 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:49:04,134-Speed 5148.81 samples/sec Loss 1.7753 LearningRate 0.0193 Epoch: 11 Global Step: 187130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:49:06,119-Speed 5161.24 samples/sec Loss 1.7889 LearningRate 0.0193 Epoch: 11 Global Step: 187140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:49:08,086-Speed 5206.12 samples/sec Loss 1.7829 LearningRate 0.0193 Epoch: 11 Global Step: 187150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:49:10,059-Speed 5190.65 samples/sec Loss 1.7810 LearningRate 0.0193 Epoch: 11 Global Step: 187160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:49:12,051-Speed 5142.21 samples/sec Loss 1.7823 LearningRate 0.0193 Epoch: 11 Global Step: 187170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:49:14,031-Speed 5174.18 samples/sec Loss 1.8069 LearningRate 0.0193 Epoch: 11 Global Step: 187180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:49:16,000-Speed 5202.91 samples/sec Loss 1.7485 LearningRate 0.0193 Epoch: 11 Global Step: 187190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:49:17,972-Speed 5196.31 samples/sec Loss 1.7416 LearningRate 0.0193 Epoch: 11 Global Step: 187200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:49:19,945-Speed 5190.49 samples/sec Loss 1.7261 LearningRate 0.0193 Epoch: 11 Global Step: 187210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:49:21,916-Speed 5196.94 samples/sec Loss 1.7717 LearningRate 0.0193 Epoch: 11 Global Step: 187220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:49:23,897-Speed 5171.79 samples/sec Loss 1.7594 LearningRate 0.0193 Epoch: 11 Global Step: 187230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:49:25,901-Speed 5109.81 samples/sec Loss 1.7833 LearningRate 0.0193 Epoch: 11 Global Step: 187240 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:49:27,872-Speed 5197.32 samples/sec Loss 1.7507 LearningRate 0.0193 Epoch: 11 Global Step: 187250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:49:29,840-Speed 5205.49 samples/sec Loss 1.7270 LearningRate 0.0193 Epoch: 11 Global Step: 187260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:49:31,815-Speed 5186.93 samples/sec Loss 1.7925 LearningRate 0.0193 Epoch: 11 Global Step: 187270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:49:33,818-Speed 5114.88 samples/sec Loss 1.7462 LearningRate 0.0193 Epoch: 11 Global Step: 187280 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:49:35,784-Speed 5209.08 samples/sec Loss 1.8109 LearningRate 0.0193 Epoch: 11 Global Step: 187290 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:49:37,755-Speed 5198.28 samples/sec Loss 1.7906 LearningRate 0.0193 Epoch: 11 Global Step: 187300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:49:39,724-Speed 5202.66 samples/sec Loss 1.7283 LearningRate 0.0193 Epoch: 11 Global Step: 187310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:49:41,703-Speed 5176.40 samples/sec Loss 1.7336 LearningRate 0.0193 Epoch: 11 Global Step: 187320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:49:43,682-Speed 5173.66 samples/sec Loss 1.8016 LearningRate 0.0193 Epoch: 11 Global Step: 187330 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:49:45,654-Speed 5194.45 samples/sec Loss 1.7696 LearningRate 0.0193 Epoch: 11 Global Step: 187340 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:49:47,624-Speed 5201.10 samples/sec Loss 1.8152 LearningRate 0.0193 Epoch: 11 Global Step: 187350 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:49:49,626-Speed 5115.84 samples/sec Loss 1.7314 LearningRate 0.0192 Epoch: 11 Global Step: 187360 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:49:51,598-Speed 5195.21 samples/sec Loss 1.8460 LearningRate 0.0192 Epoch: 11 Global Step: 187370 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:49:53,568-Speed 5199.67 samples/sec Loss 1.8204 LearningRate 0.0192 Epoch: 11 Global Step: 187380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:49:55,540-Speed 5192.42 samples/sec Loss 1.7611 LearningRate 0.0192 Epoch: 11 Global Step: 187390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:49:57,521-Speed 5172.67 samples/sec Loss 1.8129 LearningRate 0.0192 Epoch: 11 Global Step: 187400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:49:59,491-Speed 5199.49 samples/sec Loss 1.8203 LearningRate 0.0192 Epoch: 11 Global Step: 187410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:50:01,469-Speed 5180.78 samples/sec Loss 1.7890 LearningRate 0.0192 Epoch: 11 Global Step: 187420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:50:03,463-Speed 5136.70 samples/sec Loss 1.7724 LearningRate 0.0192 Epoch: 11 Global Step: 187430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:50:05,446-Speed 5165.72 samples/sec Loss 1.7526 LearningRate 0.0192 Epoch: 11 Global Step: 187440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:50:07,429-Speed 5165.01 samples/sec Loss 1.7790 LearningRate 0.0192 Epoch: 11 Global Step: 187450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:50:09,399-Speed 5200.58 samples/sec Loss 1.7042 LearningRate 0.0192 Epoch: 11 Global Step: 187460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:50:11,398-Speed 5123.38 samples/sec Loss 1.8063 LearningRate 0.0192 Epoch: 11 Global Step: 187470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:50:13,383-Speed 5161.01 samples/sec Loss 1.7761 LearningRate 0.0192 Epoch: 11 Global Step: 187480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:50:15,358-Speed 5187.13 samples/sec Loss 1.7402 LearningRate 0.0192 Epoch: 11 Global Step: 187490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:50:17,331-Speed 5192.01 samples/sec Loss 1.8390 LearningRate 0.0192 Epoch: 11 Global Step: 187500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:50:19,304-Speed 5191.84 samples/sec Loss 1.8293 LearningRate 0.0192 Epoch: 11 Global Step: 187510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:50:21,283-Speed 5173.50 samples/sec Loss 1.8183 LearningRate 0.0192 Epoch: 11 Global Step: 187520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:50:23,270-Speed 5156.36 samples/sec Loss 1.8069 LearningRate 0.0192 Epoch: 11 Global Step: 187530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:50:25,247-Speed 5181.74 samples/sec Loss 1.7802 LearningRate 0.0192 Epoch: 11 Global Step: 187540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:50:27,220-Speed 5192.03 samples/sec Loss 1.7987 LearningRate 0.0192 Epoch: 11 Global Step: 187550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:50:29,190-Speed 5198.23 samples/sec Loss 1.8106 LearningRate 0.0192 Epoch: 11 Global Step: 187560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:50:31,159-Speed 5203.48 samples/sec Loss 1.8274 LearningRate 0.0192 Epoch: 11 Global Step: 187570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:50:33,135-Speed 5183.01 samples/sec Loss 1.8196 LearningRate 0.0192 Epoch: 11 Global Step: 187580 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:50:35,121-Speed 5159.85 samples/sec Loss 1.7898 LearningRate 0.0192 Epoch: 11 Global Step: 187590 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:50:37,086-Speed 5212.58 samples/sec Loss 1.8048 LearningRate 0.0192 Epoch: 11 Global Step: 187600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:50:39,061-Speed 5186.74 samples/sec Loss 1.8264 LearningRate 0.0192 Epoch: 11 Global Step: 187610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:50:41,034-Speed 5190.75 samples/sec Loss 1.7880 LearningRate 0.0192 Epoch: 11 Global Step: 187620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:50:43,003-Speed 5203.46 samples/sec Loss 1.7651 LearningRate 0.0192 Epoch: 11 Global Step: 187630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:50:44,990-Speed 5155.34 samples/sec Loss 1.7671 LearningRate 0.0192 Epoch: 11 Global Step: 187640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:50:46,975-Speed 5158.01 samples/sec Loss 1.8316 LearningRate 0.0192 Epoch: 11 Global Step: 187650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:50:48,963-Speed 5152.29 samples/sec Loss 1.8144 LearningRate 0.0192 Epoch: 11 Global Step: 187660 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:50:50,938-Speed 5187.21 samples/sec Loss 1.7531 LearningRate 0.0192 Epoch: 11 Global Step: 187670 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:50:52,920-Speed 5169.74 samples/sec Loss 1.7842 LearningRate 0.0192 Epoch: 11 Global Step: 187680 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:50:54,892-Speed 5193.78 samples/sec Loss 1.7331 LearningRate 0.0192 Epoch: 11 Global Step: 187690 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:50:56,864-Speed 5194.99 samples/sec Loss 1.7871 LearningRate 0.0192 Epoch: 11 Global Step: 187700 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:50:58,838-Speed 5188.07 samples/sec Loss 1.8098 LearningRate 0.0192 Epoch: 11 Global Step: 187710 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:51:00,840-Speed 5116.92 samples/sec Loss 1.7970 LearningRate 0.0192 Epoch: 11 Global Step: 187720 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:51:02,817-Speed 5182.99 samples/sec Loss 1.8048 LearningRate 0.0192 Epoch: 11 Global Step: 187730 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:51:04,801-Speed 5161.85 samples/sec Loss 1.7411 LearningRate 0.0191 Epoch: 11 Global Step: 187740 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:51:06,795-Speed 5138.31 samples/sec Loss 1.8008 LearningRate 0.0191 Epoch: 11 Global Step: 187750 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:51:08,777-Speed 5166.01 samples/sec Loss 1.8348 LearningRate 0.0191 Epoch: 11 Global Step: 187760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:51:10,760-Speed 5166.63 samples/sec Loss 1.8130 LearningRate 0.0191 Epoch: 11 Global Step: 187770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:51:12,742-Speed 5167.71 samples/sec Loss 1.7807 LearningRate 0.0191 Epoch: 11 Global Step: 187780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:51:14,721-Speed 5178.52 samples/sec Loss 1.8260 LearningRate 0.0191 Epoch: 11 Global Step: 187790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:51:16,692-Speed 5197.13 samples/sec Loss 1.8547 LearningRate 0.0191 Epoch: 11 Global Step: 187800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:51:18,666-Speed 5187.52 samples/sec Loss 1.7914 LearningRate 0.0191 Epoch: 11 Global Step: 187810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:51:20,638-Speed 5194.30 samples/sec Loss 1.7784 LearningRate 0.0191 Epoch: 11 Global Step: 187820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:51:22,609-Speed 5197.87 samples/sec Loss 1.8365 LearningRate 0.0191 Epoch: 11 Global Step: 187830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:51:24,587-Speed 5177.28 samples/sec Loss 1.7362 LearningRate 0.0191 Epoch: 11 Global Step: 187840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:51:26,571-Speed 5162.88 samples/sec Loss 1.8073 LearningRate 0.0191 Epoch: 11 Global Step: 187850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:51:28,545-Speed 5191.03 samples/sec Loss 1.8374 LearningRate 0.0191 Epoch: 11 Global Step: 187860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:51:30,520-Speed 5185.36 samples/sec Loss 1.7542 LearningRate 0.0191 Epoch: 11 Global Step: 187870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:51:32,487-Speed 5209.01 samples/sec Loss 1.8365 LearningRate 0.0191 Epoch: 11 Global Step: 187880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:51:34,469-Speed 5167.25 samples/sec Loss 1.8393 LearningRate 0.0191 Epoch: 11 Global Step: 187890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:51:36,439-Speed 5199.28 samples/sec Loss 1.7922 LearningRate 0.0191 Epoch: 11 Global Step: 187900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:51:38,434-Speed 5135.43 samples/sec Loss 1.8393 LearningRate 0.0191 Epoch: 11 Global Step: 187910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:51:40,430-Speed 5131.95 samples/sec Loss 1.7783 LearningRate 0.0191 Epoch: 11 Global Step: 187920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:51:42,399-Speed 5202.35 samples/sec Loss 1.8112 LearningRate 0.0191 Epoch: 11 Global Step: 187930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:51:44,371-Speed 5195.91 samples/sec Loss 1.8056 LearningRate 0.0191 Epoch: 11 Global Step: 187940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:51:46,345-Speed 5187.06 samples/sec Loss 1.8010 LearningRate 0.0191 Epoch: 11 Global Step: 187950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:51:48,324-Speed 5177.70 samples/sec Loss 1.8624 LearningRate 0.0191 Epoch: 11 Global Step: 187960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:51:50,311-Speed 5153.62 samples/sec Loss 1.7842 LearningRate 0.0191 Epoch: 11 Global Step: 187970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:51:52,287-Speed 5183.54 samples/sec Loss 1.8476 LearningRate 0.0191 Epoch: 11 Global Step: 187980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:51:54,267-Speed 5175.39 samples/sec Loss 1.8363 LearningRate 0.0191 Epoch: 11 Global Step: 187990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:51:56,253-Speed 5157.59 samples/sec Loss 1.8173 LearningRate 0.0191 Epoch: 11 Global Step: 188000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:52:22,975-[lfw][188000]XNorm: 23.291106 Training: 2022-04-11 11:52:22,976-[lfw][188000]Accuracy-Flip: 0.99800+-0.00267 Training: 2022-04-11 11:52:22,976-[lfw][188000]Accuracy-Highest: 0.99833 Training: 2022-04-11 11:52:53,646-[cfp_fp][188000]XNorm: 21.882795 Training: 2022-04-11 11:52:53,647-[cfp_fp][188000]Accuracy-Flip: 0.98714+-0.00429 Training: 2022-04-11 11:52:53,647-[cfp_fp][188000]Accuracy-Highest: 0.98714 Training: 2022-04-11 11:53:20,185-[agedb_30][188000]XNorm: 23.644459 Training: 2022-04-11 11:53:20,185-[agedb_30][188000]Accuracy-Flip: 0.98217+-0.00792 Training: 2022-04-11 11:53:20,186-[agedb_30][188000]Accuracy-Highest: 0.98250 Training: 2022-04-11 11:53:22,172-Speed 119.18 samples/sec Loss 1.8562 LearningRate 0.0191 Epoch: 11 Global Step: 188010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:53:24,138-Speed 5208.19 samples/sec Loss 1.8433 LearningRate 0.0191 Epoch: 11 Global Step: 188020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:53:26,129-Speed 5144.66 samples/sec Loss 1.7820 LearningRate 0.0191 Epoch: 11 Global Step: 188030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:53:28,103-Speed 5189.66 samples/sec Loss 1.8116 LearningRate 0.0191 Epoch: 11 Global Step: 188040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:53:30,102-Speed 5124.81 samples/sec Loss 1.8093 LearningRate 0.0191 Epoch: 11 Global Step: 188050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:53:32,068-Speed 5210.98 samples/sec Loss 1.8227 LearningRate 0.0191 Epoch: 11 Global Step: 188060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:53:34,043-Speed 5184.66 samples/sec Loss 1.7994 LearningRate 0.0191 Epoch: 11 Global Step: 188070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:53:36,025-Speed 5167.20 samples/sec Loss 1.8339 LearningRate 0.0191 Epoch: 11 Global Step: 188080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:53:37,998-Speed 5192.40 samples/sec Loss 1.8008 LearningRate 0.0191 Epoch: 11 Global Step: 188090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:53:39,991-Speed 5140.66 samples/sec Loss 1.8577 LearningRate 0.0191 Epoch: 11 Global Step: 188100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:53:41,976-Speed 5159.70 samples/sec Loss 1.8202 LearningRate 0.0191 Epoch: 11 Global Step: 188110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:53:43,945-Speed 5201.96 samples/sec Loss 1.8035 LearningRate 0.0190 Epoch: 11 Global Step: 188120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:53:45,927-Speed 5168.35 samples/sec Loss 1.8026 LearningRate 0.0190 Epoch: 11 Global Step: 188130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:53:47,904-Speed 5182.52 samples/sec Loss 1.8092 LearningRate 0.0190 Epoch: 11 Global Step: 188140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:53:49,884-Speed 5172.58 samples/sec Loss 1.8112 LearningRate 0.0190 Epoch: 11 Global Step: 188150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:53:51,877-Speed 5139.38 samples/sec Loss 1.8035 LearningRate 0.0190 Epoch: 11 Global Step: 188160 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:53:53,855-Speed 5179.74 samples/sec Loss 1.8103 LearningRate 0.0190 Epoch: 11 Global Step: 188170 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:53:55,829-Speed 5190.06 samples/sec Loss 1.8679 LearningRate 0.0190 Epoch: 11 Global Step: 188180 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:53:57,813-Speed 5162.90 samples/sec Loss 1.8300 LearningRate 0.0190 Epoch: 11 Global Step: 188190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:53:59,801-Speed 5152.30 samples/sec Loss 1.8217 LearningRate 0.0190 Epoch: 11 Global Step: 188200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:54:01,789-Speed 5151.18 samples/sec Loss 1.8487 LearningRate 0.0190 Epoch: 11 Global Step: 188210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:54:03,773-Speed 5162.93 samples/sec Loss 1.7971 LearningRate 0.0190 Epoch: 11 Global Step: 188220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:54:05,751-Speed 5180.36 samples/sec Loss 1.7920 LearningRate 0.0190 Epoch: 11 Global Step: 188230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:54:07,733-Speed 5169.21 samples/sec Loss 1.8263 LearningRate 0.0190 Epoch: 11 Global Step: 188240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:54:09,708-Speed 5184.74 samples/sec Loss 1.8132 LearningRate 0.0190 Epoch: 11 Global Step: 188250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:54:11,692-Speed 5164.61 samples/sec Loss 1.7619 LearningRate 0.0190 Epoch: 11 Global Step: 188260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:54:13,672-Speed 5173.74 samples/sec Loss 1.8397 LearningRate 0.0190 Epoch: 11 Global Step: 188270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:54:15,658-Speed 5157.41 samples/sec Loss 1.8399 LearningRate 0.0190 Epoch: 11 Global Step: 188280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:54:17,654-Speed 5130.74 samples/sec Loss 1.8339 LearningRate 0.0190 Epoch: 11 Global Step: 188290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:54:19,637-Speed 5166.58 samples/sec Loss 1.8259 LearningRate 0.0190 Epoch: 11 Global Step: 188300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:54:21,627-Speed 5146.85 samples/sec Loss 1.8334 LearningRate 0.0190 Epoch: 11 Global Step: 188310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:54:23,608-Speed 5169.34 samples/sec Loss 1.8211 LearningRate 0.0190 Epoch: 11 Global Step: 188320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:54:25,599-Speed 5145.71 samples/sec Loss 1.8076 LearningRate 0.0190 Epoch: 11 Global Step: 188330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:54:27,587-Speed 5153.93 samples/sec Loss 1.7902 LearningRate 0.0190 Epoch: 11 Global Step: 188340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:54:29,565-Speed 5178.27 samples/sec Loss 1.7546 LearningRate 0.0190 Epoch: 11 Global Step: 188350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:54:31,544-Speed 5175.96 samples/sec Loss 1.8731 LearningRate 0.0190 Epoch: 11 Global Step: 188360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:54:33,537-Speed 5142.62 samples/sec Loss 1.7843 LearningRate 0.0190 Epoch: 11 Global Step: 188370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:54:35,530-Speed 5139.68 samples/sec Loss 1.7738 LearningRate 0.0190 Epoch: 11 Global Step: 188380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:54:37,520-Speed 5147.17 samples/sec Loss 1.7981 LearningRate 0.0190 Epoch: 11 Global Step: 188390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:54:39,505-Speed 5159.82 samples/sec Loss 1.8247 LearningRate 0.0190 Epoch: 11 Global Step: 188400 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:54:41,490-Speed 5160.01 samples/sec Loss 1.8072 LearningRate 0.0190 Epoch: 11 Global Step: 188410 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:54:43,477-Speed 5155.47 samples/sec Loss 1.7907 LearningRate 0.0190 Epoch: 11 Global Step: 188420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:54:45,457-Speed 5172.30 samples/sec Loss 1.8195 LearningRate 0.0190 Epoch: 11 Global Step: 188430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:54:47,432-Speed 5189.07 samples/sec Loss 1.7852 LearningRate 0.0190 Epoch: 11 Global Step: 188440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:54:49,411-Speed 5175.27 samples/sec Loss 1.8180 LearningRate 0.0190 Epoch: 11 Global Step: 188450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:54:51,399-Speed 5153.16 samples/sec Loss 1.7676 LearningRate 0.0190 Epoch: 11 Global Step: 188460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:54:53,388-Speed 5150.63 samples/sec Loss 1.8132 LearningRate 0.0190 Epoch: 11 Global Step: 188470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:54:55,368-Speed 5173.95 samples/sec Loss 1.8197 LearningRate 0.0190 Epoch: 11 Global Step: 188480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:54:57,346-Speed 5178.77 samples/sec Loss 1.8037 LearningRate 0.0190 Epoch: 11 Global Step: 188490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:54:59,339-Speed 5137.31 samples/sec Loss 1.8047 LearningRate 0.0190 Epoch: 11 Global Step: 188500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:55:01,335-Speed 5132.98 samples/sec Loss 1.7743 LearningRate 0.0189 Epoch: 11 Global Step: 188510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:55:03,314-Speed 5177.17 samples/sec Loss 1.8407 LearningRate 0.0189 Epoch: 11 Global Step: 188520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:55:05,306-Speed 5140.62 samples/sec Loss 1.8137 LearningRate 0.0189 Epoch: 11 Global Step: 188530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:55:07,285-Speed 5176.94 samples/sec Loss 1.8668 LearningRate 0.0189 Epoch: 11 Global Step: 188540 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:55:09,270-Speed 5160.47 samples/sec Loss 1.8231 LearningRate 0.0189 Epoch: 11 Global Step: 188550 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:55:11,245-Speed 5188.46 samples/sec Loss 1.8433 LearningRate 0.0189 Epoch: 11 Global Step: 188560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:55:13,226-Speed 5169.73 samples/sec Loss 1.7639 LearningRate 0.0189 Epoch: 11 Global Step: 188570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:55:15,219-Speed 5139.92 samples/sec Loss 1.8359 LearningRate 0.0189 Epoch: 11 Global Step: 188580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:55:17,203-Speed 5162.16 samples/sec Loss 1.8075 LearningRate 0.0189 Epoch: 11 Global Step: 188590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:55:19,180-Speed 5182.04 samples/sec Loss 1.8246 LearningRate 0.0189 Epoch: 11 Global Step: 188600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:55:21,175-Speed 5134.73 samples/sec Loss 1.8183 LearningRate 0.0189 Epoch: 11 Global Step: 188610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:55:23,165-Speed 5147.02 samples/sec Loss 1.8302 LearningRate 0.0189 Epoch: 11 Global Step: 188620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:55:25,148-Speed 5165.48 samples/sec Loss 1.8614 LearningRate 0.0189 Epoch: 11 Global Step: 188630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:55:27,140-Speed 5141.87 samples/sec Loss 1.8038 LearningRate 0.0189 Epoch: 11 Global Step: 188640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:55:29,117-Speed 5181.09 samples/sec Loss 1.7619 LearningRate 0.0189 Epoch: 11 Global Step: 188650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:55:31,092-Speed 5188.84 samples/sec Loss 1.7955 LearningRate 0.0189 Epoch: 11 Global Step: 188660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:55:33,066-Speed 5187.88 samples/sec Loss 1.8174 LearningRate 0.0189 Epoch: 11 Global Step: 188670 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:55:35,058-Speed 5141.65 samples/sec Loss 1.7769 LearningRate 0.0189 Epoch: 11 Global Step: 188680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:55:37,045-Speed 5155.62 samples/sec Loss 1.8242 LearningRate 0.0189 Epoch: 11 Global Step: 188690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:55:39,025-Speed 5173.37 samples/sec Loss 1.7755 LearningRate 0.0189 Epoch: 11 Global Step: 188700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:55:41,009-Speed 5162.11 samples/sec Loss 1.8026 LearningRate 0.0189 Epoch: 11 Global Step: 188710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:55:42,989-Speed 5174.19 samples/sec Loss 1.8295 LearningRate 0.0189 Epoch: 11 Global Step: 188720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:55:44,969-Speed 5173.81 samples/sec Loss 1.8201 LearningRate 0.0189 Epoch: 11 Global Step: 188730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:55:46,925-Speed 5235.68 samples/sec Loss 1.8196 LearningRate 0.0189 Epoch: 11 Global Step: 188740 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-11 11:55:48,920-Speed 5136.94 samples/sec Loss 1.8322 LearningRate 0.0189 Epoch: 11 Global Step: 188750 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-11 11:55:50,912-Speed 5140.42 samples/sec Loss 1.8939 LearningRate 0.0189 Epoch: 11 Global Step: 188760 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-11 11:55:52,890-Speed 5180.45 samples/sec Loss 1.8556 LearningRate 0.0189 Epoch: 11 Global Step: 188770 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-11 11:55:54,864-Speed 5189.15 samples/sec Loss 1.7777 LearningRate 0.0189 Epoch: 11 Global Step: 188780 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-11 11:55:56,846-Speed 5167.84 samples/sec Loss 1.7966 LearningRate 0.0189 Epoch: 11 Global Step: 188790 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-11 11:55:58,819-Speed 5191.42 samples/sec Loss 1.8948 LearningRate 0.0189 Epoch: 11 Global Step: 188800 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-11 11:56:00,810-Speed 5143.93 samples/sec Loss 1.8742 LearningRate 0.0189 Epoch: 11 Global Step: 188810 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-11 11:56:02,808-Speed 5127.91 samples/sec Loss 1.8091 LearningRate 0.0189 Epoch: 11 Global Step: 188820 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-11 11:56:04,782-Speed 5189.98 samples/sec Loss 1.8020 LearningRate 0.0189 Epoch: 11 Global Step: 188830 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-11 11:56:06,777-Speed 5133.97 samples/sec Loss 1.8586 LearningRate 0.0189 Epoch: 11 Global Step: 188840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:56:08,746-Speed 5201.28 samples/sec Loss 1.8367 LearningRate 0.0189 Epoch: 11 Global Step: 188850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:56:10,736-Speed 5147.60 samples/sec Loss 1.8474 LearningRate 0.0189 Epoch: 11 Global Step: 188860 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:56:12,712-Speed 5184.19 samples/sec Loss 1.8415 LearningRate 0.0189 Epoch: 11 Global Step: 188870 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:56:14,690-Speed 5177.76 samples/sec Loss 1.8291 LearningRate 0.0189 Epoch: 11 Global Step: 188880 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:56:16,672-Speed 5167.87 samples/sec Loss 1.8680 LearningRate 0.0188 Epoch: 11 Global Step: 188890 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:56:18,685-Speed 5090.91 samples/sec Loss 1.7899 LearningRate 0.0188 Epoch: 11 Global Step: 188900 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:56:20,679-Speed 5136.43 samples/sec Loss 1.8228 LearningRate 0.0188 Epoch: 11 Global Step: 188910 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:56:22,667-Speed 5152.09 samples/sec Loss 1.8976 LearningRate 0.0188 Epoch: 11 Global Step: 188920 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:56:24,666-Speed 5123.43 samples/sec Loss 1.8173 LearningRate 0.0188 Epoch: 11 Global Step: 188930 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 11:56:26,643-Speed 5181.83 samples/sec Loss 1.8694 LearningRate 0.0188 Epoch: 11 Global Step: 188940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:56:28,622-Speed 5175.97 samples/sec Loss 1.8578 LearningRate 0.0188 Epoch: 11 Global Step: 188950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:56:30,604-Speed 5170.51 samples/sec Loss 1.8642 LearningRate 0.0188 Epoch: 11 Global Step: 188960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:56:32,579-Speed 5185.39 samples/sec Loss 1.8629 LearningRate 0.0188 Epoch: 11 Global Step: 188970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:56:34,566-Speed 5155.41 samples/sec Loss 1.8119 LearningRate 0.0188 Epoch: 11 Global Step: 188980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:56:36,556-Speed 5147.45 samples/sec Loss 1.8449 LearningRate 0.0188 Epoch: 11 Global Step: 188990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:56:38,540-Speed 5162.05 samples/sec Loss 1.8152 LearningRate 0.0188 Epoch: 11 Global Step: 189000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:56:40,525-Speed 5160.87 samples/sec Loss 1.8222 LearningRate 0.0188 Epoch: 11 Global Step: 189010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:56:42,505-Speed 5173.62 samples/sec Loss 1.8548 LearningRate 0.0188 Epoch: 11 Global Step: 189020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:56:44,503-Speed 5126.62 samples/sec Loss 1.8985 LearningRate 0.0188 Epoch: 11 Global Step: 189030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:56:46,500-Speed 5128.79 samples/sec Loss 1.8334 LearningRate 0.0188 Epoch: 11 Global Step: 189040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:56:48,477-Speed 5180.95 samples/sec Loss 1.8080 LearningRate 0.0188 Epoch: 11 Global Step: 189050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:56:50,459-Speed 5170.50 samples/sec Loss 1.8974 LearningRate 0.0188 Epoch: 11 Global Step: 189060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:56:52,442-Speed 5164.51 samples/sec Loss 1.8668 LearningRate 0.0188 Epoch: 11 Global Step: 189070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:56:54,420-Speed 5179.48 samples/sec Loss 1.7801 LearningRate 0.0188 Epoch: 11 Global Step: 189080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:56:56,399-Speed 5175.91 samples/sec Loss 1.8317 LearningRate 0.0188 Epoch: 11 Global Step: 189090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:56:58,383-Speed 5162.13 samples/sec Loss 1.7737 LearningRate 0.0188 Epoch: 11 Global Step: 189100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:57:00,361-Speed 5179.75 samples/sec Loss 1.8189 LearningRate 0.0188 Epoch: 11 Global Step: 189110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:57:02,351-Speed 5146.14 samples/sec Loss 1.8391 LearningRate 0.0188 Epoch: 11 Global Step: 189120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:57:04,327-Speed 5186.14 samples/sec Loss 1.8341 LearningRate 0.0188 Epoch: 11 Global Step: 189130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:57:06,329-Speed 5115.84 samples/sec Loss 1.8222 LearningRate 0.0188 Epoch: 11 Global Step: 189140 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:57:08,322-Speed 5138.00 samples/sec Loss 1.8467 LearningRate 0.0188 Epoch: 11 Global Step: 189150 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:57:10,293-Speed 5198.08 samples/sec Loss 1.8409 LearningRate 0.0188 Epoch: 11 Global Step: 189160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:57:12,275-Speed 5170.72 samples/sec Loss 1.9073 LearningRate 0.0188 Epoch: 11 Global Step: 189170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:57:14,256-Speed 5170.52 samples/sec Loss 1.9005 LearningRate 0.0188 Epoch: 11 Global Step: 189180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:57:16,237-Speed 5169.60 samples/sec Loss 1.8487 LearningRate 0.0188 Epoch: 11 Global Step: 189190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:57:18,216-Speed 5175.66 samples/sec Loss 1.8198 LearningRate 0.0188 Epoch: 11 Global Step: 189200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:57:20,192-Speed 5183.34 samples/sec Loss 1.8599 LearningRate 0.0188 Epoch: 11 Global Step: 189210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:57:22,169-Speed 5181.23 samples/sec Loss 1.8356 LearningRate 0.0188 Epoch: 11 Global Step: 189220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:57:24,167-Speed 5127.20 samples/sec Loss 1.8780 LearningRate 0.0188 Epoch: 11 Global Step: 189230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:57:26,143-Speed 5185.60 samples/sec Loss 1.8472 LearningRate 0.0188 Epoch: 11 Global Step: 189240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:57:28,126-Speed 5164.75 samples/sec Loss 1.9045 LearningRate 0.0188 Epoch: 11 Global Step: 189250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:57:30,109-Speed 5164.57 samples/sec Loss 1.8855 LearningRate 0.0188 Epoch: 11 Global Step: 189260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:57:32,083-Speed 5191.48 samples/sec Loss 1.8697 LearningRate 0.0188 Epoch: 11 Global Step: 189270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:57:34,058-Speed 5186.06 samples/sec Loss 1.8541 LearningRate 0.0187 Epoch: 11 Global Step: 189280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:57:36,050-Speed 5142.31 samples/sec Loss 1.7869 LearningRate 0.0187 Epoch: 11 Global Step: 189290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:57:38,053-Speed 5112.88 samples/sec Loss 1.8405 LearningRate 0.0187 Epoch: 11 Global Step: 189300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:57:40,038-Speed 5161.19 samples/sec Loss 1.8357 LearningRate 0.0187 Epoch: 11 Global Step: 189310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:57:42,007-Speed 5201.07 samples/sec Loss 1.8577 LearningRate 0.0187 Epoch: 11 Global Step: 189320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:57:43,986-Speed 5175.85 samples/sec Loss 1.8225 LearningRate 0.0187 Epoch: 11 Global Step: 189330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:57:45,962-Speed 5185.26 samples/sec Loss 1.8365 LearningRate 0.0187 Epoch: 11 Global Step: 189340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:57:47,965-Speed 5114.96 samples/sec Loss 1.8228 LearningRate 0.0187 Epoch: 11 Global Step: 189350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:57:49,967-Speed 5115.60 samples/sec Loss 1.8098 LearningRate 0.0187 Epoch: 11 Global Step: 189360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:57:51,946-Speed 5176.24 samples/sec Loss 1.8101 LearningRate 0.0187 Epoch: 11 Global Step: 189370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:57:53,939-Speed 5138.87 samples/sec Loss 1.8558 LearningRate 0.0187 Epoch: 11 Global Step: 189380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:57:55,916-Speed 5183.06 samples/sec Loss 1.8505 LearningRate 0.0187 Epoch: 11 Global Step: 189390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:57:57,896-Speed 5171.89 samples/sec Loss 1.8471 LearningRate 0.0187 Epoch: 11 Global Step: 189400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:57:59,881-Speed 5160.90 samples/sec Loss 1.8770 LearningRate 0.0187 Epoch: 11 Global Step: 189410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:58:01,859-Speed 5179.56 samples/sec Loss 1.8838 LearningRate 0.0187 Epoch: 11 Global Step: 189420 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:58:03,852-Speed 5139.68 samples/sec Loss 1.7698 LearningRate 0.0187 Epoch: 11 Global Step: 189430 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:58:05,832-Speed 5173.66 samples/sec Loss 1.8687 LearningRate 0.0187 Epoch: 11 Global Step: 189440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:58:07,804-Speed 5193.21 samples/sec Loss 1.8585 LearningRate 0.0187 Epoch: 11 Global Step: 189450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:58:09,781-Speed 5181.51 samples/sec Loss 1.8096 LearningRate 0.0187 Epoch: 11 Global Step: 189460 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:58:11,768-Speed 5156.23 samples/sec Loss 1.8575 LearningRate 0.0187 Epoch: 11 Global Step: 189470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:58:13,761-Speed 5137.81 samples/sec Loss 1.8704 LearningRate 0.0187 Epoch: 11 Global Step: 189480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:58:15,756-Speed 5134.50 samples/sec Loss 1.8344 LearningRate 0.0187 Epoch: 11 Global Step: 189490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:58:17,749-Speed 5140.37 samples/sec Loss 1.8870 LearningRate 0.0187 Epoch: 11 Global Step: 189500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:58:19,720-Speed 5199.00 samples/sec Loss 1.8658 LearningRate 0.0187 Epoch: 11 Global Step: 189510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:58:21,693-Speed 5190.31 samples/sec Loss 1.8810 LearningRate 0.0187 Epoch: 11 Global Step: 189520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:58:23,673-Speed 5173.77 samples/sec Loss 1.8700 LearningRate 0.0187 Epoch: 11 Global Step: 189530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:58:25,645-Speed 5194.46 samples/sec Loss 1.8475 LearningRate 0.0187 Epoch: 11 Global Step: 189540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:58:27,625-Speed 5174.29 samples/sec Loss 1.9613 LearningRate 0.0187 Epoch: 11 Global Step: 189550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:58:29,614-Speed 5148.66 samples/sec Loss 1.8725 LearningRate 0.0187 Epoch: 11 Global Step: 189560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:58:31,605-Speed 5143.86 samples/sec Loss 1.8621 LearningRate 0.0187 Epoch: 11 Global Step: 189570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:58:33,593-Speed 5154.86 samples/sec Loss 1.8543 LearningRate 0.0187 Epoch: 11 Global Step: 189580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:58:35,572-Speed 5174.47 samples/sec Loss 1.8162 LearningRate 0.0187 Epoch: 11 Global Step: 189590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:58:37,569-Speed 5129.18 samples/sec Loss 1.8184 LearningRate 0.0187 Epoch: 11 Global Step: 189600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:58:39,543-Speed 5190.84 samples/sec Loss 1.8687 LearningRate 0.0187 Epoch: 11 Global Step: 189610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:58:41,524-Speed 5169.68 samples/sec Loss 1.8450 LearningRate 0.0187 Epoch: 11 Global Step: 189620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:58:43,499-Speed 5186.16 samples/sec Loss 1.7972 LearningRate 0.0187 Epoch: 11 Global Step: 189630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:58:45,480-Speed 5172.32 samples/sec Loss 1.7979 LearningRate 0.0187 Epoch: 11 Global Step: 189640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:58:47,472-Speed 5141.92 samples/sec Loss 1.8157 LearningRate 0.0187 Epoch: 11 Global Step: 189650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:58:49,472-Speed 5121.78 samples/sec Loss 1.8593 LearningRate 0.0186 Epoch: 11 Global Step: 189660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:58:51,458-Speed 5156.52 samples/sec Loss 1.8221 LearningRate 0.0186 Epoch: 11 Global Step: 189670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:58:53,451-Speed 5140.85 samples/sec Loss 1.8680 LearningRate 0.0186 Epoch: 11 Global Step: 189680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:58:55,443-Speed 5140.82 samples/sec Loss 1.8458 LearningRate 0.0186 Epoch: 11 Global Step: 189690 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:58:57,420-Speed 5182.68 samples/sec Loss 1.8018 LearningRate 0.0186 Epoch: 11 Global Step: 189700 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:58:59,395-Speed 5185.68 samples/sec Loss 1.9102 LearningRate 0.0186 Epoch: 11 Global Step: 189710 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:59:01,386-Speed 5146.56 samples/sec Loss 1.8463 LearningRate 0.0186 Epoch: 11 Global Step: 189720 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:59:03,367-Speed 5169.71 samples/sec Loss 1.9025 LearningRate 0.0186 Epoch: 11 Global Step: 189730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:59:05,359-Speed 5142.88 samples/sec Loss 1.8492 LearningRate 0.0186 Epoch: 11 Global Step: 189740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:59:07,337-Speed 5179.86 samples/sec Loss 1.7827 LearningRate 0.0186 Epoch: 11 Global Step: 189750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:59:09,310-Speed 5191.46 samples/sec Loss 1.9124 LearningRate 0.0186 Epoch: 11 Global Step: 189760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:59:11,304-Speed 5136.07 samples/sec Loss 1.8687 LearningRate 0.0186 Epoch: 11 Global Step: 189770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:59:13,304-Speed 5122.45 samples/sec Loss 1.8012 LearningRate 0.0186 Epoch: 11 Global Step: 189780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:59:15,294-Speed 5146.61 samples/sec Loss 1.8867 LearningRate 0.0186 Epoch: 11 Global Step: 189790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:59:17,288-Speed 5136.54 samples/sec Loss 1.7637 LearningRate 0.0186 Epoch: 11 Global Step: 189800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:59:19,270-Speed 5168.58 samples/sec Loss 1.8974 LearningRate 0.0186 Epoch: 11 Global Step: 189810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:59:21,245-Speed 5185.62 samples/sec Loss 1.9190 LearningRate 0.0186 Epoch: 11 Global Step: 189820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 11:59:23,223-Speed 5180.16 samples/sec Loss 1.8752 LearningRate 0.0186 Epoch: 11 Global Step: 189830 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:59:25,207-Speed 5163.83 samples/sec Loss 1.8342 LearningRate 0.0186 Epoch: 11 Global Step: 189840 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:59:27,181-Speed 5188.45 samples/sec Loss 1.8108 LearningRate 0.0186 Epoch: 11 Global Step: 189850 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:59:29,166-Speed 5162.10 samples/sec Loss 1.8391 LearningRate 0.0186 Epoch: 11 Global Step: 189860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:59:31,144-Speed 5177.89 samples/sec Loss 1.8526 LearningRate 0.0186 Epoch: 11 Global Step: 189870 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:59:33,138-Speed 5135.69 samples/sec Loss 1.8820 LearningRate 0.0186 Epoch: 11 Global Step: 189880 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:59:35,127-Speed 5150.31 samples/sec Loss 1.8711 LearningRate 0.0186 Epoch: 11 Global Step: 189890 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:59:37,116-Speed 5149.99 samples/sec Loss 1.8975 LearningRate 0.0186 Epoch: 11 Global Step: 189900 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:59:39,096-Speed 5174.71 samples/sec Loss 1.8345 LearningRate 0.0186 Epoch: 11 Global Step: 189910 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:59:41,084-Speed 5150.55 samples/sec Loss 1.8242 LearningRate 0.0186 Epoch: 11 Global Step: 189920 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:59:43,064-Speed 5175.02 samples/sec Loss 1.8445 LearningRate 0.0186 Epoch: 11 Global Step: 189930 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:59:45,041-Speed 5181.12 samples/sec Loss 1.9008 LearningRate 0.0186 Epoch: 11 Global Step: 189940 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:59:47,018-Speed 5181.05 samples/sec Loss 1.8298 LearningRate 0.0186 Epoch: 11 Global Step: 189950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:59:49,013-Speed 5136.28 samples/sec Loss 1.8272 LearningRate 0.0186 Epoch: 11 Global Step: 189960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:59:51,008-Speed 5132.71 samples/sec Loss 1.8411 LearningRate 0.0186 Epoch: 11 Global Step: 189970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:59:53,002-Speed 5137.29 samples/sec Loss 1.8414 LearningRate 0.0186 Epoch: 11 Global Step: 189980 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:59:54,988-Speed 5157.74 samples/sec Loss 1.8543 LearningRate 0.0186 Epoch: 11 Global Step: 189990 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 11:59:56,965-Speed 5181.29 samples/sec Loss 1.8959 LearningRate 0.0186 Epoch: 11 Global Step: 190000 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:00:23,615-[lfw][190000]XNorm: 21.922614 Training: 2022-04-11 12:00:23,615-[lfw][190000]Accuracy-Flip: 0.99800+-0.00287 Training: 2022-04-11 12:00:23,616-[lfw][190000]Accuracy-Highest: 0.99833 Training: 2022-04-11 12:00:54,308-[cfp_fp][190000]XNorm: 20.832989 Training: 2022-04-11 12:00:54,308-[cfp_fp][190000]Accuracy-Flip: 0.98529+-0.00332 Training: 2022-04-11 12:00:54,309-[cfp_fp][190000]Accuracy-Highest: 0.98714 Training: 2022-04-11 12:01:21,003-[agedb_30][190000]XNorm: 21.731463 Training: 2022-04-11 12:01:21,004-[agedb_30][190000]Accuracy-Flip: 0.97933+-0.00786 Training: 2022-04-11 12:01:21,004-[agedb_30][190000]Accuracy-Highest: 0.98250 Training: 2022-04-11 12:01:22,986-Speed 119.04 samples/sec Loss 1.8828 LearningRate 0.0186 Epoch: 11 Global Step: 190010 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:01:24,960-Speed 5187.71 samples/sec Loss 1.8605 LearningRate 0.0186 Epoch: 11 Global Step: 190020 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:01:26,916-Speed 5237.87 samples/sec Loss 1.8698 LearningRate 0.0186 Epoch: 11 Global Step: 190030 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:01:28,883-Speed 5207.70 samples/sec Loss 1.9177 LearningRate 0.0186 Epoch: 11 Global Step: 190040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:01:30,856-Speed 5190.28 samples/sec Loss 1.9056 LearningRate 0.0185 Epoch: 11 Global Step: 190050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:01:32,825-Speed 5204.77 samples/sec Loss 1.8759 LearningRate 0.0185 Epoch: 11 Global Step: 190060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:01:34,793-Speed 5204.77 samples/sec Loss 1.8738 LearningRate 0.0185 Epoch: 11 Global Step: 190070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:01:36,788-Speed 5135.46 samples/sec Loss 1.8106 LearningRate 0.0185 Epoch: 11 Global Step: 190080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:01:38,758-Speed 5198.78 samples/sec Loss 1.8655 LearningRate 0.0185 Epoch: 11 Global Step: 190090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:01:40,729-Speed 5196.99 samples/sec Loss 1.8860 LearningRate 0.0185 Epoch: 11 Global Step: 190100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:01:42,697-Speed 5204.92 samples/sec Loss 1.8843 LearningRate 0.0185 Epoch: 11 Global Step: 190110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:01:44,676-Speed 5175.86 samples/sec Loss 1.8245 LearningRate 0.0185 Epoch: 11 Global Step: 190120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:01:46,670-Speed 5137.50 samples/sec Loss 1.8511 LearningRate 0.0185 Epoch: 11 Global Step: 190130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:01:48,662-Speed 5141.56 samples/sec Loss 1.8380 LearningRate 0.0185 Epoch: 11 Global Step: 190140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:01:50,656-Speed 5135.77 samples/sec Loss 1.8957 LearningRate 0.0185 Epoch: 11 Global Step: 190150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:01:52,652-Speed 5132.23 samples/sec Loss 1.8688 LearningRate 0.0185 Epoch: 11 Global Step: 190160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:01:54,633-Speed 5170.42 samples/sec Loss 1.8863 LearningRate 0.0185 Epoch: 11 Global Step: 190170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:01:56,606-Speed 5192.51 samples/sec Loss 1.8483 LearningRate 0.0185 Epoch: 11 Global Step: 190180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:01:58,605-Speed 5125.52 samples/sec Loss 1.8521 LearningRate 0.0185 Epoch: 11 Global Step: 190190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:02:00,600-Speed 5133.07 samples/sec Loss 1.8662 LearningRate 0.0185 Epoch: 11 Global Step: 190200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:02:02,585-Speed 5163.28 samples/sec Loss 1.9329 LearningRate 0.0185 Epoch: 11 Global Step: 190210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:02:04,565-Speed 5170.90 samples/sec Loss 1.8725 LearningRate 0.0185 Epoch: 11 Global Step: 190220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:02:06,548-Speed 5165.32 samples/sec Loss 1.8599 LearningRate 0.0185 Epoch: 11 Global Step: 190230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:02:08,524-Speed 5183.79 samples/sec Loss 1.8359 LearningRate 0.0185 Epoch: 11 Global Step: 190240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:02:10,513-Speed 5152.21 samples/sec Loss 1.9017 LearningRate 0.0185 Epoch: 11 Global Step: 190250 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:02:12,503-Speed 5146.72 samples/sec Loss 1.8816 LearningRate 0.0185 Epoch: 11 Global Step: 190260 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:02:14,490-Speed 5155.13 samples/sec Loss 1.8582 LearningRate 0.0185 Epoch: 11 Global Step: 190270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:02:16,475-Speed 5160.57 samples/sec Loss 1.8655 LearningRate 0.0185 Epoch: 11 Global Step: 190280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:02:18,459-Speed 5163.20 samples/sec Loss 1.8572 LearningRate 0.0185 Epoch: 11 Global Step: 190290 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:02:20,440-Speed 5170.32 samples/sec Loss 1.8540 LearningRate 0.0185 Epoch: 11 Global Step: 190300 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:02:22,424-Speed 5164.57 samples/sec Loss 1.9400 LearningRate 0.0185 Epoch: 11 Global Step: 190310 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:02:24,402-Speed 5178.87 samples/sec Loss 1.8457 LearningRate 0.0185 Epoch: 11 Global Step: 190320 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:02:26,400-Speed 5126.90 samples/sec Loss 1.9153 LearningRate 0.0185 Epoch: 11 Global Step: 190330 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:02:28,400-Speed 5121.97 samples/sec Loss 1.8713 LearningRate 0.0185 Epoch: 11 Global Step: 190340 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:02:30,379-Speed 5174.63 samples/sec Loss 1.9038 LearningRate 0.0185 Epoch: 11 Global Step: 190350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:02:32,361-Speed 5169.51 samples/sec Loss 1.8561 LearningRate 0.0185 Epoch: 11 Global Step: 190360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:02:34,346-Speed 5158.35 samples/sec Loss 1.8477 LearningRate 0.0185 Epoch: 11 Global Step: 190370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:02:36,355-Speed 5099.80 samples/sec Loss 1.8833 LearningRate 0.0185 Epoch: 11 Global Step: 190380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:02:38,344-Speed 5150.04 samples/sec Loss 1.8896 LearningRate 0.0185 Epoch: 11 Global Step: 190390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:02:40,324-Speed 5173.52 samples/sec Loss 1.8639 LearningRate 0.0185 Epoch: 11 Global Step: 190400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:02:42,309-Speed 5161.63 samples/sec Loss 1.9622 LearningRate 0.0185 Epoch: 11 Global Step: 190410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:02:44,287-Speed 5179.33 samples/sec Loss 1.8065 LearningRate 0.0185 Epoch: 11 Global Step: 190420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:02:46,268-Speed 5168.69 samples/sec Loss 1.8138 LearningRate 0.0185 Epoch: 11 Global Step: 190430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:02:48,259-Speed 5144.45 samples/sec Loss 1.8561 LearningRate 0.0184 Epoch: 11 Global Step: 190440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:02:50,260-Speed 5120.60 samples/sec Loss 1.8719 LearningRate 0.0184 Epoch: 11 Global Step: 190450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:02:52,244-Speed 5161.57 samples/sec Loss 1.8892 LearningRate 0.0184 Epoch: 11 Global Step: 190460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:02:54,230-Speed 5157.95 samples/sec Loss 1.8607 LearningRate 0.0184 Epoch: 11 Global Step: 190470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:02:56,218-Speed 5152.80 samples/sec Loss 1.8254 LearningRate 0.0184 Epoch: 11 Global Step: 190480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:02:58,193-Speed 5186.42 samples/sec Loss 1.8363 LearningRate 0.0184 Epoch: 11 Global Step: 190490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:03:00,173-Speed 5173.65 samples/sec Loss 1.8045 LearningRate 0.0184 Epoch: 11 Global Step: 190500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:03:02,159-Speed 5158.49 samples/sec Loss 1.8145 LearningRate 0.0184 Epoch: 11 Global Step: 190510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:03:04,133-Speed 5188.92 samples/sec Loss 1.8166 LearningRate 0.0184 Epoch: 11 Global Step: 190520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:03:06,100-Speed 5208.78 samples/sec Loss 1.8665 LearningRate 0.0184 Epoch: 11 Global Step: 190530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:03:08,063-Speed 5217.86 samples/sec Loss 1.8780 LearningRate 0.0184 Epoch: 11 Global Step: 190540 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:03:10,032-Speed 5203.08 samples/sec Loss 1.9120 LearningRate 0.0184 Epoch: 11 Global Step: 190550 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:03:12,019-Speed 5155.77 samples/sec Loss 1.8877 LearningRate 0.0184 Epoch: 11 Global Step: 190560 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:03:13,986-Speed 5207.43 samples/sec Loss 1.7915 LearningRate 0.0184 Epoch: 11 Global Step: 190570 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:03:15,960-Speed 5188.70 samples/sec Loss 1.8656 LearningRate 0.0184 Epoch: 11 Global Step: 190580 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:03:17,928-Speed 5204.38 samples/sec Loss 1.8378 LearningRate 0.0184 Epoch: 11 Global Step: 190590 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:03:19,905-Speed 5182.83 samples/sec Loss 1.8424 LearningRate 0.0184 Epoch: 11 Global Step: 190600 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:03:21,880-Speed 5187.31 samples/sec Loss 1.8446 LearningRate 0.0184 Epoch: 11 Global Step: 190610 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:03:23,865-Speed 5160.01 samples/sec Loss 1.9088 LearningRate 0.0184 Epoch: 11 Global Step: 190620 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:03:25,872-Speed 5103.89 samples/sec Loss 1.9130 LearningRate 0.0184 Epoch: 11 Global Step: 190630 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:03:27,857-Speed 5162.01 samples/sec Loss 1.8504 LearningRate 0.0184 Epoch: 11 Global Step: 190640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:03:29,847-Speed 5145.85 samples/sec Loss 1.8501 LearningRate 0.0184 Epoch: 11 Global Step: 190650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:03:31,813-Speed 5212.81 samples/sec Loss 1.8812 LearningRate 0.0184 Epoch: 11 Global Step: 190660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:03:33,801-Speed 5151.91 samples/sec Loss 1.8657 LearningRate 0.0184 Epoch: 11 Global Step: 190670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:03:35,768-Speed 5206.32 samples/sec Loss 1.8767 LearningRate 0.0184 Epoch: 11 Global Step: 190680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:03:37,751-Speed 5165.39 samples/sec Loss 1.8590 LearningRate 0.0184 Epoch: 11 Global Step: 190690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:03:39,719-Speed 5206.16 samples/sec Loss 1.8448 LearningRate 0.0184 Epoch: 11 Global Step: 190700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:03:41,689-Speed 5199.96 samples/sec Loss 1.8841 LearningRate 0.0184 Epoch: 11 Global Step: 190710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:03:43,655-Speed 5211.64 samples/sec Loss 1.8842 LearningRate 0.0184 Epoch: 11 Global Step: 190720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:03:45,622-Speed 5207.26 samples/sec Loss 1.8906 LearningRate 0.0184 Epoch: 11 Global Step: 190730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:03:47,589-Speed 5206.11 samples/sec Loss 1.9283 LearningRate 0.0184 Epoch: 11 Global Step: 190740 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:03:49,556-Speed 5208.03 samples/sec Loss 1.8450 LearningRate 0.0184 Epoch: 11 Global Step: 190750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:03:51,545-Speed 5150.84 samples/sec Loss 1.9249 LearningRate 0.0184 Epoch: 11 Global Step: 190760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:03:53,548-Speed 5113.30 samples/sec Loss 1.8719 LearningRate 0.0184 Epoch: 11 Global Step: 190770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:03:55,533-Speed 5158.94 samples/sec Loss 1.8458 LearningRate 0.0184 Epoch: 11 Global Step: 190780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:03:57,505-Speed 5195.52 samples/sec Loss 1.8526 LearningRate 0.0184 Epoch: 11 Global Step: 190790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:03:59,474-Speed 5202.24 samples/sec Loss 1.8893 LearningRate 0.0184 Epoch: 11 Global Step: 190800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:04:01,449-Speed 5188.26 samples/sec Loss 1.9358 LearningRate 0.0184 Epoch: 11 Global Step: 190810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:04:03,456-Speed 5102.48 samples/sec Loss 1.8826 LearningRate 0.0184 Epoch: 11 Global Step: 190820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:04:05,426-Speed 5200.76 samples/sec Loss 1.8444 LearningRate 0.0183 Epoch: 11 Global Step: 190830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:04:07,388-Speed 5220.19 samples/sec Loss 1.8981 LearningRate 0.0183 Epoch: 11 Global Step: 190840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:04:09,347-Speed 5228.38 samples/sec Loss 1.8323 LearningRate 0.0183 Epoch: 11 Global Step: 190850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:04:11,323-Speed 5185.71 samples/sec Loss 1.8772 LearningRate 0.0183 Epoch: 11 Global Step: 190860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:04:13,293-Speed 5200.02 samples/sec Loss 1.9243 LearningRate 0.0183 Epoch: 11 Global Step: 190870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:04:15,291-Speed 5126.88 samples/sec Loss 1.8689 LearningRate 0.0183 Epoch: 11 Global Step: 190880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:04:17,260-Speed 5202.17 samples/sec Loss 1.8780 LearningRate 0.0183 Epoch: 11 Global Step: 190890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:04:19,226-Speed 5207.99 samples/sec Loss 1.8398 LearningRate 0.0183 Epoch: 11 Global Step: 190900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:04:21,200-Speed 5190.16 samples/sec Loss 1.8598 LearningRate 0.0183 Epoch: 11 Global Step: 190910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:04:23,183-Speed 5167.30 samples/sec Loss 1.9272 LearningRate 0.0183 Epoch: 11 Global Step: 190920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:04:25,153-Speed 5198.96 samples/sec Loss 1.8585 LearningRate 0.0183 Epoch: 11 Global Step: 190930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:04:27,147-Speed 5137.28 samples/sec Loss 1.8646 LearningRate 0.0183 Epoch: 11 Global Step: 190940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:04:29,115-Speed 5203.90 samples/sec Loss 1.8707 LearningRate 0.0183 Epoch: 11 Global Step: 190950 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:04:31,101-Speed 5158.66 samples/sec Loss 1.8789 LearningRate 0.0183 Epoch: 11 Global Step: 190960 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:04:33,075-Speed 5188.18 samples/sec Loss 1.8808 LearningRate 0.0183 Epoch: 11 Global Step: 190970 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:04:35,041-Speed 5211.32 samples/sec Loss 1.8739 LearningRate 0.0183 Epoch: 11 Global Step: 190980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:04:37,036-Speed 5133.29 samples/sec Loss 1.8562 LearningRate 0.0183 Epoch: 11 Global Step: 190990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:04:39,008-Speed 5195.96 samples/sec Loss 1.8695 LearningRate 0.0183 Epoch: 11 Global Step: 191000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:04:41,005-Speed 5127.78 samples/sec Loss 1.8347 LearningRate 0.0183 Epoch: 11 Global Step: 191010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:04:42,978-Speed 5192.92 samples/sec Loss 1.9060 LearningRate 0.0183 Epoch: 11 Global Step: 191020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:04:44,970-Speed 5142.12 samples/sec Loss 1.8875 LearningRate 0.0183 Epoch: 11 Global Step: 191030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:04:46,987-Speed 5080.21 samples/sec Loss 1.8378 LearningRate 0.0183 Epoch: 11 Global Step: 191040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:04:48,958-Speed 5194.76 samples/sec Loss 1.8898 LearningRate 0.0183 Epoch: 11 Global Step: 191050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:04:50,928-Speed 5200.16 samples/sec Loss 1.8700 LearningRate 0.0183 Epoch: 11 Global Step: 191060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:04:52,901-Speed 5192.88 samples/sec Loss 1.8416 LearningRate 0.0183 Epoch: 11 Global Step: 191070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:04:54,890-Speed 5148.52 samples/sec Loss 1.8477 LearningRate 0.0183 Epoch: 11 Global Step: 191080 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:04:56,868-Speed 5180.22 samples/sec Loss 1.8783 LearningRate 0.0183 Epoch: 11 Global Step: 191090 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:04:58,832-Speed 5215.07 samples/sec Loss 1.8965 LearningRate 0.0183 Epoch: 11 Global Step: 191100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:05:00,818-Speed 5157.50 samples/sec Loss 1.8942 LearningRate 0.0183 Epoch: 11 Global Step: 191110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:05:02,798-Speed 5174.23 samples/sec Loss 1.8500 LearningRate 0.0183 Epoch: 11 Global Step: 191120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:05:04,789-Speed 5145.27 samples/sec Loss 1.8697 LearningRate 0.0183 Epoch: 11 Global Step: 191130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:05:06,762-Speed 5191.45 samples/sec Loss 1.8732 LearningRate 0.0183 Epoch: 11 Global Step: 191140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:05:08,728-Speed 5208.76 samples/sec Loss 1.8906 LearningRate 0.0183 Epoch: 11 Global Step: 191150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:05:10,701-Speed 5194.39 samples/sec Loss 1.8888 LearningRate 0.0183 Epoch: 11 Global Step: 191160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:05:12,670-Speed 5200.30 samples/sec Loss 1.8828 LearningRate 0.0183 Epoch: 11 Global Step: 191170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:05:14,637-Speed 5208.95 samples/sec Loss 1.9276 LearningRate 0.0183 Epoch: 11 Global Step: 191180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:05:16,610-Speed 5191.43 samples/sec Loss 1.8386 LearningRate 0.0183 Epoch: 11 Global Step: 191190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:05:18,585-Speed 5186.15 samples/sec Loss 1.8469 LearningRate 0.0183 Epoch: 11 Global Step: 191200 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:05:20,557-Speed 5195.11 samples/sec Loss 1.8958 LearningRate 0.0183 Epoch: 11 Global Step: 191210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:05:22,551-Speed 5134.64 samples/sec Loss 1.7752 LearningRate 0.0182 Epoch: 11 Global Step: 191220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:05:24,532-Speed 5171.57 samples/sec Loss 1.8118 LearningRate 0.0182 Epoch: 11 Global Step: 191230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:05:26,494-Speed 5222.61 samples/sec Loss 1.8869 LearningRate 0.0182 Epoch: 11 Global Step: 191240 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:05:28,494-Speed 5121.36 samples/sec Loss 1.8641 LearningRate 0.0182 Epoch: 11 Global Step: 191250 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:05:30,474-Speed 5175.12 samples/sec Loss 1.8500 LearningRate 0.0182 Epoch: 11 Global Step: 191260 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:05:32,444-Speed 5199.26 samples/sec Loss 1.8451 LearningRate 0.0182 Epoch: 11 Global Step: 191270 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:05:34,414-Speed 5198.57 samples/sec Loss 1.8512 LearningRate 0.0182 Epoch: 11 Global Step: 191280 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:05:36,391-Speed 5183.34 samples/sec Loss 1.8440 LearningRate 0.0182 Epoch: 11 Global Step: 191290 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:05:38,361-Speed 5198.92 samples/sec Loss 1.8847 LearningRate 0.0182 Epoch: 11 Global Step: 191300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:05:40,341-Speed 5173.86 samples/sec Loss 1.8676 LearningRate 0.0182 Epoch: 11 Global Step: 191310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:05:42,316-Speed 5186.28 samples/sec Loss 1.8434 LearningRate 0.0182 Epoch: 11 Global Step: 191320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:05:44,286-Speed 5198.18 samples/sec Loss 1.8655 LearningRate 0.0182 Epoch: 11 Global Step: 191330 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:05:46,253-Speed 5209.03 samples/sec Loss 1.8740 LearningRate 0.0182 Epoch: 11 Global Step: 191340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:05:48,234-Speed 5170.23 samples/sec Loss 1.8808 LearningRate 0.0182 Epoch: 11 Global Step: 191350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:05:50,206-Speed 5194.89 samples/sec Loss 1.8333 LearningRate 0.0182 Epoch: 11 Global Step: 191360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:05:52,181-Speed 5186.35 samples/sec Loss 1.9138 LearningRate 0.0182 Epoch: 11 Global Step: 191370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:05:54,156-Speed 5187.40 samples/sec Loss 1.8875 LearningRate 0.0182 Epoch: 11 Global Step: 191380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:05:56,123-Speed 5205.41 samples/sec Loss 1.9162 LearningRate 0.0182 Epoch: 11 Global Step: 191390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:05:58,093-Speed 5202.63 samples/sec Loss 1.9067 LearningRate 0.0182 Epoch: 11 Global Step: 191400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:06:00,062-Speed 5202.21 samples/sec Loss 1.8848 LearningRate 0.0182 Epoch: 11 Global Step: 191410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:06:02,061-Speed 5124.08 samples/sec Loss 1.9017 LearningRate 0.0182 Epoch: 11 Global Step: 191420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:06:04,037-Speed 5182.06 samples/sec Loss 1.8814 LearningRate 0.0182 Epoch: 11 Global Step: 191430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:06:06,007-Speed 5198.77 samples/sec Loss 1.9197 LearningRate 0.0182 Epoch: 11 Global Step: 191440 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:06:07,980-Speed 5193.37 samples/sec Loss 1.9182 LearningRate 0.0182 Epoch: 11 Global Step: 191450 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:06:09,955-Speed 5186.09 samples/sec Loss 1.8610 LearningRate 0.0182 Epoch: 11 Global Step: 191460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:06:11,938-Speed 5167.23 samples/sec Loss 1.8234 LearningRate 0.0182 Epoch: 11 Global Step: 191470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:06:13,910-Speed 5195.30 samples/sec Loss 1.9286 LearningRate 0.0182 Epoch: 11 Global Step: 191480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:06:15,892-Speed 5167.64 samples/sec Loss 1.8998 LearningRate 0.0182 Epoch: 11 Global Step: 191490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:06:17,871-Speed 5175.53 samples/sec Loss 1.8968 LearningRate 0.0182 Epoch: 11 Global Step: 191500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:06:19,839-Speed 5206.29 samples/sec Loss 1.9118 LearningRate 0.0182 Epoch: 11 Global Step: 191510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:06:21,807-Speed 5202.88 samples/sec Loss 1.8543 LearningRate 0.0182 Epoch: 11 Global Step: 191520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:06:23,804-Speed 5130.62 samples/sec Loss 1.9141 LearningRate 0.0182 Epoch: 11 Global Step: 191530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:06:25,799-Speed 5133.76 samples/sec Loss 1.8490 LearningRate 0.0182 Epoch: 11 Global Step: 191540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:06:27,785-Speed 5156.77 samples/sec Loss 1.8947 LearningRate 0.0182 Epoch: 11 Global Step: 191550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:06:29,773-Speed 5153.69 samples/sec Loss 1.8626 LearningRate 0.0182 Epoch: 11 Global Step: 191560 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:06:31,745-Speed 5195.59 samples/sec Loss 1.9105 LearningRate 0.0182 Epoch: 11 Global Step: 191570 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:06:33,709-Speed 5215.73 samples/sec Loss 1.9132 LearningRate 0.0182 Epoch: 11 Global Step: 191580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:06:35,700-Speed 5143.94 samples/sec Loss 1.8789 LearningRate 0.0182 Epoch: 11 Global Step: 191590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:06:37,673-Speed 5192.04 samples/sec Loss 1.8750 LearningRate 0.0182 Epoch: 11 Global Step: 191600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:06:39,653-Speed 5173.74 samples/sec Loss 1.8494 LearningRate 0.0181 Epoch: 11 Global Step: 191610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:06:41,640-Speed 5155.20 samples/sec Loss 1.8208 LearningRate 0.0181 Epoch: 11 Global Step: 191620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:06:43,614-Speed 5189.98 samples/sec Loss 1.8559 LearningRate 0.0181 Epoch: 11 Global Step: 191630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:06:45,598-Speed 5165.39 samples/sec Loss 1.8727 LearningRate 0.0181 Epoch: 11 Global Step: 191640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:06:47,612-Speed 5085.92 samples/sec Loss 1.9236 LearningRate 0.0181 Epoch: 11 Global Step: 191650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:06:49,601-Speed 5149.29 samples/sec Loss 1.8366 LearningRate 0.0181 Epoch: 11 Global Step: 191660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:06:51,591-Speed 5149.31 samples/sec Loss 1.8987 LearningRate 0.0181 Epoch: 11 Global Step: 191670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:06:53,571-Speed 5173.31 samples/sec Loss 1.8616 LearningRate 0.0181 Epoch: 11 Global Step: 191680 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:06:55,538-Speed 5208.05 samples/sec Loss 1.8268 LearningRate 0.0181 Epoch: 11 Global Step: 191690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:06:57,516-Speed 5178.71 samples/sec Loss 1.9335 LearningRate 0.0181 Epoch: 11 Global Step: 191700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:06:59,486-Speed 5200.01 samples/sec Loss 1.8568 LearningRate 0.0181 Epoch: 11 Global Step: 191710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:07:01,459-Speed 5189.87 samples/sec Loss 1.9009 LearningRate 0.0181 Epoch: 11 Global Step: 191720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:07:03,434-Speed 5186.61 samples/sec Loss 1.8776 LearningRate 0.0181 Epoch: 11 Global Step: 191730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:07:05,410-Speed 5183.78 samples/sec Loss 1.8931 LearningRate 0.0181 Epoch: 11 Global Step: 191740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:07:07,397-Speed 5156.65 samples/sec Loss 1.8418 LearningRate 0.0181 Epoch: 11 Global Step: 191750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:07:09,385-Speed 5151.75 samples/sec Loss 1.9024 LearningRate 0.0181 Epoch: 11 Global Step: 191760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:07:11,353-Speed 5205.15 samples/sec Loss 1.9103 LearningRate 0.0181 Epoch: 11 Global Step: 191770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:07:13,326-Speed 5191.59 samples/sec Loss 1.9184 LearningRate 0.0181 Epoch: 11 Global Step: 191780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:07:15,316-Speed 5148.38 samples/sec Loss 1.9000 LearningRate 0.0181 Epoch: 11 Global Step: 191790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:07:17,289-Speed 5191.49 samples/sec Loss 1.8617 LearningRate 0.0181 Epoch: 11 Global Step: 191800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:07:19,272-Speed 5166.45 samples/sec Loss 1.8984 LearningRate 0.0181 Epoch: 11 Global Step: 191810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:07:21,242-Speed 5199.59 samples/sec Loss 1.9034 LearningRate 0.0181 Epoch: 11 Global Step: 191820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:07:23,214-Speed 5193.48 samples/sec Loss 1.8226 LearningRate 0.0181 Epoch: 11 Global Step: 191830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:07:25,192-Speed 5178.40 samples/sec Loss 1.9420 LearningRate 0.0181 Epoch: 11 Global Step: 191840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:07:27,173-Speed 5170.25 samples/sec Loss 1.8930 LearningRate 0.0181 Epoch: 11 Global Step: 191850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:07:29,148-Speed 5188.06 samples/sec Loss 1.9775 LearningRate 0.0181 Epoch: 11 Global Step: 191860 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:07:31,148-Speed 5122.20 samples/sec Loss 1.8218 LearningRate 0.0181 Epoch: 11 Global Step: 191870 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:07:33,120-Speed 5192.78 samples/sec Loss 1.8787 LearningRate 0.0181 Epoch: 11 Global Step: 191880 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:07:35,097-Speed 5182.66 samples/sec Loss 1.8671 LearningRate 0.0181 Epoch: 11 Global Step: 191890 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:07:37,076-Speed 5175.90 samples/sec Loss 1.8611 LearningRate 0.0181 Epoch: 11 Global Step: 191900 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:07:39,065-Speed 5150.30 samples/sec Loss 1.8225 LearningRate 0.0181 Epoch: 11 Global Step: 191910 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:07:41,040-Speed 5186.97 samples/sec Loss 1.8495 LearningRate 0.0181 Epoch: 11 Global Step: 191920 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:07:43,008-Speed 5204.06 samples/sec Loss 1.8655 LearningRate 0.0181 Epoch: 11 Global Step: 191930 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:07:44,989-Speed 5170.78 samples/sec Loss 1.9005 LearningRate 0.0181 Epoch: 11 Global Step: 191940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:07:46,968-Speed 5174.95 samples/sec Loss 1.9110 LearningRate 0.0181 Epoch: 11 Global Step: 191950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:07:48,946-Speed 5180.03 samples/sec Loss 1.9046 LearningRate 0.0181 Epoch: 11 Global Step: 191960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:07:50,926-Speed 5173.59 samples/sec Loss 1.9234 LearningRate 0.0181 Epoch: 11 Global Step: 191970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:07:52,895-Speed 5200.37 samples/sec Loss 1.8965 LearningRate 0.0181 Epoch: 11 Global Step: 191980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:07:54,872-Speed 5182.15 samples/sec Loss 1.8273 LearningRate 0.0181 Epoch: 11 Global Step: 191990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:07:56,841-Speed 5204.61 samples/sec Loss 1.8818 LearningRate 0.0180 Epoch: 11 Global Step: 192000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:08:23,630-[lfw][192000]XNorm: 22.070554 Training: 2022-04-11 12:08:23,631-[lfw][192000]Accuracy-Flip: 0.99767+-0.00281 Training: 2022-04-11 12:08:23,631-[lfw][192000]Accuracy-Highest: 0.99833 Training: 2022-04-11 12:08:54,433-[cfp_fp][192000]XNorm: 20.788151 Training: 2022-04-11 12:08:54,433-[cfp_fp][192000]Accuracy-Flip: 0.98600+-0.00567 Training: 2022-04-11 12:08:54,434-[cfp_fp][192000]Accuracy-Highest: 0.98714 Training: 2022-04-11 12:09:21,172-[agedb_30][192000]XNorm: 22.201020 Training: 2022-04-11 12:09:21,173-[agedb_30][192000]Accuracy-Flip: 0.97967+-0.00686 Training: 2022-04-11 12:09:21,173-[agedb_30][192000]Accuracy-Highest: 0.98250 Training: 2022-04-11 12:09:23,174-Speed 118.61 samples/sec Loss 1.9373 LearningRate 0.0180 Epoch: 11 Global Step: 192010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:09:25,161-Speed 5157.18 samples/sec Loss 1.8711 LearningRate 0.0180 Epoch: 11 Global Step: 192020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:09:27,133-Speed 5194.25 samples/sec Loss 1.9175 LearningRate 0.0180 Epoch: 11 Global Step: 192030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:09:29,127-Speed 5135.95 samples/sec Loss 1.8735 LearningRate 0.0180 Epoch: 11 Global Step: 192040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:09:31,098-Speed 5198.23 samples/sec Loss 1.8475 LearningRate 0.0180 Epoch: 11 Global Step: 192050 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:09:33,056-Speed 5231.57 samples/sec Loss 1.8530 LearningRate 0.0180 Epoch: 11 Global Step: 192060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:09:35,038-Speed 5166.71 samples/sec Loss 1.8535 LearningRate 0.0180 Epoch: 11 Global Step: 192070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:09:37,032-Speed 5138.46 samples/sec Loss 1.9064 LearningRate 0.0180 Epoch: 11 Global Step: 192080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:09:39,011-Speed 5173.73 samples/sec Loss 1.8958 LearningRate 0.0180 Epoch: 11 Global Step: 192090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:09:40,986-Speed 5186.83 samples/sec Loss 1.8994 LearningRate 0.0180 Epoch: 11 Global Step: 192100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:09:42,965-Speed 5177.87 samples/sec Loss 1.8684 LearningRate 0.0180 Epoch: 11 Global Step: 192110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:09:44,936-Speed 5197.56 samples/sec Loss 1.9229 LearningRate 0.0180 Epoch: 11 Global Step: 192120 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:09:46,908-Speed 5195.02 samples/sec Loss 1.8616 LearningRate 0.0180 Epoch: 11 Global Step: 192130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:09:48,876-Speed 5202.45 samples/sec Loss 1.9282 LearningRate 0.0180 Epoch: 11 Global Step: 192140 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:09:50,844-Speed 5207.17 samples/sec Loss 1.8952 LearningRate 0.0180 Epoch: 11 Global Step: 192150 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:09:52,813-Speed 5201.10 samples/sec Loss 1.9119 LearningRate 0.0180 Epoch: 11 Global Step: 192160 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-11 12:09:54,781-Speed 5204.84 samples/sec Loss 1.8829 LearningRate 0.0180 Epoch: 11 Global Step: 192170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:09:56,756-Speed 5187.61 samples/sec Loss 1.9282 LearningRate 0.0180 Epoch: 11 Global Step: 192180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:09:58,743-Speed 5153.03 samples/sec Loss 1.9526 LearningRate 0.0180 Epoch: 11 Global Step: 192190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:10:00,735-Speed 5141.97 samples/sec Loss 1.8873 LearningRate 0.0180 Epoch: 11 Global Step: 192200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:10:02,721-Speed 5159.99 samples/sec Loss 1.9338 LearningRate 0.0180 Epoch: 11 Global Step: 192210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:10:04,694-Speed 5190.24 samples/sec Loss 1.8736 LearningRate 0.0180 Epoch: 11 Global Step: 192220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:10:06,677-Speed 5166.56 samples/sec Loss 1.9040 LearningRate 0.0180 Epoch: 11 Global Step: 192230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:10:08,648-Speed 5197.02 samples/sec Loss 1.9216 LearningRate 0.0180 Epoch: 11 Global Step: 192240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:10:10,615-Speed 5209.02 samples/sec Loss 1.9061 LearningRate 0.0180 Epoch: 11 Global Step: 192250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:10:12,613-Speed 5125.00 samples/sec Loss 1.9118 LearningRate 0.0180 Epoch: 11 Global Step: 192260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:10:14,624-Speed 5094.73 samples/sec Loss 1.9112 LearningRate 0.0180 Epoch: 11 Global Step: 192270 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:10:16,606-Speed 5167.98 samples/sec Loss 1.8818 LearningRate 0.0180 Epoch: 11 Global Step: 192280 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-11 12:10:18,573-Speed 5208.67 samples/sec Loss 1.9109 LearningRate 0.0180 Epoch: 11 Global Step: 192290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:10:20,559-Speed 5155.64 samples/sec Loss 1.9468 LearningRate 0.0180 Epoch: 11 Global Step: 192300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:10:22,554-Speed 5134.82 samples/sec Loss 1.8726 LearningRate 0.0180 Epoch: 11 Global Step: 192310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:10:24,533-Speed 5176.27 samples/sec Loss 1.8791 LearningRate 0.0180 Epoch: 11 Global Step: 192320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:10:26,520-Speed 5156.76 samples/sec Loss 1.9440 LearningRate 0.0180 Epoch: 11 Global Step: 192330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:10:28,492-Speed 5192.97 samples/sec Loss 1.9546 LearningRate 0.0180 Epoch: 11 Global Step: 192340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:10:30,465-Speed 5193.03 samples/sec Loss 1.8608 LearningRate 0.0180 Epoch: 11 Global Step: 192350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:10:32,437-Speed 5194.87 samples/sec Loss 1.8168 LearningRate 0.0180 Epoch: 11 Global Step: 192360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:10:34,409-Speed 5194.90 samples/sec Loss 1.8867 LearningRate 0.0180 Epoch: 11 Global Step: 192370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:10:36,385-Speed 5183.07 samples/sec Loss 1.9021 LearningRate 0.0180 Epoch: 11 Global Step: 192380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:10:38,373-Speed 5152.42 samples/sec Loss 1.9240 LearningRate 0.0179 Epoch: 11 Global Step: 192390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:10:40,343-Speed 5198.33 samples/sec Loss 1.8642 LearningRate 0.0179 Epoch: 11 Global Step: 192400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:10:42,333-Speed 5148.70 samples/sec Loss 1.9089 LearningRate 0.0179 Epoch: 11 Global Step: 192410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-11 12:10:44,319-Speed 5158.11 samples/sec Loss 1.8637 LearningRate 0.0179 Epoch: 11 Global Step: 192420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:10:46,301-Speed 5166.75 samples/sec Loss 1.8344 LearningRate 0.0179 Epoch: 11 Global Step: 192430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:10:48,296-Speed 5134.83 samples/sec Loss 1.8775 LearningRate 0.0179 Epoch: 11 Global Step: 192440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:10:50,290-Speed 5137.70 samples/sec Loss 1.8531 LearningRate 0.0179 Epoch: 11 Global Step: 192450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:10:52,276-Speed 5157.61 samples/sec Loss 1.8907 LearningRate 0.0179 Epoch: 11 Global Step: 192460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:10:54,248-Speed 5196.50 samples/sec Loss 1.9016 LearningRate 0.0179 Epoch: 11 Global Step: 192470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:10:56,217-Speed 5200.53 samples/sec Loss 1.8667 LearningRate 0.0179 Epoch: 11 Global Step: 192480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:10:58,194-Speed 5182.44 samples/sec Loss 1.8752 LearningRate 0.0179 Epoch: 11 Global Step: 192490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:11:00,184-Speed 5148.03 samples/sec Loss 1.9091 LearningRate 0.0179 Epoch: 11 Global Step: 192500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:11:02,154-Speed 5198.11 samples/sec Loss 1.8815 LearningRate 0.0179 Epoch: 11 Global Step: 192510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:11:04,146-Speed 5142.24 samples/sec Loss 1.9137 LearningRate 0.0179 Epoch: 11 Global Step: 192520 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:11:06,136-Speed 5146.79 samples/sec Loss 1.9205 LearningRate 0.0179 Epoch: 11 Global Step: 192530 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:11:08,112-Speed 5184.14 samples/sec Loss 1.8937 LearningRate 0.0179 Epoch: 11 Global Step: 192540 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:11:10,090-Speed 5181.38 samples/sec Loss 1.9363 LearningRate 0.0179 Epoch: 11 Global Step: 192550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:11:12,069-Speed 5175.86 samples/sec Loss 1.8645 LearningRate 0.0179 Epoch: 11 Global Step: 192560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:11:14,032-Speed 5215.96 samples/sec Loss 1.8730 LearningRate 0.0179 Epoch: 11 Global Step: 192570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:11:16,004-Speed 5194.97 samples/sec Loss 1.8327 LearningRate 0.0179 Epoch: 11 Global Step: 192580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:11:17,974-Speed 5200.71 samples/sec Loss 1.8945 LearningRate 0.0179 Epoch: 11 Global Step: 192590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:11:19,943-Speed 5201.15 samples/sec Loss 1.9511 LearningRate 0.0179 Epoch: 11 Global Step: 192600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:11:21,918-Speed 5186.08 samples/sec Loss 1.8889 LearningRate 0.0179 Epoch: 11 Global Step: 192610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:11:23,936-Speed 5077.14 samples/sec Loss 1.8378 LearningRate 0.0179 Epoch: 11 Global Step: 192620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:11:25,941-Speed 5108.51 samples/sec Loss 1.9157 LearningRate 0.0179 Epoch: 11 Global Step: 192630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:11:27,925-Speed 5164.26 samples/sec Loss 1.8847 LearningRate 0.0179 Epoch: 11 Global Step: 192640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:11:29,900-Speed 5186.67 samples/sec Loss 1.8970 LearningRate 0.0179 Epoch: 11 Global Step: 192650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:11:31,869-Speed 5201.01 samples/sec Loss 1.9226 LearningRate 0.0179 Epoch: 11 Global Step: 192660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:11:33,847-Speed 5180.28 samples/sec Loss 1.9334 LearningRate 0.0179 Epoch: 11 Global Step: 192670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:11:35,823-Speed 5182.77 samples/sec Loss 1.8611 LearningRate 0.0179 Epoch: 11 Global Step: 192680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:11:37,815-Speed 5142.16 samples/sec Loss 1.8716 LearningRate 0.0179 Epoch: 11 Global Step: 192690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:11:39,799-Speed 5163.77 samples/sec Loss 1.9309 LearningRate 0.0179 Epoch: 11 Global Step: 192700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:11:41,769-Speed 5199.50 samples/sec Loss 1.8926 LearningRate 0.0179 Epoch: 11 Global Step: 192710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:11:43,740-Speed 5196.09 samples/sec Loss 1.8610 LearningRate 0.0179 Epoch: 11 Global Step: 192720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:11:45,721-Speed 5171.71 samples/sec Loss 1.9096 LearningRate 0.0179 Epoch: 11 Global Step: 192730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:11:47,689-Speed 5203.43 samples/sec Loss 1.9036 LearningRate 0.0179 Epoch: 11 Global Step: 192740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:11:49,682-Speed 5140.65 samples/sec Loss 1.9293 LearningRate 0.0179 Epoch: 11 Global Step: 192750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:11:51,664-Speed 5167.94 samples/sec Loss 1.9375 LearningRate 0.0179 Epoch: 11 Global Step: 192760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:11:53,649-Speed 5161.12 samples/sec Loss 1.8600 LearningRate 0.0179 Epoch: 11 Global Step: 192770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:11:55,620-Speed 5198.28 samples/sec Loss 1.9544 LearningRate 0.0179 Epoch: 11 Global Step: 192780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:11:57,598-Speed 5178.40 samples/sec Loss 1.8783 LearningRate 0.0178 Epoch: 11 Global Step: 192790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:11:59,601-Speed 5114.03 samples/sec Loss 1.8559 LearningRate 0.0178 Epoch: 11 Global Step: 192800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:12:01,582-Speed 5171.05 samples/sec Loss 1.8594 LearningRate 0.0178 Epoch: 11 Global Step: 192810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:12:03,554-Speed 5194.40 samples/sec Loss 1.9071 LearningRate 0.0178 Epoch: 11 Global Step: 192820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:12:05,536-Speed 5171.30 samples/sec Loss 1.9432 LearningRate 0.0178 Epoch: 11 Global Step: 192830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:12:07,527-Speed 5142.63 samples/sec Loss 1.9192 LearningRate 0.0178 Epoch: 11 Global Step: 192840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:12:09,512-Speed 5161.37 samples/sec Loss 1.8763 LearningRate 0.0178 Epoch: 11 Global Step: 192850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:12:11,504-Speed 5140.87 samples/sec Loss 1.8934 LearningRate 0.0178 Epoch: 11 Global Step: 192860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:12:13,498-Speed 5138.21 samples/sec Loss 1.9601 LearningRate 0.0178 Epoch: 11 Global Step: 192870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:12:15,486-Speed 5154.16 samples/sec Loss 1.9208 LearningRate 0.0178 Epoch: 11 Global Step: 192880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:12:17,467-Speed 5170.41 samples/sec Loss 1.9289 LearningRate 0.0178 Epoch: 11 Global Step: 192890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:12:19,438-Speed 5199.53 samples/sec Loss 1.9357 LearningRate 0.0178 Epoch: 11 Global Step: 192900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:12:21,414-Speed 5183.57 samples/sec Loss 1.9478 LearningRate 0.0178 Epoch: 11 Global Step: 192910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:12:23,390-Speed 5183.28 samples/sec Loss 1.9001 LearningRate 0.0178 Epoch: 11 Global Step: 192920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:12:25,365-Speed 5186.14 samples/sec Loss 1.8828 LearningRate 0.0178 Epoch: 11 Global Step: 192930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:12:27,334-Speed 5204.48 samples/sec Loss 1.9470 LearningRate 0.0178 Epoch: 11 Global Step: 192940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:12:29,330-Speed 5129.86 samples/sec Loss 1.8950 LearningRate 0.0178 Epoch: 11 Global Step: 192950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:12:31,308-Speed 5180.95 samples/sec Loss 1.9499 LearningRate 0.0178 Epoch: 11 Global Step: 192960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:12:33,295-Speed 5154.38 samples/sec Loss 1.8667 LearningRate 0.0178 Epoch: 11 Global Step: 192970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:12:35,276-Speed 5170.50 samples/sec Loss 1.8421 LearningRate 0.0178 Epoch: 11 Global Step: 192980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:12:37,254-Speed 5179.59 samples/sec Loss 1.8791 LearningRate 0.0178 Epoch: 11 Global Step: 192990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:12:39,228-Speed 5188.23 samples/sec Loss 1.9160 LearningRate 0.0178 Epoch: 11 Global Step: 193000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:12:41,208-Speed 5174.60 samples/sec Loss 1.9705 LearningRate 0.0178 Epoch: 11 Global Step: 193010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:12:43,177-Speed 5201.32 samples/sec Loss 1.8268 LearningRate 0.0178 Epoch: 11 Global Step: 193020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:12:45,160-Speed 5166.30 samples/sec Loss 1.8888 LearningRate 0.0178 Epoch: 11 Global Step: 193030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:12:47,148-Speed 5151.01 samples/sec Loss 1.8753 LearningRate 0.0178 Epoch: 11 Global Step: 193040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:12:49,162-Speed 5088.69 samples/sec Loss 1.8661 LearningRate 0.0178 Epoch: 11 Global Step: 193050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:12:51,158-Speed 5130.68 samples/sec Loss 1.9140 LearningRate 0.0178 Epoch: 11 Global Step: 193060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:12:53,128-Speed 5199.71 samples/sec Loss 1.8845 LearningRate 0.0178 Epoch: 11 Global Step: 193070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:12:55,109-Speed 5171.62 samples/sec Loss 1.8599 LearningRate 0.0178 Epoch: 11 Global Step: 193080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:12:57,077-Speed 5203.90 samples/sec Loss 1.8883 LearningRate 0.0178 Epoch: 11 Global Step: 193090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:12:59,048-Speed 5199.43 samples/sec Loss 1.8701 LearningRate 0.0178 Epoch: 11 Global Step: 193100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:13:01,016-Speed 5205.53 samples/sec Loss 1.9546 LearningRate 0.0178 Epoch: 11 Global Step: 193110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:13:02,986-Speed 5198.17 samples/sec Loss 1.9362 LearningRate 0.0178 Epoch: 11 Global Step: 193120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:13:04,969-Speed 5166.26 samples/sec Loss 1.9216 LearningRate 0.0178 Epoch: 11 Global Step: 193130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:13:06,944-Speed 5186.54 samples/sec Loss 1.8893 LearningRate 0.0178 Epoch: 11 Global Step: 193140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:13:08,917-Speed 5192.08 samples/sec Loss 1.9033 LearningRate 0.0178 Epoch: 11 Global Step: 193150 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:13:10,889-Speed 5194.02 samples/sec Loss 1.9705 LearningRate 0.0178 Epoch: 11 Global Step: 193160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:13:12,886-Speed 5129.04 samples/sec Loss 1.8393 LearningRate 0.0178 Epoch: 11 Global Step: 193170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:13:14,868-Speed 5167.80 samples/sec Loss 1.9213 LearningRate 0.0177 Epoch: 11 Global Step: 193180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:13:16,836-Speed 5203.43 samples/sec Loss 1.8321 LearningRate 0.0177 Epoch: 11 Global Step: 193190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:13:18,806-Speed 5201.71 samples/sec Loss 1.9098 LearningRate 0.0177 Epoch: 11 Global Step: 193200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:13:20,785-Speed 5175.58 samples/sec Loss 1.9114 LearningRate 0.0177 Epoch: 11 Global Step: 193210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:13:22,784-Speed 5124.21 samples/sec Loss 1.8613 LearningRate 0.0177 Epoch: 11 Global Step: 193220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:13:24,773-Speed 5150.44 samples/sec Loss 1.8815 LearningRate 0.0177 Epoch: 11 Global Step: 193230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:13:26,754-Speed 5171.49 samples/sec Loss 1.9290 LearningRate 0.0177 Epoch: 11 Global Step: 193240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:13:28,721-Speed 5207.64 samples/sec Loss 1.8467 LearningRate 0.0177 Epoch: 11 Global Step: 193250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:13:30,716-Speed 5132.87 samples/sec Loss 1.8680 LearningRate 0.0177 Epoch: 11 Global Step: 193260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:13:32,685-Speed 5204.23 samples/sec Loss 1.8324 LearningRate 0.0177 Epoch: 11 Global Step: 193270 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:13:34,657-Speed 5195.12 samples/sec Loss 1.9145 LearningRate 0.0177 Epoch: 11 Global Step: 193280 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:13:36,631-Speed 5187.34 samples/sec Loss 1.8883 LearningRate 0.0177 Epoch: 11 Global Step: 193290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:13:38,657-Speed 5057.53 samples/sec Loss 1.8533 LearningRate 0.0177 Epoch: 11 Global Step: 193300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:13:40,642-Speed 5159.12 samples/sec Loss 1.8971 LearningRate 0.0177 Epoch: 11 Global Step: 193310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:13:42,638-Speed 5131.69 samples/sec Loss 1.9327 LearningRate 0.0177 Epoch: 11 Global Step: 193320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:13:44,609-Speed 5199.46 samples/sec Loss 1.8783 LearningRate 0.0177 Epoch: 11 Global Step: 193330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:13:46,579-Speed 5198.68 samples/sec Loss 1.9123 LearningRate 0.0177 Epoch: 11 Global Step: 193340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:13:48,552-Speed 5191.78 samples/sec Loss 1.8740 LearningRate 0.0177 Epoch: 11 Global Step: 193350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:13:50,528-Speed 5184.59 samples/sec Loss 1.9311 LearningRate 0.0177 Epoch: 11 Global Step: 193360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:13:52,506-Speed 5177.31 samples/sec Loss 1.8652 LearningRate 0.0177 Epoch: 11 Global Step: 193370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:13:54,483-Speed 5182.98 samples/sec Loss 1.8456 LearningRate 0.0177 Epoch: 11 Global Step: 193380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:13:56,453-Speed 5198.82 samples/sec Loss 1.8882 LearningRate 0.0177 Epoch: 11 Global Step: 193390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:13:58,426-Speed 5192.42 samples/sec Loss 1.8629 LearningRate 0.0177 Epoch: 11 Global Step: 193400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:14:00,416-Speed 5146.96 samples/sec Loss 1.8822 LearningRate 0.0177 Epoch: 11 Global Step: 193410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:14:02,391-Speed 5187.15 samples/sec Loss 1.8536 LearningRate 0.0177 Epoch: 11 Global Step: 193420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:14:04,361-Speed 5197.58 samples/sec Loss 1.8917 LearningRate 0.0177 Epoch: 11 Global Step: 193430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:14:06,331-Speed 5200.97 samples/sec Loss 1.9175 LearningRate 0.0177 Epoch: 11 Global Step: 193440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:14:08,299-Speed 5205.92 samples/sec Loss 1.9118 LearningRate 0.0177 Epoch: 11 Global Step: 193450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:14:10,263-Speed 5216.05 samples/sec Loss 1.8789 LearningRate 0.0177 Epoch: 11 Global Step: 193460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:14:12,265-Speed 5115.12 samples/sec Loss 1.8991 LearningRate 0.0177 Epoch: 11 Global Step: 193470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:14:14,249-Speed 5164.19 samples/sec Loss 1.9266 LearningRate 0.0177 Epoch: 11 Global Step: 193480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:14:16,237-Speed 5152.38 samples/sec Loss 1.9199 LearningRate 0.0177 Epoch: 11 Global Step: 193490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:14:18,211-Speed 5190.67 samples/sec Loss 1.8606 LearningRate 0.0177 Epoch: 11 Global Step: 193500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:14:20,189-Speed 5176.85 samples/sec Loss 1.8524 LearningRate 0.0177 Epoch: 11 Global Step: 193510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:14:22,172-Speed 5166.90 samples/sec Loss 1.8945 LearningRate 0.0177 Epoch: 11 Global Step: 193520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:14:24,139-Speed 5207.37 samples/sec Loss 1.9344 LearningRate 0.0177 Epoch: 11 Global Step: 193530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:14:26,112-Speed 5191.92 samples/sec Loss 1.9145 LearningRate 0.0177 Epoch: 11 Global Step: 193540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:14:28,095-Speed 5165.54 samples/sec Loss 1.8575 LearningRate 0.0177 Epoch: 11 Global Step: 193550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:14:30,071-Speed 5183.73 samples/sec Loss 1.8945 LearningRate 0.0177 Epoch: 11 Global Step: 193560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:14:32,044-Speed 5192.19 samples/sec Loss 1.9810 LearningRate 0.0177 Epoch: 11 Global Step: 193570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:14:34,013-Speed 5204.30 samples/sec Loss 1.8684 LearningRate 0.0176 Epoch: 11 Global Step: 193580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:14:36,014-Speed 5118.64 samples/sec Loss 1.9223 LearningRate 0.0176 Epoch: 11 Global Step: 193590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:14:37,996-Speed 5168.38 samples/sec Loss 1.9165 LearningRate 0.0176 Epoch: 11 Global Step: 193600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:14:39,974-Speed 5178.55 samples/sec Loss 1.9212 LearningRate 0.0176 Epoch: 11 Global Step: 193610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:14:41,973-Speed 5123.86 samples/sec Loss 1.8608 LearningRate 0.0176 Epoch: 11 Global Step: 193620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:14:43,946-Speed 5193.45 samples/sec Loss 1.9124 LearningRate 0.0176 Epoch: 11 Global Step: 193630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:14:45,919-Speed 5190.34 samples/sec Loss 1.8547 LearningRate 0.0176 Epoch: 11 Global Step: 193640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:14:47,916-Speed 5128.09 samples/sec Loss 1.8703 LearningRate 0.0176 Epoch: 11 Global Step: 193650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:14:49,921-Speed 5110.59 samples/sec Loss 1.9112 LearningRate 0.0176 Epoch: 11 Global Step: 193660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:14:51,891-Speed 5198.19 samples/sec Loss 1.9327 LearningRate 0.0176 Epoch: 11 Global Step: 193670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:14:53,863-Speed 5196.73 samples/sec Loss 1.8808 LearningRate 0.0176 Epoch: 11 Global Step: 193680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:14:55,851-Speed 5152.47 samples/sec Loss 1.9345 LearningRate 0.0176 Epoch: 11 Global Step: 193690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:14:57,825-Speed 5189.66 samples/sec Loss 1.8699 LearningRate 0.0176 Epoch: 11 Global Step: 193700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:14:59,800-Speed 5185.06 samples/sec Loss 1.9032 LearningRate 0.0176 Epoch: 11 Global Step: 193710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:15:01,773-Speed 5192.85 samples/sec Loss 1.8315 LearningRate 0.0176 Epoch: 11 Global Step: 193720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:15:03,764-Speed 5143.04 samples/sec Loss 1.9015 LearningRate 0.0176 Epoch: 11 Global Step: 193730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:15:05,760-Speed 5132.54 samples/sec Loss 1.9270 LearningRate 0.0176 Epoch: 11 Global Step: 193740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:15:07,732-Speed 5194.63 samples/sec Loss 1.8550 LearningRate 0.0176 Epoch: 11 Global Step: 193750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:15:09,720-Speed 5153.85 samples/sec Loss 1.8989 LearningRate 0.0176 Epoch: 11 Global Step: 193760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:15:11,716-Speed 5131.29 samples/sec Loss 1.8635 LearningRate 0.0176 Epoch: 11 Global Step: 193770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:15:13,686-Speed 5200.07 samples/sec Loss 1.8920 LearningRate 0.0176 Epoch: 11 Global Step: 193780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:15:15,672-Speed 5158.19 samples/sec Loss 1.9167 LearningRate 0.0176 Epoch: 11 Global Step: 193790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:15:17,646-Speed 5187.47 samples/sec Loss 1.8995 LearningRate 0.0176 Epoch: 11 Global Step: 193800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:15:19,633-Speed 5156.16 samples/sec Loss 1.9393 LearningRate 0.0176 Epoch: 11 Global Step: 193810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:15:21,628-Speed 5134.10 samples/sec Loss 1.8490 LearningRate 0.0176 Epoch: 11 Global Step: 193820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:15:23,605-Speed 5182.82 samples/sec Loss 1.9084 LearningRate 0.0176 Epoch: 11 Global Step: 193830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:15:25,580-Speed 5184.37 samples/sec Loss 1.8666 LearningRate 0.0176 Epoch: 11 Global Step: 193840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:15:27,552-Speed 5196.46 samples/sec Loss 1.9008 LearningRate 0.0176 Epoch: 11 Global Step: 193850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:15:29,543-Speed 5144.66 samples/sec Loss 1.8507 LearningRate 0.0176 Epoch: 11 Global Step: 193860 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:15:31,515-Speed 5194.06 samples/sec Loss 1.8584 LearningRate 0.0176 Epoch: 11 Global Step: 193870 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:15:33,524-Speed 5098.52 samples/sec Loss 1.8507 LearningRate 0.0176 Epoch: 11 Global Step: 193880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:15:35,537-Speed 5089.24 samples/sec Loss 1.9435 LearningRate 0.0176 Epoch: 11 Global Step: 193890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:15:37,511-Speed 5189.19 samples/sec Loss 1.8014 LearningRate 0.0176 Epoch: 11 Global Step: 193900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:15:39,485-Speed 5189.42 samples/sec Loss 1.8802 LearningRate 0.0176 Epoch: 11 Global Step: 193910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:15:41,477-Speed 5141.94 samples/sec Loss 1.8922 LearningRate 0.0176 Epoch: 11 Global Step: 193920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:15:43,453-Speed 5184.35 samples/sec Loss 1.8918 LearningRate 0.0176 Epoch: 11 Global Step: 193930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:15:45,443-Speed 5147.75 samples/sec Loss 1.8928 LearningRate 0.0176 Epoch: 11 Global Step: 193940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:15:47,430-Speed 5154.16 samples/sec Loss 1.9022 LearningRate 0.0176 Epoch: 11 Global Step: 193950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:15:49,420-Speed 5147.48 samples/sec Loss 1.9190 LearningRate 0.0176 Epoch: 11 Global Step: 193960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:15:51,412-Speed 5142.55 samples/sec Loss 1.8400 LearningRate 0.0176 Epoch: 11 Global Step: 193970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:15:53,417-Speed 5109.23 samples/sec Loss 1.9462 LearningRate 0.0175 Epoch: 11 Global Step: 193980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:15:55,389-Speed 5194.68 samples/sec Loss 1.9386 LearningRate 0.0175 Epoch: 11 Global Step: 193990 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:15:57,386-Speed 5130.25 samples/sec Loss 1.9305 LearningRate 0.0175 Epoch: 11 Global Step: 194000 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:16:23,846-[lfw][194000]XNorm: 21.758637 Training: 2022-04-11 12:16:23,847-[lfw][194000]Accuracy-Flip: 0.99750+-0.00261 Training: 2022-04-11 12:16:23,848-[lfw][194000]Accuracy-Highest: 0.99833 Training: 2022-04-11 12:16:54,516-[cfp_fp][194000]XNorm: 20.884260 Training: 2022-04-11 12:16:54,517-[cfp_fp][194000]Accuracy-Flip: 0.98500+-0.00439 Training: 2022-04-11 12:16:54,518-[cfp_fp][194000]Accuracy-Highest: 0.98714 Training: 2022-04-11 12:17:21,046-[agedb_30][194000]XNorm: 22.213885 Training: 2022-04-11 12:17:21,047-[agedb_30][194000]Accuracy-Flip: 0.98117+-0.00813 Training: 2022-04-11 12:17:21,047-[agedb_30][194000]Accuracy-Highest: 0.98250 Training: 2022-04-11 12:17:23,042-Speed 119.55 samples/sec Loss 1.8942 LearningRate 0.0175 Epoch: 11 Global Step: 194010 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:17:25,001-Speed 5229.22 samples/sec Loss 1.8898 LearningRate 0.0175 Epoch: 11 Global Step: 194020 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:17:26,974-Speed 5192.62 samples/sec Loss 1.8469 LearningRate 0.0175 Epoch: 11 Global Step: 194030 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:17:28,964-Speed 5146.68 samples/sec Loss 1.9441 LearningRate 0.0175 Epoch: 11 Global Step: 194040 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:17:30,929-Speed 5214.00 samples/sec Loss 1.8574 LearningRate 0.0175 Epoch: 11 Global Step: 194050 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:17:32,894-Speed 5211.25 samples/sec Loss 1.9053 LearningRate 0.0175 Epoch: 11 Global Step: 194060 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:17:34,871-Speed 5181.49 samples/sec Loss 1.9044 LearningRate 0.0175 Epoch: 11 Global Step: 194070 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:17:36,862-Speed 5144.89 samples/sec Loss 1.9291 LearningRate 0.0175 Epoch: 11 Global Step: 194080 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:17:38,848-Speed 5158.64 samples/sec Loss 1.8066 LearningRate 0.0175 Epoch: 11 Global Step: 194090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:17:40,816-Speed 5204.11 samples/sec Loss 1.8780 LearningRate 0.0175 Epoch: 11 Global Step: 194100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:17:42,785-Speed 5202.82 samples/sec Loss 1.8874 LearningRate 0.0175 Epoch: 11 Global Step: 194110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:17:44,751-Speed 5211.17 samples/sec Loss 1.8578 LearningRate 0.0175 Epoch: 11 Global Step: 194120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:17:46,729-Speed 5177.44 samples/sec Loss 1.8609 LearningRate 0.0175 Epoch: 11 Global Step: 194130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:17:48,700-Speed 5198.91 samples/sec Loss 1.9174 LearningRate 0.0175 Epoch: 11 Global Step: 194140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:17:50,668-Speed 5203.58 samples/sec Loss 1.8300 LearningRate 0.0175 Epoch: 11 Global Step: 194150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:17:52,643-Speed 5187.24 samples/sec Loss 1.8762 LearningRate 0.0175 Epoch: 11 Global Step: 194160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:17:54,621-Speed 5177.01 samples/sec Loss 1.9026 LearningRate 0.0175 Epoch: 11 Global Step: 194170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:17:56,590-Speed 5202.31 samples/sec Loss 1.9479 LearningRate 0.0175 Epoch: 11 Global Step: 194180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:17:58,573-Speed 5166.11 samples/sec Loss 1.9130 LearningRate 0.0175 Epoch: 11 Global Step: 194190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:18:00,590-Speed 5079.62 samples/sec Loss 1.9313 LearningRate 0.0175 Epoch: 11 Global Step: 194200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:18:02,581-Speed 5142.91 samples/sec Loss 1.9267 LearningRate 0.0175 Epoch: 11 Global Step: 194210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:18:04,559-Speed 5179.16 samples/sec Loss 1.9019 LearningRate 0.0175 Epoch: 11 Global Step: 194220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:18:06,535-Speed 5186.46 samples/sec Loss 1.8884 LearningRate 0.0175 Epoch: 11 Global Step: 194230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:18:08,506-Speed 5196.88 samples/sec Loss 1.8784 LearningRate 0.0175 Epoch: 11 Global Step: 194240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:18:10,492-Speed 5156.76 samples/sec Loss 1.8699 LearningRate 0.0175 Epoch: 11 Global Step: 194250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:18:12,466-Speed 5190.56 samples/sec Loss 1.8943 LearningRate 0.0175 Epoch: 11 Global Step: 194260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:18:14,434-Speed 5203.44 samples/sec Loss 1.8625 LearningRate 0.0175 Epoch: 11 Global Step: 194270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:18:16,416-Speed 5168.47 samples/sec Loss 1.8014 LearningRate 0.0175 Epoch: 11 Global Step: 194280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:18:18,386-Speed 5199.41 samples/sec Loss 1.8506 LearningRate 0.0175 Epoch: 11 Global Step: 194290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:18:20,357-Speed 5196.50 samples/sec Loss 1.9431 LearningRate 0.0175 Epoch: 11 Global Step: 194300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:18:22,363-Speed 5106.11 samples/sec Loss 1.8445 LearningRate 0.0175 Epoch: 11 Global Step: 194310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:18:24,336-Speed 5192.33 samples/sec Loss 1.9186 LearningRate 0.0175 Epoch: 11 Global Step: 194320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:18:26,318-Speed 5169.02 samples/sec Loss 1.8767 LearningRate 0.0175 Epoch: 11 Global Step: 194330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:18:28,301-Speed 5166.07 samples/sec Loss 1.9051 LearningRate 0.0175 Epoch: 11 Global Step: 194340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:18:30,270-Speed 5201.52 samples/sec Loss 1.9151 LearningRate 0.0175 Epoch: 11 Global Step: 194350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:18:32,246-Speed 5185.30 samples/sec Loss 1.9384 LearningRate 0.0175 Epoch: 11 Global Step: 194360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:18:34,217-Speed 5196.90 samples/sec Loss 1.8752 LearningRate 0.0175 Epoch: 11 Global Step: 194370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:18:36,207-Speed 5147.94 samples/sec Loss 1.8897 LearningRate 0.0174 Epoch: 11 Global Step: 194380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:18:38,195-Speed 5153.13 samples/sec Loss 1.9045 LearningRate 0.0174 Epoch: 11 Global Step: 194390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:18:40,174-Speed 5176.14 samples/sec Loss 1.8859 LearningRate 0.0174 Epoch: 11 Global Step: 194400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:18:42,169-Speed 5132.47 samples/sec Loss 1.9551 LearningRate 0.0174 Epoch: 11 Global Step: 194410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:18:44,148-Speed 5175.36 samples/sec Loss 1.9585 LearningRate 0.0174 Epoch: 11 Global Step: 194420 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:18:46,121-Speed 5192.26 samples/sec Loss 1.8870 LearningRate 0.0174 Epoch: 11 Global Step: 194430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:18:48,108-Speed 5158.10 samples/sec Loss 1.8580 LearningRate 0.0174 Epoch: 11 Global Step: 194440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:18:50,078-Speed 5199.30 samples/sec Loss 1.9542 LearningRate 0.0174 Epoch: 11 Global Step: 194450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:18:52,050-Speed 5194.00 samples/sec Loss 1.8776 LearningRate 0.0174 Epoch: 11 Global Step: 194460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:18:54,019-Speed 5203.02 samples/sec Loss 1.9314 LearningRate 0.0174 Epoch: 11 Global Step: 194470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:18:55,987-Speed 5203.53 samples/sec Loss 1.8811 LearningRate 0.0174 Epoch: 11 Global Step: 194480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:18:57,977-Speed 5149.15 samples/sec Loss 1.8550 LearningRate 0.0174 Epoch: 11 Global Step: 194490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:18:59,954-Speed 5180.30 samples/sec Loss 1.8376 LearningRate 0.0174 Epoch: 11 Global Step: 194500 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:19:01,933-Speed 5176.48 samples/sec Loss 1.9712 LearningRate 0.0174 Epoch: 11 Global Step: 194510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:19:03,913-Speed 5173.28 samples/sec Loss 1.9494 LearningRate 0.0174 Epoch: 11 Global Step: 194520 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:19:05,897-Speed 5162.42 samples/sec Loss 1.9099 LearningRate 0.0174 Epoch: 11 Global Step: 194530 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:19:07,868-Speed 5196.14 samples/sec Loss 1.9092 LearningRate 0.0174 Epoch: 11 Global Step: 194540 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:19:09,844-Speed 5186.26 samples/sec Loss 1.9677 LearningRate 0.0174 Epoch: 11 Global Step: 194550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:19:11,821-Speed 5181.31 samples/sec Loss 1.9010 LearningRate 0.0174 Epoch: 11 Global Step: 194560 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:19:13,796-Speed 5187.06 samples/sec Loss 1.8896 LearningRate 0.0174 Epoch: 11 Global Step: 194570 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:19:15,769-Speed 5191.17 samples/sec Loss 1.9937 LearningRate 0.0174 Epoch: 11 Global Step: 194580 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:19:17,777-Speed 5099.32 samples/sec Loss 1.9270 LearningRate 0.0174 Epoch: 11 Global Step: 194590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:19:19,764-Speed 5157.53 samples/sec Loss 1.8801 LearningRate 0.0174 Epoch: 11 Global Step: 194600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:19:21,743-Speed 5174.61 samples/sec Loss 1.8976 LearningRate 0.0174 Epoch: 11 Global Step: 194610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:19:23,713-Speed 5199.82 samples/sec Loss 1.9255 LearningRate 0.0174 Epoch: 11 Global Step: 194620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:19:25,696-Speed 5166.89 samples/sec Loss 1.8961 LearningRate 0.0174 Epoch: 11 Global Step: 194630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:19:27,681-Speed 5160.79 samples/sec Loss 1.8858 LearningRate 0.0174 Epoch: 11 Global Step: 194640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:19:29,650-Speed 5201.65 samples/sec Loss 1.8439 LearningRate 0.0174 Epoch: 11 Global Step: 194650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:19:31,624-Speed 5190.32 samples/sec Loss 1.8844 LearningRate 0.0174 Epoch: 11 Global Step: 194660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:19:33,606-Speed 5167.48 samples/sec Loss 1.9314 LearningRate 0.0174 Epoch: 11 Global Step: 194670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:19:35,579-Speed 5193.15 samples/sec Loss 1.9158 LearningRate 0.0174 Epoch: 11 Global Step: 194680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:19:37,548-Speed 5200.38 samples/sec Loss 1.8864 LearningRate 0.0174 Epoch: 11 Global Step: 194690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:19:39,535-Speed 5156.43 samples/sec Loss 1.9548 LearningRate 0.0174 Epoch: 11 Global Step: 194700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:19:41,510-Speed 5184.87 samples/sec Loss 1.8925 LearningRate 0.0174 Epoch: 11 Global Step: 194710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:19:43,476-Speed 5211.44 samples/sec Loss 1.9214 LearningRate 0.0174 Epoch: 11 Global Step: 194720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:19:45,454-Speed 5177.30 samples/sec Loss 1.9950 LearningRate 0.0174 Epoch: 11 Global Step: 194730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:19:47,433-Speed 5177.58 samples/sec Loss 1.9475 LearningRate 0.0174 Epoch: 11 Global Step: 194740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:19:49,416-Speed 5166.39 samples/sec Loss 1.9114 LearningRate 0.0174 Epoch: 11 Global Step: 194750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:19:51,385-Speed 5202.50 samples/sec Loss 1.8962 LearningRate 0.0174 Epoch: 11 Global Step: 194760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:19:53,358-Speed 5190.57 samples/sec Loss 1.9187 LearningRate 0.0174 Epoch: 11 Global Step: 194770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:19:55,338-Speed 5173.69 samples/sec Loss 1.9065 LearningRate 0.0173 Epoch: 11 Global Step: 194780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:19:57,328-Speed 5149.40 samples/sec Loss 1.8692 LearningRate 0.0173 Epoch: 11 Global Step: 194790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:19:59,301-Speed 5191.39 samples/sec Loss 1.9267 LearningRate 0.0173 Epoch: 11 Global Step: 194800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:20:01,282-Speed 5170.29 samples/sec Loss 1.8862 LearningRate 0.0173 Epoch: 11 Global Step: 194810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:20:03,257-Speed 5185.99 samples/sec Loss 1.8988 LearningRate 0.0173 Epoch: 11 Global Step: 194820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:20:05,239-Speed 5168.03 samples/sec Loss 1.8786 LearningRate 0.0173 Epoch: 11 Global Step: 194830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:20:07,206-Speed 5206.49 samples/sec Loss 1.8570 LearningRate 0.0173 Epoch: 11 Global Step: 194840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:20:09,181-Speed 5188.68 samples/sec Loss 1.9124 LearningRate 0.0173 Epoch: 11 Global Step: 194850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:20:11,163-Speed 5167.68 samples/sec Loss 1.9580 LearningRate 0.0173 Epoch: 11 Global Step: 194860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:20:13,146-Speed 5163.86 samples/sec Loss 1.9335 LearningRate 0.0173 Epoch: 11 Global Step: 194870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:20:15,125-Speed 5177.57 samples/sec Loss 1.9629 LearningRate 0.0173 Epoch: 11 Global Step: 194880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:20:17,097-Speed 5195.63 samples/sec Loss 1.8930 LearningRate 0.0173 Epoch: 11 Global Step: 194890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:20:19,077-Speed 5173.36 samples/sec Loss 1.9489 LearningRate 0.0173 Epoch: 11 Global Step: 194900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:20:21,050-Speed 5190.89 samples/sec Loss 1.9280 LearningRate 0.0173 Epoch: 11 Global Step: 194910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:20:23,033-Speed 5165.48 samples/sec Loss 1.9013 LearningRate 0.0173 Epoch: 11 Global Step: 194920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:20:25,012-Speed 5174.84 samples/sec Loss 1.9065 LearningRate 0.0173 Epoch: 11 Global Step: 194930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:20:27,002-Speed 5148.41 samples/sec Loss 1.8545 LearningRate 0.0173 Epoch: 11 Global Step: 194940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:20:28,974-Speed 5196.15 samples/sec Loss 1.9759 LearningRate 0.0173 Epoch: 11 Global Step: 194950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:20:30,949-Speed 5185.94 samples/sec Loss 1.8498 LearningRate 0.0173 Epoch: 11 Global Step: 194960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:20:32,930-Speed 5169.23 samples/sec Loss 1.9153 LearningRate 0.0173 Epoch: 11 Global Step: 194970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:20:34,938-Speed 5101.85 samples/sec Loss 1.9142 LearningRate 0.0173 Epoch: 11 Global Step: 194980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:20:36,909-Speed 5198.23 samples/sec Loss 1.9268 LearningRate 0.0173 Epoch: 11 Global Step: 194990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:20:38,898-Speed 5150.79 samples/sec Loss 1.9512 LearningRate 0.0173 Epoch: 11 Global Step: 195000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:20:40,876-Speed 5176.50 samples/sec Loss 1.9515 LearningRate 0.0173 Epoch: 11 Global Step: 195010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:20:42,863-Speed 5156.04 samples/sec Loss 1.9121 LearningRate 0.0173 Epoch: 11 Global Step: 195020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:20:44,860-Speed 5128.36 samples/sec Loss 1.8495 LearningRate 0.0173 Epoch: 11 Global Step: 195030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:20:46,845-Speed 5160.85 samples/sec Loss 1.9085 LearningRate 0.0173 Epoch: 11 Global Step: 195040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:20:48,819-Speed 5189.10 samples/sec Loss 1.9079 LearningRate 0.0173 Epoch: 11 Global Step: 195050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:20:50,809-Speed 5147.97 samples/sec Loss 1.9393 LearningRate 0.0173 Epoch: 11 Global Step: 195060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:20:52,780-Speed 5195.87 samples/sec Loss 1.8828 LearningRate 0.0173 Epoch: 11 Global Step: 195070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:20:54,749-Speed 5202.14 samples/sec Loss 1.9103 LearningRate 0.0173 Epoch: 11 Global Step: 195080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:20:56,733-Speed 5165.79 samples/sec Loss 1.9216 LearningRate 0.0173 Epoch: 11 Global Step: 195090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:20:58,730-Speed 5129.23 samples/sec Loss 1.8962 LearningRate 0.0173 Epoch: 11 Global Step: 195100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:21:00,721-Speed 5144.26 samples/sec Loss 1.8839 LearningRate 0.0173 Epoch: 11 Global Step: 195110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:21:02,701-Speed 5174.21 samples/sec Loss 1.8674 LearningRate 0.0173 Epoch: 11 Global Step: 195120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:21:04,662-Speed 5224.30 samples/sec Loss 1.9021 LearningRate 0.0173 Epoch: 11 Global Step: 195130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:21:06,639-Speed 5179.77 samples/sec Loss 1.8594 LearningRate 0.0173 Epoch: 11 Global Step: 195140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:21:08,618-Speed 5176.38 samples/sec Loss 1.8948 LearningRate 0.0173 Epoch: 11 Global Step: 195150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:21:10,619-Speed 5119.15 samples/sec Loss 1.9319 LearningRate 0.0173 Epoch: 11 Global Step: 195160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:21:12,623-Speed 5110.71 samples/sec Loss 1.8922 LearningRate 0.0173 Epoch: 11 Global Step: 195170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:21:14,635-Speed 5091.63 samples/sec Loss 1.9357 LearningRate 0.0172 Epoch: 11 Global Step: 195180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:21:16,627-Speed 5142.91 samples/sec Loss 1.9131 LearningRate 0.0172 Epoch: 11 Global Step: 195190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:21:18,612-Speed 5159.07 samples/sec Loss 1.8920 LearningRate 0.0172 Epoch: 11 Global Step: 195200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:21:20,584-Speed 5194.05 samples/sec Loss 1.9364 LearningRate 0.0172 Epoch: 11 Global Step: 195210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:21:22,568-Speed 5163.77 samples/sec Loss 1.9293 LearningRate 0.0172 Epoch: 11 Global Step: 195220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:21:24,554-Speed 5157.36 samples/sec Loss 1.8571 LearningRate 0.0172 Epoch: 11 Global Step: 195230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:21:26,529-Speed 5187.62 samples/sec Loss 1.8935 LearningRate 0.0172 Epoch: 11 Global Step: 195240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:21:28,508-Speed 5176.68 samples/sec Loss 1.8720 LearningRate 0.0172 Epoch: 11 Global Step: 195250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:21:30,483-Speed 5185.35 samples/sec Loss 1.9123 LearningRate 0.0172 Epoch: 11 Global Step: 195260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:21:32,469-Speed 5158.98 samples/sec Loss 1.8954 LearningRate 0.0172 Epoch: 11 Global Step: 195270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:21:34,446-Speed 5180.00 samples/sec Loss 1.9275 LearningRate 0.0172 Epoch: 11 Global Step: 195280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:21:36,449-Speed 5115.55 samples/sec Loss 1.9874 LearningRate 0.0172 Epoch: 11 Global Step: 195290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:21:38,423-Speed 5189.81 samples/sec Loss 1.9026 LearningRate 0.0172 Epoch: 11 Global Step: 195300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:21:40,407-Speed 5163.69 samples/sec Loss 1.8591 LearningRate 0.0172 Epoch: 11 Global Step: 195310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:21:42,378-Speed 5195.84 samples/sec Loss 1.9000 LearningRate 0.0172 Epoch: 11 Global Step: 195320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:21:44,351-Speed 5191.40 samples/sec Loss 1.8207 LearningRate 0.0172 Epoch: 11 Global Step: 195330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:21:46,324-Speed 5191.50 samples/sec Loss 1.9025 LearningRate 0.0172 Epoch: 11 Global Step: 195340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:21:48,310-Speed 5158.00 samples/sec Loss 1.9144 LearningRate 0.0172 Epoch: 11 Global Step: 195350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:21:50,286-Speed 5183.26 samples/sec Loss 1.9085 LearningRate 0.0172 Epoch: 11 Global Step: 195360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:21:52,266-Speed 5174.38 samples/sec Loss 1.9045 LearningRate 0.0172 Epoch: 11 Global Step: 195370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:21:54,236-Speed 5199.54 samples/sec Loss 1.9274 LearningRate 0.0172 Epoch: 11 Global Step: 195380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:21:56,220-Speed 5161.76 samples/sec Loss 1.8482 LearningRate 0.0172 Epoch: 11 Global Step: 195390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:21:58,207-Speed 5156.65 samples/sec Loss 1.8994 LearningRate 0.0172 Epoch: 11 Global Step: 195400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:22:00,179-Speed 5195.01 samples/sec Loss 1.9284 LearningRate 0.0172 Epoch: 11 Global Step: 195410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:22:02,151-Speed 5194.51 samples/sec Loss 1.9167 LearningRate 0.0172 Epoch: 11 Global Step: 195420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:22:04,117-Speed 5209.31 samples/sec Loss 1.9181 LearningRate 0.0172 Epoch: 11 Global Step: 195430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:22:06,093-Speed 5185.72 samples/sec Loss 1.9041 LearningRate 0.0172 Epoch: 11 Global Step: 195440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:22:08,080-Speed 5153.94 samples/sec Loss 1.8561 LearningRate 0.0172 Epoch: 11 Global Step: 195450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:22:10,057-Speed 5181.41 samples/sec Loss 1.9418 LearningRate 0.0172 Epoch: 11 Global Step: 195460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:22:12,042-Speed 5161.41 samples/sec Loss 1.8930 LearningRate 0.0172 Epoch: 11 Global Step: 195470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:22:14,030-Speed 5151.69 samples/sec Loss 1.9220 LearningRate 0.0172 Epoch: 11 Global Step: 195480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:22:15,997-Speed 5207.68 samples/sec Loss 1.8859 LearningRate 0.0172 Epoch: 11 Global Step: 195490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:22:17,970-Speed 5190.61 samples/sec Loss 1.8627 LearningRate 0.0172 Epoch: 11 Global Step: 195500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:22:19,941-Speed 5197.54 samples/sec Loss 1.8938 LearningRate 0.0172 Epoch: 11 Global Step: 195510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:22:21,914-Speed 5192.67 samples/sec Loss 1.8866 LearningRate 0.0172 Epoch: 11 Global Step: 195520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:22:23,899-Speed 5160.04 samples/sec Loss 1.8862 LearningRate 0.0172 Epoch: 11 Global Step: 195530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:22:25,878-Speed 5177.25 samples/sec Loss 1.9278 LearningRate 0.0172 Epoch: 11 Global Step: 195540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:22:27,851-Speed 5190.45 samples/sec Loss 1.9622 LearningRate 0.0172 Epoch: 11 Global Step: 195550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:22:29,831-Speed 5174.19 samples/sec Loss 1.8860 LearningRate 0.0172 Epoch: 11 Global Step: 195560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:22:31,811-Speed 5173.53 samples/sec Loss 1.8709 LearningRate 0.0172 Epoch: 11 Global Step: 195570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:22:33,798-Speed 5155.24 samples/sec Loss 1.8571 LearningRate 0.0171 Epoch: 11 Global Step: 195580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:22:35,802-Speed 5109.87 samples/sec Loss 1.8969 LearningRate 0.0171 Epoch: 11 Global Step: 195590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:22:37,778-Speed 5184.46 samples/sec Loss 1.9311 LearningRate 0.0171 Epoch: 11 Global Step: 195600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:22:39,755-Speed 5182.30 samples/sec Loss 1.9250 LearningRate 0.0171 Epoch: 11 Global Step: 195610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:22:41,734-Speed 5177.88 samples/sec Loss 1.8876 LearningRate 0.0171 Epoch: 11 Global Step: 195620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:22:43,715-Speed 5168.79 samples/sec Loss 1.8980 LearningRate 0.0171 Epoch: 11 Global Step: 195630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:22:45,686-Speed 5197.90 samples/sec Loss 1.9070 LearningRate 0.0171 Epoch: 11 Global Step: 195640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:22:47,670-Speed 5162.38 samples/sec Loss 1.8991 LearningRate 0.0171 Epoch: 11 Global Step: 195650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:22:49,670-Speed 5121.35 samples/sec Loss 1.9179 LearningRate 0.0171 Epoch: 11 Global Step: 195660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:22:51,658-Speed 5153.64 samples/sec Loss 1.8950 LearningRate 0.0171 Epoch: 11 Global Step: 195670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:22:53,641-Speed 5164.99 samples/sec Loss 1.9390 LearningRate 0.0171 Epoch: 11 Global Step: 195680 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:22:55,618-Speed 5182.15 samples/sec Loss 1.8882 LearningRate 0.0171 Epoch: 11 Global Step: 195690 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:22:57,600-Speed 5167.93 samples/sec Loss 1.8003 LearningRate 0.0171 Epoch: 11 Global Step: 195700 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:22:59,571-Speed 5196.19 samples/sec Loss 1.9223 LearningRate 0.0171 Epoch: 11 Global Step: 195710 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:23:01,544-Speed 5192.86 samples/sec Loss 1.8925 LearningRate 0.0171 Epoch: 11 Global Step: 195720 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:23:03,531-Speed 5156.62 samples/sec Loss 1.8631 LearningRate 0.0171 Epoch: 11 Global Step: 195730 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:23:05,503-Speed 5194.72 samples/sec Loss 1.9172 LearningRate 0.0171 Epoch: 11 Global Step: 195740 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:23:07,477-Speed 5188.36 samples/sec Loss 1.9745 LearningRate 0.0171 Epoch: 11 Global Step: 195750 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:23:09,459-Speed 5167.13 samples/sec Loss 1.8931 LearningRate 0.0171 Epoch: 11 Global Step: 195760 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:23:11,432-Speed 5195.06 samples/sec Loss 1.8923 LearningRate 0.0171 Epoch: 11 Global Step: 195770 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:23:13,422-Speed 5146.12 samples/sec Loss 1.8942 LearningRate 0.0171 Epoch: 11 Global Step: 195780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:23:15,421-Speed 5125.50 samples/sec Loss 1.8715 LearningRate 0.0171 Epoch: 11 Global Step: 195790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:23:17,421-Speed 5121.68 samples/sec Loss 1.8667 LearningRate 0.0171 Epoch: 11 Global Step: 195800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:23:19,395-Speed 5187.60 samples/sec Loss 1.8598 LearningRate 0.0171 Epoch: 11 Global Step: 195810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:23:21,379-Speed 5164.13 samples/sec Loss 1.8904 LearningRate 0.0171 Epoch: 11 Global Step: 195820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:23:23,350-Speed 5198.81 samples/sec Loss 1.8198 LearningRate 0.0171 Epoch: 11 Global Step: 195830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:23:25,329-Speed 5175.54 samples/sec Loss 1.8630 LearningRate 0.0171 Epoch: 11 Global Step: 195840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:23:27,324-Speed 5134.98 samples/sec Loss 1.9285 LearningRate 0.0171 Epoch: 11 Global Step: 195850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:23:29,305-Speed 5170.86 samples/sec Loss 1.8686 LearningRate 0.0171 Epoch: 11 Global Step: 195860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:23:31,280-Speed 5184.50 samples/sec Loss 1.8601 LearningRate 0.0171 Epoch: 11 Global Step: 195870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:23:33,270-Speed 5147.95 samples/sec Loss 1.8864 LearningRate 0.0171 Epoch: 11 Global Step: 195880 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:23:35,265-Speed 5134.38 samples/sec Loss 1.8555 LearningRate 0.0171 Epoch: 11 Global Step: 195890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:23:37,245-Speed 5172.36 samples/sec Loss 1.9576 LearningRate 0.0171 Epoch: 11 Global Step: 195900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:23:39,216-Speed 5198.81 samples/sec Loss 1.8681 LearningRate 0.0171 Epoch: 11 Global Step: 195910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:23:41,188-Speed 5195.04 samples/sec Loss 1.8987 LearningRate 0.0171 Epoch: 11 Global Step: 195920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:23:43,168-Speed 5173.71 samples/sec Loss 1.9153 LearningRate 0.0171 Epoch: 11 Global Step: 195930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:23:45,141-Speed 5190.69 samples/sec Loss 1.8685 LearningRate 0.0171 Epoch: 11 Global Step: 195940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:23:47,113-Speed 5195.07 samples/sec Loss 1.8843 LearningRate 0.0171 Epoch: 11 Global Step: 195950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:23:49,083-Speed 5198.20 samples/sec Loss 1.9228 LearningRate 0.0171 Epoch: 11 Global Step: 195960 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:23:51,060-Speed 5181.35 samples/sec Loss 1.8778 LearningRate 0.0171 Epoch: 11 Global Step: 195970 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:23:53,041-Speed 5172.05 samples/sec Loss 1.8816 LearningRate 0.0171 Epoch: 11 Global Step: 195980 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:23:55,014-Speed 5190.83 samples/sec Loss 1.9032 LearningRate 0.0170 Epoch: 11 Global Step: 195990 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:23:57,018-Speed 5112.97 samples/sec Loss 1.9099 LearningRate 0.0170 Epoch: 11 Global Step: 196000 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:24:23,763-[lfw][196000]XNorm: 22.737724 Training: 2022-04-11 12:24:23,763-[lfw][196000]Accuracy-Flip: 0.99733+-0.00271 Training: 2022-04-11 12:24:23,764-[lfw][196000]Accuracy-Highest: 0.99833 Training: 2022-04-11 12:24:54,770-[cfp_fp][196000]XNorm: 21.339430 Training: 2022-04-11 12:24:54,770-[cfp_fp][196000]Accuracy-Flip: 0.98529+-0.00394 Training: 2022-04-11 12:24:54,771-[cfp_fp][196000]Accuracy-Highest: 0.98714 Training: 2022-04-11 12:25:21,587-[agedb_30][196000]XNorm: 22.809233 Training: 2022-04-11 12:25:21,588-[agedb_30][196000]Accuracy-Flip: 0.98017+-0.00713 Training: 2022-04-11 12:25:21,589-[agedb_30][196000]Accuracy-Highest: 0.98250 Training: 2022-04-11 12:25:23,573-Speed 118.31 samples/sec Loss 1.9458 LearningRate 0.0170 Epoch: 11 Global Step: 196010 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:25:25,544-Speed 5197.56 samples/sec Loss 1.9223 LearningRate 0.0170 Epoch: 11 Global Step: 196020 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:25:27,508-Speed 5215.67 samples/sec Loss 1.8932 LearningRate 0.0170 Epoch: 11 Global Step: 196030 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:25:29,472-Speed 5216.37 samples/sec Loss 1.9398 LearningRate 0.0170 Epoch: 11 Global Step: 196040 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:25:31,436-Speed 5215.70 samples/sec Loss 1.8502 LearningRate 0.0170 Epoch: 11 Global Step: 196050 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:25:33,400-Speed 5213.89 samples/sec Loss 1.9469 LearningRate 0.0170 Epoch: 11 Global Step: 196060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:25:35,371-Speed 5197.67 samples/sec Loss 1.8994 LearningRate 0.0170 Epoch: 11 Global Step: 196070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:25:37,369-Speed 5127.49 samples/sec Loss 1.9035 LearningRate 0.0170 Epoch: 11 Global Step: 196080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:25:39,345-Speed 5183.40 samples/sec Loss 1.9007 LearningRate 0.0170 Epoch: 11 Global Step: 196090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:25:41,312-Speed 5206.90 samples/sec Loss 1.9471 LearningRate 0.0170 Epoch: 11 Global Step: 196100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:25:43,286-Speed 5190.01 samples/sec Loss 1.8781 LearningRate 0.0170 Epoch: 11 Global Step: 196110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:25:45,256-Speed 5200.51 samples/sec Loss 1.8657 LearningRate 0.0170 Epoch: 11 Global Step: 196120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:25:47,221-Speed 5211.67 samples/sec Loss 1.8996 LearningRate 0.0170 Epoch: 11 Global Step: 196130 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:25:49,193-Speed 5194.08 samples/sec Loss 1.8791 LearningRate 0.0170 Epoch: 11 Global Step: 196140 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:25:51,181-Speed 5153.51 samples/sec Loss 1.9032 LearningRate 0.0170 Epoch: 11 Global Step: 196150 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:25:53,159-Speed 5179.14 samples/sec Loss 1.8778 LearningRate 0.0170 Epoch: 11 Global Step: 196160 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:25:55,148-Speed 5150.47 samples/sec Loss 1.8892 LearningRate 0.0170 Epoch: 11 Global Step: 196170 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:25:57,131-Speed 5165.75 samples/sec Loss 1.9187 LearningRate 0.0170 Epoch: 11 Global Step: 196180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:25:59,108-Speed 5179.11 samples/sec Loss 1.9260 LearningRate 0.0170 Epoch: 11 Global Step: 196190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:26:01,093-Speed 5162.21 samples/sec Loss 1.8928 LearningRate 0.0170 Epoch: 11 Global Step: 196200 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:26:03,079-Speed 5157.73 samples/sec Loss 1.9159 LearningRate 0.0170 Epoch: 11 Global Step: 196210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:26:05,064-Speed 5159.35 samples/sec Loss 1.9059 LearningRate 0.0170 Epoch: 11 Global Step: 196220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:26:07,034-Speed 5199.50 samples/sec Loss 1.9289 LearningRate 0.0170 Epoch: 11 Global Step: 196230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:26:09,020-Speed 5157.91 samples/sec Loss 1.8718 LearningRate 0.0170 Epoch: 11 Global Step: 196240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:26:11,007-Speed 5154.42 samples/sec Loss 1.8786 LearningRate 0.0170 Epoch: 11 Global Step: 196250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:26:12,985-Speed 5178.70 samples/sec Loss 1.8828 LearningRate 0.0170 Epoch: 11 Global Step: 196260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:26:14,956-Speed 5197.65 samples/sec Loss 1.9538 LearningRate 0.0170 Epoch: 11 Global Step: 196270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:26:16,929-Speed 5192.75 samples/sec Loss 1.9460 LearningRate 0.0170 Epoch: 11 Global Step: 196280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:26:18,899-Speed 5199.20 samples/sec Loss 1.9061 LearningRate 0.0170 Epoch: 11 Global Step: 196290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:26:20,865-Speed 5210.38 samples/sec Loss 1.8962 LearningRate 0.0170 Epoch: 11 Global Step: 196300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:26:22,866-Speed 5118.83 samples/sec Loss 1.9052 LearningRate 0.0170 Epoch: 11 Global Step: 196310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:26:24,886-Speed 5071.28 samples/sec Loss 1.8473 LearningRate 0.0170 Epoch: 11 Global Step: 196320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:26:26,880-Speed 5137.21 samples/sec Loss 1.8653 LearningRate 0.0170 Epoch: 11 Global Step: 196330 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:26:28,865-Speed 5161.67 samples/sec Loss 1.8712 LearningRate 0.0170 Epoch: 11 Global Step: 196340 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:26:30,838-Speed 5190.91 samples/sec Loss 1.8838 LearningRate 0.0170 Epoch: 11 Global Step: 196350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:26:32,828-Speed 5147.98 samples/sec Loss 1.9199 LearningRate 0.0170 Epoch: 11 Global Step: 196360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:26:34,814-Speed 5156.28 samples/sec Loss 1.8802 LearningRate 0.0170 Epoch: 11 Global Step: 196370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:26:36,823-Speed 5100.26 samples/sec Loss 1.9095 LearningRate 0.0170 Epoch: 11 Global Step: 196380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:26:38,796-Speed 5192.75 samples/sec Loss 1.9053 LearningRate 0.0169 Epoch: 11 Global Step: 196390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:26:40,784-Speed 5152.63 samples/sec Loss 1.8949 LearningRate 0.0169 Epoch: 11 Global Step: 196400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:26:42,755-Speed 5197.09 samples/sec Loss 1.9131 LearningRate 0.0169 Epoch: 11 Global Step: 196410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:26:44,749-Speed 5135.25 samples/sec Loss 1.9200 LearningRate 0.0169 Epoch: 11 Global Step: 196420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:26:46,736-Speed 5156.16 samples/sec Loss 1.9062 LearningRate 0.0169 Epoch: 11 Global Step: 196430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:26:48,714-Speed 5179.40 samples/sec Loss 2.0063 LearningRate 0.0169 Epoch: 11 Global Step: 196440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:26:50,699-Speed 5160.37 samples/sec Loss 1.9747 LearningRate 0.0169 Epoch: 11 Global Step: 196450 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:26:52,692-Speed 5137.66 samples/sec Loss 1.8954 LearningRate 0.0169 Epoch: 11 Global Step: 196460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:26:54,673-Speed 5171.94 samples/sec Loss 1.9009 LearningRate 0.0169 Epoch: 11 Global Step: 196470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:26:56,654-Speed 5169.72 samples/sec Loss 1.8916 LearningRate 0.0169 Epoch: 11 Global Step: 196480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:26:58,641-Speed 5156.27 samples/sec Loss 1.8887 LearningRate 0.0169 Epoch: 11 Global Step: 196490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:27:00,618-Speed 5183.35 samples/sec Loss 1.9146 LearningRate 0.0169 Epoch: 11 Global Step: 196500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:27:02,624-Speed 5105.48 samples/sec Loss 1.9094 LearningRate 0.0169 Epoch: 11 Global Step: 196510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:27:04,607-Speed 5165.83 samples/sec Loss 1.9541 LearningRate 0.0169 Epoch: 11 Global Step: 196520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:27:06,594-Speed 5154.82 samples/sec Loss 1.8894 LearningRate 0.0169 Epoch: 11 Global Step: 196530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:27:08,584-Speed 5147.72 samples/sec Loss 1.8934 LearningRate 0.0169 Epoch: 11 Global Step: 196540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:27:10,553-Speed 5200.65 samples/sec Loss 1.8226 LearningRate 0.0169 Epoch: 11 Global Step: 196550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:27:12,528-Speed 5187.82 samples/sec Loss 1.8739 LearningRate 0.0169 Epoch: 11 Global Step: 196560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:27:14,502-Speed 5189.69 samples/sec Loss 1.9350 LearningRate 0.0169 Epoch: 11 Global Step: 196570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:27:16,473-Speed 5195.69 samples/sec Loss 1.9081 LearningRate 0.0169 Epoch: 11 Global Step: 196580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:27:18,446-Speed 5190.85 samples/sec Loss 1.7824 LearningRate 0.0169 Epoch: 11 Global Step: 196590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:27:20,416-Speed 5202.86 samples/sec Loss 1.8471 LearningRate 0.0169 Epoch: 11 Global Step: 196600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:27:22,388-Speed 5192.97 samples/sec Loss 1.8776 LearningRate 0.0169 Epoch: 11 Global Step: 196610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:27:24,364-Speed 5185.27 samples/sec Loss 1.9925 LearningRate 0.0169 Epoch: 11 Global Step: 196620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:27:26,341-Speed 5180.18 samples/sec Loss 1.8837 LearningRate 0.0169 Epoch: 11 Global Step: 196630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:27:28,315-Speed 5190.74 samples/sec Loss 1.8569 LearningRate 0.0169 Epoch: 11 Global Step: 196640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:27:30,312-Speed 5129.60 samples/sec Loss 1.9258 LearningRate 0.0169 Epoch: 11 Global Step: 196650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:27:32,282-Speed 5198.91 samples/sec Loss 1.9125 LearningRate 0.0169 Epoch: 11 Global Step: 196660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:27:34,274-Speed 5141.09 samples/sec Loss 1.9716 LearningRate 0.0169 Epoch: 11 Global Step: 196670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:27:36,259-Speed 5160.04 samples/sec Loss 1.8683 LearningRate 0.0169 Epoch: 11 Global Step: 196680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:27:38,247-Speed 5152.47 samples/sec Loss 1.9268 LearningRate 0.0169 Epoch: 11 Global Step: 196690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:27:40,228-Speed 5171.29 samples/sec Loss 1.8970 LearningRate 0.0169 Epoch: 11 Global Step: 196700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:27:42,196-Speed 5205.74 samples/sec Loss 1.8859 LearningRate 0.0169 Epoch: 11 Global Step: 196710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:27:44,177-Speed 5171.44 samples/sec Loss 1.9273 LearningRate 0.0169 Epoch: 11 Global Step: 196720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:27:46,149-Speed 5193.56 samples/sec Loss 1.9119 LearningRate 0.0169 Epoch: 11 Global Step: 196730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:27:48,146-Speed 5129.23 samples/sec Loss 1.9328 LearningRate 0.0169 Epoch: 11 Global Step: 196740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:27:50,124-Speed 5180.81 samples/sec Loss 1.8832 LearningRate 0.0169 Epoch: 11 Global Step: 196750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:27:52,091-Speed 5206.42 samples/sec Loss 1.9161 LearningRate 0.0169 Epoch: 11 Global Step: 196760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:27:54,066-Speed 5186.06 samples/sec Loss 1.9346 LearningRate 0.0169 Epoch: 11 Global Step: 196770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:27:56,039-Speed 5192.22 samples/sec Loss 1.9359 LearningRate 0.0169 Epoch: 11 Global Step: 196780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:27:58,022-Speed 5166.46 samples/sec Loss 1.8946 LearningRate 0.0169 Epoch: 11 Global Step: 196790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:27:59,992-Speed 5199.24 samples/sec Loss 1.9113 LearningRate 0.0168 Epoch: 11 Global Step: 196800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:28:02,006-Speed 5086.95 samples/sec Loss 1.8315 LearningRate 0.0168 Epoch: 11 Global Step: 196810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:28:04,025-Speed 5072.54 samples/sec Loss 1.9892 LearningRate 0.0168 Epoch: 11 Global Step: 196820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:28:05,990-Speed 5214.68 samples/sec Loss 1.8918 LearningRate 0.0168 Epoch: 11 Global Step: 196830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:28:07,958-Speed 5203.43 samples/sec Loss 1.9512 LearningRate 0.0168 Epoch: 11 Global Step: 196840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:28:09,927-Speed 5202.34 samples/sec Loss 1.9412 LearningRate 0.0168 Epoch: 11 Global Step: 196850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:28:11,900-Speed 5190.53 samples/sec Loss 1.9268 LearningRate 0.0168 Epoch: 11 Global Step: 196860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:28:13,871-Speed 5199.38 samples/sec Loss 1.8797 LearningRate 0.0168 Epoch: 11 Global Step: 196870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:28:15,862-Speed 5145.20 samples/sec Loss 1.8659 LearningRate 0.0168 Epoch: 11 Global Step: 196880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:28:17,852-Speed 5147.02 samples/sec Loss 1.9224 LearningRate 0.0168 Epoch: 11 Global Step: 196890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:28:19,830-Speed 5179.25 samples/sec Loss 1.9754 LearningRate 0.0168 Epoch: 11 Global Step: 196900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:28:21,807-Speed 5181.26 samples/sec Loss 1.8454 LearningRate 0.0168 Epoch: 11 Global Step: 196910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:28:23,813-Speed 5105.02 samples/sec Loss 1.9597 LearningRate 0.0168 Epoch: 11 Global Step: 196920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:28:25,785-Speed 5194.87 samples/sec Loss 1.9135 LearningRate 0.0168 Epoch: 11 Global Step: 196930 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:28:27,758-Speed 5192.15 samples/sec Loss 1.8849 LearningRate 0.0168 Epoch: 11 Global Step: 196940 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:28:29,750-Speed 5143.10 samples/sec Loss 1.9353 LearningRate 0.0168 Epoch: 11 Global Step: 196950 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:28:31,737-Speed 5153.76 samples/sec Loss 1.9625 LearningRate 0.0168 Epoch: 11 Global Step: 196960 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:28:33,723-Speed 5157.52 samples/sec Loss 1.8815 LearningRate 0.0168 Epoch: 11 Global Step: 196970 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:28:35,697-Speed 5191.14 samples/sec Loss 1.9692 LearningRate 0.0168 Epoch: 11 Global Step: 196980 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:28:37,672-Speed 5186.97 samples/sec Loss 1.8828 LearningRate 0.0168 Epoch: 11 Global Step: 196990 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:28:39,651-Speed 5175.46 samples/sec Loss 1.9158 LearningRate 0.0168 Epoch: 11 Global Step: 197000 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:28:41,624-Speed 5190.99 samples/sec Loss 1.8958 LearningRate 0.0168 Epoch: 11 Global Step: 197010 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:28:43,596-Speed 5193.86 samples/sec Loss 1.8951 LearningRate 0.0168 Epoch: 11 Global Step: 197020 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:28:45,573-Speed 5181.22 samples/sec Loss 1.9680 LearningRate 0.0168 Epoch: 11 Global Step: 197030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:28:47,549-Speed 5183.81 samples/sec Loss 1.9525 LearningRate 0.0168 Epoch: 11 Global Step: 197040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:28:49,531-Speed 5170.26 samples/sec Loss 1.9939 LearningRate 0.0168 Epoch: 11 Global Step: 197050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:28:51,512-Speed 5170.37 samples/sec Loss 1.9796 LearningRate 0.0168 Epoch: 11 Global Step: 197060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:28:53,482-Speed 5198.51 samples/sec Loss 1.9216 LearningRate 0.0168 Epoch: 11 Global Step: 197070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:28:55,455-Speed 5192.80 samples/sec Loss 1.9520 LearningRate 0.0168 Epoch: 11 Global Step: 197080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:28:57,472-Speed 5077.89 samples/sec Loss 1.8935 LearningRate 0.0168 Epoch: 11 Global Step: 197090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:28:59,462-Speed 5146.60 samples/sec Loss 1.9855 LearningRate 0.0168 Epoch: 11 Global Step: 197100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:29:01,446-Speed 5164.10 samples/sec Loss 1.8595 LearningRate 0.0168 Epoch: 11 Global Step: 197110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:29:03,438-Speed 5141.89 samples/sec Loss 1.8783 LearningRate 0.0168 Epoch: 11 Global Step: 197120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:29:05,423-Speed 5161.92 samples/sec Loss 1.9765 LearningRate 0.0168 Epoch: 11 Global Step: 197130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:29:07,398-Speed 5186.06 samples/sec Loss 1.9162 LearningRate 0.0168 Epoch: 11 Global Step: 197140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:29:09,377-Speed 5174.71 samples/sec Loss 1.8757 LearningRate 0.0168 Epoch: 11 Global Step: 197150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:29:11,364-Speed 5154.46 samples/sec Loss 1.8789 LearningRate 0.0168 Epoch: 11 Global Step: 197160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:29:13,329-Speed 5213.79 samples/sec Loss 1.8920 LearningRate 0.0168 Epoch: 11 Global Step: 197170 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:29:15,317-Speed 5151.93 samples/sec Loss 1.9037 LearningRate 0.0168 Epoch: 11 Global Step: 197180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:29:17,311-Speed 5136.48 samples/sec Loss 1.8642 LearningRate 0.0168 Epoch: 11 Global Step: 197190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:29:19,291-Speed 5175.54 samples/sec Loss 1.8030 LearningRate 0.0167 Epoch: 11 Global Step: 197200 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:29:21,280-Speed 5150.80 samples/sec Loss 1.9197 LearningRate 0.0167 Epoch: 11 Global Step: 197210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:29:23,264-Speed 5163.82 samples/sec Loss 1.8757 LearningRate 0.0167 Epoch: 11 Global Step: 197220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:29:25,255-Speed 5144.19 samples/sec Loss 1.9153 LearningRate 0.0167 Epoch: 11 Global Step: 197230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:29:27,225-Speed 5199.68 samples/sec Loss 1.8969 LearningRate 0.0167 Epoch: 11 Global Step: 197240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:29:29,210-Speed 5160.85 samples/sec Loss 1.9280 LearningRate 0.0167 Epoch: 11 Global Step: 197250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:29:31,184-Speed 5189.05 samples/sec Loss 1.8513 LearningRate 0.0167 Epoch: 11 Global Step: 197260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:29:33,176-Speed 5142.44 samples/sec Loss 1.9012 LearningRate 0.0167 Epoch: 11 Global Step: 197270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:29:35,149-Speed 5191.77 samples/sec Loss 1.9151 LearningRate 0.0167 Epoch: 11 Global Step: 197280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:29:37,120-Speed 5196.38 samples/sec Loss 1.9738 LearningRate 0.0167 Epoch: 11 Global Step: 197290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:29:39,101-Speed 5171.21 samples/sec Loss 1.9202 LearningRate 0.0167 Epoch: 11 Global Step: 197300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:29:41,088-Speed 5155.08 samples/sec Loss 1.9496 LearningRate 0.0167 Epoch: 11 Global Step: 197310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:29:43,059-Speed 5197.91 samples/sec Loss 1.9658 LearningRate 0.0167 Epoch: 11 Global Step: 197320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:29:45,035-Speed 5182.56 samples/sec Loss 1.9158 LearningRate 0.0167 Epoch: 11 Global Step: 197330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:29:47,032-Speed 5129.41 samples/sec Loss 1.9527 LearningRate 0.0167 Epoch: 11 Global Step: 197340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:29:49,009-Speed 5182.64 samples/sec Loss 1.8853 LearningRate 0.0167 Epoch: 11 Global Step: 197350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:29:50,990-Speed 5171.24 samples/sec Loss 1.8788 LearningRate 0.0167 Epoch: 11 Global Step: 197360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:29:52,992-Speed 5114.28 samples/sec Loss 1.9322 LearningRate 0.0167 Epoch: 11 Global Step: 197370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:29:54,969-Speed 5182.68 samples/sec Loss 1.9368 LearningRate 0.0167 Epoch: 11 Global Step: 197380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:29:56,944-Speed 5185.49 samples/sec Loss 1.8896 LearningRate 0.0167 Epoch: 11 Global Step: 197390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:29:58,914-Speed 5201.45 samples/sec Loss 1.9163 LearningRate 0.0167 Epoch: 11 Global Step: 197400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:30:00,906-Speed 5141.31 samples/sec Loss 1.8556 LearningRate 0.0167 Epoch: 11 Global Step: 197410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:30:02,897-Speed 5144.80 samples/sec Loss 1.9171 LearningRate 0.0167 Epoch: 11 Global Step: 197420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:30:04,889-Speed 5143.56 samples/sec Loss 1.9152 LearningRate 0.0167 Epoch: 11 Global Step: 197430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:30:06,868-Speed 5174.60 samples/sec Loss 1.9050 LearningRate 0.0167 Epoch: 11 Global Step: 197440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:30:08,842-Speed 5191.12 samples/sec Loss 1.9134 LearningRate 0.0167 Epoch: 11 Global Step: 197450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:30:10,826-Speed 5160.91 samples/sec Loss 1.9299 LearningRate 0.0167 Epoch: 11 Global Step: 197460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:30:12,804-Speed 5179.42 samples/sec Loss 1.9002 LearningRate 0.0167 Epoch: 11 Global Step: 197470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:30:14,778-Speed 5188.90 samples/sec Loss 1.9431 LearningRate 0.0167 Epoch: 11 Global Step: 197480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:30:16,764-Speed 5158.25 samples/sec Loss 1.8594 LearningRate 0.0167 Epoch: 11 Global Step: 197490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:30:18,753-Speed 5149.02 samples/sec Loss 1.8551 LearningRate 0.0167 Epoch: 11 Global Step: 197500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:30:20,739-Speed 5157.88 samples/sec Loss 1.8760 LearningRate 0.0167 Epoch: 11 Global Step: 197510 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:30:22,731-Speed 5143.33 samples/sec Loss 1.8409 LearningRate 0.0167 Epoch: 11 Global Step: 197520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:30:24,709-Speed 5178.25 samples/sec Loss 1.8994 LearningRate 0.0167 Epoch: 11 Global Step: 197530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:30:26,684-Speed 5187.58 samples/sec Loss 1.9762 LearningRate 0.0167 Epoch: 11 Global Step: 197540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:30:28,683-Speed 5123.66 samples/sec Loss 1.8977 LearningRate 0.0167 Epoch: 11 Global Step: 197550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:30:30,664-Speed 5172.27 samples/sec Loss 1.9110 LearningRate 0.0167 Epoch: 11 Global Step: 197560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:30:32,648-Speed 5162.14 samples/sec Loss 1.9179 LearningRate 0.0167 Epoch: 11 Global Step: 197570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:30:34,624-Speed 5183.35 samples/sec Loss 1.9396 LearningRate 0.0167 Epoch: 11 Global Step: 197580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:30:36,598-Speed 5188.61 samples/sec Loss 1.9041 LearningRate 0.0167 Epoch: 11 Global Step: 197590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:30:38,576-Speed 5180.70 samples/sec Loss 1.8734 LearningRate 0.0167 Epoch: 11 Global Step: 197600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:30:40,576-Speed 5121.63 samples/sec Loss 1.9061 LearningRate 0.0166 Epoch: 11 Global Step: 197610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:30:42,563-Speed 5153.35 samples/sec Loss 1.8936 LearningRate 0.0166 Epoch: 11 Global Step: 197620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:30:44,552-Speed 5149.27 samples/sec Loss 1.9146 LearningRate 0.0166 Epoch: 11 Global Step: 197630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:30:46,533-Speed 5171.33 samples/sec Loss 1.9131 LearningRate 0.0166 Epoch: 11 Global Step: 197640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:30:48,512-Speed 5177.94 samples/sec Loss 1.8806 LearningRate 0.0166 Epoch: 11 Global Step: 197650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:30:50,501-Speed 5150.79 samples/sec Loss 1.8868 LearningRate 0.0166 Epoch: 11 Global Step: 197660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:30:52,481-Speed 5174.34 samples/sec Loss 1.8258 LearningRate 0.0166 Epoch: 11 Global Step: 197670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:30:54,454-Speed 5192.22 samples/sec Loss 1.9310 LearningRate 0.0166 Epoch: 11 Global Step: 197680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:30:56,426-Speed 5192.58 samples/sec Loss 1.8888 LearningRate 0.0166 Epoch: 11 Global Step: 197690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:30:58,399-Speed 5191.83 samples/sec Loss 1.9061 LearningRate 0.0166 Epoch: 11 Global Step: 197700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:31:00,374-Speed 5187.49 samples/sec Loss 1.9035 LearningRate 0.0166 Epoch: 11 Global Step: 197710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:31:02,348-Speed 5189.40 samples/sec Loss 1.8537 LearningRate 0.0166 Epoch: 11 Global Step: 197720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:31:04,310-Speed 5219.32 samples/sec Loss 1.9161 LearningRate 0.0166 Epoch: 11 Global Step: 197730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:31:06,290-Speed 5174.77 samples/sec Loss 1.9166 LearningRate 0.0166 Epoch: 11 Global Step: 197740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:31:08,279-Speed 5149.25 samples/sec Loss 1.9643 LearningRate 0.0166 Epoch: 11 Global Step: 197750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:31:10,255-Speed 5184.11 samples/sec Loss 1.9266 LearningRate 0.0166 Epoch: 11 Global Step: 197760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:31:12,232-Speed 5182.05 samples/sec Loss 1.8879 LearningRate 0.0166 Epoch: 11 Global Step: 197770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:31:14,198-Speed 5209.86 samples/sec Loss 1.9529 LearningRate 0.0166 Epoch: 11 Global Step: 197780 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:31:16,197-Speed 5125.22 samples/sec Loss 1.9605 LearningRate 0.0166 Epoch: 11 Global Step: 197790 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:31:18,176-Speed 5173.85 samples/sec Loss 1.9487 LearningRate 0.0166 Epoch: 11 Global Step: 197800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:31:20,146-Speed 5200.04 samples/sec Loss 1.9029 LearningRate 0.0166 Epoch: 11 Global Step: 197810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:31:22,132-Speed 5159.96 samples/sec Loss 1.9404 LearningRate 0.0166 Epoch: 11 Global Step: 197820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:31:24,118-Speed 5156.49 samples/sec Loss 1.8677 LearningRate 0.0166 Epoch: 11 Global Step: 197830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:31:26,114-Speed 5132.60 samples/sec Loss 1.8904 LearningRate 0.0166 Epoch: 11 Global Step: 197840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:31:28,099-Speed 5159.78 samples/sec Loss 1.8567 LearningRate 0.0166 Epoch: 11 Global Step: 197850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:31:30,086-Speed 5154.11 samples/sec Loss 1.8965 LearningRate 0.0166 Epoch: 11 Global Step: 197860 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:31:32,059-Speed 5194.26 samples/sec Loss 1.9303 LearningRate 0.0166 Epoch: 11 Global Step: 197870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:31:34,029-Speed 5198.71 samples/sec Loss 1.8670 LearningRate 0.0166 Epoch: 11 Global Step: 197880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:31:36,023-Speed 5138.25 samples/sec Loss 1.9224 LearningRate 0.0166 Epoch: 11 Global Step: 197890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:31:38,000-Speed 5179.90 samples/sec Loss 1.9523 LearningRate 0.0166 Epoch: 11 Global Step: 197900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:31:39,986-Speed 5159.13 samples/sec Loss 1.9088 LearningRate 0.0166 Epoch: 11 Global Step: 197910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:31:41,973-Speed 5155.27 samples/sec Loss 1.9107 LearningRate 0.0166 Epoch: 11 Global Step: 197920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:31:43,952-Speed 5176.59 samples/sec Loss 1.9504 LearningRate 0.0166 Epoch: 11 Global Step: 197930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:31:45,930-Speed 5178.25 samples/sec Loss 1.9106 LearningRate 0.0166 Epoch: 11 Global Step: 197940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:31:47,907-Speed 5180.54 samples/sec Loss 1.9199 LearningRate 0.0166 Epoch: 11 Global Step: 197950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:31:49,888-Speed 5170.93 samples/sec Loss 1.8943 LearningRate 0.0166 Epoch: 11 Global Step: 197960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:31:51,869-Speed 5169.71 samples/sec Loss 1.8753 LearningRate 0.0166 Epoch: 11 Global Step: 197970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:31:53,851-Speed 5168.00 samples/sec Loss 1.8817 LearningRate 0.0166 Epoch: 11 Global Step: 197980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:31:55,820-Speed 5204.15 samples/sec Loss 1.9314 LearningRate 0.0166 Epoch: 11 Global Step: 197990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:31:57,814-Speed 5138.97 samples/sec Loss 1.9778 LearningRate 0.0166 Epoch: 11 Global Step: 198000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:32:24,356-[lfw][198000]XNorm: 22.312343 Training: 2022-04-11 12:32:24,357-[lfw][198000]Accuracy-Flip: 0.99800+-0.00256 Training: 2022-04-11 12:32:24,357-[lfw][198000]Accuracy-Highest: 0.99833 Training: 2022-04-11 12:32:55,159-[cfp_fp][198000]XNorm: 21.282661 Training: 2022-04-11 12:32:55,159-[cfp_fp][198000]Accuracy-Flip: 0.98757+-0.00523 Training: 2022-04-11 12:32:55,160-[cfp_fp][198000]Accuracy-Highest: 0.98757 Training: 2022-04-11 12:33:21,671-[agedb_30][198000]XNorm: 22.731196 Training: 2022-04-11 12:33:21,672-[agedb_30][198000]Accuracy-Flip: 0.98117+-0.00775 Training: 2022-04-11 12:33:21,672-[agedb_30][198000]Accuracy-Highest: 0.98250 Training: 2022-04-11 12:33:23,670-Speed 119.27 samples/sec Loss 1.9478 LearningRate 0.0166 Epoch: 11 Global Step: 198010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:33:25,632-Speed 5220.68 samples/sec Loss 1.9281 LearningRate 0.0165 Epoch: 11 Global Step: 198020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:33:27,603-Speed 5198.04 samples/sec Loss 1.8425 LearningRate 0.0165 Epoch: 11 Global Step: 198030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:33:29,566-Speed 5217.55 samples/sec Loss 1.9123 LearningRate 0.0165 Epoch: 11 Global Step: 198040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:33:31,530-Speed 5217.73 samples/sec Loss 1.8647 LearningRate 0.0165 Epoch: 11 Global Step: 198050 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:33:33,505-Speed 5184.44 samples/sec Loss 1.8920 LearningRate 0.0165 Epoch: 11 Global Step: 198060 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:33:35,480-Speed 5187.16 samples/sec Loss 1.9024 LearningRate 0.0165 Epoch: 11 Global Step: 198070 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:33:37,448-Speed 5206.49 samples/sec Loss 1.8446 LearningRate 0.0165 Epoch: 11 Global Step: 198080 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:33:39,413-Speed 5212.07 samples/sec Loss 1.9002 LearningRate 0.0165 Epoch: 11 Global Step: 198090 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:33:41,389-Speed 5184.09 samples/sec Loss 1.9142 LearningRate 0.0165 Epoch: 11 Global Step: 198100 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:33:43,364-Speed 5185.68 samples/sec Loss 1.8763 LearningRate 0.0165 Epoch: 11 Global Step: 198110 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:33:45,330-Speed 5209.74 samples/sec Loss 1.8954 LearningRate 0.0165 Epoch: 11 Global Step: 198120 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:33:47,302-Speed 5195.77 samples/sec Loss 1.9164 LearningRate 0.0165 Epoch: 11 Global Step: 198130 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:33:49,272-Speed 5199.89 samples/sec Loss 1.9516 LearningRate 0.0165 Epoch: 11 Global Step: 198140 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:33:51,253-Speed 5170.47 samples/sec Loss 1.8894 LearningRate 0.0165 Epoch: 11 Global Step: 198150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:33:53,223-Speed 5199.89 samples/sec Loss 1.9325 LearningRate 0.0165 Epoch: 11 Global Step: 198160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:33:55,214-Speed 5145.80 samples/sec Loss 1.9047 LearningRate 0.0165 Epoch: 11 Global Step: 198170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:33:57,185-Speed 5196.80 samples/sec Loss 1.8622 LearningRate 0.0165 Epoch: 11 Global Step: 198180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:33:59,171-Speed 5157.79 samples/sec Loss 1.9338 LearningRate 0.0165 Epoch: 11 Global Step: 198190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:34:01,190-Speed 5072.49 samples/sec Loss 1.8691 LearningRate 0.0165 Epoch: 11 Global Step: 198200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:34:03,179-Speed 5151.83 samples/sec Loss 1.8886 LearningRate 0.0165 Epoch: 11 Global Step: 198210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:34:05,183-Speed 5111.10 samples/sec Loss 1.8839 LearningRate 0.0165 Epoch: 11 Global Step: 198220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:34:07,166-Speed 5164.98 samples/sec Loss 1.8846 LearningRate 0.0165 Epoch: 11 Global Step: 198230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:34:09,132-Speed 5208.80 samples/sec Loss 1.8490 LearningRate 0.0165 Epoch: 11 Global Step: 198240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:34:11,142-Speed 5099.36 samples/sec Loss 1.9467 LearningRate 0.0165 Epoch: 11 Global Step: 198250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:34:13,114-Speed 5193.70 samples/sec Loss 1.9200 LearningRate 0.0165 Epoch: 11 Global Step: 198260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:34:15,072-Speed 5232.46 samples/sec Loss 1.8516 LearningRate 0.0165 Epoch: 11 Global Step: 198270 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:34:17,046-Speed 5188.68 samples/sec Loss 1.8955 LearningRate 0.0165 Epoch: 11 Global Step: 198280 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:34:19,013-Speed 5206.75 samples/sec Loss 1.8635 LearningRate 0.0165 Epoch: 11 Global Step: 198290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:34:20,987-Speed 5189.41 samples/sec Loss 1.8776 LearningRate 0.0165 Epoch: 11 Global Step: 198300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:34:22,958-Speed 5197.14 samples/sec Loss 1.8895 LearningRate 0.0165 Epoch: 11 Global Step: 198310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:34:24,937-Speed 5176.81 samples/sec Loss 1.8953 LearningRate 0.0165 Epoch: 11 Global Step: 198320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:34:26,912-Speed 5186.29 samples/sec Loss 1.8826 LearningRate 0.0165 Epoch: 11 Global Step: 198330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:34:28,880-Speed 5203.31 samples/sec Loss 1.9073 LearningRate 0.0165 Epoch: 11 Global Step: 198340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:34:30,853-Speed 5190.93 samples/sec Loss 1.9293 LearningRate 0.0165 Epoch: 11 Global Step: 198350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:34:32,830-Speed 5181.75 samples/sec Loss 1.8983 LearningRate 0.0165 Epoch: 11 Global Step: 198360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:34:34,802-Speed 5195.44 samples/sec Loss 1.9288 LearningRate 0.0165 Epoch: 11 Global Step: 198370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:34:36,776-Speed 5191.43 samples/sec Loss 1.7946 LearningRate 0.0165 Epoch: 11 Global Step: 198380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:34:38,764-Speed 5152.52 samples/sec Loss 1.9462 LearningRate 0.0165 Epoch: 11 Global Step: 198390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:34:40,739-Speed 5186.62 samples/sec Loss 1.9027 LearningRate 0.0165 Epoch: 11 Global Step: 198400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:34:42,707-Speed 5204.84 samples/sec Loss 1.8944 LearningRate 0.0165 Epoch: 11 Global Step: 198410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:34:44,690-Speed 5164.92 samples/sec Loss 1.8124 LearningRate 0.0165 Epoch: 11 Global Step: 198420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:34:46,676-Speed 5156.29 samples/sec Loss 1.9245 LearningRate 0.0164 Epoch: 11 Global Step: 198430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:34:48,652-Speed 5185.67 samples/sec Loss 1.8966 LearningRate 0.0164 Epoch: 11 Global Step: 198440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:34:50,627-Speed 5185.95 samples/sec Loss 1.8024 LearningRate 0.0164 Epoch: 11 Global Step: 198450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:34:52,625-Speed 5124.94 samples/sec Loss 1.8784 LearningRate 0.0164 Epoch: 11 Global Step: 198460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:34:54,606-Speed 5173.18 samples/sec Loss 1.7926 LearningRate 0.0164 Epoch: 11 Global Step: 198470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:34:56,587-Speed 5171.35 samples/sec Loss 1.9079 LearningRate 0.0164 Epoch: 11 Global Step: 198480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:34:58,555-Speed 5203.84 samples/sec Loss 1.9207 LearningRate 0.0164 Epoch: 11 Global Step: 198490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:00,532-Speed 5181.07 samples/sec Loss 1.8585 LearningRate 0.0164 Epoch: 11 Global Step: 198500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:02,515-Speed 5167.01 samples/sec Loss 1.9719 LearningRate 0.0164 Epoch: 11 Global Step: 198510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:04,486-Speed 5196.15 samples/sec Loss 1.9118 LearningRate 0.0164 Epoch: 11 Global Step: 198520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:06,453-Speed 5209.42 samples/sec Loss 1.9277 LearningRate 0.0164 Epoch: 11 Global Step: 198530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:08,428-Speed 5186.02 samples/sec Loss 1.8618 LearningRate 0.0164 Epoch: 11 Global Step: 198540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:10,398-Speed 5200.91 samples/sec Loss 1.8689 LearningRate 0.0164 Epoch: 11 Global Step: 198550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:12,390-Speed 5142.02 samples/sec Loss 1.9545 LearningRate 0.0164 Epoch: 11 Global Step: 198560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:14,372-Speed 5165.86 samples/sec Loss 1.9726 LearningRate 0.0164 Epoch: 11 Global Step: 198570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:16,356-Speed 5163.34 samples/sec Loss 1.8407 LearningRate 0.0164 Epoch: 11 Global Step: 198580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:18,346-Speed 5149.34 samples/sec Loss 1.9142 LearningRate 0.0164 Epoch: 11 Global Step: 198590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:20,312-Speed 5208.50 samples/sec Loss 1.8559 LearningRate 0.0164 Epoch: 11 Global Step: 198600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:22,285-Speed 5191.92 samples/sec Loss 1.9531 LearningRate 0.0164 Epoch: 11 Global Step: 198610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:24,255-Speed 5201.32 samples/sec Loss 1.8233 LearningRate 0.0164 Epoch: 11 Global Step: 198620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:26,223-Speed 5204.53 samples/sec Loss 1.8822 LearningRate 0.0164 Epoch: 11 Global Step: 198630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:28,206-Speed 5164.66 samples/sec Loss 1.9025 LearningRate 0.0164 Epoch: 11 Global Step: 198640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:30,180-Speed 5189.80 samples/sec Loss 1.8941 LearningRate 0.0164 Epoch: 11 Global Step: 198650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:32,150-Speed 5199.93 samples/sec Loss 1.9258 LearningRate 0.0164 Epoch: 11 Global Step: 198660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:34,124-Speed 5189.88 samples/sec Loss 1.9546 LearningRate 0.0164 Epoch: 11 Global Step: 198670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:36,102-Speed 5177.66 samples/sec Loss 1.9356 LearningRate 0.0164 Epoch: 11 Global Step: 198680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:38,073-Speed 5197.95 samples/sec Loss 1.8995 LearningRate 0.0164 Epoch: 11 Global Step: 198690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:40,077-Speed 5111.46 samples/sec Loss 1.8454 LearningRate 0.0164 Epoch: 11 Global Step: 198700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:42,046-Speed 5203.63 samples/sec Loss 1.8871 LearningRate 0.0164 Epoch: 11 Global Step: 198710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:44,022-Speed 5181.87 samples/sec Loss 1.8962 LearningRate 0.0164 Epoch: 11 Global Step: 198720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:46,021-Speed 5123.73 samples/sec Loss 1.8561 LearningRate 0.0164 Epoch: 11 Global Step: 198730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:47,989-Speed 5204.96 samples/sec Loss 1.8640 LearningRate 0.0164 Epoch: 11 Global Step: 198740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:49,962-Speed 5193.11 samples/sec Loss 1.9010 LearningRate 0.0164 Epoch: 11 Global Step: 198750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:51,933-Speed 5195.61 samples/sec Loss 1.9002 LearningRate 0.0164 Epoch: 11 Global Step: 198760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:53,913-Speed 5174.71 samples/sec Loss 1.8960 LearningRate 0.0164 Epoch: 11 Global Step: 198770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:55,902-Speed 5150.53 samples/sec Loss 1.8729 LearningRate 0.0164 Epoch: 11 Global Step: 198780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:35:57,879-Speed 5181.85 samples/sec Loss 1.8730 LearningRate 0.0164 Epoch: 11 Global Step: 198790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:35:59,856-Speed 5179.77 samples/sec Loss 1.8691 LearningRate 0.0164 Epoch: 11 Global Step: 198800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:36:01,854-Speed 5126.89 samples/sec Loss 1.8902 LearningRate 0.0164 Epoch: 11 Global Step: 198810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:36:03,840-Speed 5158.08 samples/sec Loss 1.9083 LearningRate 0.0164 Epoch: 11 Global Step: 198820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:36:05,805-Speed 5212.69 samples/sec Loss 1.9180 LearningRate 0.0164 Epoch: 11 Global Step: 198830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:36:07,793-Speed 5152.68 samples/sec Loss 1.9956 LearningRate 0.0163 Epoch: 11 Global Step: 198840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:36:09,760-Speed 5207.49 samples/sec Loss 1.9174 LearningRate 0.0163 Epoch: 11 Global Step: 198850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:36:11,745-Speed 5162.13 samples/sec Loss 1.8997 LearningRate 0.0163 Epoch: 11 Global Step: 198860 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:36:13,712-Speed 5208.64 samples/sec Loss 1.8679 LearningRate 0.0163 Epoch: 11 Global Step: 198870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:36:15,688-Speed 5182.98 samples/sec Loss 1.9383 LearningRate 0.0163 Epoch: 11 Global Step: 198880 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:36:17,662-Speed 5189.64 samples/sec Loss 1.9699 LearningRate 0.0163 Epoch: 11 Global Step: 198890 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:36:19,635-Speed 5192.33 samples/sec Loss 1.9171 LearningRate 0.0163 Epoch: 11 Global Step: 198900 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:36:21,618-Speed 5164.39 samples/sec Loss 1.9567 LearningRate 0.0163 Epoch: 11 Global Step: 198910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:36:23,621-Speed 5114.54 samples/sec Loss 1.8712 LearningRate 0.0163 Epoch: 11 Global Step: 198920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:36:25,606-Speed 5159.75 samples/sec Loss 1.9612 LearningRate 0.0163 Epoch: 11 Global Step: 198930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:36:27,591-Speed 5161.76 samples/sec Loss 1.8690 LearningRate 0.0163 Epoch: 11 Global Step: 198940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:36:29,581-Speed 5145.02 samples/sec Loss 1.8866 LearningRate 0.0163 Epoch: 11 Global Step: 198950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:36:31,551-Speed 5202.00 samples/sec Loss 1.8572 LearningRate 0.0163 Epoch: 11 Global Step: 198960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:36:33,525-Speed 5189.65 samples/sec Loss 1.8941 LearningRate 0.0163 Epoch: 11 Global Step: 198970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:36:35,536-Speed 5093.32 samples/sec Loss 1.8961 LearningRate 0.0163 Epoch: 11 Global Step: 198980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:36:37,523-Speed 5155.19 samples/sec Loss 1.8625 LearningRate 0.0163 Epoch: 11 Global Step: 198990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:36:39,505-Speed 5168.47 samples/sec Loss 1.8674 LearningRate 0.0163 Epoch: 11 Global Step: 199000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:36:41,477-Speed 5193.84 samples/sec Loss 1.9077 LearningRate 0.0163 Epoch: 11 Global Step: 199010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:36:43,444-Speed 5208.62 samples/sec Loss 1.8848 LearningRate 0.0163 Epoch: 11 Global Step: 199020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:36:45,440-Speed 5131.21 samples/sec Loss 1.9056 LearningRate 0.0163 Epoch: 11 Global Step: 199030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:36:47,417-Speed 5181.85 samples/sec Loss 1.8592 LearningRate 0.0163 Epoch: 11 Global Step: 199040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:36:49,413-Speed 5129.59 samples/sec Loss 1.8129 LearningRate 0.0163 Epoch: 11 Global Step: 199050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:36:51,408-Speed 5136.41 samples/sec Loss 1.9172 LearningRate 0.0163 Epoch: 11 Global Step: 199060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:36:53,379-Speed 5195.31 samples/sec Loss 1.9010 LearningRate 0.0163 Epoch: 11 Global Step: 199070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:36:55,364-Speed 5163.37 samples/sec Loss 1.8625 LearningRate 0.0163 Epoch: 11 Global Step: 199080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:36:57,349-Speed 5161.03 samples/sec Loss 1.9131 LearningRate 0.0163 Epoch: 11 Global Step: 199090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:36:59,344-Speed 5134.21 samples/sec Loss 1.9248 LearningRate 0.0163 Epoch: 11 Global Step: 199100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:37:01,318-Speed 5189.16 samples/sec Loss 1.8605 LearningRate 0.0163 Epoch: 11 Global Step: 199110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:37:03,286-Speed 5204.56 samples/sec Loss 1.8889 LearningRate 0.0163 Epoch: 11 Global Step: 199120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:37:05,278-Speed 5143.04 samples/sec Loss 1.8825 LearningRate 0.0163 Epoch: 11 Global Step: 199130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:37:07,252-Speed 5189.21 samples/sec Loss 1.9176 LearningRate 0.0163 Epoch: 11 Global Step: 199140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:37:09,226-Speed 5187.72 samples/sec Loss 1.8847 LearningRate 0.0163 Epoch: 11 Global Step: 199150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:37:11,215-Speed 5151.24 samples/sec Loss 1.8907 LearningRate 0.0163 Epoch: 11 Global Step: 199160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:37:13,188-Speed 5191.92 samples/sec Loss 1.9349 LearningRate 0.0163 Epoch: 11 Global Step: 199170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:37:15,173-Speed 5159.94 samples/sec Loss 1.8559 LearningRate 0.0163 Epoch: 11 Global Step: 199180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:37:17,170-Speed 5129.82 samples/sec Loss 1.9116 LearningRate 0.0163 Epoch: 11 Global Step: 199190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:37:19,149-Speed 5176.43 samples/sec Loss 1.9541 LearningRate 0.0163 Epoch: 11 Global Step: 199200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:37:21,118-Speed 5201.91 samples/sec Loss 1.8482 LearningRate 0.0163 Epoch: 11 Global Step: 199210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:37:23,122-Speed 5111.67 samples/sec Loss 1.9467 LearningRate 0.0163 Epoch: 11 Global Step: 199220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:37:25,101-Speed 5175.42 samples/sec Loss 1.8910 LearningRate 0.0163 Epoch: 11 Global Step: 199230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:37:27,090-Speed 5151.33 samples/sec Loss 1.9356 LearningRate 0.0163 Epoch: 11 Global Step: 199240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:37:29,083-Speed 5137.90 samples/sec Loss 1.8815 LearningRate 0.0163 Epoch: 11 Global Step: 199250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:37:31,057-Speed 5189.22 samples/sec Loss 1.9892 LearningRate 0.0162 Epoch: 11 Global Step: 199260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:37:33,024-Speed 5210.08 samples/sec Loss 1.9939 LearningRate 0.0162 Epoch: 11 Global Step: 199270 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:37:34,998-Speed 5188.51 samples/sec Loss 1.9431 LearningRate 0.0162 Epoch: 11 Global Step: 199280 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:37:37,000-Speed 5117.18 samples/sec Loss 1.8919 LearningRate 0.0162 Epoch: 11 Global Step: 199290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:37:39,013-Speed 5086.95 samples/sec Loss 1.8612 LearningRate 0.0162 Epoch: 11 Global Step: 199300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:37:40,992-Speed 5178.39 samples/sec Loss 1.8846 LearningRate 0.0162 Epoch: 11 Global Step: 199310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:37:42,963-Speed 5196.86 samples/sec Loss 1.8624 LearningRate 0.0162 Epoch: 11 Global Step: 199320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:37:44,953-Speed 5145.84 samples/sec Loss 1.9241 LearningRate 0.0162 Epoch: 11 Global Step: 199330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:37:46,933-Speed 5174.74 samples/sec Loss 1.8617 LearningRate 0.0162 Epoch: 11 Global Step: 199340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:37:48,937-Speed 5111.34 samples/sec Loss 1.8690 LearningRate 0.0162 Epoch: 11 Global Step: 199350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:37:50,909-Speed 5193.73 samples/sec Loss 1.9335 LearningRate 0.0162 Epoch: 11 Global Step: 199360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:37:52,913-Speed 5112.26 samples/sec Loss 1.8920 LearningRate 0.0162 Epoch: 11 Global Step: 199370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:37:54,887-Speed 5190.49 samples/sec Loss 1.8792 LearningRate 0.0162 Epoch: 11 Global Step: 199380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:37:56,869-Speed 5166.08 samples/sec Loss 1.8869 LearningRate 0.0162 Epoch: 11 Global Step: 199390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:37:58,852-Speed 5166.84 samples/sec Loss 1.9353 LearningRate 0.0162 Epoch: 11 Global Step: 199400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:38:00,827-Speed 5187.04 samples/sec Loss 1.8952 LearningRate 0.0162 Epoch: 11 Global Step: 199410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:38:02,813-Speed 5157.08 samples/sec Loss 1.8603 LearningRate 0.0162 Epoch: 11 Global Step: 199420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:38:04,786-Speed 5192.94 samples/sec Loss 1.9296 LearningRate 0.0162 Epoch: 11 Global Step: 199430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:38:06,772-Speed 5156.96 samples/sec Loss 1.8771 LearningRate 0.0162 Epoch: 11 Global Step: 199440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:38:08,755-Speed 5165.01 samples/sec Loss 1.8802 LearningRate 0.0162 Epoch: 11 Global Step: 199450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:38:10,737-Speed 5169.86 samples/sec Loss 1.8989 LearningRate 0.0162 Epoch: 11 Global Step: 199460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:38:12,712-Speed 5185.57 samples/sec Loss 1.8962 LearningRate 0.0162 Epoch: 11 Global Step: 199470 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:38:14,683-Speed 5196.74 samples/sec Loss 1.9060 LearningRate 0.0162 Epoch: 11 Global Step: 199480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:38:16,657-Speed 5189.82 samples/sec Loss 1.9087 LearningRate 0.0162 Epoch: 11 Global Step: 199490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:38:18,636-Speed 5175.27 samples/sec Loss 1.8803 LearningRate 0.0162 Epoch: 11 Global Step: 199500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:38:20,629-Speed 5140.25 samples/sec Loss 1.9385 LearningRate 0.0162 Epoch: 11 Global Step: 199510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:38:22,626-Speed 5129.03 samples/sec Loss 1.8995 LearningRate 0.0162 Epoch: 11 Global Step: 199520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:38:24,614-Speed 5153.91 samples/sec Loss 1.9018 LearningRate 0.0162 Epoch: 11 Global Step: 199530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:38:26,583-Speed 5199.63 samples/sec Loss 1.8784 LearningRate 0.0162 Epoch: 11 Global Step: 199540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:38:28,564-Speed 5172.64 samples/sec Loss 1.8504 LearningRate 0.0162 Epoch: 11 Global Step: 199550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:38:30,542-Speed 5177.25 samples/sec Loss 1.9431 LearningRate 0.0162 Epoch: 11 Global Step: 199560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:38:32,527-Speed 5160.93 samples/sec Loss 1.9115 LearningRate 0.0162 Epoch: 11 Global Step: 199570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:38:34,504-Speed 5184.08 samples/sec Loss 1.8316 LearningRate 0.0162 Epoch: 11 Global Step: 199580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:38:36,480-Speed 5183.75 samples/sec Loss 1.8736 LearningRate 0.0162 Epoch: 11 Global Step: 199590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:38:38,450-Speed 5198.95 samples/sec Loss 1.9111 LearningRate 0.0162 Epoch: 11 Global Step: 199600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:38:40,428-Speed 5179.00 samples/sec Loss 1.8936 LearningRate 0.0162 Epoch: 11 Global Step: 199610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:38:42,411-Speed 5165.31 samples/sec Loss 1.8592 LearningRate 0.0162 Epoch: 11 Global Step: 199620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:38:44,407-Speed 5130.21 samples/sec Loss 1.8706 LearningRate 0.0162 Epoch: 11 Global Step: 199630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:38:46,402-Speed 5135.62 samples/sec Loss 1.9192 LearningRate 0.0162 Epoch: 11 Global Step: 199640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:38:48,427-Speed 5058.88 samples/sec Loss 1.9337 LearningRate 0.0162 Epoch: 11 Global Step: 199650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:38:50,410-Speed 5163.59 samples/sec Loss 1.8677 LearningRate 0.0162 Epoch: 11 Global Step: 199660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:38:52,382-Speed 5195.95 samples/sec Loss 1.9150 LearningRate 0.0161 Epoch: 11 Global Step: 199670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:38:54,370-Speed 5154.65 samples/sec Loss 1.8855 LearningRate 0.0161 Epoch: 11 Global Step: 199680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:38:56,337-Speed 5207.49 samples/sec Loss 1.8348 LearningRate 0.0161 Epoch: 11 Global Step: 199690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:38:58,314-Speed 5181.06 samples/sec Loss 1.8610 LearningRate 0.0161 Epoch: 11 Global Step: 199700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:39:00,306-Speed 5142.03 samples/sec Loss 1.9190 LearningRate 0.0161 Epoch: 11 Global Step: 199710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:39:02,280-Speed 5190.13 samples/sec Loss 1.9219 LearningRate 0.0161 Epoch: 11 Global Step: 199720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:39:04,267-Speed 5154.57 samples/sec Loss 1.9413 LearningRate 0.0161 Epoch: 11 Global Step: 199730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:39:06,253-Speed 5158.18 samples/sec Loss 1.8991 LearningRate 0.0161 Epoch: 11 Global Step: 199740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:39:08,228-Speed 5185.74 samples/sec Loss 1.9116 LearningRate 0.0161 Epoch: 11 Global Step: 199750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:39:10,206-Speed 5177.76 samples/sec Loss 1.8496 LearningRate 0.0161 Epoch: 11 Global Step: 199760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:39:12,187-Speed 5172.04 samples/sec Loss 1.9356 LearningRate 0.0161 Epoch: 11 Global Step: 199770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:39:14,185-Speed 5128.15 samples/sec Loss 1.8727 LearningRate 0.0161 Epoch: 11 Global Step: 199780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:39:16,181-Speed 5132.00 samples/sec Loss 1.9089 LearningRate 0.0161 Epoch: 11 Global Step: 199790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:39:18,159-Speed 5179.19 samples/sec Loss 1.8560 LearningRate 0.0161 Epoch: 11 Global Step: 199800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:39:20,154-Speed 5134.31 samples/sec Loss 1.9081 LearningRate 0.0161 Epoch: 11 Global Step: 199810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:39:22,128-Speed 5187.35 samples/sec Loss 1.9368 LearningRate 0.0161 Epoch: 11 Global Step: 199820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:39:24,105-Speed 5182.19 samples/sec Loss 1.9084 LearningRate 0.0161 Epoch: 11 Global Step: 199830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:39:26,104-Speed 5126.13 samples/sec Loss 1.9290 LearningRate 0.0161 Epoch: 11 Global Step: 199840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:39:28,096-Speed 5140.26 samples/sec Loss 1.9339 LearningRate 0.0161 Epoch: 11 Global Step: 199850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:39:30,086-Speed 5146.86 samples/sec Loss 1.8895 LearningRate 0.0161 Epoch: 11 Global Step: 199860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:39:32,090-Speed 5110.95 samples/sec Loss 1.8911 LearningRate 0.0161 Epoch: 11 Global Step: 199870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:39:34,068-Speed 5180.53 samples/sec Loss 1.8666 LearningRate 0.0161 Epoch: 11 Global Step: 199880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:39:36,054-Speed 5156.51 samples/sec Loss 1.8363 LearningRate 0.0161 Epoch: 11 Global Step: 199890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:39:38,047-Speed 5141.19 samples/sec Loss 1.8141 LearningRate 0.0161 Epoch: 11 Global Step: 199900 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:39:40,034-Speed 5156.59 samples/sec Loss 1.8465 LearningRate 0.0161 Epoch: 11 Global Step: 199910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:39:42,005-Speed 5196.23 samples/sec Loss 1.8908 LearningRate 0.0161 Epoch: 11 Global Step: 199920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:39:43,983-Speed 5179.80 samples/sec Loss 1.8745 LearningRate 0.0161 Epoch: 11 Global Step: 199930 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:39:45,947-Speed 5214.43 samples/sec Loss 1.9349 LearningRate 0.0161 Epoch: 11 Global Step: 199940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:39:47,931-Speed 5163.27 samples/sec Loss 1.8996 LearningRate 0.0161 Epoch: 11 Global Step: 199950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:39:49,925-Speed 5136.24 samples/sec Loss 1.8609 LearningRate 0.0161 Epoch: 11 Global Step: 199960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:39:51,898-Speed 5190.30 samples/sec Loss 1.9256 LearningRate 0.0161 Epoch: 11 Global Step: 199970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:39:53,878-Speed 5175.57 samples/sec Loss 1.8978 LearningRate 0.0161 Epoch: 11 Global Step: 199980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:39:55,862-Speed 5163.06 samples/sec Loss 1.9325 LearningRate 0.0161 Epoch: 11 Global Step: 199990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:39:57,855-Speed 5138.24 samples/sec Loss 1.9303 LearningRate 0.0161 Epoch: 11 Global Step: 200000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:40:24,448-[lfw][200000]XNorm: 21.696615 Training: 2022-04-11 12:40:24,449-[lfw][200000]Accuracy-Flip: 0.99783+-0.00269 Training: 2022-04-11 12:40:24,449-[lfw][200000]Accuracy-Highest: 0.99833 Training: 2022-04-11 12:40:55,170-[cfp_fp][200000]XNorm: 21.012344 Training: 2022-04-11 12:40:55,171-[cfp_fp][200000]Accuracy-Flip: 0.98514+-0.00368 Training: 2022-04-11 12:40:55,171-[cfp_fp][200000]Accuracy-Highest: 0.98757 Training: 2022-04-11 12:41:21,696-[agedb_30][200000]XNorm: 22.141549 Training: 2022-04-11 12:41:21,696-[agedb_30][200000]Accuracy-Flip: 0.97900+-0.00772 Training: 2022-04-11 12:41:21,696-[agedb_30][200000]Accuracy-Highest: 0.98250 Training: 2022-04-11 12:41:23,681-Speed 119.31 samples/sec Loss 1.9173 LearningRate 0.0161 Epoch: 11 Global Step: 200010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:41:25,643-Speed 5219.53 samples/sec Loss 1.9157 LearningRate 0.0161 Epoch: 11 Global Step: 200020 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:41:27,606-Speed 5218.63 samples/sec Loss 1.9353 LearningRate 0.0161 Epoch: 11 Global Step: 200030 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:41:29,571-Speed 5214.03 samples/sec Loss 1.9654 LearningRate 0.0161 Epoch: 11 Global Step: 200040 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:41:31,539-Speed 5204.47 samples/sec Loss 1.9372 LearningRate 0.0161 Epoch: 11 Global Step: 200050 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:41:33,509-Speed 5198.92 samples/sec Loss 1.9303 LearningRate 0.0161 Epoch: 11 Global Step: 200060 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:41:35,484-Speed 5186.03 samples/sec Loss 1.8185 LearningRate 0.0161 Epoch: 11 Global Step: 200070 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:41:37,489-Speed 5109.65 samples/sec Loss 1.9089 LearningRate 0.0161 Epoch: 11 Global Step: 200080 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:41:39,465-Speed 5184.25 samples/sec Loss 1.8892 LearningRate 0.0160 Epoch: 11 Global Step: 200090 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:41:41,431-Speed 5210.44 samples/sec Loss 1.8976 LearningRate 0.0160 Epoch: 11 Global Step: 200100 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:41:43,393-Speed 5221.20 samples/sec Loss 1.8815 LearningRate 0.0160 Epoch: 11 Global Step: 200110 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:41:45,359-Speed 5209.49 samples/sec Loss 1.8642 LearningRate 0.0160 Epoch: 11 Global Step: 200120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:41:47,326-Speed 5207.48 samples/sec Loss 1.9277 LearningRate 0.0160 Epoch: 11 Global Step: 200130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:41:49,304-Speed 5179.62 samples/sec Loss 1.8692 LearningRate 0.0160 Epoch: 11 Global Step: 200140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:41:51,284-Speed 5172.78 samples/sec Loss 1.8782 LearningRate 0.0160 Epoch: 11 Global Step: 200150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:41:53,272-Speed 5152.86 samples/sec Loss 1.9289 LearningRate 0.0160 Epoch: 11 Global Step: 200160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:41:55,242-Speed 5200.14 samples/sec Loss 1.9927 LearningRate 0.0160 Epoch: 11 Global Step: 200170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:41:57,215-Speed 5191.26 samples/sec Loss 1.9341 LearningRate 0.0160 Epoch: 11 Global Step: 200180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:41:59,188-Speed 5192.11 samples/sec Loss 1.9176 LearningRate 0.0160 Epoch: 11 Global Step: 200190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:42:01,172-Speed 5163.45 samples/sec Loss 1.8835 LearningRate 0.0160 Epoch: 11 Global Step: 200200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:42:03,147-Speed 5185.81 samples/sec Loss 2.0073 LearningRate 0.0160 Epoch: 11 Global Step: 200210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:42:05,126-Speed 5176.95 samples/sec Loss 1.8758 LearningRate 0.0160 Epoch: 11 Global Step: 200220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:42:07,112-Speed 5156.12 samples/sec Loss 1.8426 LearningRate 0.0160 Epoch: 11 Global Step: 200230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:42:09,106-Speed 5136.89 samples/sec Loss 1.8586 LearningRate 0.0160 Epoch: 11 Global Step: 200240 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:42:11,086-Speed 5173.00 samples/sec Loss 1.8627 LearningRate 0.0160 Epoch: 11 Global Step: 200250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:42:13,065-Speed 5177.87 samples/sec Loss 1.9516 LearningRate 0.0160 Epoch: 11 Global Step: 200260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:42:15,047-Speed 5170.54 samples/sec Loss 1.8754 LearningRate 0.0160 Epoch: 11 Global Step: 200270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:42:17,023-Speed 5182.36 samples/sec Loss 1.9515 LearningRate 0.0160 Epoch: 11 Global Step: 200280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:42:19,218-Speed 4667.95 samples/sec Loss 1.9030 LearningRate 0.0160 Epoch: 11 Global Step: 200290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:42:48,879-Speed 345.24 samples/sec Loss 1.5207 LearningRate 0.0160 Epoch: 12 Global Step: 200300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:42:51,130-Speed 4551.68 samples/sec Loss 1.4002 LearningRate 0.0160 Epoch: 12 Global Step: 200310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:42:53,819-Speed 3808.50 samples/sec Loss 1.3804 LearningRate 0.0160 Epoch: 12 Global Step: 200320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:42:55,804-Speed 5160.71 samples/sec Loss 1.4001 LearningRate 0.0160 Epoch: 12 Global Step: 200330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:42:57,790-Speed 5159.01 samples/sec Loss 1.4120 LearningRate 0.0160 Epoch: 12 Global Step: 200340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:42:59,883-Speed 4892.93 samples/sec Loss 1.4201 LearningRate 0.0160 Epoch: 12 Global Step: 200350 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:43:01,869-Speed 5156.52 samples/sec Loss 1.3664 LearningRate 0.0160 Epoch: 12 Global Step: 200360 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:43:04,554-Speed 3815.12 samples/sec Loss 1.3583 LearningRate 0.0160 Epoch: 12 Global Step: 200370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:43:06,525-Speed 5198.17 samples/sec Loss 1.3616 LearningRate 0.0160 Epoch: 12 Global Step: 200380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:43:08,526-Speed 5120.28 samples/sec Loss 1.3640 LearningRate 0.0160 Epoch: 12 Global Step: 200390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:43:10,493-Speed 5208.10 samples/sec Loss 1.3752 LearningRate 0.0160 Epoch: 12 Global Step: 200400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:43:12,487-Speed 5137.01 samples/sec Loss 1.3884 LearningRate 0.0160 Epoch: 12 Global Step: 200410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:43:14,468-Speed 5170.69 samples/sec Loss 1.3644 LearningRate 0.0160 Epoch: 12 Global Step: 200420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:43:16,471-Speed 5112.84 samples/sec Loss 1.3360 LearningRate 0.0160 Epoch: 12 Global Step: 200430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:43:18,447-Speed 5184.08 samples/sec Loss 1.3611 LearningRate 0.0160 Epoch: 12 Global Step: 200440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:43:20,425-Speed 5179.06 samples/sec Loss 1.4338 LearningRate 0.0160 Epoch: 12 Global Step: 200450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:43:22,413-Speed 5151.52 samples/sec Loss 1.3668 LearningRate 0.0160 Epoch: 12 Global Step: 200460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:43:24,381-Speed 5206.66 samples/sec Loss 1.3991 LearningRate 0.0160 Epoch: 12 Global Step: 200470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:43:26,369-Speed 5153.12 samples/sec Loss 1.3483 LearningRate 0.0160 Epoch: 12 Global Step: 200480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:43:28,360-Speed 5144.10 samples/sec Loss 1.3525 LearningRate 0.0160 Epoch: 12 Global Step: 200490 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:43:30,329-Speed 5203.53 samples/sec Loss 1.3567 LearningRate 0.0160 Epoch: 12 Global Step: 200500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:43:32,306-Speed 5182.35 samples/sec Loss 1.3569 LearningRate 0.0159 Epoch: 12 Global Step: 200510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:43:34,416-Speed 4854.49 samples/sec Loss 1.4132 LearningRate 0.0159 Epoch: 12 Global Step: 200520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:43:36,410-Speed 5136.32 samples/sec Loss 1.4115 LearningRate 0.0159 Epoch: 12 Global Step: 200530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:43:38,413-Speed 5115.08 samples/sec Loss 1.3859 LearningRate 0.0159 Epoch: 12 Global Step: 200540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:43:40,393-Speed 5175.27 samples/sec Loss 1.3836 LearningRate 0.0159 Epoch: 12 Global Step: 200550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:43:42,373-Speed 5173.32 samples/sec Loss 1.3858 LearningRate 0.0159 Epoch: 12 Global Step: 200560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:43:44,348-Speed 5185.46 samples/sec Loss 1.3441 LearningRate 0.0159 Epoch: 12 Global Step: 200570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:43:46,345-Speed 5132.10 samples/sec Loss 1.3695 LearningRate 0.0159 Epoch: 12 Global Step: 200580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:43:48,325-Speed 5171.18 samples/sec Loss 1.4124 LearningRate 0.0159 Epoch: 12 Global Step: 200590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:43:50,310-Speed 5162.26 samples/sec Loss 1.3912 LearningRate 0.0159 Epoch: 12 Global Step: 200600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:43:52,302-Speed 5140.76 samples/sec Loss 1.4154 LearningRate 0.0159 Epoch: 12 Global Step: 200610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:43:54,286-Speed 5163.76 samples/sec Loss 1.3879 LearningRate 0.0159 Epoch: 12 Global Step: 200620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:43:56,270-Speed 5162.60 samples/sec Loss 1.4020 LearningRate 0.0159 Epoch: 12 Global Step: 200630 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:43:58,262-Speed 5142.70 samples/sec Loss 1.3507 LearningRate 0.0159 Epoch: 12 Global Step: 200640 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:44:00,233-Speed 5197.15 samples/sec Loss 1.3695 LearningRate 0.0159 Epoch: 12 Global Step: 200650 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:44:02,208-Speed 5185.01 samples/sec Loss 1.3941 LearningRate 0.0159 Epoch: 12 Global Step: 200660 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:44:04,195-Speed 5157.94 samples/sec Loss 1.3486 LearningRate 0.0159 Epoch: 12 Global Step: 200670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:44:06,183-Speed 5150.80 samples/sec Loss 1.3739 LearningRate 0.0159 Epoch: 12 Global Step: 200680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:44:08,163-Speed 5174.25 samples/sec Loss 1.4026 LearningRate 0.0159 Epoch: 12 Global Step: 200690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:44:10,136-Speed 5193.00 samples/sec Loss 1.3616 LearningRate 0.0159 Epoch: 12 Global Step: 200700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:44:12,121-Speed 5160.25 samples/sec Loss 1.4004 LearningRate 0.0159 Epoch: 12 Global Step: 200710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:44:14,108-Speed 5153.94 samples/sec Loss 1.4085 LearningRate 0.0159 Epoch: 12 Global Step: 200720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:44:16,089-Speed 5172.85 samples/sec Loss 1.4280 LearningRate 0.0159 Epoch: 12 Global Step: 200730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:44:18,064-Speed 5184.67 samples/sec Loss 1.4270 LearningRate 0.0159 Epoch: 12 Global Step: 200740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:44:20,059-Speed 5134.64 samples/sec Loss 1.3818 LearningRate 0.0159 Epoch: 12 Global Step: 200750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:44:22,064-Speed 5108.70 samples/sec Loss 1.4329 LearningRate 0.0159 Epoch: 12 Global Step: 200760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:44:24,092-Speed 5052.02 samples/sec Loss 1.3735 LearningRate 0.0159 Epoch: 12 Global Step: 200770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:44:26,091-Speed 5124.22 samples/sec Loss 1.3358 LearningRate 0.0159 Epoch: 12 Global Step: 200780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:44:28,095-Speed 5111.62 samples/sec Loss 1.3791 LearningRate 0.0159 Epoch: 12 Global Step: 200790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:44:30,082-Speed 5154.84 samples/sec Loss 1.3812 LearningRate 0.0159 Epoch: 12 Global Step: 200800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:44:32,057-Speed 5184.80 samples/sec Loss 1.3706 LearningRate 0.0159 Epoch: 12 Global Step: 200810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:44:34,039-Speed 5168.65 samples/sec Loss 1.4086 LearningRate 0.0159 Epoch: 12 Global Step: 200820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:44:36,041-Speed 5117.86 samples/sec Loss 1.4578 LearningRate 0.0159 Epoch: 12 Global Step: 200830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:44:38,017-Speed 5183.79 samples/sec Loss 1.3604 LearningRate 0.0159 Epoch: 12 Global Step: 200840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:44:40,011-Speed 5137.06 samples/sec Loss 1.3824 LearningRate 0.0159 Epoch: 12 Global Step: 200850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:44:41,992-Speed 5172.08 samples/sec Loss 1.4032 LearningRate 0.0159 Epoch: 12 Global Step: 200860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:44:43,963-Speed 5196.99 samples/sec Loss 1.3591 LearningRate 0.0159 Epoch: 12 Global Step: 200870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:44:45,966-Speed 5112.74 samples/sec Loss 1.3755 LearningRate 0.0159 Epoch: 12 Global Step: 200880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:44:47,961-Speed 5135.02 samples/sec Loss 1.3937 LearningRate 0.0159 Epoch: 12 Global Step: 200890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:44:49,939-Speed 5178.18 samples/sec Loss 1.4263 LearningRate 0.0159 Epoch: 12 Global Step: 200900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:44:51,919-Speed 5175.42 samples/sec Loss 1.4146 LearningRate 0.0159 Epoch: 12 Global Step: 200910 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:44:53,894-Speed 5185.60 samples/sec Loss 1.4295 LearningRate 0.0158 Epoch: 12 Global Step: 200920 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:44:55,872-Speed 5178.32 samples/sec Loss 1.4112 LearningRate 0.0158 Epoch: 12 Global Step: 200930 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:44:57,856-Speed 5164.64 samples/sec Loss 1.3633 LearningRate 0.0158 Epoch: 12 Global Step: 200940 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:44:59,829-Speed 5190.09 samples/sec Loss 1.4256 LearningRate 0.0158 Epoch: 12 Global Step: 200950 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:45:01,806-Speed 5182.85 samples/sec Loss 1.4112 LearningRate 0.0158 Epoch: 12 Global Step: 200960 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:45:03,791-Speed 5159.76 samples/sec Loss 1.4225 LearningRate 0.0158 Epoch: 12 Global Step: 200970 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:45:05,776-Speed 5158.72 samples/sec Loss 1.3802 LearningRate 0.0158 Epoch: 12 Global Step: 200980 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:45:07,779-Speed 5115.47 samples/sec Loss 1.4003 LearningRate 0.0158 Epoch: 12 Global Step: 200990 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:45:09,759-Speed 5172.38 samples/sec Loss 1.3761 LearningRate 0.0158 Epoch: 12 Global Step: 201000 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:45:11,738-Speed 5177.72 samples/sec Loss 1.3842 LearningRate 0.0158 Epoch: 12 Global Step: 201010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:45:13,744-Speed 5105.01 samples/sec Loss 1.4004 LearningRate 0.0158 Epoch: 12 Global Step: 201020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:45:15,729-Speed 5160.41 samples/sec Loss 1.3982 LearningRate 0.0158 Epoch: 12 Global Step: 201030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:45:17,707-Speed 5180.53 samples/sec Loss 1.3796 LearningRate 0.0158 Epoch: 12 Global Step: 201040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:45:19,715-Speed 5101.71 samples/sec Loss 1.4103 LearningRate 0.0158 Epoch: 12 Global Step: 201050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:45:21,692-Speed 5180.85 samples/sec Loss 1.4200 LearningRate 0.0158 Epoch: 12 Global Step: 201060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:45:23,666-Speed 5189.43 samples/sec Loss 1.3561 LearningRate 0.0158 Epoch: 12 Global Step: 201070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:45:25,640-Speed 5189.26 samples/sec Loss 1.4033 LearningRate 0.0158 Epoch: 12 Global Step: 201080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:45:27,623-Speed 5166.02 samples/sec Loss 1.3736 LearningRate 0.0158 Epoch: 12 Global Step: 201090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:45:29,594-Speed 5195.06 samples/sec Loss 1.4425 LearningRate 0.0158 Epoch: 12 Global Step: 201100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:45:31,575-Speed 5170.93 samples/sec Loss 1.3973 LearningRate 0.0158 Epoch: 12 Global Step: 201110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:45:33,567-Speed 5142.64 samples/sec Loss 1.3670 LearningRate 0.0158 Epoch: 12 Global Step: 201120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:45:35,555-Speed 5153.52 samples/sec Loss 1.4495 LearningRate 0.0158 Epoch: 12 Global Step: 201130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:45:37,555-Speed 5120.74 samples/sec Loss 1.3737 LearningRate 0.0158 Epoch: 12 Global Step: 201140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:45:39,541-Speed 5158.78 samples/sec Loss 1.4246 LearningRate 0.0158 Epoch: 12 Global Step: 201150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:45:41,535-Speed 5137.64 samples/sec Loss 1.3831 LearningRate 0.0158 Epoch: 12 Global Step: 201160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:45:43,499-Speed 5214.06 samples/sec Loss 1.4307 LearningRate 0.0158 Epoch: 12 Global Step: 201170 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:45:45,481-Speed 5169.06 samples/sec Loss 1.4184 LearningRate 0.0158 Epoch: 12 Global Step: 201180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:45:47,465-Speed 5163.59 samples/sec Loss 1.4808 LearningRate 0.0158 Epoch: 12 Global Step: 201190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:45:49,495-Speed 5046.35 samples/sec Loss 1.4540 LearningRate 0.0158 Epoch: 12 Global Step: 201200 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:45:51,499-Speed 5109.43 samples/sec Loss 1.4103 LearningRate 0.0158 Epoch: 12 Global Step: 201210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:45:53,486-Speed 5156.48 samples/sec Loss 1.4184 LearningRate 0.0158 Epoch: 12 Global Step: 201220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:45:55,462-Speed 5183.59 samples/sec Loss 1.4404 LearningRate 0.0158 Epoch: 12 Global Step: 201230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:45:57,456-Speed 5137.71 samples/sec Loss 1.3945 LearningRate 0.0158 Epoch: 12 Global Step: 201240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:45:59,441-Speed 5161.39 samples/sec Loss 1.4076 LearningRate 0.0158 Epoch: 12 Global Step: 201250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:46:01,413-Speed 5192.93 samples/sec Loss 1.4299 LearningRate 0.0158 Epoch: 12 Global Step: 201260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:46:03,419-Speed 5107.52 samples/sec Loss 1.4410 LearningRate 0.0158 Epoch: 12 Global Step: 201270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:46:05,413-Speed 5137.47 samples/sec Loss 1.4268 LearningRate 0.0158 Epoch: 12 Global Step: 201280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:46:07,406-Speed 5139.51 samples/sec Loss 1.3837 LearningRate 0.0158 Epoch: 12 Global Step: 201290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:46:09,404-Speed 5127.22 samples/sec Loss 1.4198 LearningRate 0.0158 Epoch: 12 Global Step: 201300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:46:11,408-Speed 5109.06 samples/sec Loss 1.3819 LearningRate 0.0158 Epoch: 12 Global Step: 201310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:46:13,384-Speed 5185.32 samples/sec Loss 1.4281 LearningRate 0.0158 Epoch: 12 Global Step: 201320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:46:15,360-Speed 5184.52 samples/sec Loss 1.4872 LearningRate 0.0158 Epoch: 12 Global Step: 201330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:46:17,333-Speed 5190.19 samples/sec Loss 1.3946 LearningRate 0.0157 Epoch: 12 Global Step: 201340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:46:19,328-Speed 5134.65 samples/sec Loss 1.4142 LearningRate 0.0157 Epoch: 12 Global Step: 201350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:46:21,302-Speed 5190.63 samples/sec Loss 1.4525 LearningRate 0.0157 Epoch: 12 Global Step: 201360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:46:23,325-Speed 5062.78 samples/sec Loss 1.4370 LearningRate 0.0157 Epoch: 12 Global Step: 201370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:46:25,322-Speed 5130.63 samples/sec Loss 1.4055 LearningRate 0.0157 Epoch: 12 Global Step: 201380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:46:27,301-Speed 5176.24 samples/sec Loss 1.4414 LearningRate 0.0157 Epoch: 12 Global Step: 201390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:46:29,294-Speed 5139.44 samples/sec Loss 1.4273 LearningRate 0.0157 Epoch: 12 Global Step: 201400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:46:31,268-Speed 5188.05 samples/sec Loss 1.4477 LearningRate 0.0157 Epoch: 12 Global Step: 201410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:46:33,256-Speed 5151.50 samples/sec Loss 1.3949 LearningRate 0.0157 Epoch: 12 Global Step: 201420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:46:35,244-Speed 5152.10 samples/sec Loss 1.4203 LearningRate 0.0157 Epoch: 12 Global Step: 201430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:46:37,225-Speed 5173.17 samples/sec Loss 1.4364 LearningRate 0.0157 Epoch: 12 Global Step: 201440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:46:39,206-Speed 5169.83 samples/sec Loss 1.4431 LearningRate 0.0157 Epoch: 12 Global Step: 201450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:46:41,187-Speed 5170.80 samples/sec Loss 1.4514 LearningRate 0.0157 Epoch: 12 Global Step: 201460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:46:43,165-Speed 5180.54 samples/sec Loss 1.4278 LearningRate 0.0157 Epoch: 12 Global Step: 201470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:46:45,146-Speed 5171.16 samples/sec Loss 1.3975 LearningRate 0.0157 Epoch: 12 Global Step: 201480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:46:47,124-Speed 5176.01 samples/sec Loss 1.4061 LearningRate 0.0157 Epoch: 12 Global Step: 201490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:46:49,102-Speed 5180.39 samples/sec Loss 1.4300 LearningRate 0.0157 Epoch: 12 Global Step: 201500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:46:51,077-Speed 5185.37 samples/sec Loss 1.4188 LearningRate 0.0157 Epoch: 12 Global Step: 201510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:46:53,057-Speed 5173.47 samples/sec Loss 1.4511 LearningRate 0.0157 Epoch: 12 Global Step: 201520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:46:55,036-Speed 5176.03 samples/sec Loss 1.4200 LearningRate 0.0157 Epoch: 12 Global Step: 201530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:46:57,019-Speed 5165.53 samples/sec Loss 1.3832 LearningRate 0.0157 Epoch: 12 Global Step: 201540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:46:58,999-Speed 5174.63 samples/sec Loss 1.4693 LearningRate 0.0157 Epoch: 12 Global Step: 201550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:47:00,978-Speed 5175.38 samples/sec Loss 1.3885 LearningRate 0.0157 Epoch: 12 Global Step: 201560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:47:02,975-Speed 5130.09 samples/sec Loss 1.5117 LearningRate 0.0157 Epoch: 12 Global Step: 201570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:47:04,972-Speed 5129.98 samples/sec Loss 1.4184 LearningRate 0.0157 Epoch: 12 Global Step: 201580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:47:06,958-Speed 5158.15 samples/sec Loss 1.4363 LearningRate 0.0157 Epoch: 12 Global Step: 201590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:47:08,955-Speed 5127.03 samples/sec Loss 1.4162 LearningRate 0.0157 Epoch: 12 Global Step: 201600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:47:10,941-Speed 5158.37 samples/sec Loss 1.4297 LearningRate 0.0157 Epoch: 12 Global Step: 201610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:47:12,920-Speed 5176.24 samples/sec Loss 1.4542 LearningRate 0.0157 Epoch: 12 Global Step: 201620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:47:14,963-Speed 5013.08 samples/sec Loss 1.4172 LearningRate 0.0157 Epoch: 12 Global Step: 201630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:47:16,958-Speed 5134.93 samples/sec Loss 1.4615 LearningRate 0.0157 Epoch: 12 Global Step: 201640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:47:18,937-Speed 5176.94 samples/sec Loss 1.4043 LearningRate 0.0157 Epoch: 12 Global Step: 201650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:47:20,914-Speed 5181.67 samples/sec Loss 1.4174 LearningRate 0.0157 Epoch: 12 Global Step: 201660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:47:22,901-Speed 5155.89 samples/sec Loss 1.4530 LearningRate 0.0157 Epoch: 12 Global Step: 201670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:47:24,892-Speed 5145.51 samples/sec Loss 1.4402 LearningRate 0.0157 Epoch: 12 Global Step: 201680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:47:26,891-Speed 5122.35 samples/sec Loss 1.4197 LearningRate 0.0157 Epoch: 12 Global Step: 201690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:47:28,889-Speed 5126.91 samples/sec Loss 1.4193 LearningRate 0.0157 Epoch: 12 Global Step: 201700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:47:30,870-Speed 5171.91 samples/sec Loss 1.4383 LearningRate 0.0157 Epoch: 12 Global Step: 201710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:47:32,849-Speed 5175.29 samples/sec Loss 1.4306 LearningRate 0.0157 Epoch: 12 Global Step: 201720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:47:34,827-Speed 5179.70 samples/sec Loss 1.5332 LearningRate 0.0157 Epoch: 12 Global Step: 201730 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:47:36,825-Speed 5126.75 samples/sec Loss 1.4746 LearningRate 0.0157 Epoch: 12 Global Step: 201740 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:47:38,820-Speed 5132.21 samples/sec Loss 1.4228 LearningRate 0.0157 Epoch: 12 Global Step: 201750 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:47:40,810-Speed 5148.11 samples/sec Loss 1.4784 LearningRate 0.0157 Epoch: 12 Global Step: 201760 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:47:42,803-Speed 5140.62 samples/sec Loss 1.3793 LearningRate 0.0156 Epoch: 12 Global Step: 201770 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:47:44,777-Speed 5188.43 samples/sec Loss 1.4691 LearningRate 0.0156 Epoch: 12 Global Step: 201780 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:47:46,768-Speed 5145.50 samples/sec Loss 1.3962 LearningRate 0.0156 Epoch: 12 Global Step: 201790 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:47:48,744-Speed 5182.95 samples/sec Loss 1.4201 LearningRate 0.0156 Epoch: 12 Global Step: 201800 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:47:50,730-Speed 5157.72 samples/sec Loss 1.4504 LearningRate 0.0156 Epoch: 12 Global Step: 201810 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:47:52,702-Speed 5196.81 samples/sec Loss 1.4246 LearningRate 0.0156 Epoch: 12 Global Step: 201820 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:47:54,680-Speed 5178.36 samples/sec Loss 1.4614 LearningRate 0.0156 Epoch: 12 Global Step: 201830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:47:56,684-Speed 5109.30 samples/sec Loss 1.4713 LearningRate 0.0156 Epoch: 12 Global Step: 201840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:47:58,669-Speed 5162.24 samples/sec Loss 1.4814 LearningRate 0.0156 Epoch: 12 Global Step: 201850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:48:00,681-Speed 5088.95 samples/sec Loss 1.4064 LearningRate 0.0156 Epoch: 12 Global Step: 201860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:48:02,664-Speed 5166.05 samples/sec Loss 1.4506 LearningRate 0.0156 Epoch: 12 Global Step: 201870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:48:04,651-Speed 5155.15 samples/sec Loss 1.4405 LearningRate 0.0156 Epoch: 12 Global Step: 201880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:48:06,648-Speed 5132.07 samples/sec Loss 1.4558 LearningRate 0.0156 Epoch: 12 Global Step: 201890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:48:08,621-Speed 5190.10 samples/sec Loss 1.4335 LearningRate 0.0156 Epoch: 12 Global Step: 201900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:48:10,613-Speed 5141.24 samples/sec Loss 1.4545 LearningRate 0.0156 Epoch: 12 Global Step: 201910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:48:12,615-Speed 5117.82 samples/sec Loss 1.4582 LearningRate 0.0156 Epoch: 12 Global Step: 201920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:48:14,614-Speed 5124.27 samples/sec Loss 1.4888 LearningRate 0.0156 Epoch: 12 Global Step: 201930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:48:16,608-Speed 5137.28 samples/sec Loss 1.4292 LearningRate 0.0156 Epoch: 12 Global Step: 201940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:48:18,598-Speed 5146.16 samples/sec Loss 1.4490 LearningRate 0.0156 Epoch: 12 Global Step: 201950 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:48:20,612-Speed 5087.02 samples/sec Loss 1.3894 LearningRate 0.0156 Epoch: 12 Global Step: 201960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:48:22,585-Speed 5192.42 samples/sec Loss 1.4407 LearningRate 0.0156 Epoch: 12 Global Step: 201970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:48:24,586-Speed 5119.20 samples/sec Loss 1.4635 LearningRate 0.0156 Epoch: 12 Global Step: 201980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:48:26,568-Speed 5168.83 samples/sec Loss 1.4748 LearningRate 0.0156 Epoch: 12 Global Step: 201990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:48:28,554-Speed 5158.21 samples/sec Loss 1.4491 LearningRate 0.0156 Epoch: 12 Global Step: 202000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:48:55,363-[lfw][202000]XNorm: 23.352157 Training: 2022-04-11 12:48:55,364-[lfw][202000]Accuracy-Flip: 0.99800+-0.00287 Training: 2022-04-11 12:48:55,364-[lfw][202000]Accuracy-Highest: 0.99833 Training: 2022-04-11 12:49:26,155-[cfp_fp][202000]XNorm: 22.124081 Training: 2022-04-11 12:49:26,155-[cfp_fp][202000]Accuracy-Flip: 0.98629+-0.00470 Training: 2022-04-11 12:49:26,156-[cfp_fp][202000]Accuracy-Highest: 0.98757 Training: 2022-04-11 12:49:52,724-[agedb_30][202000]XNorm: 23.612824 Training: 2022-04-11 12:49:52,725-[agedb_30][202000]Accuracy-Flip: 0.98067+-0.00750 Training: 2022-04-11 12:49:52,725-[agedb_30][202000]Accuracy-Highest: 0.98250 Training: 2022-04-11 12:49:54,707-Speed 118.86 samples/sec Loss 1.4895 LearningRate 0.0156 Epoch: 12 Global Step: 202010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:49:56,671-Speed 5216.67 samples/sec Loss 1.4593 LearningRate 0.0156 Epoch: 12 Global Step: 202020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:49:58,642-Speed 5196.41 samples/sec Loss 1.4888 LearningRate 0.0156 Epoch: 12 Global Step: 202030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:50:00,623-Speed 5172.18 samples/sec Loss 1.4675 LearningRate 0.0156 Epoch: 12 Global Step: 202040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:50:02,593-Speed 5199.55 samples/sec Loss 1.4789 LearningRate 0.0156 Epoch: 12 Global Step: 202050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:50:04,570-Speed 5181.78 samples/sec Loss 1.4636 LearningRate 0.0156 Epoch: 12 Global Step: 202060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:50:06,549-Speed 5173.76 samples/sec Loss 1.5219 LearningRate 0.0156 Epoch: 12 Global Step: 202070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:50:08,548-Speed 5124.35 samples/sec Loss 1.4430 LearningRate 0.0156 Epoch: 12 Global Step: 202080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:50:10,532-Speed 5163.83 samples/sec Loss 1.4571 LearningRate 0.0156 Epoch: 12 Global Step: 202090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:50:12,503-Speed 5198.04 samples/sec Loss 1.4606 LearningRate 0.0156 Epoch: 12 Global Step: 202100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:50:14,480-Speed 5181.02 samples/sec Loss 1.5099 LearningRate 0.0156 Epoch: 12 Global Step: 202110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:50:16,462-Speed 5168.61 samples/sec Loss 1.5051 LearningRate 0.0156 Epoch: 12 Global Step: 202120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:50:18,437-Speed 5186.32 samples/sec Loss 1.4116 LearningRate 0.0156 Epoch: 12 Global Step: 202130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:50:20,407-Speed 5200.16 samples/sec Loss 1.4679 LearningRate 0.0156 Epoch: 12 Global Step: 202140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:50:22,387-Speed 5173.87 samples/sec Loss 1.4267 LearningRate 0.0156 Epoch: 12 Global Step: 202150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:50:24,369-Speed 5165.65 samples/sec Loss 1.4926 LearningRate 0.0156 Epoch: 12 Global Step: 202160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:50:26,384-Speed 5084.98 samples/sec Loss 1.4972 LearningRate 0.0156 Epoch: 12 Global Step: 202170 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:50:28,358-Speed 5187.68 samples/sec Loss 1.5302 LearningRate 0.0156 Epoch: 12 Global Step: 202180 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:50:30,359-Speed 5120.71 samples/sec Loss 1.4439 LearningRate 0.0155 Epoch: 12 Global Step: 202190 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:50:32,339-Speed 5173.03 samples/sec Loss 1.4896 LearningRate 0.0155 Epoch: 12 Global Step: 202200 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:50:34,314-Speed 5187.28 samples/sec Loss 1.4744 LearningRate 0.0155 Epoch: 12 Global Step: 202210 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:50:36,317-Speed 5113.81 samples/sec Loss 1.4572 LearningRate 0.0155 Epoch: 12 Global Step: 202220 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:50:38,285-Speed 5205.04 samples/sec Loss 1.4839 LearningRate 0.0155 Epoch: 12 Global Step: 202230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:50:40,269-Speed 5161.29 samples/sec Loss 1.4891 LearningRate 0.0155 Epoch: 12 Global Step: 202240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:50:42,259-Speed 5146.91 samples/sec Loss 1.4439 LearningRate 0.0155 Epoch: 12 Global Step: 202250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:50:44,240-Speed 5171.84 samples/sec Loss 1.4936 LearningRate 0.0155 Epoch: 12 Global Step: 202260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:50:46,215-Speed 5186.13 samples/sec Loss 1.5055 LearningRate 0.0155 Epoch: 12 Global Step: 202270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:50:48,209-Speed 5136.80 samples/sec Loss 1.4580 LearningRate 0.0155 Epoch: 12 Global Step: 202280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:50:50,196-Speed 5154.77 samples/sec Loss 1.4193 LearningRate 0.0155 Epoch: 12 Global Step: 202290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:50:52,182-Speed 5160.65 samples/sec Loss 1.4428 LearningRate 0.0155 Epoch: 12 Global Step: 202300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:50:54,178-Speed 5132.58 samples/sec Loss 1.4815 LearningRate 0.0155 Epoch: 12 Global Step: 202310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:50:56,180-Speed 5116.33 samples/sec Loss 1.5111 LearningRate 0.0155 Epoch: 12 Global Step: 202320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:50:58,146-Speed 5209.12 samples/sec Loss 1.4848 LearningRate 0.0155 Epoch: 12 Global Step: 202330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:51:00,125-Speed 5175.68 samples/sec Loss 1.3962 LearningRate 0.0155 Epoch: 12 Global Step: 202340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:51:02,113-Speed 5152.91 samples/sec Loss 1.4046 LearningRate 0.0155 Epoch: 12 Global Step: 202350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:51:04,090-Speed 5181.44 samples/sec Loss 1.4579 LearningRate 0.0155 Epoch: 12 Global Step: 202360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:51:06,086-Speed 5131.26 samples/sec Loss 1.4855 LearningRate 0.0155 Epoch: 12 Global Step: 202370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:51:08,088-Speed 5117.24 samples/sec Loss 1.5444 LearningRate 0.0155 Epoch: 12 Global Step: 202380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:51:10,097-Speed 5099.38 samples/sec Loss 1.4326 LearningRate 0.0155 Epoch: 12 Global Step: 202390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:51:12,069-Speed 5192.92 samples/sec Loss 1.5148 LearningRate 0.0155 Epoch: 12 Global Step: 202400 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:51:14,081-Speed 5092.83 samples/sec Loss 1.4823 LearningRate 0.0155 Epoch: 12 Global Step: 202410 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:51:16,079-Speed 5124.95 samples/sec Loss 1.5064 LearningRate 0.0155 Epoch: 12 Global Step: 202420 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:51:18,060-Speed 5171.08 samples/sec Loss 1.5036 LearningRate 0.0155 Epoch: 12 Global Step: 202430 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:51:20,047-Speed 5156.19 samples/sec Loss 1.4148 LearningRate 0.0155 Epoch: 12 Global Step: 202440 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:51:22,051-Speed 5111.92 samples/sec Loss 1.5248 LearningRate 0.0155 Epoch: 12 Global Step: 202450 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:51:24,046-Speed 5133.73 samples/sec Loss 1.4658 LearningRate 0.0155 Epoch: 12 Global Step: 202460 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:51:26,032-Speed 5156.70 samples/sec Loss 1.4870 LearningRate 0.0155 Epoch: 12 Global Step: 202470 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:51:28,012-Speed 5175.61 samples/sec Loss 1.4708 LearningRate 0.0155 Epoch: 12 Global Step: 202480 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:51:30,010-Speed 5125.86 samples/sec Loss 1.4418 LearningRate 0.0155 Epoch: 12 Global Step: 202490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:51:31,988-Speed 5178.83 samples/sec Loss 1.4745 LearningRate 0.0155 Epoch: 12 Global Step: 202500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:51:33,972-Speed 5162.68 samples/sec Loss 1.4215 LearningRate 0.0155 Epoch: 12 Global Step: 202510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:51:35,968-Speed 5131.44 samples/sec Loss 1.4690 LearningRate 0.0155 Epoch: 12 Global Step: 202520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:51:37,969-Speed 5118.86 samples/sec Loss 1.4927 LearningRate 0.0155 Epoch: 12 Global Step: 202530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:51:39,944-Speed 5187.42 samples/sec Loss 1.5077 LearningRate 0.0155 Epoch: 12 Global Step: 202540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:51:41,928-Speed 5164.21 samples/sec Loss 1.4847 LearningRate 0.0155 Epoch: 12 Global Step: 202550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:51:43,905-Speed 5180.00 samples/sec Loss 1.5016 LearningRate 0.0155 Epoch: 12 Global Step: 202560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:51:45,886-Speed 5171.28 samples/sec Loss 1.5015 LearningRate 0.0155 Epoch: 12 Global Step: 202570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:51:47,872-Speed 5157.65 samples/sec Loss 1.5406 LearningRate 0.0155 Epoch: 12 Global Step: 202580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:51:49,860-Speed 5153.31 samples/sec Loss 1.4604 LearningRate 0.0155 Epoch: 12 Global Step: 202590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:51:51,846-Speed 5155.75 samples/sec Loss 1.5213 LearningRate 0.0155 Epoch: 12 Global Step: 202600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:51:53,825-Speed 5177.43 samples/sec Loss 1.4690 LearningRate 0.0154 Epoch: 12 Global Step: 202610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:51:55,801-Speed 5183.18 samples/sec Loss 1.4801 LearningRate 0.0154 Epoch: 12 Global Step: 202620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:51:57,797-Speed 5130.88 samples/sec Loss 1.4689 LearningRate 0.0154 Epoch: 12 Global Step: 202630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:51:59,794-Speed 5130.12 samples/sec Loss 1.4456 LearningRate 0.0154 Epoch: 12 Global Step: 202640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:52:01,773-Speed 5176.14 samples/sec Loss 1.4401 LearningRate 0.0154 Epoch: 12 Global Step: 202650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:52:03,752-Speed 5177.02 samples/sec Loss 1.5065 LearningRate 0.0154 Epoch: 12 Global Step: 202660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:52:05,743-Speed 5145.22 samples/sec Loss 1.5486 LearningRate 0.0154 Epoch: 12 Global Step: 202670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:52:07,737-Speed 5135.72 samples/sec Loss 1.5221 LearningRate 0.0154 Epoch: 12 Global Step: 202680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:52:09,726-Speed 5151.58 samples/sec Loss 1.5790 LearningRate 0.0154 Epoch: 12 Global Step: 202690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:52:11,707-Speed 5171.23 samples/sec Loss 1.4925 LearningRate 0.0154 Epoch: 12 Global Step: 202700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:52:13,685-Speed 5176.57 samples/sec Loss 1.5130 LearningRate 0.0154 Epoch: 12 Global Step: 202710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:52:15,684-Speed 5126.14 samples/sec Loss 1.5107 LearningRate 0.0154 Epoch: 12 Global Step: 202720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:52:17,675-Speed 5144.76 samples/sec Loss 1.4755 LearningRate 0.0154 Epoch: 12 Global Step: 202730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:52:19,649-Speed 5189.29 samples/sec Loss 1.4944 LearningRate 0.0154 Epoch: 12 Global Step: 202740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:52:21,660-Speed 5091.92 samples/sec Loss 1.4923 LearningRate 0.0154 Epoch: 12 Global Step: 202750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:52:23,644-Speed 5164.84 samples/sec Loss 1.5083 LearningRate 0.0154 Epoch: 12 Global Step: 202760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:52:25,620-Speed 5182.13 samples/sec Loss 1.5110 LearningRate 0.0154 Epoch: 12 Global Step: 202770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:52:27,599-Speed 5176.70 samples/sec Loss 1.4876 LearningRate 0.0154 Epoch: 12 Global Step: 202780 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:52:29,581-Speed 5168.82 samples/sec Loss 1.5091 LearningRate 0.0154 Epoch: 12 Global Step: 202790 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:52:31,574-Speed 5139.35 samples/sec Loss 1.5283 LearningRate 0.0154 Epoch: 12 Global Step: 202800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:52:33,546-Speed 5194.75 samples/sec Loss 1.4597 LearningRate 0.0154 Epoch: 12 Global Step: 202810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:52:35,533-Speed 5154.75 samples/sec Loss 1.4952 LearningRate 0.0154 Epoch: 12 Global Step: 202820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:52:37,517-Speed 5164.29 samples/sec Loss 1.4518 LearningRate 0.0154 Epoch: 12 Global Step: 202830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:52:39,528-Speed 5091.96 samples/sec Loss 1.5108 LearningRate 0.0154 Epoch: 12 Global Step: 202840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:52:41,526-Speed 5127.67 samples/sec Loss 1.4705 LearningRate 0.0154 Epoch: 12 Global Step: 202850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:52:43,498-Speed 5195.03 samples/sec Loss 1.4646 LearningRate 0.0154 Epoch: 12 Global Step: 202860 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:52:45,511-Speed 5088.31 samples/sec Loss 1.4916 LearningRate 0.0154 Epoch: 12 Global Step: 202870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:52:47,487-Speed 5182.20 samples/sec Loss 1.4650 LearningRate 0.0154 Epoch: 12 Global Step: 202880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:52:49,486-Speed 5127.19 samples/sec Loss 1.5442 LearningRate 0.0154 Epoch: 12 Global Step: 202890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:52:51,472-Speed 5155.48 samples/sec Loss 1.4917 LearningRate 0.0154 Epoch: 12 Global Step: 202900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:52:53,456-Speed 5164.14 samples/sec Loss 1.5267 LearningRate 0.0154 Epoch: 12 Global Step: 202910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:52:55,437-Speed 5171.05 samples/sec Loss 1.4873 LearningRate 0.0154 Epoch: 12 Global Step: 202920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:52:57,412-Speed 5186.08 samples/sec Loss 1.5943 LearningRate 0.0154 Epoch: 12 Global Step: 202930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:52:59,393-Speed 5170.42 samples/sec Loss 1.5130 LearningRate 0.0154 Epoch: 12 Global Step: 202940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:53:01,376-Speed 5166.42 samples/sec Loss 1.5148 LearningRate 0.0154 Epoch: 12 Global Step: 202950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:53:03,375-Speed 5122.71 samples/sec Loss 1.4945 LearningRate 0.0154 Epoch: 12 Global Step: 202960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:53:05,385-Speed 5096.02 samples/sec Loss 1.5251 LearningRate 0.0154 Epoch: 12 Global Step: 202970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:53:07,359-Speed 5190.49 samples/sec Loss 1.5082 LearningRate 0.0154 Epoch: 12 Global Step: 202980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:53:09,361-Speed 5116.97 samples/sec Loss 1.5464 LearningRate 0.0154 Epoch: 12 Global Step: 202990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:53:11,342-Speed 5171.21 samples/sec Loss 1.4811 LearningRate 0.0154 Epoch: 12 Global Step: 203000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:53:13,313-Speed 5196.58 samples/sec Loss 1.4974 LearningRate 0.0154 Epoch: 12 Global Step: 203010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:53:15,311-Speed 5127.55 samples/sec Loss 1.5483 LearningRate 0.0154 Epoch: 12 Global Step: 203020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:53:17,309-Speed 5127.46 samples/sec Loss 1.5139 LearningRate 0.0154 Epoch: 12 Global Step: 203030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:53:19,282-Speed 5190.96 samples/sec Loss 1.5185 LearningRate 0.0153 Epoch: 12 Global Step: 203040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:53:21,276-Speed 5135.72 samples/sec Loss 1.4683 LearningRate 0.0153 Epoch: 12 Global Step: 203050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:53:23,273-Speed 5129.56 samples/sec Loss 1.5263 LearningRate 0.0153 Epoch: 12 Global Step: 203060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:53:25,301-Speed 5051.46 samples/sec Loss 1.4736 LearningRate 0.0153 Epoch: 12 Global Step: 203070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:53:27,278-Speed 5180.81 samples/sec Loss 1.4828 LearningRate 0.0153 Epoch: 12 Global Step: 203080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:53:29,263-Speed 5159.64 samples/sec Loss 1.5174 LearningRate 0.0153 Epoch: 12 Global Step: 203090 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:53:31,247-Speed 5165.93 samples/sec Loss 1.4597 LearningRate 0.0153 Epoch: 12 Global Step: 203100 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:53:33,219-Speed 5193.24 samples/sec Loss 1.5129 LearningRate 0.0153 Epoch: 12 Global Step: 203110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:53:35,224-Speed 5109.98 samples/sec Loss 1.4930 LearningRate 0.0153 Epoch: 12 Global Step: 203120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:53:37,208-Speed 5163.48 samples/sec Loss 1.5033 LearningRate 0.0153 Epoch: 12 Global Step: 203130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:53:39,186-Speed 5176.25 samples/sec Loss 1.4847 LearningRate 0.0153 Epoch: 12 Global Step: 203140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:53:41,165-Speed 5175.76 samples/sec Loss 1.4682 LearningRate 0.0153 Epoch: 12 Global Step: 203150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:53:43,138-Speed 5193.46 samples/sec Loss 1.4669 LearningRate 0.0153 Epoch: 12 Global Step: 203160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:53:45,126-Speed 5153.40 samples/sec Loss 1.5043 LearningRate 0.0153 Epoch: 12 Global Step: 203170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:53:47,123-Speed 5126.85 samples/sec Loss 1.4968 LearningRate 0.0153 Epoch: 12 Global Step: 203180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:53:49,107-Speed 5163.70 samples/sec Loss 1.5403 LearningRate 0.0153 Epoch: 12 Global Step: 203190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:53:51,096-Speed 5149.00 samples/sec Loss 1.5448 LearningRate 0.0153 Epoch: 12 Global Step: 203200 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:53:53,091-Speed 5135.69 samples/sec Loss 1.5197 LearningRate 0.0153 Epoch: 12 Global Step: 203210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:53:55,069-Speed 5180.15 samples/sec Loss 1.5042 LearningRate 0.0153 Epoch: 12 Global Step: 203220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:53:57,057-Speed 5153.14 samples/sec Loss 1.5453 LearningRate 0.0153 Epoch: 12 Global Step: 203230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:53:59,038-Speed 5169.85 samples/sec Loss 1.5134 LearningRate 0.0153 Epoch: 12 Global Step: 203240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:54:01,033-Speed 5134.00 samples/sec Loss 1.5076 LearningRate 0.0153 Epoch: 12 Global Step: 203250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:54:03,022-Speed 5148.69 samples/sec Loss 1.5038 LearningRate 0.0153 Epoch: 12 Global Step: 203260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:54:05,014-Speed 5142.35 samples/sec Loss 1.5509 LearningRate 0.0153 Epoch: 12 Global Step: 203270 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:54:06,991-Speed 5183.58 samples/sec Loss 1.4869 LearningRate 0.0153 Epoch: 12 Global Step: 203280 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:54:08,994-Speed 5113.59 samples/sec Loss 1.5322 LearningRate 0.0153 Epoch: 12 Global Step: 203290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:54:10,991-Speed 5129.70 samples/sec Loss 1.5433 LearningRate 0.0153 Epoch: 12 Global Step: 203300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:54:12,969-Speed 5176.29 samples/sec Loss 1.4644 LearningRate 0.0153 Epoch: 12 Global Step: 203310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:54:14,951-Speed 5171.25 samples/sec Loss 1.5721 LearningRate 0.0153 Epoch: 12 Global Step: 203320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:54:16,931-Speed 5171.29 samples/sec Loss 1.5056 LearningRate 0.0153 Epoch: 12 Global Step: 203330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:54:18,911-Speed 5174.62 samples/sec Loss 1.4879 LearningRate 0.0153 Epoch: 12 Global Step: 203340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:54:20,907-Speed 5131.41 samples/sec Loss 1.5533 LearningRate 0.0153 Epoch: 12 Global Step: 203350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:54:22,926-Speed 5073.67 samples/sec Loss 1.5472 LearningRate 0.0153 Epoch: 12 Global Step: 203360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:54:24,920-Speed 5138.52 samples/sec Loss 1.5050 LearningRate 0.0153 Epoch: 12 Global Step: 203370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:54:26,929-Speed 5097.49 samples/sec Loss 1.5265 LearningRate 0.0153 Epoch: 12 Global Step: 203380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:54:28,933-Speed 5110.63 samples/sec Loss 1.5413 LearningRate 0.0153 Epoch: 12 Global Step: 203390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:54:30,912-Speed 5175.36 samples/sec Loss 1.5516 LearningRate 0.0153 Epoch: 12 Global Step: 203400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:54:32,925-Speed 5089.37 samples/sec Loss 1.5579 LearningRate 0.0153 Epoch: 12 Global Step: 203410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:54:34,912-Speed 5154.50 samples/sec Loss 1.4921 LearningRate 0.0153 Epoch: 12 Global Step: 203420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:54:36,911-Speed 5125.38 samples/sec Loss 1.5604 LearningRate 0.0153 Epoch: 12 Global Step: 203430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:54:38,891-Speed 5174.28 samples/sec Loss 1.5142 LearningRate 0.0153 Epoch: 12 Global Step: 203440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:54:40,883-Speed 5142.76 samples/sec Loss 1.4699 LearningRate 0.0153 Epoch: 12 Global Step: 203450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:54:42,868-Speed 5160.29 samples/sec Loss 1.5089 LearningRate 0.0152 Epoch: 12 Global Step: 203460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:54:44,865-Speed 5129.21 samples/sec Loss 1.5108 LearningRate 0.0152 Epoch: 12 Global Step: 203470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:54:46,851-Speed 5156.44 samples/sec Loss 1.4706 LearningRate 0.0152 Epoch: 12 Global Step: 203480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:54:48,843-Speed 5141.89 samples/sec Loss 1.5001 LearningRate 0.0152 Epoch: 12 Global Step: 203490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:54:50,845-Speed 5116.42 samples/sec Loss 1.5363 LearningRate 0.0152 Epoch: 12 Global Step: 203500 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:54:52,825-Speed 5175.28 samples/sec Loss 1.4567 LearningRate 0.0152 Epoch: 12 Global Step: 203510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:54:54,801-Speed 5183.34 samples/sec Loss 1.5549 LearningRate 0.0152 Epoch: 12 Global Step: 203520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:54:56,772-Speed 5195.59 samples/sec Loss 1.5526 LearningRate 0.0152 Epoch: 12 Global Step: 203530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:54:58,769-Speed 5129.67 samples/sec Loss 1.5083 LearningRate 0.0152 Epoch: 12 Global Step: 203540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:55:00,773-Speed 5112.34 samples/sec Loss 1.5837 LearningRate 0.0152 Epoch: 12 Global Step: 203550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:55:02,763-Speed 5148.07 samples/sec Loss 1.4815 LearningRate 0.0152 Epoch: 12 Global Step: 203560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:55:04,755-Speed 5141.20 samples/sec Loss 1.5216 LearningRate 0.0152 Epoch: 12 Global Step: 203570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:55:06,754-Speed 5125.87 samples/sec Loss 1.5473 LearningRate 0.0152 Epoch: 12 Global Step: 203580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:55:08,729-Speed 5186.34 samples/sec Loss 1.4645 LearningRate 0.0152 Epoch: 12 Global Step: 203590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:55:10,733-Speed 5110.12 samples/sec Loss 1.5171 LearningRate 0.0152 Epoch: 12 Global Step: 203600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:55:12,713-Speed 5173.41 samples/sec Loss 1.5008 LearningRate 0.0152 Epoch: 12 Global Step: 203610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:55:14,722-Speed 5097.85 samples/sec Loss 1.5628 LearningRate 0.0152 Epoch: 12 Global Step: 203620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:55:16,732-Speed 5096.98 samples/sec Loss 1.5820 LearningRate 0.0152 Epoch: 12 Global Step: 203630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:55:18,710-Speed 5179.50 samples/sec Loss 1.5684 LearningRate 0.0152 Epoch: 12 Global Step: 203640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:55:20,726-Speed 5080.77 samples/sec Loss 1.5264 LearningRate 0.0152 Epoch: 12 Global Step: 203650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:55:22,711-Speed 5161.67 samples/sec Loss 1.5301 LearningRate 0.0152 Epoch: 12 Global Step: 203660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:55:24,709-Speed 5125.06 samples/sec Loss 1.5428 LearningRate 0.0152 Epoch: 12 Global Step: 203670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:55:26,697-Speed 5152.35 samples/sec Loss 1.6066 LearningRate 0.0152 Epoch: 12 Global Step: 203680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:55:28,695-Speed 5127.54 samples/sec Loss 1.5225 LearningRate 0.0152 Epoch: 12 Global Step: 203690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:55:30,692-Speed 5128.70 samples/sec Loss 1.5291 LearningRate 0.0152 Epoch: 12 Global Step: 203700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:55:32,669-Speed 5181.85 samples/sec Loss 1.5564 LearningRate 0.0152 Epoch: 12 Global Step: 203710 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:55:34,647-Speed 5178.04 samples/sec Loss 1.5008 LearningRate 0.0152 Epoch: 12 Global Step: 203720 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:55:36,647-Speed 5123.44 samples/sec Loss 1.5779 LearningRate 0.0152 Epoch: 12 Global Step: 203730 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:55:38,654-Speed 5103.79 samples/sec Loss 1.5258 LearningRate 0.0152 Epoch: 12 Global Step: 203740 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:55:40,656-Speed 5117.12 samples/sec Loss 1.5548 LearningRate 0.0152 Epoch: 12 Global Step: 203750 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:55:42,636-Speed 5173.93 samples/sec Loss 1.5096 LearningRate 0.0152 Epoch: 12 Global Step: 203760 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:55:44,631-Speed 5134.30 samples/sec Loss 1.5244 LearningRate 0.0152 Epoch: 12 Global Step: 203770 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:55:46,613-Speed 5168.60 samples/sec Loss 1.5307 LearningRate 0.0152 Epoch: 12 Global Step: 203780 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:55:48,592-Speed 5173.96 samples/sec Loss 1.4940 LearningRate 0.0152 Epoch: 12 Global Step: 203790 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:55:50,578-Speed 5159.77 samples/sec Loss 1.4877 LearningRate 0.0152 Epoch: 12 Global Step: 203800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:55:52,558-Speed 5171.84 samples/sec Loss 1.5076 LearningRate 0.0152 Epoch: 12 Global Step: 203810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:55:54,560-Speed 5116.60 samples/sec Loss 1.5155 LearningRate 0.0152 Epoch: 12 Global Step: 203820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:55:56,558-Speed 5127.61 samples/sec Loss 1.5507 LearningRate 0.0152 Epoch: 12 Global Step: 203830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:55:58,548-Speed 5145.57 samples/sec Loss 1.5239 LearningRate 0.0152 Epoch: 12 Global Step: 203840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:56:00,543-Speed 5135.76 samples/sec Loss 1.5653 LearningRate 0.0152 Epoch: 12 Global Step: 203850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:56:02,521-Speed 5178.66 samples/sec Loss 1.5567 LearningRate 0.0152 Epoch: 12 Global Step: 203860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:56:04,496-Speed 5186.38 samples/sec Loss 1.5492 LearningRate 0.0152 Epoch: 12 Global Step: 203870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:56:06,483-Speed 5155.31 samples/sec Loss 1.5581 LearningRate 0.0152 Epoch: 12 Global Step: 203880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:56:08,471-Speed 5153.70 samples/sec Loss 1.5417 LearningRate 0.0151 Epoch: 12 Global Step: 203890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:56:10,452-Speed 5169.45 samples/sec Loss 1.4999 LearningRate 0.0151 Epoch: 12 Global Step: 203900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:56:12,431-Speed 5176.38 samples/sec Loss 1.5403 LearningRate 0.0151 Epoch: 12 Global Step: 203910 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:56:14,427-Speed 5131.07 samples/sec Loss 1.5308 LearningRate 0.0151 Epoch: 12 Global Step: 203920 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:56:16,400-Speed 5192.39 samples/sec Loss 1.5438 LearningRate 0.0151 Epoch: 12 Global Step: 203930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:56:18,373-Speed 5191.28 samples/sec Loss 1.5036 LearningRate 0.0151 Epoch: 12 Global Step: 203940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:56:20,370-Speed 5129.40 samples/sec Loss 1.5605 LearningRate 0.0151 Epoch: 12 Global Step: 203950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:56:22,362-Speed 5142.84 samples/sec Loss 1.5243 LearningRate 0.0151 Epoch: 12 Global Step: 203960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:56:24,354-Speed 5142.22 samples/sec Loss 1.5101 LearningRate 0.0151 Epoch: 12 Global Step: 203970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:56:26,336-Speed 5169.06 samples/sec Loss 1.5465 LearningRate 0.0151 Epoch: 12 Global Step: 203980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:56:28,326-Speed 5148.73 samples/sec Loss 1.5639 LearningRate 0.0151 Epoch: 12 Global Step: 203990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:56:30,297-Speed 5196.46 samples/sec Loss 1.5304 LearningRate 0.0151 Epoch: 12 Global Step: 204000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:56:57,031-[lfw][204000]XNorm: 21.898300 Training: 2022-04-11 12:56:57,031-[lfw][204000]Accuracy-Flip: 0.99767+-0.00281 Training: 2022-04-11 12:56:57,032-[lfw][204000]Accuracy-Highest: 0.99833 Training: 2022-04-11 12:57:28,043-[cfp_fp][204000]XNorm: 21.116233 Training: 2022-04-11 12:57:28,043-[cfp_fp][204000]Accuracy-Flip: 0.98557+-0.00525 Training: 2022-04-11 12:57:28,044-[cfp_fp][204000]Accuracy-Highest: 0.98757 Training: 2022-04-11 12:57:54,865-[agedb_30][204000]XNorm: 22.203815 Training: 2022-04-11 12:57:54,866-[agedb_30][204000]Accuracy-Flip: 0.98150+-0.00713 Training: 2022-04-11 12:57:54,866-[agedb_30][204000]Accuracy-Highest: 0.98250 Training: 2022-04-11 12:57:56,849-Speed 118.31 samples/sec Loss 1.5094 LearningRate 0.0151 Epoch: 12 Global Step: 204010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:57:58,838-Speed 5149.65 samples/sec Loss 1.5665 LearningRate 0.0151 Epoch: 12 Global Step: 204020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:58:00,815-Speed 5181.13 samples/sec Loss 1.5391 LearningRate 0.0151 Epoch: 12 Global Step: 204030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:58:02,790-Speed 5187.22 samples/sec Loss 1.5493 LearningRate 0.0151 Epoch: 12 Global Step: 204040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:58:04,762-Speed 5194.01 samples/sec Loss 1.5657 LearningRate 0.0151 Epoch: 12 Global Step: 204050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:58:06,739-Speed 5182.12 samples/sec Loss 1.5944 LearningRate 0.0151 Epoch: 12 Global Step: 204060 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:58:08,706-Speed 5208.02 samples/sec Loss 1.5223 LearningRate 0.0151 Epoch: 12 Global Step: 204070 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:58:10,676-Speed 5199.79 samples/sec Loss 1.5254 LearningRate 0.0151 Epoch: 12 Global Step: 204080 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:58:12,650-Speed 5187.96 samples/sec Loss 1.5139 LearningRate 0.0151 Epoch: 12 Global Step: 204090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:58:14,629-Speed 5177.32 samples/sec Loss 1.5500 LearningRate 0.0151 Epoch: 12 Global Step: 204100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:58:16,604-Speed 5185.19 samples/sec Loss 1.5525 LearningRate 0.0151 Epoch: 12 Global Step: 204110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:58:18,569-Speed 5212.05 samples/sec Loss 1.6279 LearningRate 0.0151 Epoch: 12 Global Step: 204120 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:58:20,546-Speed 5182.49 samples/sec Loss 1.5820 LearningRate 0.0151 Epoch: 12 Global Step: 204130 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:58:22,532-Speed 5158.56 samples/sec Loss 1.5966 LearningRate 0.0151 Epoch: 12 Global Step: 204140 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:58:24,526-Speed 5136.64 samples/sec Loss 1.5579 LearningRate 0.0151 Epoch: 12 Global Step: 204150 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:58:26,506-Speed 5174.56 samples/sec Loss 1.5503 LearningRate 0.0151 Epoch: 12 Global Step: 204160 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:58:28,480-Speed 5187.87 samples/sec Loss 1.6193 LearningRate 0.0151 Epoch: 12 Global Step: 204170 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:58:30,452-Speed 5194.50 samples/sec Loss 1.4973 LearningRate 0.0151 Epoch: 12 Global Step: 204180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:58:32,435-Speed 5165.89 samples/sec Loss 1.5468 LearningRate 0.0151 Epoch: 12 Global Step: 204190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:58:34,410-Speed 5187.22 samples/sec Loss 1.4865 LearningRate 0.0151 Epoch: 12 Global Step: 204200 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:58:36,394-Speed 5162.66 samples/sec Loss 1.5841 LearningRate 0.0151 Epoch: 12 Global Step: 204210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:58:38,380-Speed 5158.80 samples/sec Loss 1.5813 LearningRate 0.0151 Epoch: 12 Global Step: 204220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:58:40,397-Speed 5075.88 samples/sec Loss 1.5531 LearningRate 0.0151 Epoch: 12 Global Step: 204230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:58:42,384-Speed 5157.56 samples/sec Loss 1.5181 LearningRate 0.0151 Epoch: 12 Global Step: 204240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:58:44,357-Speed 5192.26 samples/sec Loss 1.5403 LearningRate 0.0151 Epoch: 12 Global Step: 204250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:58:46,348-Speed 5143.09 samples/sec Loss 1.5264 LearningRate 0.0151 Epoch: 12 Global Step: 204260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:58:48,326-Speed 5180.07 samples/sec Loss 1.4872 LearningRate 0.0151 Epoch: 12 Global Step: 204270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:58:50,315-Speed 5149.33 samples/sec Loss 1.5114 LearningRate 0.0151 Epoch: 12 Global Step: 204280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:58:52,290-Speed 5187.77 samples/sec Loss 1.6084 LearningRate 0.0151 Epoch: 12 Global Step: 204290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:58:54,277-Speed 5155.25 samples/sec Loss 1.6715 LearningRate 0.0151 Epoch: 12 Global Step: 204300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:58:56,271-Speed 5134.57 samples/sec Loss 1.5859 LearningRate 0.0151 Epoch: 12 Global Step: 204310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:58:58,244-Speed 5191.37 samples/sec Loss 1.5403 LearningRate 0.0150 Epoch: 12 Global Step: 204320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:59:00,234-Speed 5148.31 samples/sec Loss 1.5967 LearningRate 0.0150 Epoch: 12 Global Step: 204330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:59:02,235-Speed 5120.58 samples/sec Loss 1.5467 LearningRate 0.0150 Epoch: 12 Global Step: 204340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:59:04,211-Speed 5184.83 samples/sec Loss 1.5453 LearningRate 0.0150 Epoch: 12 Global Step: 204350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:59:06,180-Speed 5201.01 samples/sec Loss 1.5136 LearningRate 0.0150 Epoch: 12 Global Step: 204360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:59:08,156-Speed 5185.29 samples/sec Loss 1.5666 LearningRate 0.0150 Epoch: 12 Global Step: 204370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:59:10,155-Speed 5123.74 samples/sec Loss 1.4767 LearningRate 0.0150 Epoch: 12 Global Step: 204380 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:59:12,159-Speed 5110.54 samples/sec Loss 1.5526 LearningRate 0.0150 Epoch: 12 Global Step: 204390 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:59:14,143-Speed 5164.20 samples/sec Loss 1.5603 LearningRate 0.0150 Epoch: 12 Global Step: 204400 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:59:16,123-Speed 5171.93 samples/sec Loss 1.5162 LearningRate 0.0150 Epoch: 12 Global Step: 204410 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:59:18,102-Speed 5176.99 samples/sec Loss 1.5377 LearningRate 0.0150 Epoch: 12 Global Step: 204420 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:59:20,079-Speed 5179.23 samples/sec Loss 1.5071 LearningRate 0.0150 Epoch: 12 Global Step: 204430 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:59:22,069-Speed 5147.91 samples/sec Loss 1.5943 LearningRate 0.0150 Epoch: 12 Global Step: 204440 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 12:59:24,075-Speed 5107.95 samples/sec Loss 1.5666 LearningRate 0.0150 Epoch: 12 Global Step: 204450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:59:26,066-Speed 5143.84 samples/sec Loss 1.5630 LearningRate 0.0150 Epoch: 12 Global Step: 204460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:59:28,051-Speed 5159.88 samples/sec Loss 1.5589 LearningRate 0.0150 Epoch: 12 Global Step: 204470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:59:30,024-Speed 5192.77 samples/sec Loss 1.5773 LearningRate 0.0150 Epoch: 12 Global Step: 204480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:59:32,009-Speed 5159.97 samples/sec Loss 1.5566 LearningRate 0.0150 Epoch: 12 Global Step: 204490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:59:33,995-Speed 5159.06 samples/sec Loss 1.5393 LearningRate 0.0150 Epoch: 12 Global Step: 204500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:59:35,986-Speed 5144.76 samples/sec Loss 1.6024 LearningRate 0.0150 Epoch: 12 Global Step: 204510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:59:37,993-Speed 5103.64 samples/sec Loss 1.5718 LearningRate 0.0150 Epoch: 12 Global Step: 204520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:59:39,983-Speed 5147.67 samples/sec Loss 1.6152 LearningRate 0.0150 Epoch: 12 Global Step: 204530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:59:41,959-Speed 5184.32 samples/sec Loss 1.5749 LearningRate 0.0150 Epoch: 12 Global Step: 204540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 12:59:43,938-Speed 5175.97 samples/sec Loss 1.5737 LearningRate 0.0150 Epoch: 12 Global Step: 204550 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:59:45,930-Speed 5141.29 samples/sec Loss 1.5790 LearningRate 0.0150 Epoch: 12 Global Step: 204560 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:59:47,922-Speed 5142.05 samples/sec Loss 1.5269 LearningRate 0.0150 Epoch: 12 Global Step: 204570 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:59:49,943-Speed 5068.06 samples/sec Loss 1.5408 LearningRate 0.0150 Epoch: 12 Global Step: 204580 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:59:51,934-Speed 5147.15 samples/sec Loss 1.6141 LearningRate 0.0150 Epoch: 12 Global Step: 204590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:59:53,923-Speed 5149.48 samples/sec Loss 1.5314 LearningRate 0.0150 Epoch: 12 Global Step: 204600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:59:55,905-Speed 5167.39 samples/sec Loss 1.5123 LearningRate 0.0150 Epoch: 12 Global Step: 204610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:59:57,892-Speed 5154.70 samples/sec Loss 1.5903 LearningRate 0.0150 Epoch: 12 Global Step: 204620 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 12:59:59,867-Speed 5186.45 samples/sec Loss 1.5382 LearningRate 0.0150 Epoch: 12 Global Step: 204630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:00:01,859-Speed 5143.09 samples/sec Loss 1.5181 LearningRate 0.0150 Epoch: 12 Global Step: 204640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:00:03,847-Speed 5151.93 samples/sec Loss 1.5583 LearningRate 0.0150 Epoch: 12 Global Step: 204650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:00:05,824-Speed 5180.62 samples/sec Loss 1.5319 LearningRate 0.0150 Epoch: 12 Global Step: 204660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:00:07,795-Speed 5198.66 samples/sec Loss 1.6238 LearningRate 0.0150 Epoch: 12 Global Step: 204670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:00:09,785-Speed 5147.83 samples/sec Loss 1.5657 LearningRate 0.0150 Epoch: 12 Global Step: 204680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:00:11,772-Speed 5156.35 samples/sec Loss 1.5721 LearningRate 0.0150 Epoch: 12 Global Step: 204690 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:00:13,748-Speed 5183.55 samples/sec Loss 1.5560 LearningRate 0.0150 Epoch: 12 Global Step: 204700 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:00:15,729-Speed 5169.68 samples/sec Loss 1.5934 LearningRate 0.0150 Epoch: 12 Global Step: 204710 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:00:17,750-Speed 5067.66 samples/sec Loss 1.5771 LearningRate 0.0150 Epoch: 12 Global Step: 204720 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:00:19,725-Speed 5188.50 samples/sec Loss 1.5568 LearningRate 0.0150 Epoch: 12 Global Step: 204730 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:00:21,707-Speed 5165.79 samples/sec Loss 1.5297 LearningRate 0.0150 Epoch: 12 Global Step: 204740 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:00:23,711-Speed 5113.41 samples/sec Loss 1.6087 LearningRate 0.0149 Epoch: 12 Global Step: 204750 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:00:25,710-Speed 5124.52 samples/sec Loss 1.5323 LearningRate 0.0149 Epoch: 12 Global Step: 204760 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:00:27,687-Speed 5180.49 samples/sec Loss 1.5393 LearningRate 0.0149 Epoch: 12 Global Step: 204770 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:00:29,667-Speed 5173.99 samples/sec Loss 1.5865 LearningRate 0.0149 Epoch: 12 Global Step: 204780 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:00:31,642-Speed 5186.16 samples/sec Loss 1.5486 LearningRate 0.0149 Epoch: 12 Global Step: 204790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:00:33,624-Speed 5168.17 samples/sec Loss 1.5172 LearningRate 0.0149 Epoch: 12 Global Step: 204800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:00:35,600-Speed 5183.02 samples/sec Loss 1.5397 LearningRate 0.0149 Epoch: 12 Global Step: 204810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:00:37,580-Speed 5173.89 samples/sec Loss 1.5794 LearningRate 0.0149 Epoch: 12 Global Step: 204820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:00:39,555-Speed 5187.03 samples/sec Loss 1.5855 LearningRate 0.0149 Epoch: 12 Global Step: 204830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:00:41,542-Speed 5154.87 samples/sec Loss 1.5126 LearningRate 0.0149 Epoch: 12 Global Step: 204840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:00:43,514-Speed 5194.50 samples/sec Loss 1.5816 LearningRate 0.0149 Epoch: 12 Global Step: 204850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:00:45,501-Speed 5154.96 samples/sec Loss 1.5655 LearningRate 0.0149 Epoch: 12 Global Step: 204860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:00:47,489-Speed 5153.54 samples/sec Loss 1.5767 LearningRate 0.0149 Epoch: 12 Global Step: 204870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:00:49,472-Speed 5166.00 samples/sec Loss 1.5839 LearningRate 0.0149 Epoch: 12 Global Step: 204880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:00:51,455-Speed 5165.66 samples/sec Loss 1.5483 LearningRate 0.0149 Epoch: 12 Global Step: 204890 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:00:53,436-Speed 5169.52 samples/sec Loss 1.5880 LearningRate 0.0149 Epoch: 12 Global Step: 204900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:00:55,412-Speed 5183.94 samples/sec Loss 1.5454 LearningRate 0.0149 Epoch: 12 Global Step: 204910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:00:57,386-Speed 5190.48 samples/sec Loss 1.5798 LearningRate 0.0149 Epoch: 12 Global Step: 204920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:00:59,361-Speed 5185.81 samples/sec Loss 1.5400 LearningRate 0.0149 Epoch: 12 Global Step: 204930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:01:01,341-Speed 5174.19 samples/sec Loss 1.5972 LearningRate 0.0149 Epoch: 12 Global Step: 204940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:01:03,325-Speed 5161.39 samples/sec Loss 1.6204 LearningRate 0.0149 Epoch: 12 Global Step: 204950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:01:05,323-Speed 5127.60 samples/sec Loss 1.5708 LearningRate 0.0149 Epoch: 12 Global Step: 204960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:01:07,310-Speed 5155.36 samples/sec Loss 1.5664 LearningRate 0.0149 Epoch: 12 Global Step: 204970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:01:09,284-Speed 5189.52 samples/sec Loss 1.5784 LearningRate 0.0149 Epoch: 12 Global Step: 204980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:01:11,276-Speed 5143.38 samples/sec Loss 1.5702 LearningRate 0.0149 Epoch: 12 Global Step: 204990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:01:13,257-Speed 5169.00 samples/sec Loss 1.5547 LearningRate 0.0149 Epoch: 12 Global Step: 205000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:01:15,246-Speed 5151.39 samples/sec Loss 1.5699 LearningRate 0.0149 Epoch: 12 Global Step: 205010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:01:17,216-Speed 5198.05 samples/sec Loss 1.5335 LearningRate 0.0149 Epoch: 12 Global Step: 205020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:01:19,183-Speed 5207.61 samples/sec Loss 1.5773 LearningRate 0.0149 Epoch: 12 Global Step: 205030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:01:21,201-Speed 5077.48 samples/sec Loss 1.5606 LearningRate 0.0149 Epoch: 12 Global Step: 205040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:01:23,195-Speed 5136.88 samples/sec Loss 1.5848 LearningRate 0.0149 Epoch: 12 Global Step: 205050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:01:25,184-Speed 5149.98 samples/sec Loss 1.6285 LearningRate 0.0149 Epoch: 12 Global Step: 205060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:01:27,191-Speed 5103.77 samples/sec Loss 1.5878 LearningRate 0.0149 Epoch: 12 Global Step: 205070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:01:29,173-Speed 5166.99 samples/sec Loss 1.5797 LearningRate 0.0149 Epoch: 12 Global Step: 205080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:01:31,163-Speed 5150.00 samples/sec Loss 1.5724 LearningRate 0.0149 Epoch: 12 Global Step: 205090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:01:33,134-Speed 5195.22 samples/sec Loss 1.5891 LearningRate 0.0149 Epoch: 12 Global Step: 205100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:01:35,116-Speed 5168.18 samples/sec Loss 1.5680 LearningRate 0.0149 Epoch: 12 Global Step: 205110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:01:37,110-Speed 5136.73 samples/sec Loss 1.5691 LearningRate 0.0149 Epoch: 12 Global Step: 205120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:01:39,095-Speed 5161.42 samples/sec Loss 1.5773 LearningRate 0.0149 Epoch: 12 Global Step: 205130 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:01:41,082-Speed 5154.92 samples/sec Loss 1.5832 LearningRate 0.0149 Epoch: 12 Global Step: 205140 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:01:43,064-Speed 5169.00 samples/sec Loss 1.5770 LearningRate 0.0149 Epoch: 12 Global Step: 205150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:01:45,055-Speed 5143.62 samples/sec Loss 1.6124 LearningRate 0.0149 Epoch: 12 Global Step: 205160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:01:47,035-Speed 5173.70 samples/sec Loss 1.5675 LearningRate 0.0149 Epoch: 12 Global Step: 205170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:01:49,027-Speed 5142.78 samples/sec Loss 1.6046 LearningRate 0.0149 Epoch: 12 Global Step: 205180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:01:51,007-Speed 5172.26 samples/sec Loss 1.5599 LearningRate 0.0148 Epoch: 12 Global Step: 205190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:01:52,993-Speed 5159.99 samples/sec Loss 1.5471 LearningRate 0.0148 Epoch: 12 Global Step: 205200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:01:54,967-Speed 5188.00 samples/sec Loss 1.6023 LearningRate 0.0148 Epoch: 12 Global Step: 205210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:01:56,951-Speed 5162.10 samples/sec Loss 1.5864 LearningRate 0.0148 Epoch: 12 Global Step: 205220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:01:58,928-Speed 5182.61 samples/sec Loss 1.6732 LearningRate 0.0148 Epoch: 12 Global Step: 205230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:02:00,921-Speed 5139.79 samples/sec Loss 1.6161 LearningRate 0.0148 Epoch: 12 Global Step: 205240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:02:02,909-Speed 5151.57 samples/sec Loss 1.5910 LearningRate 0.0148 Epoch: 12 Global Step: 205250 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:02:04,901-Speed 5142.30 samples/sec Loss 1.5669 LearningRate 0.0148 Epoch: 12 Global Step: 205260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:02:06,882-Speed 5171.79 samples/sec Loss 1.5530 LearningRate 0.0148 Epoch: 12 Global Step: 205270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:02:08,877-Speed 5134.63 samples/sec Loss 1.5781 LearningRate 0.0148 Epoch: 12 Global Step: 205280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:02:10,871-Speed 5137.53 samples/sec Loss 1.5800 LearningRate 0.0148 Epoch: 12 Global Step: 205290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:02:12,860-Speed 5148.10 samples/sec Loss 1.5842 LearningRate 0.0148 Epoch: 12 Global Step: 205300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:02:14,838-Speed 5179.35 samples/sec Loss 1.5996 LearningRate 0.0148 Epoch: 12 Global Step: 205310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:02:16,817-Speed 5176.49 samples/sec Loss 1.5281 LearningRate 0.0148 Epoch: 12 Global Step: 205320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:02:18,812-Speed 5134.59 samples/sec Loss 1.5359 LearningRate 0.0148 Epoch: 12 Global Step: 205330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:02:20,785-Speed 5190.84 samples/sec Loss 1.5795 LearningRate 0.0148 Epoch: 12 Global Step: 205340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:02:22,779-Speed 5138.15 samples/sec Loss 1.6041 LearningRate 0.0148 Epoch: 12 Global Step: 205350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:02:24,772-Speed 5138.19 samples/sec Loss 1.5229 LearningRate 0.0148 Epoch: 12 Global Step: 205360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:02:26,810-Speed 5027.32 samples/sec Loss 1.6422 LearningRate 0.0148 Epoch: 12 Global Step: 205370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:02:28,794-Speed 5163.26 samples/sec Loss 1.5614 LearningRate 0.0148 Epoch: 12 Global Step: 205380 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:02:30,812-Speed 5075.91 samples/sec Loss 1.5654 LearningRate 0.0148 Epoch: 12 Global Step: 205390 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:02:32,786-Speed 5188.75 samples/sec Loss 1.5406 LearningRate 0.0148 Epoch: 12 Global Step: 205400 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:02:34,763-Speed 5181.78 samples/sec Loss 1.5677 LearningRate 0.0148 Epoch: 12 Global Step: 205410 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:02:36,741-Speed 5177.26 samples/sec Loss 1.5524 LearningRate 0.0148 Epoch: 12 Global Step: 205420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:02:38,728-Speed 5156.46 samples/sec Loss 1.6195 LearningRate 0.0148 Epoch: 12 Global Step: 205430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:02:40,717-Speed 5151.58 samples/sec Loss 1.6066 LearningRate 0.0148 Epoch: 12 Global Step: 205440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:02:42,711-Speed 5134.80 samples/sec Loss 1.5877 LearningRate 0.0148 Epoch: 12 Global Step: 205450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:02:44,687-Speed 5185.62 samples/sec Loss 1.5683 LearningRate 0.0148 Epoch: 12 Global Step: 205460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:02:46,674-Speed 5153.14 samples/sec Loss 1.6061 LearningRate 0.0148 Epoch: 12 Global Step: 205470 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:02:48,661-Speed 5155.31 samples/sec Loss 1.5606 LearningRate 0.0148 Epoch: 12 Global Step: 205480 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:02:50,640-Speed 5178.05 samples/sec Loss 1.5951 LearningRate 0.0148 Epoch: 12 Global Step: 205490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:02:52,626-Speed 5157.13 samples/sec Loss 1.5644 LearningRate 0.0148 Epoch: 12 Global Step: 205500 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:02:54,602-Speed 5184.89 samples/sec Loss 1.6052 LearningRate 0.0148 Epoch: 12 Global Step: 205510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:02:56,579-Speed 5180.14 samples/sec Loss 1.5883 LearningRate 0.0148 Epoch: 12 Global Step: 205520 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:02:58,591-Speed 5090.91 samples/sec Loss 1.5053 LearningRate 0.0148 Epoch: 12 Global Step: 205530 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:03:00,580-Speed 5150.45 samples/sec Loss 1.6022 LearningRate 0.0148 Epoch: 12 Global Step: 205540 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:03:02,579-Speed 5126.14 samples/sec Loss 1.6143 LearningRate 0.0148 Epoch: 12 Global Step: 205550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:03:04,562-Speed 5165.35 samples/sec Loss 1.6398 LearningRate 0.0148 Epoch: 12 Global Step: 205560 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:03:06,556-Speed 5136.17 samples/sec Loss 1.5851 LearningRate 0.0148 Epoch: 12 Global Step: 205570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:03:08,533-Speed 5181.36 samples/sec Loss 1.5662 LearningRate 0.0148 Epoch: 12 Global Step: 205580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:03:10,519-Speed 5157.67 samples/sec Loss 1.5812 LearningRate 0.0148 Epoch: 12 Global Step: 205590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:03:12,508-Speed 5151.03 samples/sec Loss 1.6085 LearningRate 0.0148 Epoch: 12 Global Step: 205600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:03:14,506-Speed 5126.13 samples/sec Loss 1.5654 LearningRate 0.0148 Epoch: 12 Global Step: 205610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:03:16,496-Speed 5148.06 samples/sec Loss 1.6076 LearningRate 0.0147 Epoch: 12 Global Step: 205620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:03:18,478-Speed 5167.99 samples/sec Loss 1.5811 LearningRate 0.0147 Epoch: 12 Global Step: 205630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:03:20,469-Speed 5144.29 samples/sec Loss 1.6027 LearningRate 0.0147 Epoch: 12 Global Step: 205640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:03:22,456-Speed 5155.97 samples/sec Loss 1.6067 LearningRate 0.0147 Epoch: 12 Global Step: 205650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:03:24,436-Speed 5173.78 samples/sec Loss 1.5484 LearningRate 0.0147 Epoch: 12 Global Step: 205660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:03:26,421-Speed 5160.15 samples/sec Loss 1.5996 LearningRate 0.0147 Epoch: 12 Global Step: 205670 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:03:28,407-Speed 5155.58 samples/sec Loss 1.6271 LearningRate 0.0147 Epoch: 12 Global Step: 205680 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:03:30,381-Speed 5191.76 samples/sec Loss 1.5902 LearningRate 0.0147 Epoch: 12 Global Step: 205690 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:03:32,380-Speed 5121.91 samples/sec Loss 1.6679 LearningRate 0.0147 Epoch: 12 Global Step: 205700 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:03:34,362-Speed 5168.80 samples/sec Loss 1.5438 LearningRate 0.0147 Epoch: 12 Global Step: 205710 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:03:36,357-Speed 5135.09 samples/sec Loss 1.5401 LearningRate 0.0147 Epoch: 12 Global Step: 205720 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:03:38,341-Speed 5163.87 samples/sec Loss 1.5770 LearningRate 0.0147 Epoch: 12 Global Step: 205730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:03:40,344-Speed 5113.44 samples/sec Loss 1.5690 LearningRate 0.0147 Epoch: 12 Global Step: 205740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:03:42,317-Speed 5191.32 samples/sec Loss 1.5839 LearningRate 0.0147 Epoch: 12 Global Step: 205750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:03:44,295-Speed 5178.94 samples/sec Loss 1.5618 LearningRate 0.0147 Epoch: 12 Global Step: 205760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:03:46,276-Speed 5171.42 samples/sec Loss 1.5830 LearningRate 0.0147 Epoch: 12 Global Step: 205770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:03:48,268-Speed 5142.06 samples/sec Loss 1.6228 LearningRate 0.0147 Epoch: 12 Global Step: 205780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:03:50,247-Speed 5175.86 samples/sec Loss 1.5748 LearningRate 0.0147 Epoch: 12 Global Step: 205790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:03:52,220-Speed 5191.61 samples/sec Loss 1.5686 LearningRate 0.0147 Epoch: 12 Global Step: 205800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:03:54,192-Speed 5196.14 samples/sec Loss 1.6714 LearningRate 0.0147 Epoch: 12 Global Step: 205810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:03:56,186-Speed 5135.81 samples/sec Loss 1.5713 LearningRate 0.0147 Epoch: 12 Global Step: 205820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:03:58,175-Speed 5149.19 samples/sec Loss 1.5995 LearningRate 0.0147 Epoch: 12 Global Step: 205830 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:04:00,197-Speed 5067.06 samples/sec Loss 1.5936 LearningRate 0.0147 Epoch: 12 Global Step: 205840 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:04:02,176-Speed 5177.55 samples/sec Loss 1.5795 LearningRate 0.0147 Epoch: 12 Global Step: 205850 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:04:04,150-Speed 5187.18 samples/sec Loss 1.6259 LearningRate 0.0147 Epoch: 12 Global Step: 205860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:04:06,124-Speed 5191.16 samples/sec Loss 1.5662 LearningRate 0.0147 Epoch: 12 Global Step: 205870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:04:08,103-Speed 5173.84 samples/sec Loss 1.5840 LearningRate 0.0147 Epoch: 12 Global Step: 205880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:04:10,110-Speed 5104.44 samples/sec Loss 1.5551 LearningRate 0.0147 Epoch: 12 Global Step: 205890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:04:12,089-Speed 5175.53 samples/sec Loss 1.5942 LearningRate 0.0147 Epoch: 12 Global Step: 205900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:04:14,065-Speed 5183.45 samples/sec Loss 1.5664 LearningRate 0.0147 Epoch: 12 Global Step: 205910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:04:16,038-Speed 5194.11 samples/sec Loss 1.6111 LearningRate 0.0147 Epoch: 12 Global Step: 205920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:04:18,030-Speed 5141.20 samples/sec Loss 1.5767 LearningRate 0.0147 Epoch: 12 Global Step: 205930 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:04:20,003-Speed 5191.28 samples/sec Loss 1.6518 LearningRate 0.0147 Epoch: 12 Global Step: 205940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:04:21,979-Speed 5183.41 samples/sec Loss 1.6113 LearningRate 0.0147 Epoch: 12 Global Step: 205950 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:04:23,974-Speed 5136.16 samples/sec Loss 1.5827 LearningRate 0.0147 Epoch: 12 Global Step: 205960 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:04:25,961-Speed 5156.14 samples/sec Loss 1.5860 LearningRate 0.0147 Epoch: 12 Global Step: 205970 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:04:27,981-Speed 5070.61 samples/sec Loss 1.5226 LearningRate 0.0147 Epoch: 12 Global Step: 205980 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:04:29,968-Speed 5155.69 samples/sec Loss 1.5946 LearningRate 0.0147 Epoch: 12 Global Step: 205990 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:04:31,957-Speed 5148.77 samples/sec Loss 1.5249 LearningRate 0.0147 Epoch: 12 Global Step: 206000 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:04:58,741-[lfw][206000]XNorm: 21.410842 Training: 2022-04-11 13:04:58,741-[lfw][206000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 13:04:58,742-[lfw][206000]Accuracy-Highest: 0.99833 Training: 2022-04-11 13:05:29,689-[cfp_fp][206000]XNorm: 20.774150 Training: 2022-04-11 13:05:29,690-[cfp_fp][206000]Accuracy-Flip: 0.98643+-0.00462 Training: 2022-04-11 13:05:29,690-[cfp_fp][206000]Accuracy-Highest: 0.98757 Training: 2022-04-11 13:05:56,454-[agedb_30][206000]XNorm: 22.048304 Training: 2022-04-11 13:05:56,454-[agedb_30][206000]Accuracy-Flip: 0.98083+-0.00814 Training: 2022-04-11 13:05:56,455-[agedb_30][206000]Accuracy-Highest: 0.98250 Training: 2022-04-11 13:05:58,436-Speed 118.41 samples/sec Loss 1.6127 LearningRate 0.0147 Epoch: 12 Global Step: 206010 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:06:00,405-Speed 5202.74 samples/sec Loss 1.5990 LearningRate 0.0147 Epoch: 12 Global Step: 206020 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:06:02,403-Speed 5127.06 samples/sec Loss 1.5653 LearningRate 0.0147 Epoch: 12 Global Step: 206030 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:06:04,392-Speed 5150.36 samples/sec Loss 1.5703 LearningRate 0.0147 Epoch: 12 Global Step: 206040 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:06:06,379-Speed 5153.31 samples/sec Loss 1.5732 LearningRate 0.0146 Epoch: 12 Global Step: 206050 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:06:08,337-Speed 5232.05 samples/sec Loss 1.5857 LearningRate 0.0146 Epoch: 12 Global Step: 206060 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:06:10,314-Speed 5181.41 samples/sec Loss 1.5484 LearningRate 0.0146 Epoch: 12 Global Step: 206070 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:06:12,285-Speed 5197.55 samples/sec Loss 1.5533 LearningRate 0.0146 Epoch: 12 Global Step: 206080 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:06:14,263-Speed 5179.39 samples/sec Loss 1.5652 LearningRate 0.0146 Epoch: 12 Global Step: 206090 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:06:16,248-Speed 5159.65 samples/sec Loss 1.5840 LearningRate 0.0146 Epoch: 12 Global Step: 206100 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:06:18,218-Speed 5199.11 samples/sec Loss 1.5797 LearningRate 0.0146 Epoch: 12 Global Step: 206110 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:06:20,190-Speed 5195.04 samples/sec Loss 1.5915 LearningRate 0.0146 Epoch: 12 Global Step: 206120 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:06:22,187-Speed 5129.15 samples/sec Loss 1.5974 LearningRate 0.0146 Epoch: 12 Global Step: 206130 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:06:24,180-Speed 5140.50 samples/sec Loss 1.5991 LearningRate 0.0146 Epoch: 12 Global Step: 206140 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:06:26,163-Speed 5163.93 samples/sec Loss 1.6448 LearningRate 0.0146 Epoch: 12 Global Step: 206150 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:06:28,143-Speed 5174.21 samples/sec Loss 1.6660 LearningRate 0.0146 Epoch: 12 Global Step: 206160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:06:30,145-Speed 5118.58 samples/sec Loss 1.5783 LearningRate 0.0146 Epoch: 12 Global Step: 206170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:06:32,129-Speed 5162.07 samples/sec Loss 1.6190 LearningRate 0.0146 Epoch: 12 Global Step: 206180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:06:34,116-Speed 5153.61 samples/sec Loss 1.6093 LearningRate 0.0146 Epoch: 12 Global Step: 206190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:06:36,111-Speed 5137.43 samples/sec Loss 1.6265 LearningRate 0.0146 Epoch: 12 Global Step: 206200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:06:38,112-Speed 5117.64 samples/sec Loss 1.6055 LearningRate 0.0146 Epoch: 12 Global Step: 206210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:06:40,098-Speed 5159.91 samples/sec Loss 1.5568 LearningRate 0.0146 Epoch: 12 Global Step: 206220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:06:42,071-Speed 5192.61 samples/sec Loss 1.5846 LearningRate 0.0146 Epoch: 12 Global Step: 206230 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:06:44,045-Speed 5188.15 samples/sec Loss 1.6132 LearningRate 0.0146 Epoch: 12 Global Step: 206240 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:06:46,030-Speed 5158.75 samples/sec Loss 1.5875 LearningRate 0.0146 Epoch: 12 Global Step: 206250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:06:48,006-Speed 5185.20 samples/sec Loss 1.6167 LearningRate 0.0146 Epoch: 12 Global Step: 206260 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:06:50,018-Speed 5094.06 samples/sec Loss 1.6039 LearningRate 0.0146 Epoch: 12 Global Step: 206270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:06:52,044-Speed 5054.95 samples/sec Loss 1.5779 LearningRate 0.0146 Epoch: 12 Global Step: 206280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:06:54,032-Speed 5153.37 samples/sec Loss 1.5762 LearningRate 0.0146 Epoch: 12 Global Step: 206290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:06:56,019-Speed 5153.71 samples/sec Loss 1.5143 LearningRate 0.0146 Epoch: 12 Global Step: 206300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:06:58,005-Speed 5159.29 samples/sec Loss 1.5827 LearningRate 0.0146 Epoch: 12 Global Step: 206310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:06:59,979-Speed 5189.58 samples/sec Loss 1.5722 LearningRate 0.0146 Epoch: 12 Global Step: 206320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:07:01,974-Speed 5134.37 samples/sec Loss 1.5911 LearningRate 0.0146 Epoch: 12 Global Step: 206330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:07:03,953-Speed 5176.72 samples/sec Loss 1.6502 LearningRate 0.0146 Epoch: 12 Global Step: 206340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:07:05,928-Speed 5185.86 samples/sec Loss 1.6109 LearningRate 0.0146 Epoch: 12 Global Step: 206350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:07:07,930-Speed 5117.33 samples/sec Loss 1.5891 LearningRate 0.0146 Epoch: 12 Global Step: 206360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:07:09,909-Speed 5174.73 samples/sec Loss 1.5853 LearningRate 0.0146 Epoch: 12 Global Step: 206370 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:07:11,887-Speed 5180.00 samples/sec Loss 1.6218 LearningRate 0.0146 Epoch: 12 Global Step: 206380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:07:13,867-Speed 5172.37 samples/sec Loss 1.5684 LearningRate 0.0146 Epoch: 12 Global Step: 206390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:07:15,865-Speed 5127.33 samples/sec Loss 1.5927 LearningRate 0.0146 Epoch: 12 Global Step: 206400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:07:17,845-Speed 5173.10 samples/sec Loss 1.5769 LearningRate 0.0146 Epoch: 12 Global Step: 206410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:07:19,822-Speed 5181.76 samples/sec Loss 1.6157 LearningRate 0.0146 Epoch: 12 Global Step: 206420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:07:21,805-Speed 5164.56 samples/sec Loss 1.5644 LearningRate 0.0146 Epoch: 12 Global Step: 206430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:07:23,783-Speed 5180.56 samples/sec Loss 1.5873 LearningRate 0.0146 Epoch: 12 Global Step: 206440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:07:25,782-Speed 5123.30 samples/sec Loss 1.5776 LearningRate 0.0146 Epoch: 12 Global Step: 206450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:07:27,765-Speed 5166.40 samples/sec Loss 1.5655 LearningRate 0.0146 Epoch: 12 Global Step: 206460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:07:29,764-Speed 5124.84 samples/sec Loss 1.5737 LearningRate 0.0146 Epoch: 12 Global Step: 206470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:07:31,735-Speed 5197.54 samples/sec Loss 1.5494 LearningRate 0.0146 Epoch: 12 Global Step: 206480 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:07:33,712-Speed 5179.40 samples/sec Loss 1.5901 LearningRate 0.0145 Epoch: 12 Global Step: 206490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:07:35,692-Speed 5173.76 samples/sec Loss 1.6127 LearningRate 0.0145 Epoch: 12 Global Step: 206500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:07:37,693-Speed 5119.65 samples/sec Loss 1.6254 LearningRate 0.0145 Epoch: 12 Global Step: 206510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:07:39,677-Speed 5162.30 samples/sec Loss 1.6053 LearningRate 0.0145 Epoch: 12 Global Step: 206520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:07:41,665-Speed 5151.58 samples/sec Loss 1.6631 LearningRate 0.0145 Epoch: 12 Global Step: 206530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:07:43,654-Speed 5149.08 samples/sec Loss 1.5870 LearningRate 0.0145 Epoch: 12 Global Step: 206540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:07:45,649-Speed 5135.55 samples/sec Loss 1.5617 LearningRate 0.0145 Epoch: 12 Global Step: 206550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:07:47,636-Speed 5154.55 samples/sec Loss 1.6317 LearningRate 0.0145 Epoch: 12 Global Step: 206560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:07:49,609-Speed 5192.84 samples/sec Loss 1.6016 LearningRate 0.0145 Epoch: 12 Global Step: 206570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:07:51,583-Speed 5188.75 samples/sec Loss 1.5693 LearningRate 0.0145 Epoch: 12 Global Step: 206580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:07:53,569-Speed 5158.22 samples/sec Loss 1.6200 LearningRate 0.0145 Epoch: 12 Global Step: 206590 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:07:55,548-Speed 5177.08 samples/sec Loss 1.6418 LearningRate 0.0145 Epoch: 12 Global Step: 206600 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:07:57,534-Speed 5157.24 samples/sec Loss 1.5736 LearningRate 0.0145 Epoch: 12 Global Step: 206610 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:07:59,526-Speed 5141.90 samples/sec Loss 1.5858 LearningRate 0.0145 Epoch: 12 Global Step: 206620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:08:01,518-Speed 5141.56 samples/sec Loss 1.6416 LearningRate 0.0145 Epoch: 12 Global Step: 206630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:08:03,500-Speed 5168.26 samples/sec Loss 1.6403 LearningRate 0.0145 Epoch: 12 Global Step: 206640 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:08:05,501-Speed 5120.14 samples/sec Loss 1.6021 LearningRate 0.0145 Epoch: 12 Global Step: 206650 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:08:07,484-Speed 5165.61 samples/sec Loss 1.6716 LearningRate 0.0145 Epoch: 12 Global Step: 206660 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:08:09,484-Speed 5122.85 samples/sec Loss 1.6303 LearningRate 0.0145 Epoch: 12 Global Step: 206670 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:08:11,478-Speed 5136.41 samples/sec Loss 1.5689 LearningRate 0.0145 Epoch: 12 Global Step: 206680 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:08:13,450-Speed 5194.52 samples/sec Loss 1.6355 LearningRate 0.0145 Epoch: 12 Global Step: 206690 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:08:15,424-Speed 5188.61 samples/sec Loss 1.6412 LearningRate 0.0145 Epoch: 12 Global Step: 206700 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:08:17,392-Speed 5205.47 samples/sec Loss 1.6265 LearningRate 0.0145 Epoch: 12 Global Step: 206710 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:08:19,362-Speed 5198.82 samples/sec Loss 1.6224 LearningRate 0.0145 Epoch: 12 Global Step: 206720 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:08:21,351-Speed 5150.07 samples/sec Loss 1.6765 LearningRate 0.0145 Epoch: 12 Global Step: 206730 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:08:23,334-Speed 5167.23 samples/sec Loss 1.5760 LearningRate 0.0145 Epoch: 12 Global Step: 206740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:08:25,319-Speed 5161.02 samples/sec Loss 1.6572 LearningRate 0.0145 Epoch: 12 Global Step: 206750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:08:27,322-Speed 5112.36 samples/sec Loss 1.6043 LearningRate 0.0145 Epoch: 12 Global Step: 206760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:08:29,296-Speed 5189.66 samples/sec Loss 1.6355 LearningRate 0.0145 Epoch: 12 Global Step: 206770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:08:31,283-Speed 5155.90 samples/sec Loss 1.5766 LearningRate 0.0145 Epoch: 12 Global Step: 206780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:08:33,265-Speed 5165.53 samples/sec Loss 1.6168 LearningRate 0.0145 Epoch: 12 Global Step: 206790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:08:35,278-Speed 5088.51 samples/sec Loss 1.6020 LearningRate 0.0145 Epoch: 12 Global Step: 206800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:08:37,281-Speed 5116.10 samples/sec Loss 1.6096 LearningRate 0.0145 Epoch: 12 Global Step: 206810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:08:39,277-Speed 5132.38 samples/sec Loss 1.6068 LearningRate 0.0145 Epoch: 12 Global Step: 206820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:08:41,261-Speed 5162.09 samples/sec Loss 1.6317 LearningRate 0.0145 Epoch: 12 Global Step: 206830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:08:43,234-Speed 5192.72 samples/sec Loss 1.5731 LearningRate 0.0145 Epoch: 12 Global Step: 206840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:08:45,245-Speed 5093.27 samples/sec Loss 1.5998 LearningRate 0.0145 Epoch: 12 Global Step: 206850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:08:47,218-Speed 5191.95 samples/sec Loss 1.5869 LearningRate 0.0145 Epoch: 12 Global Step: 206860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:08:49,203-Speed 5160.11 samples/sec Loss 1.6186 LearningRate 0.0145 Epoch: 12 Global Step: 206870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:08:51,209-Speed 5106.73 samples/sec Loss 1.6065 LearningRate 0.0145 Epoch: 12 Global Step: 206880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:08:53,184-Speed 5186.25 samples/sec Loss 1.5914 LearningRate 0.0145 Epoch: 12 Global Step: 206890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:08:55,179-Speed 5134.64 samples/sec Loss 1.6185 LearningRate 0.0145 Epoch: 12 Global Step: 206900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:08:57,168-Speed 5148.30 samples/sec Loss 1.5508 LearningRate 0.0145 Epoch: 12 Global Step: 206910 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:08:59,145-Speed 5182.98 samples/sec Loss 1.6049 LearningRate 0.0145 Epoch: 12 Global Step: 206920 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:09:01,135-Speed 5147.50 samples/sec Loss 1.5585 LearningRate 0.0144 Epoch: 12 Global Step: 206930 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:09:03,108-Speed 5192.66 samples/sec Loss 1.6327 LearningRate 0.0144 Epoch: 12 Global Step: 206940 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:09:05,107-Speed 5123.52 samples/sec Loss 1.5990 LearningRate 0.0144 Epoch: 12 Global Step: 206950 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:09:07,083-Speed 5182.68 samples/sec Loss 1.5923 LearningRate 0.0144 Epoch: 12 Global Step: 206960 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:09:09,072-Speed 5150.78 samples/sec Loss 1.6306 LearningRate 0.0144 Epoch: 12 Global Step: 206970 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:09:11,061-Speed 5151.54 samples/sec Loss 1.6138 LearningRate 0.0144 Epoch: 12 Global Step: 206980 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:09:13,042-Speed 5170.18 samples/sec Loss 1.5921 LearningRate 0.0144 Epoch: 12 Global Step: 206990 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:09:15,025-Speed 5166.10 samples/sec Loss 1.6274 LearningRate 0.0144 Epoch: 12 Global Step: 207000 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-11 13:09:17,015-Speed 5147.05 samples/sec Loss 1.5993 LearningRate 0.0144 Epoch: 12 Global Step: 207010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:09:18,998-Speed 5164.78 samples/sec Loss 1.6309 LearningRate 0.0144 Epoch: 12 Global Step: 207020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:09:21,000-Speed 5116.14 samples/sec Loss 1.6380 LearningRate 0.0144 Epoch: 12 Global Step: 207030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:09:22,989-Speed 5150.28 samples/sec Loss 1.6251 LearningRate 0.0144 Epoch: 12 Global Step: 207040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:09:24,998-Speed 5098.81 samples/sec Loss 1.6265 LearningRate 0.0144 Epoch: 12 Global Step: 207050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:09:27,005-Speed 5105.61 samples/sec Loss 1.5968 LearningRate 0.0144 Epoch: 12 Global Step: 207060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:09:28,984-Speed 5176.01 samples/sec Loss 1.6155 LearningRate 0.0144 Epoch: 12 Global Step: 207070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:09:30,973-Speed 5148.51 samples/sec Loss 1.6164 LearningRate 0.0144 Epoch: 12 Global Step: 207080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:09:32,974-Speed 5119.18 samples/sec Loss 1.6073 LearningRate 0.0144 Epoch: 12 Global Step: 207090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:09:34,968-Speed 5137.42 samples/sec Loss 1.6523 LearningRate 0.0144 Epoch: 12 Global Step: 207100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:09:36,946-Speed 5176.83 samples/sec Loss 1.5902 LearningRate 0.0144 Epoch: 12 Global Step: 207110 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:09:38,952-Speed 5107.40 samples/sec Loss 1.6336 LearningRate 0.0144 Epoch: 12 Global Step: 207120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:09:40,928-Speed 5183.25 samples/sec Loss 1.6220 LearningRate 0.0144 Epoch: 12 Global Step: 207130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:09:42,904-Speed 5184.26 samples/sec Loss 1.5905 LearningRate 0.0144 Epoch: 12 Global Step: 207140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:09:44,890-Speed 5157.74 samples/sec Loss 1.6077 LearningRate 0.0144 Epoch: 12 Global Step: 207150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:09:46,868-Speed 5180.48 samples/sec Loss 1.5889 LearningRate 0.0144 Epoch: 12 Global Step: 207160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:09:48,852-Speed 5161.47 samples/sec Loss 1.6537 LearningRate 0.0144 Epoch: 12 Global Step: 207170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:09:50,826-Speed 5189.48 samples/sec Loss 1.6312 LearningRate 0.0144 Epoch: 12 Global Step: 207180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:09:52,809-Speed 5165.18 samples/sec Loss 1.5932 LearningRate 0.0144 Epoch: 12 Global Step: 207190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:09:54,795-Speed 5158.04 samples/sec Loss 1.6044 LearningRate 0.0144 Epoch: 12 Global Step: 207200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:09:56,786-Speed 5145.09 samples/sec Loss 1.6604 LearningRate 0.0144 Epoch: 12 Global Step: 207210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:09:58,772-Speed 5159.15 samples/sec Loss 1.5635 LearningRate 0.0144 Epoch: 12 Global Step: 207220 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-11 13:10:00,769-Speed 5128.85 samples/sec Loss 1.5591 LearningRate 0.0144 Epoch: 12 Global Step: 207230 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-11 13:10:02,780-Speed 5093.34 samples/sec Loss 1.6389 LearningRate 0.0144 Epoch: 12 Global Step: 207240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:10:04,773-Speed 5138.47 samples/sec Loss 1.5869 LearningRate 0.0144 Epoch: 12 Global Step: 207250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:10:06,763-Speed 5146.60 samples/sec Loss 1.6070 LearningRate 0.0144 Epoch: 12 Global Step: 207260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:10:08,738-Speed 5188.92 samples/sec Loss 1.6619 LearningRate 0.0144 Epoch: 12 Global Step: 207270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:10:10,708-Speed 5198.21 samples/sec Loss 1.6168 LearningRate 0.0144 Epoch: 12 Global Step: 207280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:10:12,716-Speed 5101.22 samples/sec Loss 1.6079 LearningRate 0.0144 Epoch: 12 Global Step: 207290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:10:14,725-Speed 5099.22 samples/sec Loss 1.6590 LearningRate 0.0144 Epoch: 12 Global Step: 207300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:10:16,696-Speed 5199.32 samples/sec Loss 1.5909 LearningRate 0.0144 Epoch: 12 Global Step: 207310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:10:18,674-Speed 5178.09 samples/sec Loss 1.6459 LearningRate 0.0144 Epoch: 12 Global Step: 207320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:10:20,656-Speed 5167.49 samples/sec Loss 1.6428 LearningRate 0.0144 Epoch: 12 Global Step: 207330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:10:22,647-Speed 5143.19 samples/sec Loss 1.5937 LearningRate 0.0144 Epoch: 12 Global Step: 207340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:10:24,620-Speed 5194.31 samples/sec Loss 1.6220 LearningRate 0.0144 Epoch: 12 Global Step: 207350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:10:26,618-Speed 5126.22 samples/sec Loss 1.6201 LearningRate 0.0144 Epoch: 12 Global Step: 207360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:10:28,601-Speed 5163.76 samples/sec Loss 1.6314 LearningRate 0.0143 Epoch: 12 Global Step: 207370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:10:30,602-Speed 5121.03 samples/sec Loss 1.6171 LearningRate 0.0143 Epoch: 12 Global Step: 207380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:10:32,575-Speed 5191.64 samples/sec Loss 1.6435 LearningRate 0.0143 Epoch: 12 Global Step: 207390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:10:34,547-Speed 5194.14 samples/sec Loss 1.6469 LearningRate 0.0143 Epoch: 12 Global Step: 207400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:10:36,549-Speed 5115.91 samples/sec Loss 1.6118 LearningRate 0.0143 Epoch: 12 Global Step: 207410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:10:38,541-Speed 5143.07 samples/sec Loss 1.6097 LearningRate 0.0143 Epoch: 12 Global Step: 207420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:10:40,519-Speed 5178.25 samples/sec Loss 1.5787 LearningRate 0.0143 Epoch: 12 Global Step: 207430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:10:42,507-Speed 5153.65 samples/sec Loss 1.5719 LearningRate 0.0143 Epoch: 12 Global Step: 207440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:10:44,500-Speed 5137.79 samples/sec Loss 1.6432 LearningRate 0.0143 Epoch: 12 Global Step: 207450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:10:46,476-Speed 5184.24 samples/sec Loss 1.6113 LearningRate 0.0143 Epoch: 12 Global Step: 207460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:10:48,475-Speed 5124.57 samples/sec Loss 1.6136 LearningRate 0.0143 Epoch: 12 Global Step: 207470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:10:50,467-Speed 5142.98 samples/sec Loss 1.6103 LearningRate 0.0143 Epoch: 12 Global Step: 207480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:10:52,444-Speed 5184.47 samples/sec Loss 1.6571 LearningRate 0.0143 Epoch: 12 Global Step: 207490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:10:54,430-Speed 5156.08 samples/sec Loss 1.5814 LearningRate 0.0143 Epoch: 12 Global Step: 207500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:10:56,414-Speed 5165.05 samples/sec Loss 1.6388 LearningRate 0.0143 Epoch: 12 Global Step: 207510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:10:58,409-Speed 5132.93 samples/sec Loss 1.6247 LearningRate 0.0143 Epoch: 12 Global Step: 207520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:11:00,391-Speed 5167.96 samples/sec Loss 1.5924 LearningRate 0.0143 Epoch: 12 Global Step: 207530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:11:02,375-Speed 5165.53 samples/sec Loss 1.5707 LearningRate 0.0143 Epoch: 12 Global Step: 207540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:11:04,352-Speed 5180.13 samples/sec Loss 1.6412 LearningRate 0.0143 Epoch: 12 Global Step: 207550 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:11:06,332-Speed 5173.82 samples/sec Loss 1.6091 LearningRate 0.0143 Epoch: 12 Global Step: 207560 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:11:08,321-Speed 5148.78 samples/sec Loss 1.6105 LearningRate 0.0143 Epoch: 12 Global Step: 207570 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:11:10,305-Speed 5164.63 samples/sec Loss 1.6475 LearningRate 0.0143 Epoch: 12 Global Step: 207580 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:11:12,289-Speed 5162.14 samples/sec Loss 1.6045 LearningRate 0.0143 Epoch: 12 Global Step: 207590 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:11:14,277-Speed 5151.56 samples/sec Loss 1.6531 LearningRate 0.0143 Epoch: 12 Global Step: 207600 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:11:16,275-Speed 5127.14 samples/sec Loss 1.6335 LearningRate 0.0143 Epoch: 12 Global Step: 207610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:11:18,254-Speed 5176.89 samples/sec Loss 1.5492 LearningRate 0.0143 Epoch: 12 Global Step: 207620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:11:20,249-Speed 5132.95 samples/sec Loss 1.6200 LearningRate 0.0143 Epoch: 12 Global Step: 207630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:11:22,224-Speed 5188.04 samples/sec Loss 1.6351 LearningRate 0.0143 Epoch: 12 Global Step: 207640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:11:24,209-Speed 5160.40 samples/sec Loss 1.6692 LearningRate 0.0143 Epoch: 12 Global Step: 207650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:11:26,206-Speed 5131.16 samples/sec Loss 1.6338 LearningRate 0.0143 Epoch: 12 Global Step: 207660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:11:28,184-Speed 5177.96 samples/sec Loss 1.6560 LearningRate 0.0143 Epoch: 12 Global Step: 207670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:11:30,160-Speed 5182.60 samples/sec Loss 1.6597 LearningRate 0.0143 Epoch: 12 Global Step: 207680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:11:32,137-Speed 5182.42 samples/sec Loss 1.6445 LearningRate 0.0143 Epoch: 12 Global Step: 207690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:11:34,119-Speed 5166.47 samples/sec Loss 1.6588 LearningRate 0.0143 Epoch: 12 Global Step: 207700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:11:36,113-Speed 5137.59 samples/sec Loss 1.5727 LearningRate 0.0143 Epoch: 12 Global Step: 207710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:11:38,110-Speed 5129.37 samples/sec Loss 1.6209 LearningRate 0.0143 Epoch: 12 Global Step: 207720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:11:40,096-Speed 5157.81 samples/sec Loss 1.6141 LearningRate 0.0143 Epoch: 12 Global Step: 207730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:11:42,082-Speed 5158.52 samples/sec Loss 1.6463 LearningRate 0.0143 Epoch: 12 Global Step: 207740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:11:44,061-Speed 5173.77 samples/sec Loss 1.6197 LearningRate 0.0143 Epoch: 12 Global Step: 207750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:11:46,049-Speed 5154.53 samples/sec Loss 1.6162 LearningRate 0.0143 Epoch: 12 Global Step: 207760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:11:48,023-Speed 5189.97 samples/sec Loss 1.6495 LearningRate 0.0143 Epoch: 12 Global Step: 207770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:11:50,023-Speed 5122.37 samples/sec Loss 1.6780 LearningRate 0.0143 Epoch: 12 Global Step: 207780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:11:52,018-Speed 5132.34 samples/sec Loss 1.5940 LearningRate 0.0143 Epoch: 12 Global Step: 207790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:11:53,995-Speed 5182.33 samples/sec Loss 1.6531 LearningRate 0.0143 Epoch: 12 Global Step: 207800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:11:55,980-Speed 5161.42 samples/sec Loss 1.6384 LearningRate 0.0142 Epoch: 12 Global Step: 207810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:11:57,960-Speed 5173.01 samples/sec Loss 1.6214 LearningRate 0.0142 Epoch: 12 Global Step: 207820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:11:59,935-Speed 5185.32 samples/sec Loss 1.6545 LearningRate 0.0142 Epoch: 12 Global Step: 207830 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:12:01,915-Speed 5173.18 samples/sec Loss 1.5637 LearningRate 0.0142 Epoch: 12 Global Step: 207840 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:12:03,900-Speed 5159.72 samples/sec Loss 1.5922 LearningRate 0.0142 Epoch: 12 Global Step: 207850 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:12:05,890-Speed 5149.50 samples/sec Loss 1.6115 LearningRate 0.0142 Epoch: 12 Global Step: 207860 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:12:07,862-Speed 5193.16 samples/sec Loss 1.6077 LearningRate 0.0142 Epoch: 12 Global Step: 207870 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:12:09,834-Speed 5195.79 samples/sec Loss 1.6051 LearningRate 0.0142 Epoch: 12 Global Step: 207880 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:12:11,815-Speed 5170.09 samples/sec Loss 1.6599 LearningRate 0.0142 Epoch: 12 Global Step: 207890 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:12:13,790-Speed 5186.19 samples/sec Loss 1.5759 LearningRate 0.0142 Epoch: 12 Global Step: 207900 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:12:15,775-Speed 5162.52 samples/sec Loss 1.6389 LearningRate 0.0142 Epoch: 12 Global Step: 207910 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:12:17,753-Speed 5177.39 samples/sec Loss 1.6598 LearningRate 0.0142 Epoch: 12 Global Step: 207920 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:12:19,729-Speed 5182.89 samples/sec Loss 1.6203 LearningRate 0.0142 Epoch: 12 Global Step: 207930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:12:21,713-Speed 5163.87 samples/sec Loss 1.6350 LearningRate 0.0142 Epoch: 12 Global Step: 207940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:12:23,710-Speed 5128.56 samples/sec Loss 1.6294 LearningRate 0.0142 Epoch: 12 Global Step: 207950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:12:25,722-Speed 5092.21 samples/sec Loss 1.6402 LearningRate 0.0142 Epoch: 12 Global Step: 207960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:12:27,705-Speed 5166.75 samples/sec Loss 1.6259 LearningRate 0.0142 Epoch: 12 Global Step: 207970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:12:29,679-Speed 5188.37 samples/sec Loss 1.5797 LearningRate 0.0142 Epoch: 12 Global Step: 207980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:12:31,654-Speed 5184.76 samples/sec Loss 1.6620 LearningRate 0.0142 Epoch: 12 Global Step: 207990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:12:33,659-Speed 5111.55 samples/sec Loss 1.6555 LearningRate 0.0142 Epoch: 12 Global Step: 208000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:13:00,408-[lfw][208000]XNorm: 23.646676 Training: 2022-04-11 13:13:00,408-[lfw][208000]Accuracy-Flip: 0.99800+-0.00277 Training: 2022-04-11 13:13:00,409-[lfw][208000]Accuracy-Highest: 0.99833 Training: 2022-04-11 13:13:31,363-[cfp_fp][208000]XNorm: 22.600406 Training: 2022-04-11 13:13:31,363-[cfp_fp][208000]Accuracy-Flip: 0.98657+-0.00405 Training: 2022-04-11 13:13:31,364-[cfp_fp][208000]Accuracy-Highest: 0.98757 Training: 2022-04-11 13:13:58,074-[agedb_30][208000]XNorm: 23.917921 Training: 2022-04-11 13:13:58,075-[agedb_30][208000]Accuracy-Flip: 0.98100+-0.00757 Training: 2022-04-11 13:13:58,075-[agedb_30][208000]Accuracy-Highest: 0.98250 Training: 2022-04-11 13:14:00,067-Speed 118.51 samples/sec Loss 1.6320 LearningRate 0.0142 Epoch: 12 Global Step: 208010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:14:02,038-Speed 5197.77 samples/sec Loss 1.6485 LearningRate 0.0142 Epoch: 12 Global Step: 208020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:14:04,034-Speed 5131.29 samples/sec Loss 1.5816 LearningRate 0.0142 Epoch: 12 Global Step: 208030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:14:05,990-Speed 5236.88 samples/sec Loss 1.6340 LearningRate 0.0142 Epoch: 12 Global Step: 208040 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:14:07,971-Speed 5170.90 samples/sec Loss 1.6297 LearningRate 0.0142 Epoch: 12 Global Step: 208050 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:14:09,949-Speed 5178.54 samples/sec Loss 1.6300 LearningRate 0.0142 Epoch: 12 Global Step: 208060 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:14:11,940-Speed 5144.68 samples/sec Loss 1.6583 LearningRate 0.0142 Epoch: 12 Global Step: 208070 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:14:13,911-Speed 5197.74 samples/sec Loss 1.6328 LearningRate 0.0142 Epoch: 12 Global Step: 208080 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:14:15,878-Speed 5207.80 samples/sec Loss 1.6678 LearningRate 0.0142 Epoch: 12 Global Step: 208090 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:14:17,849-Speed 5197.01 samples/sec Loss 1.6237 LearningRate 0.0142 Epoch: 12 Global Step: 208100 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:14:19,827-Speed 5177.40 samples/sec Loss 1.6221 LearningRate 0.0142 Epoch: 12 Global Step: 208110 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:14:21,797-Speed 5199.65 samples/sec Loss 1.6030 LearningRate 0.0142 Epoch: 12 Global Step: 208120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:14:23,775-Speed 5177.45 samples/sec Loss 1.6137 LearningRate 0.0142 Epoch: 12 Global Step: 208130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:14:25,770-Speed 5136.33 samples/sec Loss 1.6757 LearningRate 0.0142 Epoch: 12 Global Step: 208140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:14:27,771-Speed 5118.70 samples/sec Loss 1.6011 LearningRate 0.0142 Epoch: 12 Global Step: 208150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:14:29,745-Speed 5189.22 samples/sec Loss 1.5929 LearningRate 0.0142 Epoch: 12 Global Step: 208160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:14:31,717-Speed 5194.33 samples/sec Loss 1.6231 LearningRate 0.0142 Epoch: 12 Global Step: 208170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:14:33,713-Speed 5133.26 samples/sec Loss 1.5976 LearningRate 0.0142 Epoch: 12 Global Step: 208180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:14:35,697-Speed 5163.08 samples/sec Loss 1.6555 LearningRate 0.0142 Epoch: 12 Global Step: 208190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:14:37,686-Speed 5147.94 samples/sec Loss 1.6018 LearningRate 0.0142 Epoch: 12 Global Step: 208200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:14:39,669-Speed 5165.74 samples/sec Loss 1.6029 LearningRate 0.0142 Epoch: 12 Global Step: 208210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:14:41,681-Speed 5091.52 samples/sec Loss 1.6434 LearningRate 0.0142 Epoch: 12 Global Step: 208220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:14:43,674-Speed 5140.06 samples/sec Loss 1.6963 LearningRate 0.0142 Epoch: 12 Global Step: 208230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:14:45,664-Speed 5146.24 samples/sec Loss 1.6404 LearningRate 0.0142 Epoch: 12 Global Step: 208240 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:14:47,645-Speed 5175.09 samples/sec Loss 1.6351 LearningRate 0.0141 Epoch: 12 Global Step: 208250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:14:49,630-Speed 5159.10 samples/sec Loss 1.6322 LearningRate 0.0141 Epoch: 12 Global Step: 208260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:14:51,637-Speed 5105.55 samples/sec Loss 1.6811 LearningRate 0.0141 Epoch: 12 Global Step: 208270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:14:53,632-Speed 5134.61 samples/sec Loss 1.6179 LearningRate 0.0141 Epoch: 12 Global Step: 208280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:14:55,623-Speed 5143.61 samples/sec Loss 1.6096 LearningRate 0.0141 Epoch: 12 Global Step: 208290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:14:57,594-Speed 5198.93 samples/sec Loss 1.6066 LearningRate 0.0141 Epoch: 12 Global Step: 208300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:14:59,577-Speed 5165.36 samples/sec Loss 1.6360 LearningRate 0.0141 Epoch: 12 Global Step: 208310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:15:01,564-Speed 5153.01 samples/sec Loss 1.6696 LearningRate 0.0141 Epoch: 12 Global Step: 208320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:15:03,555-Speed 5144.91 samples/sec Loss 1.6317 LearningRate 0.0141 Epoch: 12 Global Step: 208330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:15:05,544-Speed 5150.96 samples/sec Loss 1.6443 LearningRate 0.0141 Epoch: 12 Global Step: 208340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:15:07,525-Speed 5170.43 samples/sec Loss 1.6216 LearningRate 0.0141 Epoch: 12 Global Step: 208350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:15:09,516-Speed 5146.59 samples/sec Loss 1.6108 LearningRate 0.0141 Epoch: 12 Global Step: 208360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:15:11,493-Speed 5179.68 samples/sec Loss 1.6109 LearningRate 0.0141 Epoch: 12 Global Step: 208370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:15:13,499-Speed 5107.12 samples/sec Loss 1.6279 LearningRate 0.0141 Epoch: 12 Global Step: 208380 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:15:15,491-Speed 5141.64 samples/sec Loss 1.6752 LearningRate 0.0141 Epoch: 12 Global Step: 208390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:15:17,462-Speed 5199.40 samples/sec Loss 1.6597 LearningRate 0.0141 Epoch: 12 Global Step: 208400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:15:19,434-Speed 5194.25 samples/sec Loss 1.6353 LearningRate 0.0141 Epoch: 12 Global Step: 208410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:15:21,425-Speed 5144.43 samples/sec Loss 1.6291 LearningRate 0.0141 Epoch: 12 Global Step: 208420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:15:23,404-Speed 5176.00 samples/sec Loss 1.6268 LearningRate 0.0141 Epoch: 12 Global Step: 208430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:15:25,382-Speed 5177.26 samples/sec Loss 1.6193 LearningRate 0.0141 Epoch: 12 Global Step: 208440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:15:27,383-Speed 5119.59 samples/sec Loss 1.6043 LearningRate 0.0141 Epoch: 12 Global Step: 208450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:15:29,366-Speed 5165.33 samples/sec Loss 1.6292 LearningRate 0.0141 Epoch: 12 Global Step: 208460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:15:31,340-Speed 5190.29 samples/sec Loss 1.6866 LearningRate 0.0141 Epoch: 12 Global Step: 208470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:15:33,326-Speed 5155.59 samples/sec Loss 1.6097 LearningRate 0.0141 Epoch: 12 Global Step: 208480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:15:35,313-Speed 5157.16 samples/sec Loss 1.6135 LearningRate 0.0141 Epoch: 12 Global Step: 208490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:15:37,292-Speed 5174.43 samples/sec Loss 1.6447 LearningRate 0.0141 Epoch: 12 Global Step: 208500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:15:39,284-Speed 5143.61 samples/sec Loss 1.6253 LearningRate 0.0141 Epoch: 12 Global Step: 208510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:15:41,272-Speed 5151.96 samples/sec Loss 1.6277 LearningRate 0.0141 Epoch: 12 Global Step: 208520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:15:43,240-Speed 5205.19 samples/sec Loss 1.5887 LearningRate 0.0141 Epoch: 12 Global Step: 208530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:15:45,231-Speed 5144.67 samples/sec Loss 1.6503 LearningRate 0.0141 Epoch: 12 Global Step: 208540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:15:47,199-Speed 5206.19 samples/sec Loss 1.6044 LearningRate 0.0141 Epoch: 12 Global Step: 208550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:15:49,175-Speed 5184.07 samples/sec Loss 1.5580 LearningRate 0.0141 Epoch: 12 Global Step: 208560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:15:51,166-Speed 5143.68 samples/sec Loss 1.6820 LearningRate 0.0141 Epoch: 12 Global Step: 208570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:15:53,152-Speed 5158.58 samples/sec Loss 1.6633 LearningRate 0.0141 Epoch: 12 Global Step: 208580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:15:55,142-Speed 5147.36 samples/sec Loss 1.6749 LearningRate 0.0141 Epoch: 12 Global Step: 208590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:15:57,131-Speed 5148.87 samples/sec Loss 1.5873 LearningRate 0.0141 Epoch: 12 Global Step: 208600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:15:59,103-Speed 5197.00 samples/sec Loss 1.6198 LearningRate 0.0141 Epoch: 12 Global Step: 208610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:01,104-Speed 5119.13 samples/sec Loss 1.6040 LearningRate 0.0141 Epoch: 12 Global Step: 208620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:03,080-Speed 5182.22 samples/sec Loss 1.6479 LearningRate 0.0141 Epoch: 12 Global Step: 208630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:05,076-Speed 5132.94 samples/sec Loss 1.5994 LearningRate 0.0141 Epoch: 12 Global Step: 208640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:07,068-Speed 5141.61 samples/sec Loss 1.6227 LearningRate 0.0141 Epoch: 12 Global Step: 208650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:09,047-Speed 5176.58 samples/sec Loss 1.6800 LearningRate 0.0141 Epoch: 12 Global Step: 208660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:11,026-Speed 5174.72 samples/sec Loss 1.6419 LearningRate 0.0141 Epoch: 12 Global Step: 208670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:13,021-Speed 5134.89 samples/sec Loss 1.5903 LearningRate 0.0141 Epoch: 12 Global Step: 208680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:15,006-Speed 5160.87 samples/sec Loss 1.6808 LearningRate 0.0141 Epoch: 12 Global Step: 208690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:16,983-Speed 5180.38 samples/sec Loss 1.6448 LearningRate 0.0140 Epoch: 12 Global Step: 208700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:18,945-Speed 5220.69 samples/sec Loss 1.6262 LearningRate 0.0140 Epoch: 12 Global Step: 208710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:20,921-Speed 5185.47 samples/sec Loss 1.6018 LearningRate 0.0140 Epoch: 12 Global Step: 208720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:22,927-Speed 5106.68 samples/sec Loss 1.6141 LearningRate 0.0140 Epoch: 12 Global Step: 208730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:24,940-Speed 5089.76 samples/sec Loss 1.6592 LearningRate 0.0140 Epoch: 12 Global Step: 208740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:26,938-Speed 5124.20 samples/sec Loss 1.5778 LearningRate 0.0140 Epoch: 12 Global Step: 208750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:28,938-Speed 5122.24 samples/sec Loss 1.5855 LearningRate 0.0140 Epoch: 12 Global Step: 208760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:30,930-Speed 5142.70 samples/sec Loss 1.6402 LearningRate 0.0140 Epoch: 12 Global Step: 208770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:32,922-Speed 5142.16 samples/sec Loss 1.5997 LearningRate 0.0140 Epoch: 12 Global Step: 208780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:34,918-Speed 5132.46 samples/sec Loss 1.6508 LearningRate 0.0140 Epoch: 12 Global Step: 208790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:36,920-Speed 5115.74 samples/sec Loss 1.5996 LearningRate 0.0140 Epoch: 12 Global Step: 208800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:38,900-Speed 5174.78 samples/sec Loss 1.6212 LearningRate 0.0140 Epoch: 12 Global Step: 208810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:40,906-Speed 5106.61 samples/sec Loss 1.6489 LearningRate 0.0140 Epoch: 12 Global Step: 208820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:42,886-Speed 5172.44 samples/sec Loss 1.6295 LearningRate 0.0140 Epoch: 12 Global Step: 208830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:44,884-Speed 5128.10 samples/sec Loss 1.6201 LearningRate 0.0140 Epoch: 12 Global Step: 208840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:46,862-Speed 5176.48 samples/sec Loss 1.6816 LearningRate 0.0140 Epoch: 12 Global Step: 208850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:48,858-Speed 5133.63 samples/sec Loss 1.6371 LearningRate 0.0140 Epoch: 12 Global Step: 208860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:50,854-Speed 5131.22 samples/sec Loss 1.6351 LearningRate 0.0140 Epoch: 12 Global Step: 208870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:52,845-Speed 5145.62 samples/sec Loss 1.6546 LearningRate 0.0140 Epoch: 12 Global Step: 208880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:54,824-Speed 5176.21 samples/sec Loss 1.6240 LearningRate 0.0140 Epoch: 12 Global Step: 208890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:56,810-Speed 5156.99 samples/sec Loss 1.6349 LearningRate 0.0140 Epoch: 12 Global Step: 208900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:16:58,788-Speed 5178.31 samples/sec Loss 1.6169 LearningRate 0.0140 Epoch: 12 Global Step: 208910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:17:00,796-Speed 5101.56 samples/sec Loss 1.5744 LearningRate 0.0140 Epoch: 12 Global Step: 208920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:17:02,772-Speed 5185.50 samples/sec Loss 1.6605 LearningRate 0.0140 Epoch: 12 Global Step: 208930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:17:04,739-Speed 5207.75 samples/sec Loss 1.7122 LearningRate 0.0140 Epoch: 12 Global Step: 208940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:17:06,716-Speed 5180.12 samples/sec Loss 1.6261 LearningRate 0.0140 Epoch: 12 Global Step: 208950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:17:08,716-Speed 5121.10 samples/sec Loss 1.5893 LearningRate 0.0140 Epoch: 12 Global Step: 208960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:17:10,708-Speed 5141.62 samples/sec Loss 1.6726 LearningRate 0.0140 Epoch: 12 Global Step: 208970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:17:12,681-Speed 5191.29 samples/sec Loss 1.5935 LearningRate 0.0140 Epoch: 12 Global Step: 208980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:17:14,657-Speed 5185.52 samples/sec Loss 1.6618 LearningRate 0.0140 Epoch: 12 Global Step: 208990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:17:16,649-Speed 5140.49 samples/sec Loss 1.6100 LearningRate 0.0140 Epoch: 12 Global Step: 209000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:17:18,632-Speed 5166.35 samples/sec Loss 1.5460 LearningRate 0.0140 Epoch: 12 Global Step: 209010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:17:20,619-Speed 5156.56 samples/sec Loss 1.6297 LearningRate 0.0140 Epoch: 12 Global Step: 209020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:17:22,631-Speed 5091.25 samples/sec Loss 1.6676 LearningRate 0.0140 Epoch: 12 Global Step: 209030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:17:24,647-Speed 5081.99 samples/sec Loss 1.6759 LearningRate 0.0140 Epoch: 12 Global Step: 209040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:17:26,625-Speed 5177.93 samples/sec Loss 1.6506 LearningRate 0.0140 Epoch: 12 Global Step: 209050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:17:28,629-Speed 5112.02 samples/sec Loss 1.6632 LearningRate 0.0140 Epoch: 12 Global Step: 209060 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:17:30,609-Speed 5173.27 samples/sec Loss 1.6179 LearningRate 0.0140 Epoch: 12 Global Step: 209070 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:17:32,583-Speed 5188.42 samples/sec Loss 1.5979 LearningRate 0.0140 Epoch: 12 Global Step: 209080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:17:34,567-Speed 5163.23 samples/sec Loss 1.6056 LearningRate 0.0140 Epoch: 12 Global Step: 209090 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:17:36,564-Speed 5128.38 samples/sec Loss 1.6579 LearningRate 0.0140 Epoch: 12 Global Step: 209100 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:17:38,550-Speed 5158.26 samples/sec Loss 1.6346 LearningRate 0.0140 Epoch: 12 Global Step: 209110 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:17:40,543-Speed 5139.67 samples/sec Loss 1.6116 LearningRate 0.0140 Epoch: 12 Global Step: 209120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:17:42,519-Speed 5186.81 samples/sec Loss 1.5802 LearningRate 0.0140 Epoch: 12 Global Step: 209130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:17:44,502-Speed 5166.30 samples/sec Loss 1.6288 LearningRate 0.0139 Epoch: 12 Global Step: 209140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:17:46,483-Speed 5170.73 samples/sec Loss 1.5878 LearningRate 0.0139 Epoch: 12 Global Step: 209150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:17:48,520-Speed 5028.54 samples/sec Loss 1.6639 LearningRate 0.0139 Epoch: 12 Global Step: 209160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:17:50,507-Speed 5155.68 samples/sec Loss 1.6140 LearningRate 0.0139 Epoch: 12 Global Step: 209170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:17:52,491-Speed 5163.45 samples/sec Loss 1.6167 LearningRate 0.0139 Epoch: 12 Global Step: 209180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:17:54,469-Speed 5179.19 samples/sec Loss 1.6378 LearningRate 0.0139 Epoch: 12 Global Step: 209190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:17:56,450-Speed 5170.14 samples/sec Loss 1.6183 LearningRate 0.0139 Epoch: 12 Global Step: 209200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:17:58,434-Speed 5163.19 samples/sec Loss 1.6253 LearningRate 0.0139 Epoch: 12 Global Step: 209210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:18:00,407-Speed 5192.96 samples/sec Loss 1.6727 LearningRate 0.0139 Epoch: 12 Global Step: 209220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:18:02,379-Speed 5194.98 samples/sec Loss 1.6304 LearningRate 0.0139 Epoch: 12 Global Step: 209230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:18:04,372-Speed 5138.05 samples/sec Loss 1.6145 LearningRate 0.0139 Epoch: 12 Global Step: 209240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:18:06,350-Speed 5179.44 samples/sec Loss 1.5867 LearningRate 0.0139 Epoch: 12 Global Step: 209250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:18:08,323-Speed 5189.96 samples/sec Loss 1.6123 LearningRate 0.0139 Epoch: 12 Global Step: 209260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:18:10,324-Speed 5120.42 samples/sec Loss 1.6784 LearningRate 0.0139 Epoch: 12 Global Step: 209270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:18:12,303-Speed 5175.00 samples/sec Loss 1.6267 LearningRate 0.0139 Epoch: 12 Global Step: 209280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:18:14,287-Speed 5165.09 samples/sec Loss 1.6342 LearningRate 0.0139 Epoch: 12 Global Step: 209290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:18:16,283-Speed 5131.49 samples/sec Loss 1.6664 LearningRate 0.0139 Epoch: 12 Global Step: 209300 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:18:18,270-Speed 5155.69 samples/sec Loss 1.6808 LearningRate 0.0139 Epoch: 12 Global Step: 209310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:18:20,257-Speed 5155.72 samples/sec Loss 1.5832 LearningRate 0.0139 Epoch: 12 Global Step: 209320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:18:22,245-Speed 5152.48 samples/sec Loss 1.6300 LearningRate 0.0139 Epoch: 12 Global Step: 209330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:18:24,222-Speed 5181.46 samples/sec Loss 1.6368 LearningRate 0.0139 Epoch: 12 Global Step: 209340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:18:26,202-Speed 5173.37 samples/sec Loss 1.6173 LearningRate 0.0139 Epoch: 12 Global Step: 209350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:18:28,198-Speed 5131.70 samples/sec Loss 1.6528 LearningRate 0.0139 Epoch: 12 Global Step: 209360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:18:30,186-Speed 5152.93 samples/sec Loss 1.6153 LearningRate 0.0139 Epoch: 12 Global Step: 209370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:18:32,161-Speed 5186.90 samples/sec Loss 1.6139 LearningRate 0.0139 Epoch: 12 Global Step: 209380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:18:34,146-Speed 5160.08 samples/sec Loss 1.6359 LearningRate 0.0139 Epoch: 12 Global Step: 209390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:18:36,128-Speed 5168.35 samples/sec Loss 1.6707 LearningRate 0.0139 Epoch: 12 Global Step: 209400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:18:38,117-Speed 5148.68 samples/sec Loss 1.6229 LearningRate 0.0139 Epoch: 12 Global Step: 209410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:18:40,094-Speed 5181.59 samples/sec Loss 1.6399 LearningRate 0.0139 Epoch: 12 Global Step: 209420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:18:42,079-Speed 5160.51 samples/sec Loss 1.6000 LearningRate 0.0139 Epoch: 12 Global Step: 209430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:18:44,064-Speed 5160.27 samples/sec Loss 1.5967 LearningRate 0.0139 Epoch: 12 Global Step: 209440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:18:46,060-Speed 5133.67 samples/sec Loss 1.6483 LearningRate 0.0139 Epoch: 12 Global Step: 209450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:18:48,028-Speed 5204.89 samples/sec Loss 1.6132 LearningRate 0.0139 Epoch: 12 Global Step: 209460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:18:50,003-Speed 5186.15 samples/sec Loss 1.5784 LearningRate 0.0139 Epoch: 12 Global Step: 209470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:18:51,995-Speed 5141.69 samples/sec Loss 1.6489 LearningRate 0.0139 Epoch: 12 Global Step: 209480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:18:53,985-Speed 5146.72 samples/sec Loss 1.6164 LearningRate 0.0139 Epoch: 12 Global Step: 209490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:18:55,962-Speed 5182.81 samples/sec Loss 1.6509 LearningRate 0.0139 Epoch: 12 Global Step: 209500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:18:57,934-Speed 5193.96 samples/sec Loss 1.5961 LearningRate 0.0139 Epoch: 12 Global Step: 209510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:18:59,921-Speed 5153.79 samples/sec Loss 1.6505 LearningRate 0.0139 Epoch: 12 Global Step: 209520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:19:01,910-Speed 5151.95 samples/sec Loss 1.6198 LearningRate 0.0139 Epoch: 12 Global Step: 209530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:19:03,888-Speed 5177.71 samples/sec Loss 1.6733 LearningRate 0.0139 Epoch: 12 Global Step: 209540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:19:05,863-Speed 5187.78 samples/sec Loss 1.6218 LearningRate 0.0139 Epoch: 12 Global Step: 209550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:19:07,835-Speed 5193.90 samples/sec Loss 1.6701 LearningRate 0.0139 Epoch: 12 Global Step: 209560 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:19:09,809-Speed 5188.32 samples/sec Loss 1.6635 LearningRate 0.0139 Epoch: 12 Global Step: 209570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:19:11,790-Speed 5171.83 samples/sec Loss 1.6574 LearningRate 0.0139 Epoch: 12 Global Step: 209580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:19:13,780-Speed 5150.85 samples/sec Loss 1.6496 LearningRate 0.0138 Epoch: 12 Global Step: 209590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:19:15,758-Speed 5177.57 samples/sec Loss 1.6640 LearningRate 0.0138 Epoch: 12 Global Step: 209600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:19:17,732-Speed 5189.07 samples/sec Loss 1.7384 LearningRate 0.0138 Epoch: 12 Global Step: 209610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:19:19,711-Speed 5175.16 samples/sec Loss 1.6692 LearningRate 0.0138 Epoch: 12 Global Step: 209620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:19:21,694-Speed 5166.93 samples/sec Loss 1.6094 LearningRate 0.0138 Epoch: 12 Global Step: 209630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:19:23,680-Speed 5155.74 samples/sec Loss 1.6096 LearningRate 0.0138 Epoch: 12 Global Step: 209640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:19:25,669-Speed 5152.44 samples/sec Loss 1.5930 LearningRate 0.0138 Epoch: 12 Global Step: 209650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:19:27,664-Speed 5133.30 samples/sec Loss 1.6211 LearningRate 0.0138 Epoch: 12 Global Step: 209660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:19:29,649-Speed 5160.53 samples/sec Loss 1.6133 LearningRate 0.0138 Epoch: 12 Global Step: 209670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:19:31,621-Speed 5196.84 samples/sec Loss 1.6672 LearningRate 0.0138 Epoch: 12 Global Step: 209680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:19:33,600-Speed 5175.77 samples/sec Loss 1.6686 LearningRate 0.0138 Epoch: 12 Global Step: 209690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:19:35,587-Speed 5154.28 samples/sec Loss 1.6068 LearningRate 0.0138 Epoch: 12 Global Step: 209700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:19:37,561-Speed 5189.99 samples/sec Loss 1.6072 LearningRate 0.0138 Epoch: 12 Global Step: 209710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:19:39,551-Speed 5146.82 samples/sec Loss 1.6432 LearningRate 0.0138 Epoch: 12 Global Step: 209720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:19:41,526-Speed 5185.74 samples/sec Loss 1.5999 LearningRate 0.0138 Epoch: 12 Global Step: 209730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:19:43,516-Speed 5146.00 samples/sec Loss 1.6510 LearningRate 0.0138 Epoch: 12 Global Step: 209740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:19:45,502-Speed 5161.25 samples/sec Loss 1.5719 LearningRate 0.0138 Epoch: 12 Global Step: 209750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:19:47,485-Speed 5163.38 samples/sec Loss 1.7027 LearningRate 0.0138 Epoch: 12 Global Step: 209760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:19:49,475-Speed 5148.10 samples/sec Loss 1.6035 LearningRate 0.0138 Epoch: 12 Global Step: 209770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:19:51,462-Speed 5154.74 samples/sec Loss 1.6209 LearningRate 0.0138 Epoch: 12 Global Step: 209780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:19:53,442-Speed 5175.48 samples/sec Loss 1.6495 LearningRate 0.0138 Epoch: 12 Global Step: 209790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:19:55,429-Speed 5154.01 samples/sec Loss 1.6293 LearningRate 0.0138 Epoch: 12 Global Step: 209800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:19:57,425-Speed 5130.43 samples/sec Loss 1.6270 LearningRate 0.0138 Epoch: 12 Global Step: 209810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:19:59,420-Speed 5135.87 samples/sec Loss 1.6306 LearningRate 0.0138 Epoch: 12 Global Step: 209820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:20:01,422-Speed 5115.40 samples/sec Loss 1.6352 LearningRate 0.0138 Epoch: 12 Global Step: 209830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:20:03,420-Speed 5128.28 samples/sec Loss 1.6471 LearningRate 0.0138 Epoch: 12 Global Step: 209840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:20:05,399-Speed 5175.25 samples/sec Loss 1.6211 LearningRate 0.0138 Epoch: 12 Global Step: 209850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:20:07,384-Speed 5160.48 samples/sec Loss 1.6337 LearningRate 0.0138 Epoch: 12 Global Step: 209860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:20:09,354-Speed 5200.26 samples/sec Loss 1.6394 LearningRate 0.0138 Epoch: 12 Global Step: 209870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:20:11,356-Speed 5116.30 samples/sec Loss 1.6314 LearningRate 0.0138 Epoch: 12 Global Step: 209880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:20:13,350-Speed 5135.85 samples/sec Loss 1.6452 LearningRate 0.0138 Epoch: 12 Global Step: 209890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:20:15,341-Speed 5147.01 samples/sec Loss 1.6302 LearningRate 0.0138 Epoch: 12 Global Step: 209900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:20:17,322-Speed 5169.42 samples/sec Loss 1.6434 LearningRate 0.0138 Epoch: 12 Global Step: 209910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:20:19,301-Speed 5176.99 samples/sec Loss 1.6272 LearningRate 0.0138 Epoch: 12 Global Step: 209920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:20:21,296-Speed 5134.25 samples/sec Loss 1.6699 LearningRate 0.0138 Epoch: 12 Global Step: 209930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:20:23,268-Speed 5197.13 samples/sec Loss 1.6156 LearningRate 0.0138 Epoch: 12 Global Step: 209940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:20:25,244-Speed 5182.76 samples/sec Loss 1.6188 LearningRate 0.0138 Epoch: 12 Global Step: 209950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:20:27,229-Speed 5160.23 samples/sec Loss 1.6522 LearningRate 0.0138 Epoch: 12 Global Step: 209960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:20:29,204-Speed 5185.74 samples/sec Loss 1.6324 LearningRate 0.0138 Epoch: 12 Global Step: 209970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:20:31,189-Speed 5164.00 samples/sec Loss 1.6463 LearningRate 0.0138 Epoch: 12 Global Step: 209980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:20:33,166-Speed 5181.94 samples/sec Loss 1.6235 LearningRate 0.0138 Epoch: 12 Global Step: 209990 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:20:35,156-Speed 5149.04 samples/sec Loss 1.6549 LearningRate 0.0138 Epoch: 12 Global Step: 210000 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:21:01,963-[lfw][210000]XNorm: 24.202201 Training: 2022-04-11 13:21:01,964-[lfw][210000]Accuracy-Flip: 0.99800+-0.00296 Training: 2022-04-11 13:21:01,964-[lfw][210000]Accuracy-Highest: 0.99833 Training: 2022-04-11 13:21:32,872-[cfp_fp][210000]XNorm: 22.920613 Training: 2022-04-11 13:21:32,872-[cfp_fp][210000]Accuracy-Flip: 0.98771+-0.00439 Training: 2022-04-11 13:21:32,873-[cfp_fp][210000]Accuracy-Highest: 0.98771 Training: 2022-04-11 13:21:59,531-[agedb_30][210000]XNorm: 24.032433 Training: 2022-04-11 13:21:59,532-[agedb_30][210000]Accuracy-Flip: 0.98133+-0.00948 Training: 2022-04-11 13:21:59,532-[agedb_30][210000]Accuracy-Highest: 0.98250 Training: 2022-04-11 13:22:01,529-Speed 118.56 samples/sec Loss 1.6123 LearningRate 0.0138 Epoch: 12 Global Step: 210010 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:22:03,493-Speed 5215.59 samples/sec Loss 1.6668 LearningRate 0.0138 Epoch: 12 Global Step: 210020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:22:05,499-Speed 5104.92 samples/sec Loss 1.6483 LearningRate 0.0138 Epoch: 12 Global Step: 210030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:22:07,465-Speed 5209.91 samples/sec Loss 1.6222 LearningRate 0.0137 Epoch: 12 Global Step: 210040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:22:09,435-Speed 5200.08 samples/sec Loss 1.6503 LearningRate 0.0137 Epoch: 12 Global Step: 210050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:22:11,407-Speed 5196.00 samples/sec Loss 1.6435 LearningRate 0.0137 Epoch: 12 Global Step: 210060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:22:13,416-Speed 5096.99 samples/sec Loss 1.6229 LearningRate 0.0137 Epoch: 12 Global Step: 210070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:22:15,386-Speed 5200.49 samples/sec Loss 1.6465 LearningRate 0.0137 Epoch: 12 Global Step: 210080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:22:17,370-Speed 5163.07 samples/sec Loss 1.7054 LearningRate 0.0137 Epoch: 12 Global Step: 210090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:22:19,339-Speed 5200.81 samples/sec Loss 1.6124 LearningRate 0.0137 Epoch: 12 Global Step: 210100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:22:21,322-Speed 5165.99 samples/sec Loss 1.6176 LearningRate 0.0137 Epoch: 12 Global Step: 210110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:22:23,301-Speed 5176.81 samples/sec Loss 1.6483 LearningRate 0.0137 Epoch: 12 Global Step: 210120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:22:25,302-Speed 5119.03 samples/sec Loss 1.6080 LearningRate 0.0137 Epoch: 12 Global Step: 210130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:22:27,294-Speed 5143.12 samples/sec Loss 1.5955 LearningRate 0.0137 Epoch: 12 Global Step: 210140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:22:29,272-Speed 5177.43 samples/sec Loss 1.6329 LearningRate 0.0137 Epoch: 12 Global Step: 210150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:22:31,244-Speed 5196.93 samples/sec Loss 1.6005 LearningRate 0.0137 Epoch: 12 Global Step: 210160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:22:33,225-Speed 5169.62 samples/sec Loss 1.6768 LearningRate 0.0137 Epoch: 12 Global Step: 210170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:22:35,218-Speed 5139.73 samples/sec Loss 1.6744 LearningRate 0.0137 Epoch: 12 Global Step: 210180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:22:37,200-Speed 5168.53 samples/sec Loss 1.6424 LearningRate 0.0137 Epoch: 12 Global Step: 210190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:22:39,204-Speed 5111.37 samples/sec Loss 1.6611 LearningRate 0.0137 Epoch: 12 Global Step: 210200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:22:41,189-Speed 5159.02 samples/sec Loss 1.6133 LearningRate 0.0137 Epoch: 12 Global Step: 210210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:22:43,165-Speed 5183.52 samples/sec Loss 1.6188 LearningRate 0.0137 Epoch: 12 Global Step: 210220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:22:45,165-Speed 5122.73 samples/sec Loss 1.6591 LearningRate 0.0137 Epoch: 12 Global Step: 210230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:22:47,156-Speed 5146.64 samples/sec Loss 1.6204 LearningRate 0.0137 Epoch: 12 Global Step: 210240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:22:49,160-Speed 5110.71 samples/sec Loss 1.6732 LearningRate 0.0137 Epoch: 12 Global Step: 210250 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:22:51,138-Speed 5177.96 samples/sec Loss 1.6472 LearningRate 0.0137 Epoch: 12 Global Step: 210260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:22:53,110-Speed 5196.55 samples/sec Loss 1.6565 LearningRate 0.0137 Epoch: 12 Global Step: 210270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:22:55,094-Speed 5163.14 samples/sec Loss 1.6127 LearningRate 0.0137 Epoch: 12 Global Step: 210280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:22:57,065-Speed 5196.47 samples/sec Loss 1.6228 LearningRate 0.0137 Epoch: 12 Global Step: 210290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:22:59,034-Speed 5203.12 samples/sec Loss 1.6365 LearningRate 0.0137 Epoch: 12 Global Step: 210300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:23:01,024-Speed 5146.75 samples/sec Loss 1.7098 LearningRate 0.0137 Epoch: 12 Global Step: 210310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:23:03,000-Speed 5184.17 samples/sec Loss 1.6338 LearningRate 0.0137 Epoch: 12 Global Step: 210320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:23:04,982-Speed 5167.65 samples/sec Loss 1.6407 LearningRate 0.0137 Epoch: 12 Global Step: 210330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:23:06,965-Speed 5165.09 samples/sec Loss 1.6676 LearningRate 0.0137 Epoch: 12 Global Step: 210340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:23:08,955-Speed 5147.13 samples/sec Loss 1.6486 LearningRate 0.0137 Epoch: 12 Global Step: 210350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:23:10,942-Speed 5155.95 samples/sec Loss 1.6147 LearningRate 0.0137 Epoch: 12 Global Step: 210360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:23:12,927-Speed 5160.49 samples/sec Loss 1.6173 LearningRate 0.0137 Epoch: 12 Global Step: 210370 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:23:14,904-Speed 5181.41 samples/sec Loss 1.6833 LearningRate 0.0137 Epoch: 12 Global Step: 210380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:23:16,880-Speed 5184.56 samples/sec Loss 1.6733 LearningRate 0.0137 Epoch: 12 Global Step: 210390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:23:18,848-Speed 5204.04 samples/sec Loss 1.5962 LearningRate 0.0137 Epoch: 12 Global Step: 210400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:23:20,823-Speed 5188.16 samples/sec Loss 1.6244 LearningRate 0.0137 Epoch: 12 Global Step: 210410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:23:22,837-Speed 5084.21 samples/sec Loss 1.6444 LearningRate 0.0137 Epoch: 12 Global Step: 210420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:23:24,819-Speed 5168.85 samples/sec Loss 1.6777 LearningRate 0.0137 Epoch: 12 Global Step: 210430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:23:26,791-Speed 5194.00 samples/sec Loss 1.6435 LearningRate 0.0137 Epoch: 12 Global Step: 210440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:23:28,776-Speed 5162.48 samples/sec Loss 1.6128 LearningRate 0.0137 Epoch: 12 Global Step: 210450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:23:30,757-Speed 5170.34 samples/sec Loss 1.6311 LearningRate 0.0137 Epoch: 12 Global Step: 210460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:23:32,743-Speed 5157.35 samples/sec Loss 1.7206 LearningRate 0.0137 Epoch: 12 Global Step: 210470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:23:34,754-Speed 5092.14 samples/sec Loss 1.6285 LearningRate 0.0137 Epoch: 12 Global Step: 210480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:23:36,742-Speed 5152.79 samples/sec Loss 1.6700 LearningRate 0.0136 Epoch: 12 Global Step: 210490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:23:38,725-Speed 5165.90 samples/sec Loss 1.6247 LearningRate 0.0136 Epoch: 12 Global Step: 210500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:23:40,703-Speed 5180.12 samples/sec Loss 1.7061 LearningRate 0.0136 Epoch: 12 Global Step: 210510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:23:42,658-Speed 5239.93 samples/sec Loss 1.6140 LearningRate 0.0136 Epoch: 12 Global Step: 210520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:23:44,623-Speed 5211.40 samples/sec Loss 1.7172 LearningRate 0.0136 Epoch: 12 Global Step: 210530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:23:46,600-Speed 5182.50 samples/sec Loss 1.6554 LearningRate 0.0136 Epoch: 12 Global Step: 210540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:23:48,573-Speed 5190.30 samples/sec Loss 1.6625 LearningRate 0.0136 Epoch: 12 Global Step: 210550 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:23:50,582-Speed 5099.22 samples/sec Loss 1.6124 LearningRate 0.0136 Epoch: 12 Global Step: 210560 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:23:52,557-Speed 5187.46 samples/sec Loss 1.6598 LearningRate 0.0136 Epoch: 12 Global Step: 210570 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:23:54,539-Speed 5168.29 samples/sec Loss 1.6488 LearningRate 0.0136 Epoch: 12 Global Step: 210580 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:23:56,521-Speed 5166.96 samples/sec Loss 1.6403 LearningRate 0.0136 Epoch: 12 Global Step: 210590 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:23:58,511-Speed 5147.88 samples/sec Loss 1.6718 LearningRate 0.0136 Epoch: 12 Global Step: 210600 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:24:00,509-Speed 5126.98 samples/sec Loss 1.6194 LearningRate 0.0136 Epoch: 12 Global Step: 210610 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:24:02,491-Speed 5168.98 samples/sec Loss 1.5570 LearningRate 0.0136 Epoch: 12 Global Step: 210620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:24:04,468-Speed 5180.12 samples/sec Loss 1.6617 LearningRate 0.0136 Epoch: 12 Global Step: 210630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:24:06,434-Speed 5211.82 samples/sec Loss 1.6161 LearningRate 0.0136 Epoch: 12 Global Step: 210640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:24:08,408-Speed 5188.14 samples/sec Loss 1.6108 LearningRate 0.0136 Epoch: 12 Global Step: 210650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:24:10,377-Speed 5203.72 samples/sec Loss 1.6679 LearningRate 0.0136 Epoch: 12 Global Step: 210660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:24:12,344-Speed 5207.06 samples/sec Loss 1.6614 LearningRate 0.0136 Epoch: 12 Global Step: 210670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:24:14,354-Speed 5095.05 samples/sec Loss 1.6379 LearningRate 0.0136 Epoch: 12 Global Step: 210680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:24:16,328-Speed 5188.39 samples/sec Loss 1.6468 LearningRate 0.0136 Epoch: 12 Global Step: 210690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:24:18,320-Speed 5143.49 samples/sec Loss 1.6392 LearningRate 0.0136 Epoch: 12 Global Step: 210700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:24:20,319-Speed 5123.34 samples/sec Loss 1.6106 LearningRate 0.0136 Epoch: 12 Global Step: 210710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:24:22,294-Speed 5187.56 samples/sec Loss 1.6358 LearningRate 0.0136 Epoch: 12 Global Step: 210720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:24:24,275-Speed 5170.12 samples/sec Loss 1.6922 LearningRate 0.0136 Epoch: 12 Global Step: 210730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:24:26,266-Speed 5144.89 samples/sec Loss 1.6537 LearningRate 0.0136 Epoch: 12 Global Step: 210740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:24:28,228-Speed 5223.39 samples/sec Loss 1.6373 LearningRate 0.0136 Epoch: 12 Global Step: 210750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:24:30,225-Speed 5128.13 samples/sec Loss 1.6170 LearningRate 0.0136 Epoch: 12 Global Step: 210760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:24:32,195-Speed 5201.68 samples/sec Loss 1.6529 LearningRate 0.0136 Epoch: 12 Global Step: 210770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:24:34,164-Speed 5201.64 samples/sec Loss 1.6407 LearningRate 0.0136 Epoch: 12 Global Step: 210780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:24:36,142-Speed 5177.60 samples/sec Loss 1.6441 LearningRate 0.0136 Epoch: 12 Global Step: 210790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:24:38,128-Speed 5157.08 samples/sec Loss 1.6741 LearningRate 0.0136 Epoch: 12 Global Step: 210800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:24:40,094-Speed 5212.49 samples/sec Loss 1.6624 LearningRate 0.0136 Epoch: 12 Global Step: 210810 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:24:42,066-Speed 5192.59 samples/sec Loss 1.6697 LearningRate 0.0136 Epoch: 12 Global Step: 210820 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:24:44,034-Speed 5204.77 samples/sec Loss 1.7089 LearningRate 0.0136 Epoch: 12 Global Step: 210830 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:24:46,020-Speed 5159.27 samples/sec Loss 1.6713 LearningRate 0.0136 Epoch: 12 Global Step: 210840 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:24:48,017-Speed 5128.67 samples/sec Loss 1.7100 LearningRate 0.0136 Epoch: 12 Global Step: 210850 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:24:49,990-Speed 5193.69 samples/sec Loss 1.6474 LearningRate 0.0136 Epoch: 12 Global Step: 210860 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:24:51,960-Speed 5198.09 samples/sec Loss 1.6677 LearningRate 0.0136 Epoch: 12 Global Step: 210870 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:24:53,932-Speed 5196.28 samples/sec Loss 1.6374 LearningRate 0.0136 Epoch: 12 Global Step: 210880 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:24:55,949-Speed 5076.96 samples/sec Loss 1.6329 LearningRate 0.0136 Epoch: 12 Global Step: 210890 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:24:57,927-Speed 5180.67 samples/sec Loss 1.6869 LearningRate 0.0136 Epoch: 12 Global Step: 210900 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:24:59,916-Speed 5149.78 samples/sec Loss 1.6691 LearningRate 0.0136 Epoch: 12 Global Step: 210910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:25:01,887-Speed 5196.82 samples/sec Loss 1.7077 LearningRate 0.0136 Epoch: 12 Global Step: 210920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:25:03,869-Speed 5166.36 samples/sec Loss 1.7148 LearningRate 0.0136 Epoch: 12 Global Step: 210930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:25:05,846-Speed 5181.77 samples/sec Loss 1.6418 LearningRate 0.0135 Epoch: 12 Global Step: 210940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:25:07,831-Speed 5161.53 samples/sec Loss 1.6025 LearningRate 0.0135 Epoch: 12 Global Step: 210950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:25:09,818-Speed 5153.15 samples/sec Loss 1.6987 LearningRate 0.0135 Epoch: 12 Global Step: 210960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:25:11,811-Speed 5141.43 samples/sec Loss 1.6167 LearningRate 0.0135 Epoch: 12 Global Step: 210970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:25:13,786-Speed 5185.60 samples/sec Loss 1.6395 LearningRate 0.0135 Epoch: 12 Global Step: 210980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:25:15,758-Speed 5193.92 samples/sec Loss 1.6754 LearningRate 0.0135 Epoch: 12 Global Step: 210990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:25:17,729-Speed 5197.21 samples/sec Loss 1.6869 LearningRate 0.0135 Epoch: 12 Global Step: 211000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:25:19,693-Speed 5217.20 samples/sec Loss 1.6710 LearningRate 0.0135 Epoch: 12 Global Step: 211010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:25:21,668-Speed 5186.31 samples/sec Loss 1.6387 LearningRate 0.0135 Epoch: 12 Global Step: 211020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:25:23,661-Speed 5138.09 samples/sec Loss 1.6605 LearningRate 0.0135 Epoch: 12 Global Step: 211030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:25:25,634-Speed 5192.52 samples/sec Loss 1.6159 LearningRate 0.0135 Epoch: 12 Global Step: 211040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:25:27,621-Speed 5159.44 samples/sec Loss 1.6598 LearningRate 0.0135 Epoch: 12 Global Step: 211050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:25:29,592-Speed 5196.96 samples/sec Loss 1.6522 LearningRate 0.0135 Epoch: 12 Global Step: 211060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:25:31,561-Speed 5202.56 samples/sec Loss 1.6278 LearningRate 0.0135 Epoch: 12 Global Step: 211070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:25:33,533-Speed 5192.33 samples/sec Loss 1.6596 LearningRate 0.0135 Epoch: 12 Global Step: 211080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:25:35,509-Speed 5186.33 samples/sec Loss 1.6490 LearningRate 0.0135 Epoch: 12 Global Step: 211090 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:25:37,530-Speed 5069.36 samples/sec Loss 1.6797 LearningRate 0.0135 Epoch: 12 Global Step: 211100 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:25:39,522-Speed 5140.56 samples/sec Loss 1.6847 LearningRate 0.0135 Epoch: 12 Global Step: 211110 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:25:41,496-Speed 5189.87 samples/sec Loss 1.6417 LearningRate 0.0135 Epoch: 12 Global Step: 211120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:25:43,466-Speed 5198.45 samples/sec Loss 1.5784 LearningRate 0.0135 Epoch: 12 Global Step: 211130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:25:45,450-Speed 5162.83 samples/sec Loss 1.6481 LearningRate 0.0135 Epoch: 12 Global Step: 211140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:25:47,443-Speed 5140.82 samples/sec Loss 1.6579 LearningRate 0.0135 Epoch: 12 Global Step: 211150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:25:49,433-Speed 5146.22 samples/sec Loss 1.5850 LearningRate 0.0135 Epoch: 12 Global Step: 211160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:25:51,418-Speed 5160.48 samples/sec Loss 1.6095 LearningRate 0.0135 Epoch: 12 Global Step: 211170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:25:53,393-Speed 5187.52 samples/sec Loss 1.6527 LearningRate 0.0135 Epoch: 12 Global Step: 211180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:25:55,379-Speed 5158.19 samples/sec Loss 1.6332 LearningRate 0.0135 Epoch: 12 Global Step: 211190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:25:57,349-Speed 5198.98 samples/sec Loss 1.6471 LearningRate 0.0135 Epoch: 12 Global Step: 211200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:25:59,324-Speed 5186.00 samples/sec Loss 1.6674 LearningRate 0.0135 Epoch: 12 Global Step: 211210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:26:01,310-Speed 5160.47 samples/sec Loss 1.6727 LearningRate 0.0135 Epoch: 12 Global Step: 211220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:26:03,284-Speed 5188.78 samples/sec Loss 1.6318 LearningRate 0.0135 Epoch: 12 Global Step: 211230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:26:05,274-Speed 5145.06 samples/sec Loss 1.6462 LearningRate 0.0135 Epoch: 12 Global Step: 211240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:26:07,246-Speed 5194.83 samples/sec Loss 1.6383 LearningRate 0.0135 Epoch: 12 Global Step: 211250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:26:09,212-Speed 5209.81 samples/sec Loss 1.6487 LearningRate 0.0135 Epoch: 12 Global Step: 211260 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:26:11,197-Speed 5162.70 samples/sec Loss 1.6278 LearningRate 0.0135 Epoch: 12 Global Step: 211270 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:26:13,187-Speed 5145.66 samples/sec Loss 1.6911 LearningRate 0.0135 Epoch: 12 Global Step: 211280 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:26:15,161-Speed 5188.25 samples/sec Loss 1.6118 LearningRate 0.0135 Epoch: 12 Global Step: 211290 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:26:17,134-Speed 5193.19 samples/sec Loss 1.6179 LearningRate 0.0135 Epoch: 12 Global Step: 211300 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:26:19,123-Speed 5149.41 samples/sec Loss 1.6717 LearningRate 0.0135 Epoch: 12 Global Step: 211310 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:26:21,095-Speed 5195.35 samples/sec Loss 1.6622 LearningRate 0.0135 Epoch: 12 Global Step: 211320 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:26:23,072-Speed 5182.08 samples/sec Loss 1.6261 LearningRate 0.0135 Epoch: 12 Global Step: 211330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:26:25,059-Speed 5153.73 samples/sec Loss 1.6609 LearningRate 0.0135 Epoch: 12 Global Step: 211340 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:26:27,037-Speed 5178.48 samples/sec Loss 1.6086 LearningRate 0.0135 Epoch: 12 Global Step: 211350 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:26:29,015-Speed 5180.61 samples/sec Loss 1.6328 LearningRate 0.0135 Epoch: 12 Global Step: 211360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:26:30,986-Speed 5196.09 samples/sec Loss 1.6404 LearningRate 0.0135 Epoch: 12 Global Step: 211370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:26:32,959-Speed 5191.48 samples/sec Loss 1.5923 LearningRate 0.0135 Epoch: 12 Global Step: 211380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:26:34,935-Speed 5185.49 samples/sec Loss 1.6485 LearningRate 0.0135 Epoch: 12 Global Step: 211390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:26:36,910-Speed 5186.08 samples/sec Loss 1.7227 LearningRate 0.0134 Epoch: 12 Global Step: 211400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:26:38,886-Speed 5182.47 samples/sec Loss 1.6466 LearningRate 0.0134 Epoch: 12 Global Step: 211410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:26:40,878-Speed 5142.88 samples/sec Loss 1.6260 LearningRate 0.0134 Epoch: 12 Global Step: 211420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:26:42,866-Speed 5153.79 samples/sec Loss 1.6539 LearningRate 0.0134 Epoch: 12 Global Step: 211430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:26:44,847-Speed 5168.25 samples/sec Loss 1.6510 LearningRate 0.0134 Epoch: 12 Global Step: 211440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:26:46,831-Speed 5166.01 samples/sec Loss 1.6641 LearningRate 0.0134 Epoch: 12 Global Step: 211450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:26:48,826-Speed 5134.98 samples/sec Loss 1.6301 LearningRate 0.0134 Epoch: 12 Global Step: 211460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:26:50,800-Speed 5189.25 samples/sec Loss 1.6733 LearningRate 0.0134 Epoch: 12 Global Step: 211470 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:26:52,776-Speed 5183.57 samples/sec Loss 1.6773 LearningRate 0.0134 Epoch: 12 Global Step: 211480 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:26:54,746-Speed 5198.00 samples/sec Loss 1.6173 LearningRate 0.0134 Epoch: 12 Global Step: 211490 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:26:56,719-Speed 5191.98 samples/sec Loss 1.6579 LearningRate 0.0134 Epoch: 12 Global Step: 211500 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:26:58,692-Speed 5193.14 samples/sec Loss 1.6974 LearningRate 0.0134 Epoch: 12 Global Step: 211510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:27:01,553-Speed 3579.84 samples/sec Loss 1.6653 LearningRate 0.0134 Epoch: 12 Global Step: 211520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:27:03,536-Speed 5165.33 samples/sec Loss 1.6545 LearningRate 0.0134 Epoch: 12 Global Step: 211530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:27:05,517-Speed 5169.78 samples/sec Loss 1.6578 LearningRate 0.0134 Epoch: 12 Global Step: 211540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:27:07,498-Speed 5173.14 samples/sec Loss 1.5798 LearningRate 0.0134 Epoch: 12 Global Step: 211550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:27:09,481-Speed 5162.97 samples/sec Loss 1.6345 LearningRate 0.0134 Epoch: 12 Global Step: 211560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:27:11,458-Speed 5183.53 samples/sec Loss 1.6494 LearningRate 0.0134 Epoch: 12 Global Step: 211570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:27:13,438-Speed 5172.51 samples/sec Loss 1.6133 LearningRate 0.0134 Epoch: 12 Global Step: 211580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:27:15,418-Speed 5173.44 samples/sec Loss 1.6030 LearningRate 0.0134 Epoch: 12 Global Step: 211590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:27:17,404-Speed 5157.09 samples/sec Loss 1.6176 LearningRate 0.0134 Epoch: 12 Global Step: 211600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:27:19,379-Speed 5188.78 samples/sec Loss 1.6065 LearningRate 0.0134 Epoch: 12 Global Step: 211610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:27:21,350-Speed 5196.73 samples/sec Loss 1.6129 LearningRate 0.0134 Epoch: 12 Global Step: 211620 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:27:23,324-Speed 5189.46 samples/sec Loss 1.6556 LearningRate 0.0134 Epoch: 12 Global Step: 211630 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:27:25,317-Speed 5137.62 samples/sec Loss 1.6739 LearningRate 0.0134 Epoch: 12 Global Step: 211640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:27:27,293-Speed 5184.58 samples/sec Loss 1.6889 LearningRate 0.0134 Epoch: 12 Global Step: 211650 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:27:29,276-Speed 5166.29 samples/sec Loss 1.6514 LearningRate 0.0134 Epoch: 12 Global Step: 211660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:27:31,254-Speed 5178.68 samples/sec Loss 1.6071 LearningRate 0.0134 Epoch: 12 Global Step: 211670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:27:33,234-Speed 5174.15 samples/sec Loss 1.6444 LearningRate 0.0134 Epoch: 12 Global Step: 211680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:27:35,214-Speed 5173.37 samples/sec Loss 1.6130 LearningRate 0.0134 Epoch: 12 Global Step: 211690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:27:37,189-Speed 5186.25 samples/sec Loss 1.6778 LearningRate 0.0134 Epoch: 12 Global Step: 211700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:27:39,207-Speed 5076.02 samples/sec Loss 1.6281 LearningRate 0.0134 Epoch: 12 Global Step: 211710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:27:41,187-Speed 5174.45 samples/sec Loss 1.6767 LearningRate 0.0134 Epoch: 12 Global Step: 211720 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:27:43,159-Speed 5192.28 samples/sec Loss 1.7030 LearningRate 0.0134 Epoch: 12 Global Step: 211730 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:27:45,134-Speed 5188.52 samples/sec Loss 1.6060 LearningRate 0.0134 Epoch: 12 Global Step: 211740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:27:47,111-Speed 5178.89 samples/sec Loss 1.7034 LearningRate 0.0134 Epoch: 12 Global Step: 211750 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:27:49,090-Speed 5176.20 samples/sec Loss 1.6245 LearningRate 0.0134 Epoch: 12 Global Step: 211760 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:27:51,088-Speed 5127.49 samples/sec Loss 1.6614 LearningRate 0.0134 Epoch: 12 Global Step: 211770 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:27:53,091-Speed 5114.81 samples/sec Loss 1.6407 LearningRate 0.0134 Epoch: 12 Global Step: 211780 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:27:55,078-Speed 5154.13 samples/sec Loss 1.5910 LearningRate 0.0134 Epoch: 12 Global Step: 211790 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:27:57,064-Speed 5157.62 samples/sec Loss 1.6306 LearningRate 0.0134 Epoch: 12 Global Step: 211800 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:27:59,052-Speed 5153.18 samples/sec Loss 1.6431 LearningRate 0.0134 Epoch: 12 Global Step: 211810 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:28:01,026-Speed 5189.00 samples/sec Loss 1.6345 LearningRate 0.0134 Epoch: 12 Global Step: 211820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:28:03,006-Speed 5176.48 samples/sec Loss 1.6785 LearningRate 0.0134 Epoch: 12 Global Step: 211830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:28:04,979-Speed 5190.90 samples/sec Loss 1.6145 LearningRate 0.0134 Epoch: 12 Global Step: 211840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:28:06,950-Speed 5197.03 samples/sec Loss 1.6874 LearningRate 0.0134 Epoch: 12 Global Step: 211850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:28:08,930-Speed 5171.84 samples/sec Loss 1.6378 LearningRate 0.0133 Epoch: 12 Global Step: 211860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:28:10,916-Speed 5159.91 samples/sec Loss 1.6551 LearningRate 0.0133 Epoch: 12 Global Step: 211870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:28:12,904-Speed 5152.02 samples/sec Loss 1.6718 LearningRate 0.0133 Epoch: 12 Global Step: 211880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:28:14,881-Speed 5179.39 samples/sec Loss 1.6840 LearningRate 0.0133 Epoch: 12 Global Step: 211890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:28:16,877-Speed 5133.52 samples/sec Loss 1.6659 LearningRate 0.0133 Epoch: 12 Global Step: 211900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:28:18,872-Speed 5132.46 samples/sec Loss 1.6297 LearningRate 0.0133 Epoch: 12 Global Step: 211910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:28:20,852-Speed 5173.54 samples/sec Loss 1.6688 LearningRate 0.0133 Epoch: 12 Global Step: 211920 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:28:22,832-Speed 5176.10 samples/sec Loss 1.6832 LearningRate 0.0133 Epoch: 12 Global Step: 211930 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:28:24,812-Speed 5172.83 samples/sec Loss 1.6547 LearningRate 0.0133 Epoch: 12 Global Step: 211940 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:28:26,789-Speed 5180.55 samples/sec Loss 1.6158 LearningRate 0.0133 Epoch: 12 Global Step: 211950 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:28:28,786-Speed 5129.42 samples/sec Loss 1.6664 LearningRate 0.0133 Epoch: 12 Global Step: 211960 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:28:30,772-Speed 5159.29 samples/sec Loss 1.6845 LearningRate 0.0133 Epoch: 12 Global Step: 211970 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:28:32,752-Speed 5173.20 samples/sec Loss 1.6428 LearningRate 0.0133 Epoch: 12 Global Step: 211980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:28:34,734-Speed 5168.87 samples/sec Loss 1.7008 LearningRate 0.0133 Epoch: 12 Global Step: 211990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:28:36,726-Speed 5140.34 samples/sec Loss 1.6998 LearningRate 0.0133 Epoch: 12 Global Step: 212000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:29:03,526-[lfw][212000]XNorm: 22.518164 Training: 2022-04-11 13:29:03,526-[lfw][212000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 13:29:03,527-[lfw][212000]Accuracy-Highest: 0.99833 Training: 2022-04-11 13:29:34,412-[cfp_fp][212000]XNorm: 21.457144 Training: 2022-04-11 13:29:34,413-[cfp_fp][212000]Accuracy-Flip: 0.98700+-0.00467 Training: 2022-04-11 13:29:34,413-[cfp_fp][212000]Accuracy-Highest: 0.98771 Training: 2022-04-11 13:30:01,049-[agedb_30][212000]XNorm: 22.633600 Training: 2022-04-11 13:30:01,050-[agedb_30][212000]Accuracy-Flip: 0.98183+-0.00697 Training: 2022-04-11 13:30:01,050-[agedb_30][212000]Accuracy-Highest: 0.98250 Training: 2022-04-11 13:30:03,036-Speed 118.64 samples/sec Loss 1.7111 LearningRate 0.0133 Epoch: 12 Global Step: 212010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:30:05,002-Speed 5208.77 samples/sec Loss 1.6323 LearningRate 0.0133 Epoch: 12 Global Step: 212020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:30:06,966-Speed 5217.07 samples/sec Loss 1.5971 LearningRate 0.0133 Epoch: 12 Global Step: 212030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:30:08,944-Speed 5179.46 samples/sec Loss 1.6876 LearningRate 0.0133 Epoch: 12 Global Step: 212040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:30:10,914-Speed 5197.83 samples/sec Loss 1.7084 LearningRate 0.0133 Epoch: 12 Global Step: 212050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:30:12,891-Speed 5182.19 samples/sec Loss 1.6467 LearningRate 0.0133 Epoch: 12 Global Step: 212060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:30:14,862-Speed 5197.07 samples/sec Loss 1.6712 LearningRate 0.0133 Epoch: 12 Global Step: 212070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:30:16,831-Speed 5203.56 samples/sec Loss 1.7003 LearningRate 0.0133 Epoch: 12 Global Step: 212080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:30:18,802-Speed 5195.74 samples/sec Loss 1.6757 LearningRate 0.0133 Epoch: 12 Global Step: 212090 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:30:20,764-Speed 5221.34 samples/sec Loss 1.6384 LearningRate 0.0133 Epoch: 12 Global Step: 212100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:30:22,754-Speed 5148.21 samples/sec Loss 1.7414 LearningRate 0.0133 Epoch: 12 Global Step: 212110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:30:24,727-Speed 5192.39 samples/sec Loss 1.6812 LearningRate 0.0133 Epoch: 12 Global Step: 212120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:30:26,698-Speed 5196.73 samples/sec Loss 1.7210 LearningRate 0.0133 Epoch: 12 Global Step: 212130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:30:28,673-Speed 5185.46 samples/sec Loss 1.6682 LearningRate 0.0133 Epoch: 12 Global Step: 212140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:30:30,665-Speed 5142.21 samples/sec Loss 1.6595 LearningRate 0.0133 Epoch: 12 Global Step: 212150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:30:32,643-Speed 5180.11 samples/sec Loss 1.6400 LearningRate 0.0133 Epoch: 12 Global Step: 212160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:30:34,638-Speed 5133.95 samples/sec Loss 1.6640 LearningRate 0.0133 Epoch: 12 Global Step: 212170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:30:36,631-Speed 5139.47 samples/sec Loss 1.6100 LearningRate 0.0133 Epoch: 12 Global Step: 212180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:30:38,603-Speed 5193.59 samples/sec Loss 1.6925 LearningRate 0.0133 Epoch: 12 Global Step: 212190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:30:40,586-Speed 5167.55 samples/sec Loss 1.6161 LearningRate 0.0133 Epoch: 12 Global Step: 212200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:30:42,564-Speed 5179.37 samples/sec Loss 1.6462 LearningRate 0.0133 Epoch: 12 Global Step: 212210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:30:44,538-Speed 5189.55 samples/sec Loss 1.6655 LearningRate 0.0133 Epoch: 12 Global Step: 212220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:30:46,522-Speed 5162.21 samples/sec Loss 1.6724 LearningRate 0.0133 Epoch: 12 Global Step: 212230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:30:48,504-Speed 5166.71 samples/sec Loss 1.6985 LearningRate 0.0133 Epoch: 12 Global Step: 212240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:30:50,489-Speed 5160.67 samples/sec Loss 1.6446 LearningRate 0.0133 Epoch: 12 Global Step: 212250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:30:52,470-Speed 5172.21 samples/sec Loss 1.6151 LearningRate 0.0133 Epoch: 12 Global Step: 212260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:30:54,462-Speed 5142.30 samples/sec Loss 1.6772 LearningRate 0.0133 Epoch: 12 Global Step: 212270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:30:56,437-Speed 5185.72 samples/sec Loss 1.6333 LearningRate 0.0133 Epoch: 12 Global Step: 212280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:30:58,421-Speed 5162.19 samples/sec Loss 1.5765 LearningRate 0.0133 Epoch: 12 Global Step: 212290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:31:00,414-Speed 5140.40 samples/sec Loss 1.6000 LearningRate 0.0133 Epoch: 12 Global Step: 212300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:31:02,404-Speed 5145.95 samples/sec Loss 1.6623 LearningRate 0.0132 Epoch: 12 Global Step: 212310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:31:04,383-Speed 5176.92 samples/sec Loss 1.6333 LearningRate 0.0132 Epoch: 12 Global Step: 212320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:31:06,356-Speed 5193.82 samples/sec Loss 1.6630 LearningRate 0.0132 Epoch: 12 Global Step: 212330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:31:08,330-Speed 5189.12 samples/sec Loss 1.6648 LearningRate 0.0132 Epoch: 12 Global Step: 212340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:31:10,309-Speed 5174.33 samples/sec Loss 1.6118 LearningRate 0.0132 Epoch: 12 Global Step: 212350 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:31:12,270-Speed 5224.39 samples/sec Loss 1.6517 LearningRate 0.0132 Epoch: 12 Global Step: 212360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:31:14,251-Speed 5170.32 samples/sec Loss 1.6753 LearningRate 0.0132 Epoch: 12 Global Step: 212370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:31:16,249-Speed 5128.13 samples/sec Loss 1.6659 LearningRate 0.0132 Epoch: 12 Global Step: 212380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:31:18,229-Speed 5172.79 samples/sec Loss 1.6544 LearningRate 0.0132 Epoch: 12 Global Step: 212390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:31:20,208-Speed 5176.33 samples/sec Loss 1.7012 LearningRate 0.0132 Epoch: 12 Global Step: 212400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:31:22,220-Speed 5090.73 samples/sec Loss 1.6465 LearningRate 0.0132 Epoch: 12 Global Step: 212410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:31:24,192-Speed 5193.31 samples/sec Loss 1.5684 LearningRate 0.0132 Epoch: 12 Global Step: 212420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:31:26,163-Speed 5198.27 samples/sec Loss 1.7042 LearningRate 0.0132 Epoch: 12 Global Step: 212430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:31:28,157-Speed 5137.68 samples/sec Loss 1.6468 LearningRate 0.0132 Epoch: 12 Global Step: 212440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:31:30,133-Speed 5182.72 samples/sec Loss 1.6743 LearningRate 0.0132 Epoch: 12 Global Step: 212450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:31:32,095-Speed 5222.74 samples/sec Loss 1.6252 LearningRate 0.0132 Epoch: 12 Global Step: 212460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:31:34,070-Speed 5185.56 samples/sec Loss 1.6193 LearningRate 0.0132 Epoch: 12 Global Step: 212470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:31:36,039-Speed 5201.96 samples/sec Loss 1.6855 LearningRate 0.0132 Epoch: 12 Global Step: 212480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:31:38,022-Speed 5166.40 samples/sec Loss 1.6696 LearningRate 0.0132 Epoch: 12 Global Step: 212490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:31:40,011-Speed 5148.80 samples/sec Loss 1.6518 LearningRate 0.0132 Epoch: 12 Global Step: 212500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:31:41,981-Speed 5201.12 samples/sec Loss 1.6427 LearningRate 0.0132 Epoch: 12 Global Step: 212510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:31:43,948-Speed 5206.12 samples/sec Loss 1.6742 LearningRate 0.0132 Epoch: 12 Global Step: 212520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:31:45,924-Speed 5185.65 samples/sec Loss 1.6199 LearningRate 0.0132 Epoch: 12 Global Step: 212530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:31:47,904-Speed 5174.46 samples/sec Loss 1.6972 LearningRate 0.0132 Epoch: 12 Global Step: 212540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:31:49,905-Speed 5118.65 samples/sec Loss 1.7048 LearningRate 0.0132 Epoch: 12 Global Step: 212550 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:31:51,885-Speed 5174.72 samples/sec Loss 1.6487 LearningRate 0.0132 Epoch: 12 Global Step: 212560 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:31:53,852-Speed 5206.14 samples/sec Loss 1.7128 LearningRate 0.0132 Epoch: 12 Global Step: 212570 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:31:55,827-Speed 5186.27 samples/sec Loss 1.6982 LearningRate 0.0132 Epoch: 12 Global Step: 212580 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:31:57,801-Speed 5189.48 samples/sec Loss 1.6825 LearningRate 0.0132 Epoch: 12 Global Step: 212590 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:31:59,782-Speed 5169.42 samples/sec Loss 1.6727 LearningRate 0.0132 Epoch: 12 Global Step: 212600 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:32:01,756-Speed 5189.30 samples/sec Loss 1.6317 LearningRate 0.0132 Epoch: 12 Global Step: 212610 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:32:03,736-Speed 5174.46 samples/sec Loss 1.6168 LearningRate 0.0132 Epoch: 12 Global Step: 212620 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:32:05,727-Speed 5144.16 samples/sec Loss 1.6304 LearningRate 0.0132 Epoch: 12 Global Step: 212630 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:32:07,693-Speed 5211.37 samples/sec Loss 1.6133 LearningRate 0.0132 Epoch: 12 Global Step: 212640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:32:09,684-Speed 5144.08 samples/sec Loss 1.6635 LearningRate 0.0132 Epoch: 12 Global Step: 212650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:32:11,655-Speed 5196.93 samples/sec Loss 1.6440 LearningRate 0.0132 Epoch: 12 Global Step: 212660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:32:13,632-Speed 5181.71 samples/sec Loss 1.6505 LearningRate 0.0132 Epoch: 12 Global Step: 212670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:32:15,607-Speed 5187.85 samples/sec Loss 1.6371 LearningRate 0.0132 Epoch: 12 Global Step: 212680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:32:17,599-Speed 5140.55 samples/sec Loss 1.6428 LearningRate 0.0132 Epoch: 12 Global Step: 212690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:32:19,571-Speed 5194.09 samples/sec Loss 1.6287 LearningRate 0.0132 Epoch: 12 Global Step: 212700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:32:21,538-Speed 5209.37 samples/sec Loss 1.6695 LearningRate 0.0132 Epoch: 12 Global Step: 212710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:32:23,522-Speed 5162.97 samples/sec Loss 1.6740 LearningRate 0.0132 Epoch: 12 Global Step: 212720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:32:25,521-Speed 5124.39 samples/sec Loss 1.6862 LearningRate 0.0132 Epoch: 12 Global Step: 212730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:32:27,522-Speed 5118.49 samples/sec Loss 1.6803 LearningRate 0.0132 Epoch: 12 Global Step: 212740 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:32:29,488-Speed 5209.77 samples/sec Loss 1.6919 LearningRate 0.0132 Epoch: 12 Global Step: 212750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:32:31,452-Speed 5217.90 samples/sec Loss 1.6677 LearningRate 0.0132 Epoch: 12 Global Step: 212760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:32:33,419-Speed 5207.64 samples/sec Loss 1.6283 LearningRate 0.0131 Epoch: 12 Global Step: 212770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:32:35,418-Speed 5122.52 samples/sec Loss 1.6359 LearningRate 0.0131 Epoch: 12 Global Step: 212780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:32:37,411-Speed 5141.60 samples/sec Loss 1.6261 LearningRate 0.0131 Epoch: 12 Global Step: 212790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:32:39,399-Speed 5154.94 samples/sec Loss 1.6730 LearningRate 0.0131 Epoch: 12 Global Step: 212800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:32:41,380-Speed 5171.21 samples/sec Loss 1.6857 LearningRate 0.0131 Epoch: 12 Global Step: 212810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:32:43,365-Speed 5161.59 samples/sec Loss 1.6111 LearningRate 0.0131 Epoch: 12 Global Step: 212820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:32:45,340-Speed 5187.11 samples/sec Loss 1.6439 LearningRate 0.0131 Epoch: 12 Global Step: 212830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:32:47,312-Speed 5194.29 samples/sec Loss 1.6431 LearningRate 0.0131 Epoch: 12 Global Step: 212840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:32:49,316-Speed 5111.86 samples/sec Loss 1.6395 LearningRate 0.0131 Epoch: 12 Global Step: 212850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:32:51,282-Speed 5208.39 samples/sec Loss 1.6863 LearningRate 0.0131 Epoch: 12 Global Step: 212860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:32:53,272-Speed 5148.54 samples/sec Loss 1.6357 LearningRate 0.0131 Epoch: 12 Global Step: 212870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:32:55,240-Speed 5205.65 samples/sec Loss 1.6108 LearningRate 0.0131 Epoch: 12 Global Step: 212880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:32:57,212-Speed 5192.00 samples/sec Loss 1.6435 LearningRate 0.0131 Epoch: 12 Global Step: 212890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:32:59,185-Speed 5192.71 samples/sec Loss 1.6137 LearningRate 0.0131 Epoch: 12 Global Step: 212900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:33:01,185-Speed 5120.26 samples/sec Loss 1.6552 LearningRate 0.0131 Epoch: 12 Global Step: 212910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:33:03,177-Speed 5144.11 samples/sec Loss 1.6905 LearningRate 0.0131 Epoch: 12 Global Step: 212920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:33:05,164-Speed 5155.77 samples/sec Loss 1.6592 LearningRate 0.0131 Epoch: 12 Global Step: 212930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:33:07,134-Speed 5199.22 samples/sec Loss 1.6245 LearningRate 0.0131 Epoch: 12 Global Step: 212940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:33:09,105-Speed 5196.85 samples/sec Loss 1.6892 LearningRate 0.0131 Epoch: 12 Global Step: 212950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:33:11,074-Speed 5202.87 samples/sec Loss 1.6282 LearningRate 0.0131 Epoch: 12 Global Step: 212960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:33:13,075-Speed 5117.74 samples/sec Loss 1.6712 LearningRate 0.0131 Epoch: 12 Global Step: 212970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:33:15,058-Speed 5166.22 samples/sec Loss 1.6897 LearningRate 0.0131 Epoch: 12 Global Step: 212980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:33:17,028-Speed 5199.53 samples/sec Loss 1.6245 LearningRate 0.0131 Epoch: 12 Global Step: 212990 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:33:19,000-Speed 5194.56 samples/sec Loss 1.7268 LearningRate 0.0131 Epoch: 12 Global Step: 213000 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:33:20,981-Speed 5171.12 samples/sec Loss 1.5897 LearningRate 0.0131 Epoch: 12 Global Step: 213010 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:33:22,966-Speed 5159.63 samples/sec Loss 1.6788 LearningRate 0.0131 Epoch: 12 Global Step: 213020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:33:24,951-Speed 5161.72 samples/sec Loss 1.6773 LearningRate 0.0131 Epoch: 12 Global Step: 213030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:33:26,919-Speed 5203.38 samples/sec Loss 1.6461 LearningRate 0.0131 Epoch: 12 Global Step: 213040 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:33:28,893-Speed 5191.37 samples/sec Loss 1.6879 LearningRate 0.0131 Epoch: 12 Global Step: 213050 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:33:30,873-Speed 5172.62 samples/sec Loss 1.6311 LearningRate 0.0131 Epoch: 12 Global Step: 213060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:33:32,840-Speed 5208.22 samples/sec Loss 1.6636 LearningRate 0.0131 Epoch: 12 Global Step: 213070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:33:34,824-Speed 5163.18 samples/sec Loss 1.7476 LearningRate 0.0131 Epoch: 12 Global Step: 213080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:33:36,805-Speed 5170.97 samples/sec Loss 1.6610 LearningRate 0.0131 Epoch: 12 Global Step: 213090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:33:38,785-Speed 5173.07 samples/sec Loss 1.6389 LearningRate 0.0131 Epoch: 12 Global Step: 213100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:33:40,770-Speed 5159.06 samples/sec Loss 1.7075 LearningRate 0.0131 Epoch: 12 Global Step: 213110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:33:42,737-Speed 5207.60 samples/sec Loss 1.6434 LearningRate 0.0131 Epoch: 12 Global Step: 213120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:33:44,732-Speed 5136.98 samples/sec Loss 1.6840 LearningRate 0.0131 Epoch: 12 Global Step: 213130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:33:46,712-Speed 5173.17 samples/sec Loss 1.6292 LearningRate 0.0131 Epoch: 12 Global Step: 213140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:33:48,685-Speed 5189.70 samples/sec Loss 1.6803 LearningRate 0.0131 Epoch: 12 Global Step: 213150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:33:50,656-Speed 5197.38 samples/sec Loss 1.6906 LearningRate 0.0131 Epoch: 12 Global Step: 213160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:33:52,657-Speed 5121.54 samples/sec Loss 1.7041 LearningRate 0.0131 Epoch: 12 Global Step: 213170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:33:54,629-Speed 5194.22 samples/sec Loss 1.6274 LearningRate 0.0131 Epoch: 12 Global Step: 213180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:33:56,609-Speed 5173.41 samples/sec Loss 1.6603 LearningRate 0.0131 Epoch: 12 Global Step: 213190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:33:58,581-Speed 5193.40 samples/sec Loss 1.6103 LearningRate 0.0131 Epoch: 12 Global Step: 213200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:34:00,558-Speed 5181.62 samples/sec Loss 1.6458 LearningRate 0.0131 Epoch: 12 Global Step: 213210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:34:02,557-Speed 5124.63 samples/sec Loss 1.6158 LearningRate 0.0131 Epoch: 12 Global Step: 213220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:34:04,560-Speed 5113.92 samples/sec Loss 1.6608 LearningRate 0.0130 Epoch: 12 Global Step: 213230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:34:06,532-Speed 5196.64 samples/sec Loss 1.6404 LearningRate 0.0130 Epoch: 12 Global Step: 213240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:34:08,520-Speed 5151.67 samples/sec Loss 1.6675 LearningRate 0.0130 Epoch: 12 Global Step: 213250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:34:10,507-Speed 5154.63 samples/sec Loss 1.6145 LearningRate 0.0130 Epoch: 12 Global Step: 213260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:34:12,490-Speed 5166.50 samples/sec Loss 1.6252 LearningRate 0.0130 Epoch: 12 Global Step: 213270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:34:14,486-Speed 5132.39 samples/sec Loss 1.6512 LearningRate 0.0130 Epoch: 12 Global Step: 213280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:34:16,466-Speed 5172.90 samples/sec Loss 1.6122 LearningRate 0.0130 Epoch: 12 Global Step: 213290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:34:18,442-Speed 5182.74 samples/sec Loss 1.5756 LearningRate 0.0130 Epoch: 12 Global Step: 213300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:34:20,416-Speed 5188.92 samples/sec Loss 1.6424 LearningRate 0.0130 Epoch: 12 Global Step: 213310 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:34:22,399-Speed 5166.78 samples/sec Loss 1.6422 LearningRate 0.0130 Epoch: 12 Global Step: 213320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:34:24,376-Speed 5180.87 samples/sec Loss 1.6249 LearningRate 0.0130 Epoch: 12 Global Step: 213330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:34:26,343-Speed 5208.08 samples/sec Loss 1.6049 LearningRate 0.0130 Epoch: 12 Global Step: 213340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:34:28,322-Speed 5175.44 samples/sec Loss 1.6554 LearningRate 0.0130 Epoch: 12 Global Step: 213350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:34:30,305-Speed 5165.12 samples/sec Loss 1.5485 LearningRate 0.0130 Epoch: 12 Global Step: 213360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:34:32,274-Speed 5202.11 samples/sec Loss 1.6062 LearningRate 0.0130 Epoch: 12 Global Step: 213370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:34:34,262-Speed 5152.05 samples/sec Loss 1.6266 LearningRate 0.0130 Epoch: 12 Global Step: 213380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:34:36,236-Speed 5190.75 samples/sec Loss 1.6152 LearningRate 0.0130 Epoch: 12 Global Step: 213390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:34:38,211-Speed 5186.85 samples/sec Loss 1.6265 LearningRate 0.0130 Epoch: 12 Global Step: 213400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:34:40,184-Speed 5191.62 samples/sec Loss 1.6673 LearningRate 0.0130 Epoch: 12 Global Step: 213410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:34:42,164-Speed 5173.17 samples/sec Loss 1.6122 LearningRate 0.0130 Epoch: 12 Global Step: 213420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:34:44,140-Speed 5184.01 samples/sec Loss 1.6427 LearningRate 0.0130 Epoch: 12 Global Step: 213430 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:34:46,113-Speed 5191.67 samples/sec Loss 1.6536 LearningRate 0.0130 Epoch: 12 Global Step: 213440 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:34:48,095-Speed 5167.61 samples/sec Loss 1.6163 LearningRate 0.0130 Epoch: 12 Global Step: 213450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:34:50,083-Speed 5154.80 samples/sec Loss 1.6747 LearningRate 0.0130 Epoch: 12 Global Step: 213460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:34:52,063-Speed 5173.00 samples/sec Loss 1.6448 LearningRate 0.0130 Epoch: 12 Global Step: 213470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:34:54,033-Speed 5199.59 samples/sec Loss 1.6650 LearningRate 0.0130 Epoch: 12 Global Step: 213480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:34:56,008-Speed 5186.44 samples/sec Loss 1.6717 LearningRate 0.0130 Epoch: 12 Global Step: 213490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:34:57,988-Speed 5173.05 samples/sec Loss 1.6419 LearningRate 0.0130 Epoch: 12 Global Step: 213500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:35:00,003-Speed 5082.71 samples/sec Loss 1.6372 LearningRate 0.0130 Epoch: 12 Global Step: 213510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:35:01,982-Speed 5175.65 samples/sec Loss 1.6054 LearningRate 0.0130 Epoch: 12 Global Step: 213520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:35:03,953-Speed 5196.38 samples/sec Loss 1.6438 LearningRate 0.0130 Epoch: 12 Global Step: 213530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:35:05,929-Speed 5185.14 samples/sec Loss 1.6241 LearningRate 0.0130 Epoch: 12 Global Step: 213540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:35:07,899-Speed 5199.69 samples/sec Loss 1.6318 LearningRate 0.0130 Epoch: 12 Global Step: 213550 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:35:09,864-Speed 5214.72 samples/sec Loss 1.6472 LearningRate 0.0130 Epoch: 12 Global Step: 213560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:35:11,864-Speed 5120.61 samples/sec Loss 1.6302 LearningRate 0.0130 Epoch: 12 Global Step: 213570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:35:13,847-Speed 5164.97 samples/sec Loss 1.7162 LearningRate 0.0130 Epoch: 12 Global Step: 213580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:35:15,826-Speed 5176.50 samples/sec Loss 1.6528 LearningRate 0.0130 Epoch: 12 Global Step: 213590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:35:17,796-Speed 5199.19 samples/sec Loss 1.6819 LearningRate 0.0130 Epoch: 12 Global Step: 213600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:35:19,770-Speed 5189.61 samples/sec Loss 1.5669 LearningRate 0.0130 Epoch: 12 Global Step: 213610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:35:21,761-Speed 5146.14 samples/sec Loss 1.6373 LearningRate 0.0130 Epoch: 12 Global Step: 213620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:35:23,736-Speed 5184.93 samples/sec Loss 1.7006 LearningRate 0.0130 Epoch: 12 Global Step: 213630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:35:25,714-Speed 5179.57 samples/sec Loss 1.6805 LearningRate 0.0130 Epoch: 12 Global Step: 213640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:35:27,689-Speed 5187.29 samples/sec Loss 1.6272 LearningRate 0.0130 Epoch: 12 Global Step: 213650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:35:29,666-Speed 5181.37 samples/sec Loss 1.6284 LearningRate 0.0130 Epoch: 12 Global Step: 213660 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:35:31,632-Speed 5208.07 samples/sec Loss 1.6161 LearningRate 0.0130 Epoch: 12 Global Step: 213670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:35:33,626-Speed 5138.15 samples/sec Loss 1.6353 LearningRate 0.0130 Epoch: 12 Global Step: 213680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:35:35,619-Speed 5139.63 samples/sec Loss 1.6960 LearningRate 0.0130 Epoch: 12 Global Step: 213690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:35:37,614-Speed 5133.69 samples/sec Loss 1.6333 LearningRate 0.0129 Epoch: 12 Global Step: 213700 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:35:39,587-Speed 5192.74 samples/sec Loss 1.7391 LearningRate 0.0129 Epoch: 12 Global Step: 213710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:35:41,571-Speed 5162.39 samples/sec Loss 1.6169 LearningRate 0.0129 Epoch: 12 Global Step: 213720 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:35:43,555-Speed 5162.61 samples/sec Loss 1.6232 LearningRate 0.0129 Epoch: 12 Global Step: 213730 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:35:45,566-Speed 5095.49 samples/sec Loss 1.6520 LearningRate 0.0129 Epoch: 12 Global Step: 213740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:35:47,567-Speed 5118.03 samples/sec Loss 1.6418 LearningRate 0.0129 Epoch: 12 Global Step: 213750 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:35:49,569-Speed 5116.47 samples/sec Loss 1.5871 LearningRate 0.0129 Epoch: 12 Global Step: 213760 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:35:51,537-Speed 5205.75 samples/sec Loss 1.6767 LearningRate 0.0129 Epoch: 12 Global Step: 213770 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:35:53,509-Speed 5194.97 samples/sec Loss 1.6457 LearningRate 0.0129 Epoch: 12 Global Step: 213780 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:35:55,480-Speed 5195.61 samples/sec Loss 1.7315 LearningRate 0.0129 Epoch: 12 Global Step: 213790 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:35:57,472-Speed 5144.11 samples/sec Loss 1.6602 LearningRate 0.0129 Epoch: 12 Global Step: 213800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:35:59,449-Speed 5180.35 samples/sec Loss 1.6578 LearningRate 0.0129 Epoch: 12 Global Step: 213810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:36:01,432-Speed 5165.86 samples/sec Loss 1.6802 LearningRate 0.0129 Epoch: 12 Global Step: 213820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:36:03,412-Speed 5172.86 samples/sec Loss 1.6714 LearningRate 0.0129 Epoch: 12 Global Step: 213830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:36:05,420-Speed 5100.88 samples/sec Loss 1.6802 LearningRate 0.0129 Epoch: 12 Global Step: 213840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:36:07,401-Speed 5170.97 samples/sec Loss 1.6354 LearningRate 0.0129 Epoch: 12 Global Step: 213850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:36:09,382-Speed 5170.22 samples/sec Loss 1.6450 LearningRate 0.0129 Epoch: 12 Global Step: 213860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:36:11,368-Speed 5157.36 samples/sec Loss 1.6706 LearningRate 0.0129 Epoch: 12 Global Step: 213870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:36:13,370-Speed 5118.10 samples/sec Loss 1.6523 LearningRate 0.0129 Epoch: 12 Global Step: 213880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:36:15,345-Speed 5185.36 samples/sec Loss 1.6618 LearningRate 0.0129 Epoch: 12 Global Step: 213890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:36:17,318-Speed 5193.83 samples/sec Loss 1.6239 LearningRate 0.0129 Epoch: 12 Global Step: 213900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:36:19,288-Speed 5198.80 samples/sec Loss 1.6221 LearningRate 0.0129 Epoch: 12 Global Step: 213910 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:36:21,252-Speed 5216.26 samples/sec Loss 1.6521 LearningRate 0.0129 Epoch: 12 Global Step: 213920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:36:23,226-Speed 5189.45 samples/sec Loss 1.6240 LearningRate 0.0129 Epoch: 12 Global Step: 213930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:36:25,203-Speed 5179.98 samples/sec Loss 1.6329 LearningRate 0.0129 Epoch: 12 Global Step: 213940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:36:27,186-Speed 5166.88 samples/sec Loss 1.6409 LearningRate 0.0129 Epoch: 12 Global Step: 213950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:36:29,168-Speed 5167.73 samples/sec Loss 1.6540 LearningRate 0.0129 Epoch: 12 Global Step: 213960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:36:31,146-Speed 5178.06 samples/sec Loss 1.6589 LearningRate 0.0129 Epoch: 12 Global Step: 213970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:36:33,149-Speed 5114.16 samples/sec Loss 1.6456 LearningRate 0.0129 Epoch: 12 Global Step: 213980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:36:35,142-Speed 5138.25 samples/sec Loss 1.6817 LearningRate 0.0129 Epoch: 12 Global Step: 213990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:36:37,134-Speed 5143.01 samples/sec Loss 1.6475 LearningRate 0.0129 Epoch: 12 Global Step: 214000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:37:04,002-[lfw][214000]XNorm: 21.314871 Training: 2022-04-11 13:37:04,002-[lfw][214000]Accuracy-Flip: 0.99800+-0.00296 Training: 2022-04-11 13:37:04,003-[lfw][214000]Accuracy-Highest: 0.99833 Training: 2022-04-11 13:37:35,149-[cfp_fp][214000]XNorm: 20.893671 Training: 2022-04-11 13:37:35,149-[cfp_fp][214000]Accuracy-Flip: 0.98586+-0.00544 Training: 2022-04-11 13:37:35,150-[cfp_fp][214000]Accuracy-Highest: 0.98771 Training: 2022-04-11 13:38:02,059-[agedb_30][214000]XNorm: 21.965192 Training: 2022-04-11 13:38:02,060-[agedb_30][214000]Accuracy-Flip: 0.98033+-0.00748 Training: 2022-04-11 13:38:02,060-[agedb_30][214000]Accuracy-Highest: 0.98250 Training: 2022-04-11 13:38:04,049-Speed 117.82 samples/sec Loss 1.6714 LearningRate 0.0129 Epoch: 12 Global Step: 214010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:38:06,003-Speed 5240.16 samples/sec Loss 1.6632 LearningRate 0.0129 Epoch: 12 Global Step: 214020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:38:07,967-Speed 5215.96 samples/sec Loss 1.6801 LearningRate 0.0129 Epoch: 12 Global Step: 214030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:38:09,940-Speed 5192.89 samples/sec Loss 1.6913 LearningRate 0.0129 Epoch: 12 Global Step: 214040 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:38:11,904-Speed 5215.09 samples/sec Loss 1.6211 LearningRate 0.0129 Epoch: 12 Global Step: 214050 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:38:13,858-Speed 5241.50 samples/sec Loss 1.6521 LearningRate 0.0129 Epoch: 12 Global Step: 214060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:38:15,847-Speed 5149.99 samples/sec Loss 1.6263 LearningRate 0.0129 Epoch: 12 Global Step: 214070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:38:17,817-Speed 5201.35 samples/sec Loss 1.5556 LearningRate 0.0129 Epoch: 12 Global Step: 214080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:38:19,793-Speed 5183.39 samples/sec Loss 1.6351 LearningRate 0.0129 Epoch: 12 Global Step: 214090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:38:21,758-Speed 5213.15 samples/sec Loss 1.6172 LearningRate 0.0129 Epoch: 12 Global Step: 214100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:38:23,731-Speed 5190.47 samples/sec Loss 1.6807 LearningRate 0.0129 Epoch: 12 Global Step: 214110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:38:25,707-Speed 5184.59 samples/sec Loss 1.6515 LearningRate 0.0129 Epoch: 12 Global Step: 214120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:38:27,686-Speed 5174.90 samples/sec Loss 1.6305 LearningRate 0.0129 Epoch: 12 Global Step: 214130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:38:29,669-Speed 5165.80 samples/sec Loss 1.6515 LearningRate 0.0129 Epoch: 12 Global Step: 214140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:38:31,646-Speed 5181.52 samples/sec Loss 1.6595 LearningRate 0.0129 Epoch: 12 Global Step: 214150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:38:33,613-Speed 5209.38 samples/sec Loss 1.6690 LearningRate 0.0128 Epoch: 12 Global Step: 214160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:38:35,583-Speed 5199.87 samples/sec Loss 1.6804 LearningRate 0.0128 Epoch: 12 Global Step: 214170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:38:37,561-Speed 5176.62 samples/sec Loss 1.7169 LearningRate 0.0128 Epoch: 12 Global Step: 214180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:38:39,531-Speed 5199.92 samples/sec Loss 1.6172 LearningRate 0.0128 Epoch: 12 Global Step: 214190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:38:41,493-Speed 5221.23 samples/sec Loss 1.6288 LearningRate 0.0128 Epoch: 12 Global Step: 214200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:38:43,460-Speed 5207.61 samples/sec Loss 1.6308 LearningRate 0.0128 Epoch: 12 Global Step: 214210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:38:45,459-Speed 5123.62 samples/sec Loss 1.6984 LearningRate 0.0128 Epoch: 12 Global Step: 214220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:38:47,438-Speed 5176.56 samples/sec Loss 1.6699 LearningRate 0.0128 Epoch: 12 Global Step: 214230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:38:49,413-Speed 5185.73 samples/sec Loss 1.6291 LearningRate 0.0128 Epoch: 12 Global Step: 214240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:38:51,388-Speed 5187.70 samples/sec Loss 1.5627 LearningRate 0.0128 Epoch: 12 Global Step: 214250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:38:53,360-Speed 5195.75 samples/sec Loss 1.6543 LearningRate 0.0128 Epoch: 12 Global Step: 214260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:38:55,332-Speed 5192.47 samples/sec Loss 1.6058 LearningRate 0.0128 Epoch: 12 Global Step: 214270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:38:57,299-Speed 5208.33 samples/sec Loss 1.6494 LearningRate 0.0128 Epoch: 12 Global Step: 214280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:38:59,267-Speed 5206.70 samples/sec Loss 1.6207 LearningRate 0.0128 Epoch: 12 Global Step: 214290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:39:01,234-Speed 5205.55 samples/sec Loss 1.6770 LearningRate 0.0128 Epoch: 12 Global Step: 214300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:39:03,205-Speed 5198.45 samples/sec Loss 1.6107 LearningRate 0.0128 Epoch: 12 Global Step: 214310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:39:05,179-Speed 5186.99 samples/sec Loss 1.6404 LearningRate 0.0128 Epoch: 12 Global Step: 214320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:39:07,147-Speed 5207.59 samples/sec Loss 1.6544 LearningRate 0.0128 Epoch: 12 Global Step: 214330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:39:09,118-Speed 5195.37 samples/sec Loss 1.6858 LearningRate 0.0128 Epoch: 12 Global Step: 214340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:39:11,102-Speed 5164.43 samples/sec Loss 1.6164 LearningRate 0.0128 Epoch: 12 Global Step: 214350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:39:13,067-Speed 5210.65 samples/sec Loss 1.6539 LearningRate 0.0128 Epoch: 12 Global Step: 214360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:39:15,027-Speed 5226.45 samples/sec Loss 1.6338 LearningRate 0.0128 Epoch: 12 Global Step: 214370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:39:17,006-Speed 5177.76 samples/sec Loss 1.6547 LearningRate 0.0128 Epoch: 12 Global Step: 214380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:39:18,989-Speed 5167.34 samples/sec Loss 1.6741 LearningRate 0.0128 Epoch: 12 Global Step: 214390 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:39:20,958-Speed 5201.02 samples/sec Loss 1.6343 LearningRate 0.0128 Epoch: 12 Global Step: 214400 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:39:22,961-Speed 5115.28 samples/sec Loss 1.6207 LearningRate 0.0128 Epoch: 12 Global Step: 214410 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:39:24,933-Speed 5193.66 samples/sec Loss 1.6224 LearningRate 0.0128 Epoch: 12 Global Step: 214420 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:39:26,920-Speed 5153.84 samples/sec Loss 1.6952 LearningRate 0.0128 Epoch: 12 Global Step: 214430 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:39:28,898-Speed 5179.79 samples/sec Loss 1.6742 LearningRate 0.0128 Epoch: 12 Global Step: 214440 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:39:30,877-Speed 5175.78 samples/sec Loss 1.6390 LearningRate 0.0128 Epoch: 12 Global Step: 214450 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:39:32,850-Speed 5190.54 samples/sec Loss 1.6186 LearningRate 0.0128 Epoch: 12 Global Step: 214460 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:39:34,834-Speed 5163.78 samples/sec Loss 1.6832 LearningRate 0.0128 Epoch: 12 Global Step: 214470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:39:36,813-Speed 5174.52 samples/sec Loss 1.6475 LearningRate 0.0128 Epoch: 12 Global Step: 214480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:39:38,785-Speed 5195.78 samples/sec Loss 1.6136 LearningRate 0.0128 Epoch: 12 Global Step: 214490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:39:40,755-Speed 5200.92 samples/sec Loss 1.6619 LearningRate 0.0128 Epoch: 12 Global Step: 214500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:39:42,720-Speed 5212.77 samples/sec Loss 1.6512 LearningRate 0.0128 Epoch: 12 Global Step: 214510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:39:44,686-Speed 5210.61 samples/sec Loss 1.6206 LearningRate 0.0128 Epoch: 12 Global Step: 214520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:39:46,675-Speed 5148.75 samples/sec Loss 1.6222 LearningRate 0.0128 Epoch: 12 Global Step: 214530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:39:48,647-Speed 5195.76 samples/sec Loss 1.6225 LearningRate 0.0128 Epoch: 12 Global Step: 214540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:39:50,619-Speed 5194.68 samples/sec Loss 1.6599 LearningRate 0.0128 Epoch: 12 Global Step: 214550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:39:52,583-Speed 5213.44 samples/sec Loss 1.6602 LearningRate 0.0128 Epoch: 12 Global Step: 214560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:39:54,563-Speed 5173.13 samples/sec Loss 1.6421 LearningRate 0.0128 Epoch: 12 Global Step: 214570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:39:56,530-Speed 5209.43 samples/sec Loss 1.6437 LearningRate 0.0128 Epoch: 12 Global Step: 214580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:39:58,504-Speed 5189.80 samples/sec Loss 1.6358 LearningRate 0.0128 Epoch: 12 Global Step: 214590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:40:00,506-Speed 5115.46 samples/sec Loss 1.6837 LearningRate 0.0128 Epoch: 12 Global Step: 214600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:40:02,495-Speed 5150.45 samples/sec Loss 1.6482 LearningRate 0.0128 Epoch: 12 Global Step: 214610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:40:04,462-Speed 5207.12 samples/sec Loss 1.6660 LearningRate 0.0128 Epoch: 12 Global Step: 214620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:40:06,453-Speed 5145.21 samples/sec Loss 1.6980 LearningRate 0.0127 Epoch: 12 Global Step: 214630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:40:08,436-Speed 5166.67 samples/sec Loss 1.6630 LearningRate 0.0127 Epoch: 12 Global Step: 214640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:40:10,428-Speed 5140.07 samples/sec Loss 1.6461 LearningRate 0.0127 Epoch: 12 Global Step: 214650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:40:12,410-Speed 5169.40 samples/sec Loss 1.6519 LearningRate 0.0127 Epoch: 12 Global Step: 214660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:40:14,402-Speed 5140.75 samples/sec Loss 1.5936 LearningRate 0.0127 Epoch: 12 Global Step: 214670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:40:16,373-Speed 5197.13 samples/sec Loss 1.6299 LearningRate 0.0127 Epoch: 12 Global Step: 214680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:40:18,348-Speed 5187.48 samples/sec Loss 1.6294 LearningRate 0.0127 Epoch: 12 Global Step: 214690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:40:20,312-Speed 5216.50 samples/sec Loss 1.6541 LearningRate 0.0127 Epoch: 12 Global Step: 214700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:40:22,274-Speed 5221.43 samples/sec Loss 1.6684 LearningRate 0.0127 Epoch: 12 Global Step: 214710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:40:24,263-Speed 5150.18 samples/sec Loss 1.6113 LearningRate 0.0127 Epoch: 12 Global Step: 214720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:40:26,256-Speed 5139.52 samples/sec Loss 1.6945 LearningRate 0.0127 Epoch: 12 Global Step: 214730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:40:28,220-Speed 5214.68 samples/sec Loss 1.6720 LearningRate 0.0127 Epoch: 12 Global Step: 214740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:40:30,188-Speed 5205.35 samples/sec Loss 1.6703 LearningRate 0.0127 Epoch: 12 Global Step: 214750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:40:32,173-Speed 5161.93 samples/sec Loss 1.6593 LearningRate 0.0127 Epoch: 12 Global Step: 214760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:40:34,148-Speed 5185.18 samples/sec Loss 1.6616 LearningRate 0.0127 Epoch: 12 Global Step: 214770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:40:36,118-Speed 5198.98 samples/sec Loss 1.6004 LearningRate 0.0127 Epoch: 12 Global Step: 214780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:40:38,104-Speed 5158.67 samples/sec Loss 1.6751 LearningRate 0.0127 Epoch: 12 Global Step: 214790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:40:40,084-Speed 5172.33 samples/sec Loss 1.6806 LearningRate 0.0127 Epoch: 12 Global Step: 214800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:40:42,061-Speed 5181.50 samples/sec Loss 1.6032 LearningRate 0.0127 Epoch: 12 Global Step: 214810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:40:44,064-Speed 5115.24 samples/sec Loss 1.6647 LearningRate 0.0127 Epoch: 12 Global Step: 214820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:40:46,041-Speed 5179.08 samples/sec Loss 1.6719 LearningRate 0.0127 Epoch: 12 Global Step: 214830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:40:48,016-Speed 5187.56 samples/sec Loss 1.6318 LearningRate 0.0127 Epoch: 12 Global Step: 214840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:40:49,976-Speed 5229.09 samples/sec Loss 1.6767 LearningRate 0.0127 Epoch: 12 Global Step: 214850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:40:51,941-Speed 5212.65 samples/sec Loss 1.6670 LearningRate 0.0127 Epoch: 12 Global Step: 214860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:40:53,905-Speed 5215.34 samples/sec Loss 1.6194 LearningRate 0.0127 Epoch: 12 Global Step: 214870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:40:55,866-Speed 5221.02 samples/sec Loss 1.6992 LearningRate 0.0127 Epoch: 12 Global Step: 214880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:40:57,847-Speed 5172.80 samples/sec Loss 1.6475 LearningRate 0.0127 Epoch: 12 Global Step: 214890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:40:59,814-Speed 5207.56 samples/sec Loss 1.6538 LearningRate 0.0127 Epoch: 12 Global Step: 214900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:41:01,778-Speed 5214.67 samples/sec Loss 1.6853 LearningRate 0.0127 Epoch: 12 Global Step: 214910 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:41:03,748-Speed 5198.94 samples/sec Loss 1.6899 LearningRate 0.0127 Epoch: 12 Global Step: 214920 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:41:05,715-Speed 5208.79 samples/sec Loss 1.6256 LearningRate 0.0127 Epoch: 12 Global Step: 214930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:41:07,685-Speed 5198.62 samples/sec Loss 1.6226 LearningRate 0.0127 Epoch: 12 Global Step: 214940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:41:09,664-Speed 5176.65 samples/sec Loss 1.6496 LearningRate 0.0127 Epoch: 12 Global Step: 214950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:41:11,634-Speed 5207.41 samples/sec Loss 1.6128 LearningRate 0.0127 Epoch: 12 Global Step: 214960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:41:13,606-Speed 5192.65 samples/sec Loss 1.6170 LearningRate 0.0127 Epoch: 12 Global Step: 214970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:41:15,577-Speed 5198.26 samples/sec Loss 1.6060 LearningRate 0.0127 Epoch: 12 Global Step: 214980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:41:17,545-Speed 5203.63 samples/sec Loss 1.6209 LearningRate 0.0127 Epoch: 12 Global Step: 214990 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:41:19,516-Speed 5197.27 samples/sec Loss 1.6346 LearningRate 0.0127 Epoch: 12 Global Step: 215000 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:41:21,490-Speed 5190.31 samples/sec Loss 1.6180 LearningRate 0.0127 Epoch: 12 Global Step: 215010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:41:23,455-Speed 5211.86 samples/sec Loss 1.6013 LearningRate 0.0127 Epoch: 12 Global Step: 215020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:41:25,429-Speed 5189.96 samples/sec Loss 1.6738 LearningRate 0.0127 Epoch: 12 Global Step: 215030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:41:27,401-Speed 5194.30 samples/sec Loss 1.6249 LearningRate 0.0127 Epoch: 12 Global Step: 215040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:41:29,382-Speed 5169.27 samples/sec Loss 1.6223 LearningRate 0.0127 Epoch: 12 Global Step: 215050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:41:31,341-Speed 5230.58 samples/sec Loss 1.6342 LearningRate 0.0127 Epoch: 12 Global Step: 215060 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:41:33,307-Speed 5210.83 samples/sec Loss 1.6503 LearningRate 0.0127 Epoch: 12 Global Step: 215070 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:41:35,274-Speed 5208.13 samples/sec Loss 1.6414 LearningRate 0.0127 Epoch: 12 Global Step: 215080 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:41:37,261-Speed 5154.98 samples/sec Loss 1.7070 LearningRate 0.0127 Epoch: 12 Global Step: 215090 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:41:39,228-Speed 5207.17 samples/sec Loss 1.6549 LearningRate 0.0126 Epoch: 12 Global Step: 215100 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:41:41,192-Speed 5215.11 samples/sec Loss 1.6510 LearningRate 0.0126 Epoch: 12 Global Step: 215110 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:41:43,153-Speed 5223.82 samples/sec Loss 1.6830 LearningRate 0.0126 Epoch: 12 Global Step: 215120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:41:45,121-Speed 5205.49 samples/sec Loss 1.6820 LearningRate 0.0126 Epoch: 12 Global Step: 215130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:41:47,096-Speed 5186.70 samples/sec Loss 1.6221 LearningRate 0.0126 Epoch: 12 Global Step: 215140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:41:49,097-Speed 5117.54 samples/sec Loss 1.6368 LearningRate 0.0126 Epoch: 12 Global Step: 215150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:41:51,090-Speed 5139.87 samples/sec Loss 1.6085 LearningRate 0.0126 Epoch: 12 Global Step: 215160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:41:53,059-Speed 5202.59 samples/sec Loss 1.6442 LearningRate 0.0126 Epoch: 12 Global Step: 215170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:41:55,050-Speed 5144.46 samples/sec Loss 1.6380 LearningRate 0.0126 Epoch: 12 Global Step: 215180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:41:57,020-Speed 5201.98 samples/sec Loss 1.6181 LearningRate 0.0126 Epoch: 12 Global Step: 215190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:41:58,979-Speed 5229.21 samples/sec Loss 1.6044 LearningRate 0.0126 Epoch: 12 Global Step: 215200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:00,961-Speed 5166.96 samples/sec Loss 1.5855 LearningRate 0.0126 Epoch: 12 Global Step: 215210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:02,923-Speed 5219.95 samples/sec Loss 1.6807 LearningRate 0.0126 Epoch: 12 Global Step: 215220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:04,888-Speed 5212.57 samples/sec Loss 1.6409 LearningRate 0.0126 Epoch: 12 Global Step: 215230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:06,869-Speed 5172.51 samples/sec Loss 1.6613 LearningRate 0.0126 Epoch: 12 Global Step: 215240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:08,846-Speed 5179.49 samples/sec Loss 1.6976 LearningRate 0.0126 Epoch: 12 Global Step: 215250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:10,815-Speed 5203.00 samples/sec Loss 1.6028 LearningRate 0.0126 Epoch: 12 Global Step: 215260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:12,796-Speed 5170.82 samples/sec Loss 1.6520 LearningRate 0.0126 Epoch: 12 Global Step: 215270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:14,764-Speed 5204.39 samples/sec Loss 1.6623 LearningRate 0.0126 Epoch: 12 Global Step: 215280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:16,765-Speed 5119.32 samples/sec Loss 1.6587 LearningRate 0.0126 Epoch: 12 Global Step: 215290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:18,741-Speed 5185.70 samples/sec Loss 1.6428 LearningRate 0.0126 Epoch: 12 Global Step: 215300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:20,704-Speed 5219.10 samples/sec Loss 1.6036 LearningRate 0.0126 Epoch: 12 Global Step: 215310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:22,677-Speed 5190.43 samples/sec Loss 1.6461 LearningRate 0.0126 Epoch: 12 Global Step: 215320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:24,647-Speed 5199.68 samples/sec Loss 1.6517 LearningRate 0.0126 Epoch: 12 Global Step: 215330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:26,632-Speed 5160.11 samples/sec Loss 1.6327 LearningRate 0.0126 Epoch: 12 Global Step: 215340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:28,605-Speed 5193.26 samples/sec Loss 1.6290 LearningRate 0.0126 Epoch: 12 Global Step: 215350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:30,590-Speed 5158.67 samples/sec Loss 1.6422 LearningRate 0.0126 Epoch: 12 Global Step: 215360 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:42:32,559-Speed 5203.76 samples/sec Loss 1.6631 LearningRate 0.0126 Epoch: 12 Global Step: 215370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:34,538-Speed 5174.51 samples/sec Loss 1.6453 LearningRate 0.0126 Epoch: 12 Global Step: 215380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:36,535-Speed 5130.30 samples/sec Loss 1.6541 LearningRate 0.0126 Epoch: 12 Global Step: 215390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:38,510-Speed 5186.09 samples/sec Loss 1.6395 LearningRate 0.0126 Epoch: 12 Global Step: 215400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:40,480-Speed 5198.48 samples/sec Loss 1.6625 LearningRate 0.0126 Epoch: 12 Global Step: 215410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:42,447-Speed 5209.66 samples/sec Loss 1.5925 LearningRate 0.0126 Epoch: 12 Global Step: 215420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:44,414-Speed 5206.59 samples/sec Loss 1.6407 LearningRate 0.0126 Epoch: 12 Global Step: 215430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:46,436-Speed 5067.00 samples/sec Loss 1.6428 LearningRate 0.0126 Epoch: 12 Global Step: 215440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:48,405-Speed 5201.20 samples/sec Loss 1.7005 LearningRate 0.0126 Epoch: 12 Global Step: 215450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:50,382-Speed 5183.30 samples/sec Loss 1.6095 LearningRate 0.0126 Epoch: 12 Global Step: 215460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:52,341-Speed 5227.04 samples/sec Loss 1.6484 LearningRate 0.0126 Epoch: 12 Global Step: 215470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:54,309-Speed 5205.75 samples/sec Loss 1.6013 LearningRate 0.0126 Epoch: 12 Global Step: 215480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:56,281-Speed 5193.09 samples/sec Loss 1.6016 LearningRate 0.0126 Epoch: 12 Global Step: 215490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:42:58,257-Speed 5186.05 samples/sec Loss 1.6644 LearningRate 0.0126 Epoch: 12 Global Step: 215500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:43:00,238-Speed 5170.17 samples/sec Loss 1.6732 LearningRate 0.0126 Epoch: 12 Global Step: 215510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:43:02,222-Speed 5163.42 samples/sec Loss 1.6343 LearningRate 0.0126 Epoch: 12 Global Step: 215520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:43:04,198-Speed 5181.56 samples/sec Loss 1.6497 LearningRate 0.0126 Epoch: 12 Global Step: 215530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:43:06,173-Speed 5187.08 samples/sec Loss 1.7162 LearningRate 0.0126 Epoch: 12 Global Step: 215540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:43:08,149-Speed 5184.77 samples/sec Loss 1.6682 LearningRate 0.0126 Epoch: 12 Global Step: 215550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:43:10,121-Speed 5195.09 samples/sec Loss 1.6942 LearningRate 0.0126 Epoch: 12 Global Step: 215560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:43:12,096-Speed 5186.57 samples/sec Loss 1.6842 LearningRate 0.0125 Epoch: 12 Global Step: 215570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:43:14,088-Speed 5141.50 samples/sec Loss 1.6709 LearningRate 0.0125 Epoch: 12 Global Step: 215580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:43:16,065-Speed 5180.94 samples/sec Loss 1.6630 LearningRate 0.0125 Epoch: 12 Global Step: 215590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:43:18,033-Speed 5206.10 samples/sec Loss 1.6666 LearningRate 0.0125 Epoch: 12 Global Step: 215600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:43:20,004-Speed 5196.91 samples/sec Loss 1.6060 LearningRate 0.0125 Epoch: 12 Global Step: 215610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:43:21,991-Speed 5155.33 samples/sec Loss 1.5849 LearningRate 0.0125 Epoch: 12 Global Step: 215620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:43:23,990-Speed 5124.01 samples/sec Loss 1.6117 LearningRate 0.0125 Epoch: 12 Global Step: 215630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:43:25,980-Speed 5147.00 samples/sec Loss 1.6423 LearningRate 0.0125 Epoch: 12 Global Step: 215640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:43:27,947-Speed 5207.07 samples/sec Loss 1.6780 LearningRate 0.0125 Epoch: 12 Global Step: 215650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:43:29,932-Speed 5162.52 samples/sec Loss 1.6802 LearningRate 0.0125 Epoch: 12 Global Step: 215660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:43:31,929-Speed 5130.27 samples/sec Loss 1.5780 LearningRate 0.0125 Epoch: 12 Global Step: 215670 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:43:33,909-Speed 5173.02 samples/sec Loss 1.6531 LearningRate 0.0125 Epoch: 12 Global Step: 215680 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:43:35,884-Speed 5185.54 samples/sec Loss 1.6670 LearningRate 0.0125 Epoch: 12 Global Step: 215690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:43:37,869-Speed 5161.98 samples/sec Loss 1.6613 LearningRate 0.0125 Epoch: 12 Global Step: 215700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:43:39,883-Speed 5085.99 samples/sec Loss 1.6265 LearningRate 0.0125 Epoch: 12 Global Step: 215710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:43:41,877-Speed 5137.93 samples/sec Loss 1.6600 LearningRate 0.0125 Epoch: 12 Global Step: 215720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:43:43,847-Speed 5198.91 samples/sec Loss 1.6214 LearningRate 0.0125 Epoch: 12 Global Step: 215730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:43:45,819-Speed 5193.97 samples/sec Loss 1.6208 LearningRate 0.0125 Epoch: 12 Global Step: 215740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:43:47,796-Speed 5182.31 samples/sec Loss 1.6569 LearningRate 0.0125 Epoch: 12 Global Step: 215750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:43:49,790-Speed 5138.57 samples/sec Loss 1.6395 LearningRate 0.0125 Epoch: 12 Global Step: 215760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:43:51,772-Speed 5167.15 samples/sec Loss 1.5972 LearningRate 0.0125 Epoch: 12 Global Step: 215770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:43:53,744-Speed 5195.73 samples/sec Loss 1.6144 LearningRate 0.0125 Epoch: 12 Global Step: 215780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:43:55,713-Speed 5201.99 samples/sec Loss 1.5836 LearningRate 0.0125 Epoch: 12 Global Step: 215790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:43:57,684-Speed 5197.75 samples/sec Loss 1.6950 LearningRate 0.0125 Epoch: 12 Global Step: 215800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:43:59,716-Speed 5041.09 samples/sec Loss 1.6025 LearningRate 0.0125 Epoch: 12 Global Step: 215810 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:44:01,696-Speed 5173.15 samples/sec Loss 1.7551 LearningRate 0.0125 Epoch: 12 Global Step: 215820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:44:03,673-Speed 5181.09 samples/sec Loss 1.7147 LearningRate 0.0125 Epoch: 12 Global Step: 215830 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:44:05,640-Speed 5207.28 samples/sec Loss 1.6559 LearningRate 0.0125 Epoch: 12 Global Step: 215840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:44:07,602-Speed 5220.45 samples/sec Loss 1.6542 LearningRate 0.0125 Epoch: 12 Global Step: 215850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:44:09,596-Speed 5136.22 samples/sec Loss 1.6528 LearningRate 0.0125 Epoch: 12 Global Step: 215860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:44:11,571-Speed 5186.81 samples/sec Loss 1.6424 LearningRate 0.0125 Epoch: 12 Global Step: 215870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:44:13,544-Speed 5191.96 samples/sec Loss 1.6546 LearningRate 0.0125 Epoch: 12 Global Step: 215880 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:44:15,511-Speed 5207.51 samples/sec Loss 1.6552 LearningRate 0.0125 Epoch: 12 Global Step: 215890 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:44:17,496-Speed 5160.76 samples/sec Loss 1.6787 LearningRate 0.0125 Epoch: 12 Global Step: 215900 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:44:19,475-Speed 5177.76 samples/sec Loss 1.5886 LearningRate 0.0125 Epoch: 12 Global Step: 215910 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:44:21,444-Speed 5199.62 samples/sec Loss 1.6870 LearningRate 0.0125 Epoch: 12 Global Step: 215920 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:44:23,430-Speed 5158.38 samples/sec Loss 1.6164 LearningRate 0.0125 Epoch: 12 Global Step: 215930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:44:25,414-Speed 5163.75 samples/sec Loss 1.6994 LearningRate 0.0125 Epoch: 12 Global Step: 215940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:44:27,392-Speed 5178.30 samples/sec Loss 1.6124 LearningRate 0.0125 Epoch: 12 Global Step: 215950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:44:29,375-Speed 5165.06 samples/sec Loss 1.6442 LearningRate 0.0125 Epoch: 12 Global Step: 215960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:44:31,356-Speed 5170.54 samples/sec Loss 1.6920 LearningRate 0.0125 Epoch: 12 Global Step: 215970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:44:33,375-Speed 5074.17 samples/sec Loss 1.6515 LearningRate 0.0125 Epoch: 12 Global Step: 215980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:44:35,375-Speed 5122.07 samples/sec Loss 1.6694 LearningRate 0.0125 Epoch: 12 Global Step: 215990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:44:37,349-Speed 5189.28 samples/sec Loss 1.6292 LearningRate 0.0125 Epoch: 12 Global Step: 216000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:45:04,076-[lfw][216000]XNorm: 22.108111 Training: 2022-04-11 13:45:04,076-[lfw][216000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 13:45:04,077-[lfw][216000]Accuracy-Highest: 0.99833 Training: 2022-04-11 13:45:34,813-[cfp_fp][216000]XNorm: 21.317389 Training: 2022-04-11 13:45:34,814-[cfp_fp][216000]Accuracy-Flip: 0.98700+-0.00421 Training: 2022-04-11 13:45:34,814-[cfp_fp][216000]Accuracy-Highest: 0.98771 Training: 2022-04-11 13:46:01,324-[agedb_30][216000]XNorm: 22.684711 Training: 2022-04-11 13:46:01,324-[agedb_30][216000]Accuracy-Flip: 0.98150+-0.00787 Training: 2022-04-11 13:46:01,325-[agedb_30][216000]Accuracy-Highest: 0.98250 Training: 2022-04-11 13:46:03,325-Speed 119.10 samples/sec Loss 1.6415 LearningRate 0.0125 Epoch: 12 Global Step: 216010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:46:05,304-Speed 5173.81 samples/sec Loss 1.5944 LearningRate 0.0125 Epoch: 12 Global Step: 216020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:46:07,273-Speed 5202.75 samples/sec Loss 1.6061 LearningRate 0.0125 Epoch: 12 Global Step: 216030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:46:09,268-Speed 5134.66 samples/sec Loss 1.6845 LearningRate 0.0124 Epoch: 12 Global Step: 216040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:46:11,237-Speed 5203.71 samples/sec Loss 1.6208 LearningRate 0.0124 Epoch: 12 Global Step: 216050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:46:13,213-Speed 5183.04 samples/sec Loss 1.6386 LearningRate 0.0124 Epoch: 12 Global Step: 216060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:46:15,182-Speed 5201.18 samples/sec Loss 1.6799 LearningRate 0.0124 Epoch: 12 Global Step: 216070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:46:17,163-Speed 5170.86 samples/sec Loss 1.6552 LearningRate 0.0124 Epoch: 12 Global Step: 216080 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:46:19,131-Speed 5204.90 samples/sec Loss 1.6416 LearningRate 0.0124 Epoch: 12 Global Step: 216090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:46:21,116-Speed 5162.12 samples/sec Loss 1.6459 LearningRate 0.0124 Epoch: 12 Global Step: 216100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:46:23,108-Speed 5142.12 samples/sec Loss 1.5837 LearningRate 0.0124 Epoch: 12 Global Step: 216110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:46:25,086-Speed 5177.18 samples/sec Loss 1.6922 LearningRate 0.0124 Epoch: 12 Global Step: 216120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:46:27,067-Speed 5170.77 samples/sec Loss 1.6279 LearningRate 0.0124 Epoch: 12 Global Step: 216130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:46:29,063-Speed 5133.63 samples/sec Loss 1.6129 LearningRate 0.0124 Epoch: 12 Global Step: 216140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:46:31,038-Speed 5187.74 samples/sec Loss 1.6357 LearningRate 0.0124 Epoch: 12 Global Step: 216150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:46:33,023-Speed 5160.34 samples/sec Loss 1.6177 LearningRate 0.0124 Epoch: 12 Global Step: 216160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:46:35,001-Speed 5177.58 samples/sec Loss 1.6101 LearningRate 0.0124 Epoch: 12 Global Step: 216170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:46:36,980-Speed 5176.30 samples/sec Loss 1.6827 LearningRate 0.0124 Epoch: 12 Global Step: 216180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:46:39,008-Speed 5051.86 samples/sec Loss 1.6692 LearningRate 0.0124 Epoch: 12 Global Step: 216190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:46:41,013-Speed 5106.78 samples/sec Loss 1.6256 LearningRate 0.0124 Epoch: 12 Global Step: 216200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:46:43,003-Speed 5149.18 samples/sec Loss 1.6931 LearningRate 0.0124 Epoch: 12 Global Step: 216210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:46:44,997-Speed 5135.70 samples/sec Loss 1.6626 LearningRate 0.0124 Epoch: 12 Global Step: 216220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:46:46,983-Speed 5157.60 samples/sec Loss 1.7210 LearningRate 0.0124 Epoch: 12 Global Step: 216230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:46:48,969-Speed 5158.90 samples/sec Loss 1.6592 LearningRate 0.0124 Epoch: 12 Global Step: 216240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:46:50,966-Speed 5129.82 samples/sec Loss 1.6475 LearningRate 0.0124 Epoch: 12 Global Step: 216250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:46:52,961-Speed 5134.69 samples/sec Loss 1.6227 LearningRate 0.0124 Epoch: 12 Global Step: 216260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:46:54,963-Speed 5116.37 samples/sec Loss 1.6780 LearningRate 0.0124 Epoch: 12 Global Step: 216270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:46:56,941-Speed 5178.91 samples/sec Loss 1.7009 LearningRate 0.0124 Epoch: 12 Global Step: 216280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:46:58,941-Speed 5122.69 samples/sec Loss 1.6067 LearningRate 0.0124 Epoch: 12 Global Step: 216290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:47:00,939-Speed 5126.53 samples/sec Loss 1.6787 LearningRate 0.0124 Epoch: 12 Global Step: 216300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:47:02,929-Speed 5147.40 samples/sec Loss 1.5935 LearningRate 0.0124 Epoch: 12 Global Step: 216310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:47:04,911-Speed 5167.08 samples/sec Loss 1.6416 LearningRate 0.0124 Epoch: 12 Global Step: 216320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:47:06,888-Speed 5181.12 samples/sec Loss 1.6405 LearningRate 0.0124 Epoch: 12 Global Step: 216330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:47:08,872-Speed 5162.44 samples/sec Loss 1.5869 LearningRate 0.0124 Epoch: 12 Global Step: 216340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:47:10,847-Speed 5185.50 samples/sec Loss 1.6588 LearningRate 0.0124 Epoch: 12 Global Step: 216350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:47:12,821-Speed 5192.03 samples/sec Loss 1.5944 LearningRate 0.0124 Epoch: 12 Global Step: 216360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:47:14,809-Speed 5150.70 samples/sec Loss 1.6115 LearningRate 0.0124 Epoch: 12 Global Step: 216370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:47:16,792-Speed 5168.09 samples/sec Loss 1.6612 LearningRate 0.0124 Epoch: 12 Global Step: 216380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:47:18,766-Speed 5188.15 samples/sec Loss 1.5939 LearningRate 0.0124 Epoch: 12 Global Step: 216390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:47:20,738-Speed 5194.24 samples/sec Loss 1.6738 LearningRate 0.0124 Epoch: 12 Global Step: 216400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:47:22,747-Speed 5099.69 samples/sec Loss 1.6671 LearningRate 0.0124 Epoch: 12 Global Step: 216410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:47:24,748-Speed 5118.89 samples/sec Loss 1.6376 LearningRate 0.0124 Epoch: 12 Global Step: 216420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:47:26,731-Speed 5164.37 samples/sec Loss 1.6222 LearningRate 0.0124 Epoch: 12 Global Step: 216430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:47:28,702-Speed 5197.15 samples/sec Loss 1.6493 LearningRate 0.0124 Epoch: 12 Global Step: 216440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:47:30,677-Speed 5186.45 samples/sec Loss 1.6694 LearningRate 0.0124 Epoch: 12 Global Step: 216450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:47:32,667-Speed 5148.33 samples/sec Loss 1.6211 LearningRate 0.0124 Epoch: 12 Global Step: 216460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:47:34,655-Speed 5151.96 samples/sec Loss 1.6277 LearningRate 0.0124 Epoch: 12 Global Step: 216470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:47:36,635-Speed 5173.09 samples/sec Loss 1.6818 LearningRate 0.0124 Epoch: 12 Global Step: 216480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:47:38,641-Speed 5106.40 samples/sec Loss 1.6520 LearningRate 0.0124 Epoch: 12 Global Step: 216490 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:47:40,622-Speed 5172.69 samples/sec Loss 1.6323 LearningRate 0.0124 Epoch: 12 Global Step: 216500 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:47:42,592-Speed 5199.74 samples/sec Loss 1.6453 LearningRate 0.0123 Epoch: 12 Global Step: 216510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:47:44,563-Speed 5196.93 samples/sec Loss 1.6086 LearningRate 0.0123 Epoch: 12 Global Step: 216520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:47:46,535-Speed 5194.68 samples/sec Loss 1.7050 LearningRate 0.0123 Epoch: 12 Global Step: 216530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:47:48,506-Speed 5195.21 samples/sec Loss 1.6545 LearningRate 0.0123 Epoch: 12 Global Step: 216540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:47:50,479-Speed 5191.71 samples/sec Loss 1.6896 LearningRate 0.0123 Epoch: 12 Global Step: 216550 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:47:52,457-Speed 5178.99 samples/sec Loss 1.6192 LearningRate 0.0123 Epoch: 12 Global Step: 216560 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:47:54,437-Speed 5172.45 samples/sec Loss 1.6643 LearningRate 0.0123 Epoch: 12 Global Step: 216570 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:47:56,430-Speed 5141.99 samples/sec Loss 1.6081 LearningRate 0.0123 Epoch: 12 Global Step: 216580 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:47:58,432-Speed 5116.55 samples/sec Loss 1.6409 LearningRate 0.0123 Epoch: 12 Global Step: 216590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:48:00,403-Speed 5196.27 samples/sec Loss 1.6126 LearningRate 0.0123 Epoch: 12 Global Step: 216600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:48:02,377-Speed 5189.53 samples/sec Loss 1.6890 LearningRate 0.0123 Epoch: 12 Global Step: 216610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:48:04,362-Speed 5161.21 samples/sec Loss 1.6663 LearningRate 0.0123 Epoch: 12 Global Step: 216620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:48:06,339-Speed 5179.51 samples/sec Loss 1.7185 LearningRate 0.0123 Epoch: 12 Global Step: 216630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:48:08,306-Speed 5207.19 samples/sec Loss 1.6247 LearningRate 0.0123 Epoch: 12 Global Step: 216640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:48:10,283-Speed 5181.72 samples/sec Loss 1.6493 LearningRate 0.0123 Epoch: 12 Global Step: 216650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:48:12,282-Speed 5125.65 samples/sec Loss 1.6433 LearningRate 0.0123 Epoch: 12 Global Step: 216660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:48:14,249-Speed 5206.34 samples/sec Loss 1.6593 LearningRate 0.0123 Epoch: 12 Global Step: 216670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:48:16,217-Speed 5206.70 samples/sec Loss 1.6844 LearningRate 0.0123 Epoch: 12 Global Step: 216680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:48:18,204-Speed 5154.19 samples/sec Loss 1.6031 LearningRate 0.0123 Epoch: 12 Global Step: 216690 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:48:20,182-Speed 5178.01 samples/sec Loss 1.6808 LearningRate 0.0123 Epoch: 12 Global Step: 216700 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:48:22,152-Speed 5199.52 samples/sec Loss 1.6486 LearningRate 0.0123 Epoch: 12 Global Step: 216710 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:48:24,130-Speed 5180.43 samples/sec Loss 1.5999 LearningRate 0.0123 Epoch: 12 Global Step: 216720 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:48:26,099-Speed 5200.95 samples/sec Loss 1.6037 LearningRate 0.0123 Epoch: 12 Global Step: 216730 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:48:28,082-Speed 5166.63 samples/sec Loss 1.6690 LearningRate 0.0123 Epoch: 12 Global Step: 216740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:48:30,060-Speed 5177.77 samples/sec Loss 1.6328 LearningRate 0.0123 Epoch: 12 Global Step: 216750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:48:32,038-Speed 5180.12 samples/sec Loss 1.6257 LearningRate 0.0123 Epoch: 12 Global Step: 216760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:48:34,008-Speed 5198.59 samples/sec Loss 1.6039 LearningRate 0.0123 Epoch: 12 Global Step: 216770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:48:35,986-Speed 5178.95 samples/sec Loss 1.6280 LearningRate 0.0123 Epoch: 12 Global Step: 216780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:48:37,974-Speed 5153.02 samples/sec Loss 1.6576 LearningRate 0.0123 Epoch: 12 Global Step: 216790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:48:39,960-Speed 5156.23 samples/sec Loss 1.5826 LearningRate 0.0123 Epoch: 12 Global Step: 216800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:48:41,933-Speed 5193.07 samples/sec Loss 1.6807 LearningRate 0.0123 Epoch: 12 Global Step: 216810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:48:43,927-Speed 5137.77 samples/sec Loss 1.6311 LearningRate 0.0123 Epoch: 12 Global Step: 216820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:48:45,909-Speed 5167.12 samples/sec Loss 1.6494 LearningRate 0.0123 Epoch: 12 Global Step: 216830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:48:47,915-Speed 5106.66 samples/sec Loss 1.6830 LearningRate 0.0123 Epoch: 12 Global Step: 216840 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:48:49,900-Speed 5161.44 samples/sec Loss 1.6438 LearningRate 0.0123 Epoch: 12 Global Step: 216850 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:48:51,889-Speed 5150.54 samples/sec Loss 1.6158 LearningRate 0.0123 Epoch: 12 Global Step: 216860 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:48:53,860-Speed 5195.21 samples/sec Loss 1.6780 LearningRate 0.0123 Epoch: 12 Global Step: 216870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:48:55,831-Speed 5198.15 samples/sec Loss 1.5801 LearningRate 0.0123 Epoch: 12 Global Step: 216880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:48:57,803-Speed 5193.53 samples/sec Loss 1.5819 LearningRate 0.0123 Epoch: 12 Global Step: 216890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:48:59,792-Speed 5151.23 samples/sec Loss 1.6029 LearningRate 0.0123 Epoch: 12 Global Step: 216900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:49:01,781-Speed 5148.35 samples/sec Loss 1.6343 LearningRate 0.0123 Epoch: 12 Global Step: 216910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:49:03,781-Speed 5123.39 samples/sec Loss 1.6302 LearningRate 0.0123 Epoch: 12 Global Step: 216920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:49:05,752-Speed 5196.43 samples/sec Loss 1.6507 LearningRate 0.0123 Epoch: 12 Global Step: 216930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:49:07,733-Speed 5170.05 samples/sec Loss 1.6336 LearningRate 0.0123 Epoch: 12 Global Step: 216940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:49:09,705-Speed 5195.33 samples/sec Loss 1.6174 LearningRate 0.0123 Epoch: 12 Global Step: 216950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:49:11,675-Speed 5199.75 samples/sec Loss 1.6050 LearningRate 0.0123 Epoch: 12 Global Step: 216960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:49:13,657-Speed 5168.68 samples/sec Loss 1.6437 LearningRate 0.0123 Epoch: 12 Global Step: 216970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:49:15,852-Speed 4666.27 samples/sec Loss 1.6605 LearningRate 0.0123 Epoch: 12 Global Step: 216980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:49:45,238-Speed 348.47 samples/sec Loss 1.3063 LearningRate 0.0122 Epoch: 13 Global Step: 216990 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:49:47,219-Speed 5172.56 samples/sec Loss 1.1898 LearningRate 0.0122 Epoch: 13 Global Step: 217000 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:49:49,500-Speed 4488.80 samples/sec Loss 1.1841 LearningRate 0.0122 Epoch: 13 Global Step: 217010 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:49:51,764-Speed 4526.35 samples/sec Loss 1.1702 LearningRate 0.0122 Epoch: 13 Global Step: 217020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:49:54,173-Speed 4250.25 samples/sec Loss 1.1611 LearningRate 0.0122 Epoch: 13 Global Step: 217030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:49:56,136-Speed 5219.34 samples/sec Loss 1.1804 LearningRate 0.0122 Epoch: 13 Global Step: 217040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:49:58,114-Speed 5178.20 samples/sec Loss 1.1706 LearningRate 0.0122 Epoch: 13 Global Step: 217050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:50:00,102-Speed 5153.17 samples/sec Loss 1.1804 LearningRate 0.0122 Epoch: 13 Global Step: 217060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:50:02,089-Speed 5156.70 samples/sec Loss 1.1624 LearningRate 0.0122 Epoch: 13 Global Step: 217070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:50:04,077-Speed 5151.69 samples/sec Loss 1.1971 LearningRate 0.0122 Epoch: 13 Global Step: 217080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:50:06,045-Speed 5205.14 samples/sec Loss 1.1968 LearningRate 0.0122 Epoch: 13 Global Step: 217090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:50:08,010-Speed 5212.91 samples/sec Loss 1.1819 LearningRate 0.0122 Epoch: 13 Global Step: 217100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:50:09,981-Speed 5196.32 samples/sec Loss 1.1274 LearningRate 0.0122 Epoch: 13 Global Step: 217110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:50:11,967-Speed 5158.58 samples/sec Loss 1.1792 LearningRate 0.0122 Epoch: 13 Global Step: 217120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:50:13,944-Speed 5181.91 samples/sec Loss 1.1629 LearningRate 0.0122 Epoch: 13 Global Step: 217130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:50:15,949-Speed 5108.32 samples/sec Loss 1.1781 LearningRate 0.0122 Epoch: 13 Global Step: 217140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:50:17,933-Speed 5162.23 samples/sec Loss 1.1539 LearningRate 0.0122 Epoch: 13 Global Step: 217150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:50:19,899-Speed 5209.92 samples/sec Loss 1.1272 LearningRate 0.0122 Epoch: 13 Global Step: 217160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:50:21,889-Speed 5148.01 samples/sec Loss 1.1727 LearningRate 0.0122 Epoch: 13 Global Step: 217170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:50:23,868-Speed 5176.38 samples/sec Loss 1.1880 LearningRate 0.0122 Epoch: 13 Global Step: 217180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:50:25,864-Speed 5131.42 samples/sec Loss 1.1463 LearningRate 0.0122 Epoch: 13 Global Step: 217190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:50:27,866-Speed 5117.08 samples/sec Loss 1.1253 LearningRate 0.0122 Epoch: 13 Global Step: 217200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:50:29,838-Speed 5194.37 samples/sec Loss 1.1597 LearningRate 0.0122 Epoch: 13 Global Step: 217210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:50:31,808-Speed 5199.74 samples/sec Loss 1.1499 LearningRate 0.0122 Epoch: 13 Global Step: 217220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:50:33,789-Speed 5170.54 samples/sec Loss 1.1444 LearningRate 0.0122 Epoch: 13 Global Step: 217230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:50:35,770-Speed 5171.78 samples/sec Loss 1.1981 LearningRate 0.0122 Epoch: 13 Global Step: 217240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:50:37,773-Speed 5114.62 samples/sec Loss 1.1910 LearningRate 0.0122 Epoch: 13 Global Step: 217250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:50:39,752-Speed 5174.35 samples/sec Loss 1.1895 LearningRate 0.0122 Epoch: 13 Global Step: 217260 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:50:41,746-Speed 5137.74 samples/sec Loss 1.1944 LearningRate 0.0122 Epoch: 13 Global Step: 217270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:50:43,717-Speed 5198.92 samples/sec Loss 1.1990 LearningRate 0.0122 Epoch: 13 Global Step: 217280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:50:45,709-Speed 5141.21 samples/sec Loss 1.1748 LearningRate 0.0122 Epoch: 13 Global Step: 217290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:50:47,701-Speed 5142.67 samples/sec Loss 1.1162 LearningRate 0.0122 Epoch: 13 Global Step: 217300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:50:49,688-Speed 5154.89 samples/sec Loss 1.1966 LearningRate 0.0122 Epoch: 13 Global Step: 217310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:50:51,814-Speed 4818.81 samples/sec Loss 1.1914 LearningRate 0.0122 Epoch: 13 Global Step: 217320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:50:53,785-Speed 5196.61 samples/sec Loss 1.1289 LearningRate 0.0122 Epoch: 13 Global Step: 217330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:50:55,772-Speed 5153.23 samples/sec Loss 1.1665 LearningRate 0.0122 Epoch: 13 Global Step: 217340 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:50:57,755-Speed 5166.47 samples/sec Loss 1.1894 LearningRate 0.0122 Epoch: 13 Global Step: 217350 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:50:59,736-Speed 5172.47 samples/sec Loss 1.2111 LearningRate 0.0122 Epoch: 13 Global Step: 217360 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:51:01,756-Speed 5069.83 samples/sec Loss 1.1673 LearningRate 0.0122 Epoch: 13 Global Step: 217370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:51:03,726-Speed 5200.57 samples/sec Loss 1.1646 LearningRate 0.0122 Epoch: 13 Global Step: 217380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:51:05,726-Speed 5121.05 samples/sec Loss 1.1675 LearningRate 0.0122 Epoch: 13 Global Step: 217390 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:51:07,713-Speed 5155.55 samples/sec Loss 1.1580 LearningRate 0.0122 Epoch: 13 Global Step: 217400 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:51:09,681-Speed 5206.17 samples/sec Loss 1.1757 LearningRate 0.0122 Epoch: 13 Global Step: 217410 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:51:11,652-Speed 5197.66 samples/sec Loss 1.1954 LearningRate 0.0122 Epoch: 13 Global Step: 217420 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:51:13,631-Speed 5175.17 samples/sec Loss 1.1683 LearningRate 0.0122 Epoch: 13 Global Step: 217430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:51:15,618-Speed 5153.58 samples/sec Loss 1.1467 LearningRate 0.0122 Epoch: 13 Global Step: 217440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:51:17,615-Speed 5131.28 samples/sec Loss 1.1313 LearningRate 0.0122 Epoch: 13 Global Step: 217450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:51:19,579-Speed 5214.77 samples/sec Loss 1.1593 LearningRate 0.0122 Epoch: 13 Global Step: 217460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:51:22,252-Speed 3832.44 samples/sec Loss 1.1750 LearningRate 0.0121 Epoch: 13 Global Step: 217470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:51:24,233-Speed 5168.54 samples/sec Loss 1.2023 LearningRate 0.0121 Epoch: 13 Global Step: 217480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:51:26,202-Speed 5204.78 samples/sec Loss 1.1713 LearningRate 0.0121 Epoch: 13 Global Step: 217490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:51:28,185-Speed 5164.35 samples/sec Loss 1.1868 LearningRate 0.0121 Epoch: 13 Global Step: 217500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:51:30,159-Speed 5190.40 samples/sec Loss 1.1416 LearningRate 0.0121 Epoch: 13 Global Step: 217510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:51:32,140-Speed 5170.62 samples/sec Loss 1.2234 LearningRate 0.0121 Epoch: 13 Global Step: 217520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:51:34,113-Speed 5190.34 samples/sec Loss 1.1842 LearningRate 0.0121 Epoch: 13 Global Step: 217530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:51:36,077-Speed 5215.41 samples/sec Loss 1.2130 LearningRate 0.0121 Epoch: 13 Global Step: 217540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:51:38,065-Speed 5153.74 samples/sec Loss 1.1885 LearningRate 0.0121 Epoch: 13 Global Step: 217550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:51:40,042-Speed 5180.96 samples/sec Loss 1.1677 LearningRate 0.0121 Epoch: 13 Global Step: 217560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:51:42,010-Speed 5204.97 samples/sec Loss 1.1884 LearningRate 0.0121 Epoch: 13 Global Step: 217570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:51:43,979-Speed 5202.31 samples/sec Loss 1.1746 LearningRate 0.0121 Epoch: 13 Global Step: 217580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:51:45,960-Speed 5170.99 samples/sec Loss 1.1773 LearningRate 0.0121 Epoch: 13 Global Step: 217590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:51:47,952-Speed 5143.30 samples/sec Loss 1.1682 LearningRate 0.0121 Epoch: 13 Global Step: 217600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:51:49,944-Speed 5142.68 samples/sec Loss 1.1779 LearningRate 0.0121 Epoch: 13 Global Step: 217610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:51:51,920-Speed 5182.43 samples/sec Loss 1.1654 LearningRate 0.0121 Epoch: 13 Global Step: 217620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:51:53,905-Speed 5162.01 samples/sec Loss 1.1676 LearningRate 0.0121 Epoch: 13 Global Step: 217630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:51:55,876-Speed 5197.38 samples/sec Loss 1.1334 LearningRate 0.0121 Epoch: 13 Global Step: 217640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:51:57,845-Speed 5201.70 samples/sec Loss 1.2058 LearningRate 0.0121 Epoch: 13 Global Step: 217650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:51:59,834-Speed 5149.70 samples/sec Loss 1.2176 LearningRate 0.0121 Epoch: 13 Global Step: 217660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:52:01,823-Speed 5151.12 samples/sec Loss 1.1682 LearningRate 0.0121 Epoch: 13 Global Step: 217670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:52:03,818-Speed 5132.95 samples/sec Loss 1.1955 LearningRate 0.0121 Epoch: 13 Global Step: 217680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:52:05,788-Speed 5201.73 samples/sec Loss 1.2326 LearningRate 0.0121 Epoch: 13 Global Step: 217690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:52:07,768-Speed 5172.26 samples/sec Loss 1.2110 LearningRate 0.0121 Epoch: 13 Global Step: 217700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:52:09,738-Speed 5199.89 samples/sec Loss 1.2162 LearningRate 0.0121 Epoch: 13 Global Step: 217710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:52:11,715-Speed 5180.77 samples/sec Loss 1.1912 LearningRate 0.0121 Epoch: 13 Global Step: 217720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:52:13,715-Speed 5122.93 samples/sec Loss 1.1634 LearningRate 0.0121 Epoch: 13 Global Step: 217730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:52:15,686-Speed 5198.01 samples/sec Loss 1.1880 LearningRate 0.0121 Epoch: 13 Global Step: 217740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:52:17,671-Speed 5160.21 samples/sec Loss 1.2204 LearningRate 0.0121 Epoch: 13 Global Step: 217750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:52:19,641-Speed 5199.06 samples/sec Loss 1.1880 LearningRate 0.0121 Epoch: 13 Global Step: 217760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:52:21,609-Speed 5204.06 samples/sec Loss 1.1942 LearningRate 0.0121 Epoch: 13 Global Step: 217770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:52:23,593-Speed 5162.33 samples/sec Loss 1.1909 LearningRate 0.0121 Epoch: 13 Global Step: 217780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:52:25,601-Speed 5101.33 samples/sec Loss 1.1825 LearningRate 0.0121 Epoch: 13 Global Step: 217790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:52:27,572-Speed 5198.06 samples/sec Loss 1.1963 LearningRate 0.0121 Epoch: 13 Global Step: 217800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:52:29,579-Speed 5104.87 samples/sec Loss 1.2466 LearningRate 0.0121 Epoch: 13 Global Step: 217810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:52:31,551-Speed 5193.45 samples/sec Loss 1.1866 LearningRate 0.0121 Epoch: 13 Global Step: 217820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:52:33,549-Speed 5127.18 samples/sec Loss 1.2474 LearningRate 0.0121 Epoch: 13 Global Step: 217830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:52:35,529-Speed 5172.89 samples/sec Loss 1.2395 LearningRate 0.0121 Epoch: 13 Global Step: 217840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:52:37,505-Speed 5184.79 samples/sec Loss 1.1622 LearningRate 0.0121 Epoch: 13 Global Step: 217850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:52:39,491-Speed 5156.06 samples/sec Loss 1.1618 LearningRate 0.0121 Epoch: 13 Global Step: 217860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:52:41,476-Speed 5161.87 samples/sec Loss 1.1541 LearningRate 0.0121 Epoch: 13 Global Step: 217870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:52:43,449-Speed 5191.60 samples/sec Loss 1.2009 LearningRate 0.0121 Epoch: 13 Global Step: 217880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:52:45,421-Speed 5193.39 samples/sec Loss 1.1797 LearningRate 0.0121 Epoch: 13 Global Step: 217890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:52:47,423-Speed 5117.45 samples/sec Loss 1.1999 LearningRate 0.0121 Epoch: 13 Global Step: 217900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:52:49,429-Speed 5105.61 samples/sec Loss 1.2025 LearningRate 0.0121 Epoch: 13 Global Step: 217910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:52:51,448-Speed 5075.54 samples/sec Loss 1.2247 LearningRate 0.0121 Epoch: 13 Global Step: 217920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:52:53,431-Speed 5165.51 samples/sec Loss 1.1741 LearningRate 0.0121 Epoch: 13 Global Step: 217930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:52:55,414-Speed 5164.35 samples/sec Loss 1.1781 LearningRate 0.0121 Epoch: 13 Global Step: 217940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:52:57,401-Speed 5156.67 samples/sec Loss 1.1826 LearningRate 0.0120 Epoch: 13 Global Step: 217950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:52:59,383-Speed 5169.35 samples/sec Loss 1.1823 LearningRate 0.0120 Epoch: 13 Global Step: 217960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:53:01,354-Speed 5195.24 samples/sec Loss 1.1840 LearningRate 0.0120 Epoch: 13 Global Step: 217970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:53:03,342-Speed 5153.93 samples/sec Loss 1.2285 LearningRate 0.0120 Epoch: 13 Global Step: 217980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:53:05,315-Speed 5190.79 samples/sec Loss 1.2002 LearningRate 0.0120 Epoch: 13 Global Step: 217990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:53:07,287-Speed 5193.56 samples/sec Loss 1.1705 LearningRate 0.0120 Epoch: 13 Global Step: 218000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:53:33,972-[lfw][218000]XNorm: 21.487123 Training: 2022-04-11 13:53:33,973-[lfw][218000]Accuracy-Flip: 0.99783+-0.00259 Training: 2022-04-11 13:53:33,973-[lfw][218000]Accuracy-Highest: 0.99833 Training: 2022-04-11 13:54:04,998-[cfp_fp][218000]XNorm: 20.741070 Training: 2022-04-11 13:54:04,998-[cfp_fp][218000]Accuracy-Flip: 0.98757+-0.00550 Training: 2022-04-11 13:54:04,999-[cfp_fp][218000]Accuracy-Highest: 0.98771 Training: 2022-04-11 13:54:31,898-[agedb_30][218000]XNorm: 21.847891 Training: 2022-04-11 13:54:31,899-[agedb_30][218000]Accuracy-Flip: 0.98233+-0.00800 Training: 2022-04-11 13:54:31,899-[agedb_30][218000]Accuracy-Highest: 0.98250 Training: 2022-04-11 13:54:33,888-Speed 118.24 samples/sec Loss 1.1960 LearningRate 0.0120 Epoch: 13 Global Step: 218010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:54:35,843-Speed 5238.90 samples/sec Loss 1.2181 LearningRate 0.0120 Epoch: 13 Global Step: 218020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:54:37,823-Speed 5173.04 samples/sec Loss 1.2552 LearningRate 0.0120 Epoch: 13 Global Step: 218030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:54:39,794-Speed 5196.62 samples/sec Loss 1.1875 LearningRate 0.0120 Epoch: 13 Global Step: 218040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:54:41,774-Speed 5174.06 samples/sec Loss 1.1977 LearningRate 0.0120 Epoch: 13 Global Step: 218050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:54:43,744-Speed 5200.32 samples/sec Loss 1.1733 LearningRate 0.0120 Epoch: 13 Global Step: 218060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:54:45,710-Speed 5210.66 samples/sec Loss 1.2140 LearningRate 0.0120 Epoch: 13 Global Step: 218070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:54:47,681-Speed 5197.36 samples/sec Loss 1.2204 LearningRate 0.0120 Epoch: 13 Global Step: 218080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:54:49,707-Speed 5054.49 samples/sec Loss 1.1930 LearningRate 0.0120 Epoch: 13 Global Step: 218090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:54:51,691-Speed 5164.95 samples/sec Loss 1.2000 LearningRate 0.0120 Epoch: 13 Global Step: 218100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:54:53,669-Speed 5177.88 samples/sec Loss 1.2085 LearningRate 0.0120 Epoch: 13 Global Step: 218110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:54:55,648-Speed 5175.83 samples/sec Loss 1.2341 LearningRate 0.0120 Epoch: 13 Global Step: 218120 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:54:57,614-Speed 5209.58 samples/sec Loss 1.1781 LearningRate 0.0120 Epoch: 13 Global Step: 218130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:54:59,605-Speed 5146.52 samples/sec Loss 1.2284 LearningRate 0.0120 Epoch: 13 Global Step: 218140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:55:01,580-Speed 5185.78 samples/sec Loss 1.1942 LearningRate 0.0120 Epoch: 13 Global Step: 218150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:55:03,554-Speed 5188.93 samples/sec Loss 1.1969 LearningRate 0.0120 Epoch: 13 Global Step: 218160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:55:05,539-Speed 5158.86 samples/sec Loss 1.2006 LearningRate 0.0120 Epoch: 13 Global Step: 218170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:55:07,510-Speed 5199.61 samples/sec Loss 1.2343 LearningRate 0.0120 Epoch: 13 Global Step: 218180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:55:09,481-Speed 5196.54 samples/sec Loss 1.2491 LearningRate 0.0120 Epoch: 13 Global Step: 218190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:55:11,455-Speed 5190.02 samples/sec Loss 1.1903 LearningRate 0.0120 Epoch: 13 Global Step: 218200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:55:16,162-Speed 2175.67 samples/sec Loss 1.1865 LearningRate 0.0120 Epoch: 13 Global Step: 218210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:55:19,059-Speed 3535.65 samples/sec Loss 1.2097 LearningRate 0.0120 Epoch: 13 Global Step: 218220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:55:21,047-Speed 5151.43 samples/sec Loss 1.1997 LearningRate 0.0120 Epoch: 13 Global Step: 218230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:55:23,033-Speed 5157.79 samples/sec Loss 1.1710 LearningRate 0.0120 Epoch: 13 Global Step: 218240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:55:25,024-Speed 5144.82 samples/sec Loss 1.1995 LearningRate 0.0120 Epoch: 13 Global Step: 218250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:55:27,016-Speed 5144.06 samples/sec Loss 1.2143 LearningRate 0.0120 Epoch: 13 Global Step: 218260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:55:28,995-Speed 5176.49 samples/sec Loss 1.2457 LearningRate 0.0120 Epoch: 13 Global Step: 218270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:55:30,972-Speed 5180.73 samples/sec Loss 1.2140 LearningRate 0.0120 Epoch: 13 Global Step: 218280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:55:32,937-Speed 5211.78 samples/sec Loss 1.2628 LearningRate 0.0120 Epoch: 13 Global Step: 218290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:55:34,939-Speed 5116.23 samples/sec Loss 1.1901 LearningRate 0.0120 Epoch: 13 Global Step: 218300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:55:36,920-Speed 5171.46 samples/sec Loss 1.1904 LearningRate 0.0120 Epoch: 13 Global Step: 218310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:55:38,900-Speed 5174.31 samples/sec Loss 1.2411 LearningRate 0.0120 Epoch: 13 Global Step: 218320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:55:40,878-Speed 5179.24 samples/sec Loss 1.2248 LearningRate 0.0120 Epoch: 13 Global Step: 218330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:55:42,848-Speed 5199.12 samples/sec Loss 1.1794 LearningRate 0.0120 Epoch: 13 Global Step: 218340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:55:44,837-Speed 5149.06 samples/sec Loss 1.2496 LearningRate 0.0120 Epoch: 13 Global Step: 218350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:55:46,826-Speed 5150.81 samples/sec Loss 1.2304 LearningRate 0.0120 Epoch: 13 Global Step: 218360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:55:48,832-Speed 5107.01 samples/sec Loss 1.2182 LearningRate 0.0120 Epoch: 13 Global Step: 218370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:55:50,836-Speed 5111.73 samples/sec Loss 1.1466 LearningRate 0.0120 Epoch: 13 Global Step: 218380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:55:52,817-Speed 5171.11 samples/sec Loss 1.2210 LearningRate 0.0120 Epoch: 13 Global Step: 218390 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:55:54,791-Speed 5189.66 samples/sec Loss 1.2087 LearningRate 0.0120 Epoch: 13 Global Step: 218400 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:55:56,765-Speed 5188.93 samples/sec Loss 1.2239 LearningRate 0.0120 Epoch: 13 Global Step: 218410 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:55:58,768-Speed 5112.21 samples/sec Loss 1.2388 LearningRate 0.0120 Epoch: 13 Global Step: 218420 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:56:00,739-Speed 5197.94 samples/sec Loss 1.2569 LearningRate 0.0119 Epoch: 13 Global Step: 218430 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:56:02,756-Speed 5078.05 samples/sec Loss 1.1919 LearningRate 0.0119 Epoch: 13 Global Step: 218440 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:56:04,733-Speed 5181.20 samples/sec Loss 1.2112 LearningRate 0.0119 Epoch: 13 Global Step: 218450 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:56:06,713-Speed 5173.89 samples/sec Loss 1.2125 LearningRate 0.0119 Epoch: 13 Global Step: 218460 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:56:08,687-Speed 5189.80 samples/sec Loss 1.2506 LearningRate 0.0119 Epoch: 13 Global Step: 218470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:56:10,664-Speed 5181.69 samples/sec Loss 1.2291 LearningRate 0.0119 Epoch: 13 Global Step: 218480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:56:12,674-Speed 5097.09 samples/sec Loss 1.2119 LearningRate 0.0119 Epoch: 13 Global Step: 218490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:56:14,682-Speed 5100.81 samples/sec Loss 1.1925 LearningRate 0.0119 Epoch: 13 Global Step: 218500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:56:16,662-Speed 5172.91 samples/sec Loss 1.2362 LearningRate 0.0119 Epoch: 13 Global Step: 218510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:56:18,638-Speed 5183.92 samples/sec Loss 1.2422 LearningRate 0.0119 Epoch: 13 Global Step: 218520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:56:20,606-Speed 5204.30 samples/sec Loss 1.1582 LearningRate 0.0119 Epoch: 13 Global Step: 218530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:56:22,578-Speed 5194.52 samples/sec Loss 1.2502 LearningRate 0.0119 Epoch: 13 Global Step: 218540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:56:24,569-Speed 5146.45 samples/sec Loss 1.2305 LearningRate 0.0119 Epoch: 13 Global Step: 218550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:56:26,544-Speed 5185.82 samples/sec Loss 1.1860 LearningRate 0.0119 Epoch: 13 Global Step: 218560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:56:28,522-Speed 5179.09 samples/sec Loss 1.2174 LearningRate 0.0119 Epoch: 13 Global Step: 218570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:56:30,497-Speed 5185.62 samples/sec Loss 1.2218 LearningRate 0.0119 Epoch: 13 Global Step: 218580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:56:32,473-Speed 5183.32 samples/sec Loss 1.2006 LearningRate 0.0119 Epoch: 13 Global Step: 218590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:56:34,475-Speed 5116.79 samples/sec Loss 1.2105 LearningRate 0.0119 Epoch: 13 Global Step: 218600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:56:36,457-Speed 5170.40 samples/sec Loss 1.2094 LearningRate 0.0119 Epoch: 13 Global Step: 218610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:56:38,441-Speed 5162.25 samples/sec Loss 1.2536 LearningRate 0.0119 Epoch: 13 Global Step: 218620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:56:40,428-Speed 5156.08 samples/sec Loss 1.2229 LearningRate 0.0119 Epoch: 13 Global Step: 218630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:56:42,394-Speed 5208.64 samples/sec Loss 1.1964 LearningRate 0.0119 Epoch: 13 Global Step: 218640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:56:44,373-Speed 5176.46 samples/sec Loss 1.2787 LearningRate 0.0119 Epoch: 13 Global Step: 218650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:56:46,382-Speed 5098.90 samples/sec Loss 1.2485 LearningRate 0.0119 Epoch: 13 Global Step: 218660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:56:48,398-Speed 5081.27 samples/sec Loss 1.2626 LearningRate 0.0119 Epoch: 13 Global Step: 218670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:56:50,361-Speed 5217.96 samples/sec Loss 1.2185 LearningRate 0.0119 Epoch: 13 Global Step: 218680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:56:52,338-Speed 5181.31 samples/sec Loss 1.2640 LearningRate 0.0119 Epoch: 13 Global Step: 218690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:56:54,305-Speed 5206.04 samples/sec Loss 1.2198 LearningRate 0.0119 Epoch: 13 Global Step: 218700 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:56:56,277-Speed 5196.06 samples/sec Loss 1.2283 LearningRate 0.0119 Epoch: 13 Global Step: 218710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:56:58,278-Speed 5119.05 samples/sec Loss 1.2581 LearningRate 0.0119 Epoch: 13 Global Step: 218720 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:57:00,254-Speed 5189.23 samples/sec Loss 1.2520 LearningRate 0.0119 Epoch: 13 Global Step: 218730 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:57:02,229-Speed 5186.55 samples/sec Loss 1.2029 LearningRate 0.0119 Epoch: 13 Global Step: 218740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:57:04,200-Speed 5195.67 samples/sec Loss 1.2321 LearningRate 0.0119 Epoch: 13 Global Step: 218750 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:57:06,179-Speed 5175.22 samples/sec Loss 1.2484 LearningRate 0.0119 Epoch: 13 Global Step: 218760 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:57:08,148-Speed 5204.14 samples/sec Loss 1.2397 LearningRate 0.0119 Epoch: 13 Global Step: 218770 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 13:57:10,130-Speed 5168.67 samples/sec Loss 1.2069 LearningRate 0.0119 Epoch: 13 Global Step: 218780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:57:12,096-Speed 5208.78 samples/sec Loss 1.2618 LearningRate 0.0119 Epoch: 13 Global Step: 218790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:57:14,064-Speed 5205.61 samples/sec Loss 1.2388 LearningRate 0.0119 Epoch: 13 Global Step: 218800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:57:16,101-Speed 5029.33 samples/sec Loss 1.1999 LearningRate 0.0119 Epoch: 13 Global Step: 218810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:57:18,101-Speed 5122.63 samples/sec Loss 1.1905 LearningRate 0.0119 Epoch: 13 Global Step: 218820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:57:20,072-Speed 5195.32 samples/sec Loss 1.1976 LearningRate 0.0119 Epoch: 13 Global Step: 218830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:57:22,056-Speed 5163.47 samples/sec Loss 1.2234 LearningRate 0.0119 Epoch: 13 Global Step: 218840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:57:24,036-Speed 5173.57 samples/sec Loss 1.2174 LearningRate 0.0119 Epoch: 13 Global Step: 218850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:57:26,026-Speed 5145.60 samples/sec Loss 1.1830 LearningRate 0.0119 Epoch: 13 Global Step: 218860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:57:28,058-Speed 5043.16 samples/sec Loss 1.2073 LearningRate 0.0119 Epoch: 13 Global Step: 218870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:57:30,080-Speed 5065.67 samples/sec Loss 1.2626 LearningRate 0.0119 Epoch: 13 Global Step: 218880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:57:32,050-Speed 5197.68 samples/sec Loss 1.2310 LearningRate 0.0119 Epoch: 13 Global Step: 218890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:57:34,034-Speed 5164.63 samples/sec Loss 1.2348 LearningRate 0.0119 Epoch: 13 Global Step: 218900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:57:36,007-Speed 5192.04 samples/sec Loss 1.2224 LearningRate 0.0118 Epoch: 13 Global Step: 218910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:57:37,997-Speed 5147.25 samples/sec Loss 1.1926 LearningRate 0.0118 Epoch: 13 Global Step: 218920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:57:39,989-Speed 5143.83 samples/sec Loss 1.2177 LearningRate 0.0118 Epoch: 13 Global Step: 218930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:57:41,977-Speed 5150.67 samples/sec Loss 1.2779 LearningRate 0.0118 Epoch: 13 Global Step: 218940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:57:43,948-Speed 5197.72 samples/sec Loss 1.2306 LearningRate 0.0118 Epoch: 13 Global Step: 218950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:57:45,930-Speed 5168.04 samples/sec Loss 1.2378 LearningRate 0.0118 Epoch: 13 Global Step: 218960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:57:47,905-Speed 5186.58 samples/sec Loss 1.2854 LearningRate 0.0118 Epoch: 13 Global Step: 218970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:57:49,895-Speed 5147.95 samples/sec Loss 1.2026 LearningRate 0.0118 Epoch: 13 Global Step: 218980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:57:51,881-Speed 5156.67 samples/sec Loss 1.2134 LearningRate 0.0118 Epoch: 13 Global Step: 218990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:57:53,854-Speed 5192.12 samples/sec Loss 1.2290 LearningRate 0.0118 Epoch: 13 Global Step: 219000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:57:55,850-Speed 5130.61 samples/sec Loss 1.2391 LearningRate 0.0118 Epoch: 13 Global Step: 219010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:57:57,826-Speed 5185.56 samples/sec Loss 1.2682 LearningRate 0.0118 Epoch: 13 Global Step: 219020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:57:59,797-Speed 5196.71 samples/sec Loss 1.2272 LearningRate 0.0118 Epoch: 13 Global Step: 219030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:58:01,839-Speed 5018.30 samples/sec Loss 1.2422 LearningRate 0.0118 Epoch: 13 Global Step: 219040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:58:03,822-Speed 5164.25 samples/sec Loss 1.2251 LearningRate 0.0118 Epoch: 13 Global Step: 219050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:58:05,793-Speed 5198.37 samples/sec Loss 1.2858 LearningRate 0.0118 Epoch: 13 Global Step: 219060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:58:07,776-Speed 5165.33 samples/sec Loss 1.3080 LearningRate 0.0118 Epoch: 13 Global Step: 219070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:58:09,748-Speed 5193.74 samples/sec Loss 1.1761 LearningRate 0.0118 Epoch: 13 Global Step: 219080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:58:11,736-Speed 5152.15 samples/sec Loss 1.2472 LearningRate 0.0118 Epoch: 13 Global Step: 219090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:58:13,720-Speed 5163.08 samples/sec Loss 1.2142 LearningRate 0.0118 Epoch: 13 Global Step: 219100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:58:15,706-Speed 5159.11 samples/sec Loss 1.2656 LearningRate 0.0118 Epoch: 13 Global Step: 219110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:58:17,699-Speed 5138.58 samples/sec Loss 1.2420 LearningRate 0.0118 Epoch: 13 Global Step: 219120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:58:19,681-Speed 5168.69 samples/sec Loss 1.2429 LearningRate 0.0118 Epoch: 13 Global Step: 219130 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:58:21,666-Speed 5159.59 samples/sec Loss 1.1954 LearningRate 0.0118 Epoch: 13 Global Step: 219140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:58:23,659-Speed 5140.30 samples/sec Loss 1.3099 LearningRate 0.0118 Epoch: 13 Global Step: 219150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:58:25,672-Speed 5090.40 samples/sec Loss 1.2509 LearningRate 0.0118 Epoch: 13 Global Step: 219160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:58:27,644-Speed 5194.23 samples/sec Loss 1.2120 LearningRate 0.0118 Epoch: 13 Global Step: 219170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:58:29,633-Speed 5149.86 samples/sec Loss 1.2340 LearningRate 0.0118 Epoch: 13 Global Step: 219180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:58:31,602-Speed 5200.25 samples/sec Loss 1.2617 LearningRate 0.0118 Epoch: 13 Global Step: 219190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:58:33,573-Speed 5198.10 samples/sec Loss 1.2030 LearningRate 0.0118 Epoch: 13 Global Step: 219200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:58:35,547-Speed 5188.00 samples/sec Loss 1.2523 LearningRate 0.0118 Epoch: 13 Global Step: 219210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:58:37,538-Speed 5146.66 samples/sec Loss 1.2148 LearningRate 0.0118 Epoch: 13 Global Step: 219220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:58:39,517-Speed 5175.85 samples/sec Loss 1.2360 LearningRate 0.0118 Epoch: 13 Global Step: 219230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:58:41,496-Speed 5176.70 samples/sec Loss 1.2216 LearningRate 0.0118 Epoch: 13 Global Step: 219240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:58:43,471-Speed 5185.23 samples/sec Loss 1.2678 LearningRate 0.0118 Epoch: 13 Global Step: 219250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:58:45,484-Speed 5091.01 samples/sec Loss 1.2660 LearningRate 0.0118 Epoch: 13 Global Step: 219260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:58:47,502-Speed 5075.09 samples/sec Loss 1.2113 LearningRate 0.0118 Epoch: 13 Global Step: 219270 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:58:49,482-Speed 5172.33 samples/sec Loss 1.2776 LearningRate 0.0118 Epoch: 13 Global Step: 219280 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:58:51,517-Speed 5033.75 samples/sec Loss 1.2073 LearningRate 0.0118 Epoch: 13 Global Step: 219290 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:58:53,505-Speed 5153.82 samples/sec Loss 1.2931 LearningRate 0.0118 Epoch: 13 Global Step: 219300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:58:55,475-Speed 5197.77 samples/sec Loss 1.2591 LearningRate 0.0118 Epoch: 13 Global Step: 219310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:58:57,455-Speed 5175.47 samples/sec Loss 1.2812 LearningRate 0.0118 Epoch: 13 Global Step: 219320 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:58:59,441-Speed 5156.55 samples/sec Loss 1.2841 LearningRate 0.0118 Epoch: 13 Global Step: 219330 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:59:01,429-Speed 5153.52 samples/sec Loss 1.2600 LearningRate 0.0118 Epoch: 13 Global Step: 219340 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:59:03,402-Speed 5189.77 samples/sec Loss 1.2510 LearningRate 0.0118 Epoch: 13 Global Step: 219350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:59:05,378-Speed 5185.79 samples/sec Loss 1.2508 LearningRate 0.0118 Epoch: 13 Global Step: 219360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:59:07,357-Speed 5176.83 samples/sec Loss 1.2386 LearningRate 0.0118 Epoch: 13 Global Step: 219370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:59:09,344-Speed 5154.86 samples/sec Loss 1.2434 LearningRate 0.0118 Epoch: 13 Global Step: 219380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:59:11,320-Speed 5182.59 samples/sec Loss 1.2874 LearningRate 0.0118 Epoch: 13 Global Step: 219390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:59:13,300-Speed 5174.28 samples/sec Loss 1.2558 LearningRate 0.0117 Epoch: 13 Global Step: 219400 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:59:15,298-Speed 5127.17 samples/sec Loss 1.2642 LearningRate 0.0117 Epoch: 13 Global Step: 219410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:59:17,272-Speed 5188.76 samples/sec Loss 1.2349 LearningRate 0.0117 Epoch: 13 Global Step: 219420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:59:19,250-Speed 5178.23 samples/sec Loss 1.2266 LearningRate 0.0117 Epoch: 13 Global Step: 219430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:59:21,221-Speed 5198.22 samples/sec Loss 1.2548 LearningRate 0.0117 Epoch: 13 Global Step: 219440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:59:23,209-Speed 5152.39 samples/sec Loss 1.2332 LearningRate 0.0117 Epoch: 13 Global Step: 219450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:59:25,200-Speed 5143.75 samples/sec Loss 1.2366 LearningRate 0.0117 Epoch: 13 Global Step: 219460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:59:27,176-Speed 5184.83 samples/sec Loss 1.2332 LearningRate 0.0117 Epoch: 13 Global Step: 219470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:59:29,153-Speed 5182.16 samples/sec Loss 1.2608 LearningRate 0.0117 Epoch: 13 Global Step: 219480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:59:31,127-Speed 5189.21 samples/sec Loss 1.2846 LearningRate 0.0117 Epoch: 13 Global Step: 219490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:59:33,126-Speed 5123.14 samples/sec Loss 1.2861 LearningRate 0.0117 Epoch: 13 Global Step: 219500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:59:35,121-Speed 5135.86 samples/sec Loss 1.2979 LearningRate 0.0117 Epoch: 13 Global Step: 219510 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:59:37,107-Speed 5157.51 samples/sec Loss 1.2658 LearningRate 0.0117 Epoch: 13 Global Step: 219520 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:59:39,102-Speed 5134.52 samples/sec Loss 1.2753 LearningRate 0.0117 Epoch: 13 Global Step: 219530 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 13:59:41,083-Speed 5171.20 samples/sec Loss 1.2746 LearningRate 0.0117 Epoch: 13 Global Step: 219540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:59:43,050-Speed 5206.25 samples/sec Loss 1.2693 LearningRate 0.0117 Epoch: 13 Global Step: 219550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:59:45,022-Speed 5195.07 samples/sec Loss 1.3006 LearningRate 0.0117 Epoch: 13 Global Step: 219560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:59:47,012-Speed 5145.52 samples/sec Loss 1.2523 LearningRate 0.0117 Epoch: 13 Global Step: 219570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:59:49,004-Speed 5143.69 samples/sec Loss 1.2757 LearningRate 0.0117 Epoch: 13 Global Step: 219580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:59:50,994-Speed 5147.97 samples/sec Loss 1.2657 LearningRate 0.0117 Epoch: 13 Global Step: 219590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:59:52,962-Speed 5205.05 samples/sec Loss 1.2623 LearningRate 0.0117 Epoch: 13 Global Step: 219600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:59:54,941-Speed 5175.04 samples/sec Loss 1.2751 LearningRate 0.0117 Epoch: 13 Global Step: 219610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:59:56,912-Speed 5197.86 samples/sec Loss 1.2746 LearningRate 0.0117 Epoch: 13 Global Step: 219620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 13:59:58,894-Speed 5169.22 samples/sec Loss 1.2504 LearningRate 0.0117 Epoch: 13 Global Step: 219630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:00:00,898-Speed 5109.71 samples/sec Loss 1.2711 LearningRate 0.0117 Epoch: 13 Global Step: 219640 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:00:02,870-Speed 5196.60 samples/sec Loss 1.3238 LearningRate 0.0117 Epoch: 13 Global Step: 219650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:00:04,852-Speed 5166.68 samples/sec Loss 1.2848 LearningRate 0.0117 Epoch: 13 Global Step: 219660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:00:06,839-Speed 5155.57 samples/sec Loss 1.2696 LearningRate 0.0117 Epoch: 13 Global Step: 219670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:00:08,819-Speed 5174.26 samples/sec Loss 1.3002 LearningRate 0.0117 Epoch: 13 Global Step: 219680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:00:10,798-Speed 5175.14 samples/sec Loss 1.2853 LearningRate 0.0117 Epoch: 13 Global Step: 219690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:00:12,788-Speed 5149.24 samples/sec Loss 1.3118 LearningRate 0.0117 Epoch: 13 Global Step: 219700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:00:14,762-Speed 5187.08 samples/sec Loss 1.2640 LearningRate 0.0117 Epoch: 13 Global Step: 219710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:00:16,740-Speed 5179.23 samples/sec Loss 1.3149 LearningRate 0.0117 Epoch: 13 Global Step: 219720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:00:18,710-Speed 5199.16 samples/sec Loss 1.2803 LearningRate 0.0117 Epoch: 13 Global Step: 219730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:00:20,681-Speed 5198.58 samples/sec Loss 1.2182 LearningRate 0.0117 Epoch: 13 Global Step: 219740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:00:22,659-Speed 5176.95 samples/sec Loss 1.2905 LearningRate 0.0117 Epoch: 13 Global Step: 219750 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:00:24,644-Speed 5160.28 samples/sec Loss 1.2835 LearningRate 0.0117 Epoch: 13 Global Step: 219760 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:00:26,629-Speed 5161.44 samples/sec Loss 1.3013 LearningRate 0.0117 Epoch: 13 Global Step: 219770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:00:28,630-Speed 5119.50 samples/sec Loss 1.2561 LearningRate 0.0117 Epoch: 13 Global Step: 219780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:00:30,594-Speed 5214.28 samples/sec Loss 1.2674 LearningRate 0.0117 Epoch: 13 Global Step: 219790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:00:32,574-Speed 5175.83 samples/sec Loss 1.2652 LearningRate 0.0117 Epoch: 13 Global Step: 219800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:00:34,567-Speed 5139.80 samples/sec Loss 1.2234 LearningRate 0.0117 Epoch: 13 Global Step: 219810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:00:36,596-Speed 5048.23 samples/sec Loss 1.2906 LearningRate 0.0117 Epoch: 13 Global Step: 219820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:00:38,580-Speed 5161.55 samples/sec Loss 1.2128 LearningRate 0.0117 Epoch: 13 Global Step: 219830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:00:40,579-Speed 5125.97 samples/sec Loss 1.2534 LearningRate 0.0117 Epoch: 13 Global Step: 219840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:00:42,555-Speed 5183.82 samples/sec Loss 1.2521 LearningRate 0.0117 Epoch: 13 Global Step: 219850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:00:44,542-Speed 5153.56 samples/sec Loss 1.2920 LearningRate 0.0117 Epoch: 13 Global Step: 219860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:00:46,530-Speed 5154.24 samples/sec Loss 1.2958 LearningRate 0.0117 Epoch: 13 Global Step: 219870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:00:48,510-Speed 5171.64 samples/sec Loss 1.2921 LearningRate 0.0117 Epoch: 13 Global Step: 219880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:00:50,517-Speed 5105.26 samples/sec Loss 1.2837 LearningRate 0.0116 Epoch: 13 Global Step: 219890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:00:52,498-Speed 5170.78 samples/sec Loss 1.2803 LearningRate 0.0116 Epoch: 13 Global Step: 219900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:00:54,485-Speed 5155.89 samples/sec Loss 1.2800 LearningRate 0.0116 Epoch: 13 Global Step: 219910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:00:56,455-Speed 5197.61 samples/sec Loss 1.2672 LearningRate 0.0116 Epoch: 13 Global Step: 219920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:00:58,466-Speed 5093.59 samples/sec Loss 1.2857 LearningRate 0.0116 Epoch: 13 Global Step: 219930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:01:00,470-Speed 5112.45 samples/sec Loss 1.3033 LearningRate 0.0116 Epoch: 13 Global Step: 219940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:01:02,462-Speed 5143.08 samples/sec Loss 1.2980 LearningRate 0.0116 Epoch: 13 Global Step: 219950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:01:04,446-Speed 5162.25 samples/sec Loss 1.2685 LearningRate 0.0116 Epoch: 13 Global Step: 219960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:01:06,441-Speed 5135.55 samples/sec Loss 1.2511 LearningRate 0.0116 Epoch: 13 Global Step: 219970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:01:08,418-Speed 5181.89 samples/sec Loss 1.2378 LearningRate 0.0116 Epoch: 13 Global Step: 219980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:01:10,404-Speed 5157.26 samples/sec Loss 1.2294 LearningRate 0.0116 Epoch: 13 Global Step: 219990 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:01:12,413-Speed 5097.68 samples/sec Loss 1.2880 LearningRate 0.0116 Epoch: 13 Global Step: 220000 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:01:38,857-[lfw][220000]XNorm: 22.906456 Training: 2022-04-11 14:01:38,858-[lfw][220000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 14:01:38,858-[lfw][220000]Accuracy-Highest: 0.99833 Training: 2022-04-11 14:02:09,462-[cfp_fp][220000]XNorm: 22.072527 Training: 2022-04-11 14:02:09,462-[cfp_fp][220000]Accuracy-Flip: 0.98771+-0.00508 Training: 2022-04-11 14:02:09,463-[cfp_fp][220000]Accuracy-Highest: 0.98771 Training: 2022-04-11 14:02:35,934-[agedb_30][220000]XNorm: 23.399875 Training: 2022-04-11 14:02:35,935-[agedb_30][220000]Accuracy-Flip: 0.98083+-0.00757 Training: 2022-04-11 14:02:35,935-[agedb_30][220000]Accuracy-Highest: 0.98250 Training: 2022-04-11 14:02:37,940-Speed 119.73 samples/sec Loss 1.2677 LearningRate 0.0116 Epoch: 13 Global Step: 220010 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:02:39,920-Speed 5170.79 samples/sec Loss 1.3235 LearningRate 0.0116 Epoch: 13 Global Step: 220020 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:02:41,893-Speed 5193.57 samples/sec Loss 1.2639 LearningRate 0.0116 Epoch: 13 Global Step: 220030 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:02:43,868-Speed 5187.28 samples/sec Loss 1.3410 LearningRate 0.0116 Epoch: 13 Global Step: 220040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:02:45,843-Speed 5186.19 samples/sec Loss 1.2910 LearningRate 0.0116 Epoch: 13 Global Step: 220050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:02:47,832-Speed 5149.23 samples/sec Loss 1.2890 LearningRate 0.0116 Epoch: 13 Global Step: 220060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:02:49,806-Speed 5191.09 samples/sec Loss 1.3141 LearningRate 0.0116 Epoch: 13 Global Step: 220070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:02:51,811-Speed 5108.28 samples/sec Loss 1.2958 LearningRate 0.0116 Epoch: 13 Global Step: 220080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:02:53,821-Speed 5095.42 samples/sec Loss 1.2589 LearningRate 0.0116 Epoch: 13 Global Step: 220090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:02:55,792-Speed 5197.56 samples/sec Loss 1.3170 LearningRate 0.0116 Epoch: 13 Global Step: 220100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:02:57,765-Speed 5191.92 samples/sec Loss 1.2684 LearningRate 0.0116 Epoch: 13 Global Step: 220110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:02:59,773-Speed 5101.26 samples/sec Loss 1.2931 LearningRate 0.0116 Epoch: 13 Global Step: 220120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:03:01,741-Speed 5204.74 samples/sec Loss 1.2858 LearningRate 0.0116 Epoch: 13 Global Step: 220130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:03:03,711-Speed 5199.65 samples/sec Loss 1.2911 LearningRate 0.0116 Epoch: 13 Global Step: 220140 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:03:05,684-Speed 5190.81 samples/sec Loss 1.2359 LearningRate 0.0116 Epoch: 13 Global Step: 220150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:03:07,650-Speed 5209.24 samples/sec Loss 1.3169 LearningRate 0.0116 Epoch: 13 Global Step: 220160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:03:09,637-Speed 5157.03 samples/sec Loss 1.2533 LearningRate 0.0116 Epoch: 13 Global Step: 220170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:03:11,608-Speed 5196.58 samples/sec Loss 1.2872 LearningRate 0.0116 Epoch: 13 Global Step: 220180 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:03:13,584-Speed 5185.36 samples/sec Loss 1.3198 LearningRate 0.0116 Epoch: 13 Global Step: 220190 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:03:15,561-Speed 5180.54 samples/sec Loss 1.2728 LearningRate 0.0116 Epoch: 13 Global Step: 220200 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:03:17,558-Speed 5129.99 samples/sec Loss 1.3255 LearningRate 0.0116 Epoch: 13 Global Step: 220210 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:03:19,543-Speed 5160.91 samples/sec Loss 1.2536 LearningRate 0.0116 Epoch: 13 Global Step: 220220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:03:21,516-Speed 5191.34 samples/sec Loss 1.3103 LearningRate 0.0116 Epoch: 13 Global Step: 220230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:03:23,510-Speed 5136.77 samples/sec Loss 1.2878 LearningRate 0.0116 Epoch: 13 Global Step: 220240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:03:25,495-Speed 5159.26 samples/sec Loss 1.2710 LearningRate 0.0116 Epoch: 13 Global Step: 220250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:03:27,517-Speed 5067.21 samples/sec Loss 1.2957 LearningRate 0.0116 Epoch: 13 Global Step: 220260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:03:29,509-Speed 5142.11 samples/sec Loss 1.2805 LearningRate 0.0116 Epoch: 13 Global Step: 220270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:03:31,476-Speed 5206.67 samples/sec Loss 1.3146 LearningRate 0.0116 Epoch: 13 Global Step: 220280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:03:33,446-Speed 5200.61 samples/sec Loss 1.2794 LearningRate 0.0116 Epoch: 13 Global Step: 220290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:03:35,438-Speed 5143.58 samples/sec Loss 1.2777 LearningRate 0.0116 Epoch: 13 Global Step: 220300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:03:37,426-Speed 5151.69 samples/sec Loss 1.2638 LearningRate 0.0116 Epoch: 13 Global Step: 220310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:03:39,398-Speed 5193.80 samples/sec Loss 1.2657 LearningRate 0.0116 Epoch: 13 Global Step: 220320 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:03:41,387-Speed 5150.45 samples/sec Loss 1.3562 LearningRate 0.0116 Epoch: 13 Global Step: 220330 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:03:43,358-Speed 5196.85 samples/sec Loss 1.2963 LearningRate 0.0116 Epoch: 13 Global Step: 220340 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:03:45,334-Speed 5184.67 samples/sec Loss 1.3043 LearningRate 0.0116 Epoch: 13 Global Step: 220350 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:03:47,324-Speed 5148.01 samples/sec Loss 1.3274 LearningRate 0.0116 Epoch: 13 Global Step: 220360 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:03:49,303-Speed 5175.71 samples/sec Loss 1.2842 LearningRate 0.0116 Epoch: 13 Global Step: 220370 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:03:51,294-Speed 5145.41 samples/sec Loss 1.2850 LearningRate 0.0115 Epoch: 13 Global Step: 220380 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:03:53,281-Speed 5155.72 samples/sec Loss 1.2941 LearningRate 0.0115 Epoch: 13 Global Step: 220390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:03:55,253-Speed 5194.83 samples/sec Loss 1.2806 LearningRate 0.0115 Epoch: 13 Global Step: 220400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:03:57,230-Speed 5180.96 samples/sec Loss 1.2796 LearningRate 0.0115 Epoch: 13 Global Step: 220410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:03:59,222-Speed 5140.46 samples/sec Loss 1.3250 LearningRate 0.0115 Epoch: 13 Global Step: 220420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:04:01,224-Speed 5117.69 samples/sec Loss 1.2839 LearningRate 0.0115 Epoch: 13 Global Step: 220430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:04:03,218-Speed 5138.48 samples/sec Loss 1.3017 LearningRate 0.0115 Epoch: 13 Global Step: 220440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:04:05,210-Speed 5141.92 samples/sec Loss 1.2828 LearningRate 0.0115 Epoch: 13 Global Step: 220450 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:04:07,175-Speed 5211.81 samples/sec Loss 1.2638 LearningRate 0.0115 Epoch: 13 Global Step: 220460 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:04:09,147-Speed 5193.10 samples/sec Loss 1.3622 LearningRate 0.0115 Epoch: 13 Global Step: 220470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:04:11,117-Speed 5199.62 samples/sec Loss 1.2723 LearningRate 0.0115 Epoch: 13 Global Step: 220480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:04:13,106-Speed 5150.70 samples/sec Loss 1.2336 LearningRate 0.0115 Epoch: 13 Global Step: 220490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:04:15,091-Speed 5162.19 samples/sec Loss 1.2941 LearningRate 0.0115 Epoch: 13 Global Step: 220500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:04:17,067-Speed 5181.70 samples/sec Loss 1.3252 LearningRate 0.0115 Epoch: 13 Global Step: 220510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:04:19,050-Speed 5166.53 samples/sec Loss 1.2847 LearningRate 0.0115 Epoch: 13 Global Step: 220520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:04:21,020-Speed 5201.33 samples/sec Loss 1.2992 LearningRate 0.0115 Epoch: 13 Global Step: 220530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:04:22,983-Speed 5217.44 samples/sec Loss 1.2748 LearningRate 0.0115 Epoch: 13 Global Step: 220540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:04:24,977-Speed 5136.07 samples/sec Loss 1.2965 LearningRate 0.0115 Epoch: 13 Global Step: 220550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:04:26,992-Speed 5085.33 samples/sec Loss 1.2789 LearningRate 0.0115 Epoch: 13 Global Step: 220560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:04:28,974-Speed 5168.35 samples/sec Loss 1.2783 LearningRate 0.0115 Epoch: 13 Global Step: 220570 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:04:30,941-Speed 5208.18 samples/sec Loss 1.2737 LearningRate 0.0115 Epoch: 13 Global Step: 220580 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:04:32,906-Speed 5211.51 samples/sec Loss 1.2910 LearningRate 0.0115 Epoch: 13 Global Step: 220590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:04:34,875-Speed 5201.74 samples/sec Loss 1.3026 LearningRate 0.0115 Epoch: 13 Global Step: 220600 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:04:36,847-Speed 5195.59 samples/sec Loss 1.2761 LearningRate 0.0115 Epoch: 13 Global Step: 220610 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:04:38,840-Speed 5139.87 samples/sec Loss 1.3156 LearningRate 0.0115 Epoch: 13 Global Step: 220620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:04:40,805-Speed 5213.03 samples/sec Loss 1.2674 LearningRate 0.0115 Epoch: 13 Global Step: 220630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:04:42,770-Speed 5212.61 samples/sec Loss 1.2955 LearningRate 0.0115 Epoch: 13 Global Step: 220640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:04:44,747-Speed 5182.87 samples/sec Loss 1.2584 LearningRate 0.0115 Epoch: 13 Global Step: 220650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:04:46,735-Speed 5151.30 samples/sec Loss 1.3134 LearningRate 0.0115 Epoch: 13 Global Step: 220660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:04:48,729-Speed 5135.51 samples/sec Loss 1.2952 LearningRate 0.0115 Epoch: 13 Global Step: 220670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:04:50,699-Speed 5201.24 samples/sec Loss 1.2988 LearningRate 0.0115 Epoch: 13 Global Step: 220680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:04:52,663-Speed 5213.86 samples/sec Loss 1.3111 LearningRate 0.0115 Epoch: 13 Global Step: 220690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:04:54,640-Speed 5183.87 samples/sec Loss 1.3545 LearningRate 0.0115 Epoch: 13 Global Step: 220700 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:04:56,620-Speed 5173.26 samples/sec Loss 1.3123 LearningRate 0.0115 Epoch: 13 Global Step: 220710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:04:58,606-Speed 5156.83 samples/sec Loss 1.3366 LearningRate 0.0115 Epoch: 13 Global Step: 220720 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:05:00,584-Speed 5178.84 samples/sec Loss 1.2910 LearningRate 0.0115 Epoch: 13 Global Step: 220730 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:05:02,559-Speed 5187.93 samples/sec Loss 1.2883 LearningRate 0.0115 Epoch: 13 Global Step: 220740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:05:04,532-Speed 5191.80 samples/sec Loss 1.3074 LearningRate 0.0115 Epoch: 13 Global Step: 220750 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:05:06,505-Speed 5191.80 samples/sec Loss 1.2950 LearningRate 0.0115 Epoch: 13 Global Step: 220760 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:05:08,476-Speed 5196.14 samples/sec Loss 1.3250 LearningRate 0.0115 Epoch: 13 Global Step: 220770 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:05:10,457-Speed 5171.09 samples/sec Loss 1.2901 LearningRate 0.0115 Epoch: 13 Global Step: 220780 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:05:12,423-Speed 5210.38 samples/sec Loss 1.2836 LearningRate 0.0115 Epoch: 13 Global Step: 220790 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:05:14,416-Speed 5138.31 samples/sec Loss 1.2868 LearningRate 0.0115 Epoch: 13 Global Step: 220800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:05:16,390-Speed 5190.12 samples/sec Loss 1.2809 LearningRate 0.0115 Epoch: 13 Global Step: 220810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:05:18,369-Speed 5176.67 samples/sec Loss 1.2759 LearningRate 0.0115 Epoch: 13 Global Step: 220820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:05:20,350-Speed 5169.59 samples/sec Loss 1.3202 LearningRate 0.0115 Epoch: 13 Global Step: 220830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:05:22,359-Speed 5098.39 samples/sec Loss 1.3162 LearningRate 0.0115 Epoch: 13 Global Step: 220840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:05:24,352-Speed 5142.17 samples/sec Loss 1.3414 LearningRate 0.0115 Epoch: 13 Global Step: 220850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:05:26,330-Speed 5178.92 samples/sec Loss 1.2505 LearningRate 0.0115 Epoch: 13 Global Step: 220860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:05:28,293-Speed 5216.45 samples/sec Loss 1.3112 LearningRate 0.0114 Epoch: 13 Global Step: 220870 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:05:30,282-Speed 5149.36 samples/sec Loss 1.2606 LearningRate 0.0114 Epoch: 13 Global Step: 220880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:05:32,244-Speed 5222.62 samples/sec Loss 1.2756 LearningRate 0.0114 Epoch: 13 Global Step: 220890 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:05:34,235-Speed 5144.08 samples/sec Loss 1.2700 LearningRate 0.0114 Epoch: 13 Global Step: 220900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:05:36,208-Speed 5191.04 samples/sec Loss 1.3215 LearningRate 0.0114 Epoch: 13 Global Step: 220910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:05:38,190-Speed 5167.97 samples/sec Loss 1.3222 LearningRate 0.0114 Epoch: 13 Global Step: 220920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:05:40,198-Speed 5103.62 samples/sec Loss 1.2876 LearningRate 0.0114 Epoch: 13 Global Step: 220930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:05:42,171-Speed 5191.00 samples/sec Loss 1.3396 LearningRate 0.0114 Epoch: 13 Global Step: 220940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:05:44,128-Speed 5233.77 samples/sec Loss 1.3271 LearningRate 0.0114 Epoch: 13 Global Step: 220950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:05:46,130-Speed 5118.51 samples/sec Loss 1.3470 LearningRate 0.0114 Epoch: 13 Global Step: 220960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:05:48,117-Speed 5153.89 samples/sec Loss 1.3094 LearningRate 0.0114 Epoch: 13 Global Step: 220970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:05:50,092-Speed 5185.97 samples/sec Loss 1.2770 LearningRate 0.0114 Epoch: 13 Global Step: 220980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:05:52,060-Speed 5205.70 samples/sec Loss 1.3248 LearningRate 0.0114 Epoch: 13 Global Step: 220990 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:05:54,022-Speed 5220.93 samples/sec Loss 1.2954 LearningRate 0.0114 Epoch: 13 Global Step: 221000 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:05:55,986-Speed 5214.98 samples/sec Loss 1.2991 LearningRate 0.0114 Epoch: 13 Global Step: 221010 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:05:57,953-Speed 5209.15 samples/sec Loss 1.3099 LearningRate 0.0114 Epoch: 13 Global Step: 221020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:05:59,916-Speed 5217.61 samples/sec Loss 1.3005 LearningRate 0.0114 Epoch: 13 Global Step: 221030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:06:01,883-Speed 5207.51 samples/sec Loss 1.3259 LearningRate 0.0114 Epoch: 13 Global Step: 221040 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:06:03,861-Speed 5179.17 samples/sec Loss 1.3043 LearningRate 0.0114 Epoch: 13 Global Step: 221050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:06:05,862-Speed 5119.06 samples/sec Loss 1.3494 LearningRate 0.0114 Epoch: 13 Global Step: 221060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:06:07,827-Speed 5214.42 samples/sec Loss 1.2978 LearningRate 0.0114 Epoch: 13 Global Step: 221070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:06:09,795-Speed 5203.04 samples/sec Loss 1.3223 LearningRate 0.0114 Epoch: 13 Global Step: 221080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:06:11,768-Speed 5193.01 samples/sec Loss 1.3190 LearningRate 0.0114 Epoch: 13 Global Step: 221090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:06:13,738-Speed 5197.68 samples/sec Loss 1.3084 LearningRate 0.0114 Epoch: 13 Global Step: 221100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:06:15,722-Speed 5163.47 samples/sec Loss 1.3621 LearningRate 0.0114 Epoch: 13 Global Step: 221110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:06:17,720-Speed 5128.60 samples/sec Loss 1.3058 LearningRate 0.0114 Epoch: 13 Global Step: 221120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:06:19,696-Speed 5182.71 samples/sec Loss 1.2911 LearningRate 0.0114 Epoch: 13 Global Step: 221130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:06:21,683-Speed 5155.37 samples/sec Loss 1.3339 LearningRate 0.0114 Epoch: 13 Global Step: 221140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:06:23,672-Speed 5151.03 samples/sec Loss 1.3030 LearningRate 0.0114 Epoch: 13 Global Step: 221150 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:06:25,634-Speed 5220.00 samples/sec Loss 1.3516 LearningRate 0.0114 Epoch: 13 Global Step: 221160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:06:27,604-Speed 5201.89 samples/sec Loss 1.3308 LearningRate 0.0114 Epoch: 13 Global Step: 221170 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:06:29,600-Speed 5130.98 samples/sec Loss 1.3318 LearningRate 0.0114 Epoch: 13 Global Step: 221180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:06:31,584-Speed 5163.66 samples/sec Loss 1.3039 LearningRate 0.0114 Epoch: 13 Global Step: 221190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:06:33,550-Speed 5208.23 samples/sec Loss 1.2668 LearningRate 0.0114 Epoch: 13 Global Step: 221200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:06:35,522-Speed 5194.34 samples/sec Loss 1.3314 LearningRate 0.0114 Epoch: 13 Global Step: 221210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:06:37,500-Speed 5180.78 samples/sec Loss 1.3277 LearningRate 0.0114 Epoch: 13 Global Step: 221220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:06:39,475-Speed 5184.48 samples/sec Loss 1.2999 LearningRate 0.0114 Epoch: 13 Global Step: 221230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:06:41,489-Speed 5086.46 samples/sec Loss 1.2688 LearningRate 0.0114 Epoch: 13 Global Step: 221240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:06:43,484-Speed 5134.77 samples/sec Loss 1.2767 LearningRate 0.0114 Epoch: 13 Global Step: 221250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:06:45,468-Speed 5164.51 samples/sec Loss 1.3002 LearningRate 0.0114 Epoch: 13 Global Step: 221260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:06:47,429-Speed 5222.16 samples/sec Loss 1.3614 LearningRate 0.0114 Epoch: 13 Global Step: 221270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:06:49,417-Speed 5152.58 samples/sec Loss 1.3298 LearningRate 0.0114 Epoch: 13 Global Step: 221280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:06:51,447-Speed 5047.64 samples/sec Loss 1.3377 LearningRate 0.0114 Epoch: 13 Global Step: 221290 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:06:53,412-Speed 5211.36 samples/sec Loss 1.3653 LearningRate 0.0114 Epoch: 13 Global Step: 221300 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:06:55,393-Speed 5172.26 samples/sec Loss 1.3025 LearningRate 0.0114 Epoch: 13 Global Step: 221310 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:06:57,357-Speed 5213.59 samples/sec Loss 1.2949 LearningRate 0.0114 Epoch: 13 Global Step: 221320 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:06:59,325-Speed 5204.65 samples/sec Loss 1.2975 LearningRate 0.0114 Epoch: 13 Global Step: 221330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:07:01,310-Speed 5161.36 samples/sec Loss 1.2816 LearningRate 0.0114 Epoch: 13 Global Step: 221340 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:07:03,290-Speed 5174.89 samples/sec Loss 1.3007 LearningRate 0.0114 Epoch: 13 Global Step: 221350 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:07:05,278-Speed 5152.19 samples/sec Loss 1.3385 LearningRate 0.0113 Epoch: 13 Global Step: 221360 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:07:07,255-Speed 5181.34 samples/sec Loss 1.3518 LearningRate 0.0113 Epoch: 13 Global Step: 221370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:07:09,219-Speed 5216.68 samples/sec Loss 1.2964 LearningRate 0.0113 Epoch: 13 Global Step: 221380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:07:11,215-Speed 5129.55 samples/sec Loss 1.3127 LearningRate 0.0113 Epoch: 13 Global Step: 221390 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:07:13,193-Speed 5180.60 samples/sec Loss 1.3142 LearningRate 0.0113 Epoch: 13 Global Step: 221400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:07:15,162-Speed 5201.59 samples/sec Loss 1.2991 LearningRate 0.0113 Epoch: 13 Global Step: 221410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:07:17,141-Speed 5174.73 samples/sec Loss 1.3077 LearningRate 0.0113 Epoch: 13 Global Step: 221420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:07:19,108-Speed 5209.01 samples/sec Loss 1.3059 LearningRate 0.0113 Epoch: 13 Global Step: 221430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:07:21,076-Speed 5205.12 samples/sec Loss 1.3098 LearningRate 0.0113 Epoch: 13 Global Step: 221440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:07:23,077-Speed 5118.75 samples/sec Loss 1.3311 LearningRate 0.0113 Epoch: 13 Global Step: 221450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:07:25,053-Speed 5183.02 samples/sec Loss 1.3379 LearningRate 0.0113 Epoch: 13 Global Step: 221460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:07:27,042-Speed 5150.33 samples/sec Loss 1.3168 LearningRate 0.0113 Epoch: 13 Global Step: 221470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:07:29,010-Speed 5205.69 samples/sec Loss 1.3122 LearningRate 0.0113 Epoch: 13 Global Step: 221480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:07:30,969-Speed 5229.09 samples/sec Loss 1.2905 LearningRate 0.0113 Epoch: 13 Global Step: 221490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:07:32,937-Speed 5206.25 samples/sec Loss 1.3352 LearningRate 0.0113 Epoch: 13 Global Step: 221500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:07:34,903-Speed 5210.36 samples/sec Loss 1.2216 LearningRate 0.0113 Epoch: 13 Global Step: 221510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:07:36,935-Speed 5040.77 samples/sec Loss 1.2874 LearningRate 0.0113 Epoch: 13 Global Step: 221520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:07:38,929-Speed 5135.88 samples/sec Loss 1.2881 LearningRate 0.0113 Epoch: 13 Global Step: 221530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:07:40,924-Speed 5133.77 samples/sec Loss 1.3350 LearningRate 0.0113 Epoch: 13 Global Step: 221540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:07:42,888-Speed 5217.48 samples/sec Loss 1.3282 LearningRate 0.0113 Epoch: 13 Global Step: 221550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:07:44,861-Speed 5190.79 samples/sec Loss 1.2929 LearningRate 0.0113 Epoch: 13 Global Step: 221560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:07:46,863-Speed 5117.50 samples/sec Loss 1.3066 LearningRate 0.0113 Epoch: 13 Global Step: 221570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:07:48,841-Speed 5177.40 samples/sec Loss 1.3320 LearningRate 0.0113 Epoch: 13 Global Step: 221580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:07:50,810-Speed 5203.43 samples/sec Loss 1.2869 LearningRate 0.0113 Epoch: 13 Global Step: 221590 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:07:52,808-Speed 5126.36 samples/sec Loss 1.3785 LearningRate 0.0113 Epoch: 13 Global Step: 221600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:07:54,774-Speed 5211.08 samples/sec Loss 1.3280 LearningRate 0.0113 Epoch: 13 Global Step: 221610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:07:56,745-Speed 5197.29 samples/sec Loss 1.3099 LearningRate 0.0113 Epoch: 13 Global Step: 221620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:07:58,721-Speed 5184.24 samples/sec Loss 1.3717 LearningRate 0.0113 Epoch: 13 Global Step: 221630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:08:00,691-Speed 5200.09 samples/sec Loss 1.2961 LearningRate 0.0113 Epoch: 13 Global Step: 221640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:08:02,661-Speed 5197.94 samples/sec Loss 1.3277 LearningRate 0.0113 Epoch: 13 Global Step: 221650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:08:04,627-Speed 5212.33 samples/sec Loss 1.3567 LearningRate 0.0113 Epoch: 13 Global Step: 221660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:08:06,586-Speed 5227.76 samples/sec Loss 1.3230 LearningRate 0.0113 Epoch: 13 Global Step: 221670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:08:08,556-Speed 5199.25 samples/sec Loss 1.3046 LearningRate 0.0113 Epoch: 13 Global Step: 221680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:08:10,526-Speed 5202.03 samples/sec Loss 1.2951 LearningRate 0.0113 Epoch: 13 Global Step: 221690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:08:12,514-Speed 5152.51 samples/sec Loss 1.2963 LearningRate 0.0113 Epoch: 13 Global Step: 221700 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:08:14,483-Speed 5201.11 samples/sec Loss 1.2955 LearningRate 0.0113 Epoch: 13 Global Step: 221710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:08:16,451-Speed 5205.32 samples/sec Loss 1.3212 LearningRate 0.0113 Epoch: 13 Global Step: 221720 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:08:18,420-Speed 5203.41 samples/sec Loss 1.3372 LearningRate 0.0113 Epoch: 13 Global Step: 221730 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:08:20,385-Speed 5211.77 samples/sec Loss 1.3232 LearningRate 0.0113 Epoch: 13 Global Step: 221740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:08:22,380-Speed 5135.01 samples/sec Loss 1.2877 LearningRate 0.0113 Epoch: 13 Global Step: 221750 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:08:24,349-Speed 5201.82 samples/sec Loss 1.2831 LearningRate 0.0113 Epoch: 13 Global Step: 221760 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:08:26,337-Speed 5153.43 samples/sec Loss 1.3387 LearningRate 0.0113 Epoch: 13 Global Step: 221770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:08:28,343-Speed 5105.56 samples/sec Loss 1.2908 LearningRate 0.0113 Epoch: 13 Global Step: 221780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:08:30,310-Speed 5207.01 samples/sec Loss 1.2737 LearningRate 0.0113 Epoch: 13 Global Step: 221790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:08:32,280-Speed 5200.45 samples/sec Loss 1.3250 LearningRate 0.0113 Epoch: 13 Global Step: 221800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:08:34,251-Speed 5198.49 samples/sec Loss 1.2736 LearningRate 0.0113 Epoch: 13 Global Step: 221810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:08:36,234-Speed 5165.72 samples/sec Loss 1.3405 LearningRate 0.0113 Epoch: 13 Global Step: 221820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:08:38,224-Speed 5147.30 samples/sec Loss 1.3270 LearningRate 0.0113 Epoch: 13 Global Step: 221830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:08:40,204-Speed 5171.28 samples/sec Loss 1.3040 LearningRate 0.0113 Epoch: 13 Global Step: 221840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:08:42,173-Speed 5203.09 samples/sec Loss 1.3209 LearningRate 0.0113 Epoch: 13 Global Step: 221850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:08:44,137-Speed 5216.30 samples/sec Loss 1.3294 LearningRate 0.0112 Epoch: 13 Global Step: 221860 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:08:46,104-Speed 5207.93 samples/sec Loss 1.3424 LearningRate 0.0112 Epoch: 13 Global Step: 221870 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:08:48,091-Speed 5153.91 samples/sec Loss 1.3434 LearningRate 0.0112 Epoch: 13 Global Step: 221880 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:08:50,073-Speed 5169.16 samples/sec Loss 1.3496 LearningRate 0.0112 Epoch: 13 Global Step: 221890 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:08:52,097-Speed 5060.34 samples/sec Loss 1.3186 LearningRate 0.0112 Epoch: 13 Global Step: 221900 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:08:54,077-Speed 5175.09 samples/sec Loss 1.4358 LearningRate 0.0112 Epoch: 13 Global Step: 221910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:08:56,055-Speed 5177.84 samples/sec Loss 1.3219 LearningRate 0.0112 Epoch: 13 Global Step: 221920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:08:58,022-Speed 5206.79 samples/sec Loss 1.3795 LearningRate 0.0112 Epoch: 13 Global Step: 221930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:09:00,023-Speed 5119.06 samples/sec Loss 1.2923 LearningRate 0.0112 Epoch: 13 Global Step: 221940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:09:01,991-Speed 5207.21 samples/sec Loss 1.3477 LearningRate 0.0112 Epoch: 13 Global Step: 221950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:09:03,958-Speed 5206.46 samples/sec Loss 1.3252 LearningRate 0.0112 Epoch: 13 Global Step: 221960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:09:05,927-Speed 5201.93 samples/sec Loss 1.3529 LearningRate 0.0112 Epoch: 13 Global Step: 221970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:09:07,893-Speed 5209.44 samples/sec Loss 1.3650 LearningRate 0.0112 Epoch: 13 Global Step: 221980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:09:09,861-Speed 5205.12 samples/sec Loss 1.3806 LearningRate 0.0112 Epoch: 13 Global Step: 221990 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:09:11,830-Speed 5202.98 samples/sec Loss 1.3517 LearningRate 0.0112 Epoch: 13 Global Step: 222000 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:09:38,422-[lfw][222000]XNorm: 22.735518 Training: 2022-04-11 14:09:38,423-[lfw][222000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 14:09:38,424-[lfw][222000]Accuracy-Highest: 0.99833 Training: 2022-04-11 14:10:09,129-[cfp_fp][222000]XNorm: 22.221958 Training: 2022-04-11 14:10:09,129-[cfp_fp][222000]Accuracy-Flip: 0.98729+-0.00401 Training: 2022-04-11 14:10:09,130-[cfp_fp][222000]Accuracy-Highest: 0.98771 Training: 2022-04-11 14:10:35,561-[agedb_30][222000]XNorm: 23.193164 Training: 2022-04-11 14:10:35,561-[agedb_30][222000]Accuracy-Flip: 0.98200+-0.00809 Training: 2022-04-11 14:10:35,562-[agedb_30][222000]Accuracy-Highest: 0.98250 Training: 2022-04-11 14:10:37,563-Speed 119.44 samples/sec Loss 1.3476 LearningRate 0.0112 Epoch: 13 Global Step: 222010 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:10:39,546-Speed 5164.58 samples/sec Loss 1.2841 LearningRate 0.0112 Epoch: 13 Global Step: 222020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:10:41,525-Speed 5174.51 samples/sec Loss 1.2832 LearningRate 0.0112 Epoch: 13 Global Step: 222030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:10:43,504-Speed 5176.99 samples/sec Loss 1.2744 LearningRate 0.0112 Epoch: 13 Global Step: 222040 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:10:45,559-Speed 4985.56 samples/sec Loss 1.3573 LearningRate 0.0112 Epoch: 13 Global Step: 222050 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-11 14:10:47,537-Speed 5177.55 samples/sec Loss 1.3546 LearningRate 0.0112 Epoch: 13 Global Step: 222060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:10:49,542-Speed 5109.21 samples/sec Loss 1.3442 LearningRate 0.0112 Epoch: 13 Global Step: 222070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:10:51,536-Speed 5137.14 samples/sec Loss 1.2967 LearningRate 0.0112 Epoch: 13 Global Step: 222080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:10:53,501-Speed 5211.85 samples/sec Loss 1.3130 LearningRate 0.0112 Epoch: 13 Global Step: 222090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:10:55,470-Speed 5203.53 samples/sec Loss 1.2925 LearningRate 0.0112 Epoch: 13 Global Step: 222100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:10:57,440-Speed 5200.32 samples/sec Loss 1.3143 LearningRate 0.0112 Epoch: 13 Global Step: 222110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:10:59,422-Speed 5166.30 samples/sec Loss 1.3689 LearningRate 0.0112 Epoch: 13 Global Step: 222120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:11:01,390-Speed 5206.14 samples/sec Loss 1.3573 LearningRate 0.0112 Epoch: 13 Global Step: 222130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:11:03,356-Speed 5211.73 samples/sec Loss 1.3554 LearningRate 0.0112 Epoch: 13 Global Step: 222140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:11:05,337-Speed 5170.43 samples/sec Loss 1.2906 LearningRate 0.0112 Epoch: 13 Global Step: 222150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:11:07,304-Speed 5207.18 samples/sec Loss 1.3368 LearningRate 0.0112 Epoch: 13 Global Step: 222160 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-11 14:11:09,316-Speed 5090.71 samples/sec Loss 1.3371 LearningRate 0.0112 Epoch: 13 Global Step: 222170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:11:11,306-Speed 5147.93 samples/sec Loss 1.3070 LearningRate 0.0112 Epoch: 13 Global Step: 222180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:11:13,304-Speed 5127.19 samples/sec Loss 1.3341 LearningRate 0.0112 Epoch: 13 Global Step: 222190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:11:15,290-Speed 5160.95 samples/sec Loss 1.2947 LearningRate 0.0112 Epoch: 13 Global Step: 222200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:11:17,265-Speed 5184.34 samples/sec Loss 1.3282 LearningRate 0.0112 Epoch: 13 Global Step: 222210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:11:19,249-Speed 5164.19 samples/sec Loss 1.3358 LearningRate 0.0112 Epoch: 13 Global Step: 222220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:11:21,234-Speed 5159.78 samples/sec Loss 1.2915 LearningRate 0.0112 Epoch: 13 Global Step: 222230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-11 14:11:23,211-Speed 5182.90 samples/sec Loss 1.3060 LearningRate 0.0112 Epoch: 13 Global Step: 222240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:11:25,182-Speed 5196.06 samples/sec Loss 1.3251 LearningRate 0.0112 Epoch: 13 Global Step: 222250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:11:27,172-Speed 5147.89 samples/sec Loss 1.3471 LearningRate 0.0112 Epoch: 13 Global Step: 222260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:11:29,151-Speed 5174.53 samples/sec Loss 1.3238 LearningRate 0.0112 Epoch: 13 Global Step: 222270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:11:31,112-Speed 5224.24 samples/sec Loss 1.2939 LearningRate 0.0112 Epoch: 13 Global Step: 222280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:11:33,080-Speed 5204.95 samples/sec Loss 1.3192 LearningRate 0.0112 Epoch: 13 Global Step: 222290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:11:35,057-Speed 5181.56 samples/sec Loss 1.3412 LearningRate 0.0112 Epoch: 13 Global Step: 222300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:11:37,053-Speed 5132.49 samples/sec Loss 1.3092 LearningRate 0.0112 Epoch: 13 Global Step: 222310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:11:39,020-Speed 5206.11 samples/sec Loss 1.3295 LearningRate 0.0112 Epoch: 13 Global Step: 222320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:11:41,019-Speed 5126.05 samples/sec Loss 1.2985 LearningRate 0.0112 Epoch: 13 Global Step: 222330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:11:42,980-Speed 5222.35 samples/sec Loss 1.3087 LearningRate 0.0112 Epoch: 13 Global Step: 222340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:11:44,944-Speed 5217.13 samples/sec Loss 1.3131 LearningRate 0.0112 Epoch: 13 Global Step: 222350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:11:46,928-Speed 5161.84 samples/sec Loss 1.3476 LearningRate 0.0111 Epoch: 13 Global Step: 222360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:11:48,910-Speed 5168.15 samples/sec Loss 1.2736 LearningRate 0.0111 Epoch: 13 Global Step: 222370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:11:50,877-Speed 5208.60 samples/sec Loss 1.3375 LearningRate 0.0111 Epoch: 13 Global Step: 222380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:11:52,842-Speed 5211.94 samples/sec Loss 1.3281 LearningRate 0.0111 Epoch: 13 Global Step: 222390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:11:54,807-Speed 5213.52 samples/sec Loss 1.3290 LearningRate 0.0111 Epoch: 13 Global Step: 222400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:11:56,784-Speed 5182.62 samples/sec Loss 1.3112 LearningRate 0.0111 Epoch: 13 Global Step: 222410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:11:58,779-Speed 5134.25 samples/sec Loss 1.3306 LearningRate 0.0111 Epoch: 13 Global Step: 222420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:12:00,750-Speed 5195.83 samples/sec Loss 1.3494 LearningRate 0.0111 Epoch: 13 Global Step: 222430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:12:02,762-Speed 5093.79 samples/sec Loss 1.3195 LearningRate 0.0111 Epoch: 13 Global Step: 222440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:12:04,738-Speed 5181.72 samples/sec Loss 1.3821 LearningRate 0.0111 Epoch: 13 Global Step: 222450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:12:06,699-Speed 5225.01 samples/sec Loss 1.3611 LearningRate 0.0111 Epoch: 13 Global Step: 222460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:12:08,664-Speed 5211.66 samples/sec Loss 1.4418 LearningRate 0.0111 Epoch: 13 Global Step: 222470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:12:10,635-Speed 5196.86 samples/sec Loss 1.3390 LearningRate 0.0111 Epoch: 13 Global Step: 222480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:12:12,590-Speed 5241.21 samples/sec Loss 1.3819 LearningRate 0.0111 Epoch: 13 Global Step: 222490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:12:14,557-Speed 5206.74 samples/sec Loss 1.3536 LearningRate 0.0111 Epoch: 13 Global Step: 222500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:12:16,521-Speed 5214.84 samples/sec Loss 1.3275 LearningRate 0.0111 Epoch: 13 Global Step: 222510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:12:18,499-Speed 5178.48 samples/sec Loss 1.3967 LearningRate 0.0111 Epoch: 13 Global Step: 222520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:12:20,462-Speed 5220.74 samples/sec Loss 1.3313 LearningRate 0.0111 Epoch: 13 Global Step: 222530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:12:22,471-Speed 5097.88 samples/sec Loss 1.3229 LearningRate 0.0111 Epoch: 13 Global Step: 222540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:12:24,472-Speed 5120.85 samples/sec Loss 1.3154 LearningRate 0.0111 Epoch: 13 Global Step: 222550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:12:26,441-Speed 5200.70 samples/sec Loss 1.3796 LearningRate 0.0111 Epoch: 13 Global Step: 222560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:12:28,417-Speed 5185.85 samples/sec Loss 1.3653 LearningRate 0.0111 Epoch: 13 Global Step: 222570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:12:30,379-Speed 5219.58 samples/sec Loss 1.3570 LearningRate 0.0111 Epoch: 13 Global Step: 222580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:12:32,336-Speed 5234.96 samples/sec Loss 1.3472 LearningRate 0.0111 Epoch: 13 Global Step: 222590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:12:34,294-Speed 5230.21 samples/sec Loss 1.3475 LearningRate 0.0111 Epoch: 13 Global Step: 222600 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:12:36,269-Speed 5188.12 samples/sec Loss 1.3337 LearningRate 0.0111 Epoch: 13 Global Step: 222610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:12:38,247-Speed 5178.33 samples/sec Loss 1.3212 LearningRate 0.0111 Epoch: 13 Global Step: 222620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:12:40,215-Speed 5203.88 samples/sec Loss 1.3227 LearningRate 0.0111 Epoch: 13 Global Step: 222630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:12:42,186-Speed 5196.78 samples/sec Loss 1.3209 LearningRate 0.0111 Epoch: 13 Global Step: 222640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:12:44,174-Speed 5152.34 samples/sec Loss 1.3597 LearningRate 0.0111 Epoch: 13 Global Step: 222650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:12:46,154-Speed 5175.56 samples/sec Loss 1.4013 LearningRate 0.0111 Epoch: 13 Global Step: 222660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:12:48,130-Speed 5184.87 samples/sec Loss 1.3639 LearningRate 0.0111 Epoch: 13 Global Step: 222670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:12:50,098-Speed 5203.11 samples/sec Loss 1.3529 LearningRate 0.0111 Epoch: 13 Global Step: 222680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:12:52,059-Speed 5225.26 samples/sec Loss 1.3623 LearningRate 0.0111 Epoch: 13 Global Step: 222690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:12:54,024-Speed 5213.04 samples/sec Loss 1.3596 LearningRate 0.0111 Epoch: 13 Global Step: 222700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:12:55,986-Speed 5219.58 samples/sec Loss 1.4042 LearningRate 0.0111 Epoch: 13 Global Step: 222710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:12:57,949-Speed 5218.35 samples/sec Loss 1.3693 LearningRate 0.0111 Epoch: 13 Global Step: 222720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:12:59,959-Speed 5097.31 samples/sec Loss 1.3595 LearningRate 0.0111 Epoch: 13 Global Step: 222730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:13:01,939-Speed 5173.15 samples/sec Loss 1.3481 LearningRate 0.0111 Epoch: 13 Global Step: 222740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:13:03,910-Speed 5196.74 samples/sec Loss 1.3331 LearningRate 0.0111 Epoch: 13 Global Step: 222750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:13:05,900-Speed 5147.05 samples/sec Loss 1.3675 LearningRate 0.0111 Epoch: 13 Global Step: 222760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:13:07,876-Speed 5185.04 samples/sec Loss 1.3454 LearningRate 0.0111 Epoch: 13 Global Step: 222770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:13:09,843-Speed 5207.04 samples/sec Loss 1.2992 LearningRate 0.0111 Epoch: 13 Global Step: 222780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:13:11,815-Speed 5195.07 samples/sec Loss 1.3493 LearningRate 0.0111 Epoch: 13 Global Step: 222790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:13:13,789-Speed 5190.30 samples/sec Loss 1.3036 LearningRate 0.0111 Epoch: 13 Global Step: 222800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:13:15,754-Speed 5212.18 samples/sec Loss 1.3768 LearningRate 0.0111 Epoch: 13 Global Step: 222810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:13:17,741-Speed 5155.35 samples/sec Loss 1.3372 LearningRate 0.0111 Epoch: 13 Global Step: 222820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:13:19,703-Speed 5219.24 samples/sec Loss 1.3111 LearningRate 0.0111 Epoch: 13 Global Step: 222830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:13:21,668-Speed 5212.94 samples/sec Loss 1.3301 LearningRate 0.0111 Epoch: 13 Global Step: 222840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:13:23,639-Speed 5198.29 samples/sec Loss 1.3073 LearningRate 0.0111 Epoch: 13 Global Step: 222850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:13:25,603-Speed 5215.89 samples/sec Loss 1.3317 LearningRate 0.0110 Epoch: 13 Global Step: 222860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:13:27,589-Speed 5157.24 samples/sec Loss 1.3397 LearningRate 0.0110 Epoch: 13 Global Step: 222870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:13:29,559-Speed 5201.23 samples/sec Loss 1.3738 LearningRate 0.0110 Epoch: 13 Global Step: 222880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:13:31,525-Speed 5211.24 samples/sec Loss 1.3449 LearningRate 0.0110 Epoch: 13 Global Step: 222890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:13:33,496-Speed 5195.10 samples/sec Loss 1.3321 LearningRate 0.0110 Epoch: 13 Global Step: 222900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:13:35,475-Speed 5176.52 samples/sec Loss 1.3638 LearningRate 0.0110 Epoch: 13 Global Step: 222910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:13:37,457-Speed 5169.99 samples/sec Loss 1.3584 LearningRate 0.0110 Epoch: 13 Global Step: 222920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:13:39,443-Speed 5156.53 samples/sec Loss 1.3257 LearningRate 0.0110 Epoch: 13 Global Step: 222930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:13:41,410-Speed 5208.27 samples/sec Loss 1.3037 LearningRate 0.0110 Epoch: 13 Global Step: 222940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:13:43,382-Speed 5193.81 samples/sec Loss 1.3703 LearningRate 0.0110 Epoch: 13 Global Step: 222950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:13:45,352-Speed 5201.18 samples/sec Loss 1.3191 LearningRate 0.0110 Epoch: 13 Global Step: 222960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:13:47,354-Speed 5115.32 samples/sec Loss 1.3754 LearningRate 0.0110 Epoch: 13 Global Step: 222970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:13:49,325-Speed 5197.28 samples/sec Loss 1.3666 LearningRate 0.0110 Epoch: 13 Global Step: 222980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:13:51,306-Speed 5172.59 samples/sec Loss 1.3246 LearningRate 0.0110 Epoch: 13 Global Step: 222990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:13:53,280-Speed 5189.57 samples/sec Loss 1.3559 LearningRate 0.0110 Epoch: 13 Global Step: 223000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:13:55,244-Speed 5215.22 samples/sec Loss 1.3395 LearningRate 0.0110 Epoch: 13 Global Step: 223010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:13:57,202-Speed 5231.34 samples/sec Loss 1.3788 LearningRate 0.0110 Epoch: 13 Global Step: 223020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:13:59,181-Speed 5175.47 samples/sec Loss 1.3588 LearningRate 0.0110 Epoch: 13 Global Step: 223030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:14:01,157-Speed 5183.85 samples/sec Loss 1.3703 LearningRate 0.0110 Epoch: 13 Global Step: 223040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:14:03,138-Speed 5170.31 samples/sec Loss 1.3200 LearningRate 0.0110 Epoch: 13 Global Step: 223050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:14:05,108-Speed 5199.36 samples/sec Loss 1.4173 LearningRate 0.0110 Epoch: 13 Global Step: 223060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:14:07,077-Speed 5202.58 samples/sec Loss 1.3824 LearningRate 0.0110 Epoch: 13 Global Step: 223070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:14:09,064-Speed 5156.40 samples/sec Loss 1.3667 LearningRate 0.0110 Epoch: 13 Global Step: 223080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:14:11,079-Speed 5084.05 samples/sec Loss 1.4157 LearningRate 0.0110 Epoch: 13 Global Step: 223090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:14:13,056-Speed 5181.14 samples/sec Loss 1.3352 LearningRate 0.0110 Epoch: 13 Global Step: 223100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:14:15,039-Speed 5164.93 samples/sec Loss 1.3339 LearningRate 0.0110 Epoch: 13 Global Step: 223110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:14:17,013-Speed 5189.16 samples/sec Loss 1.3180 LearningRate 0.0110 Epoch: 13 Global Step: 223120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:14:18,978-Speed 5212.42 samples/sec Loss 1.3651 LearningRate 0.0110 Epoch: 13 Global Step: 223130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:14:20,948-Speed 5202.27 samples/sec Loss 1.3749 LearningRate 0.0110 Epoch: 13 Global Step: 223140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:14:22,945-Speed 5127.51 samples/sec Loss 1.3310 LearningRate 0.0110 Epoch: 13 Global Step: 223150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:14:24,978-Speed 5038.20 samples/sec Loss 1.3428 LearningRate 0.0110 Epoch: 13 Global Step: 223160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:14:26,964-Speed 5157.88 samples/sec Loss 1.3339 LearningRate 0.0110 Epoch: 13 Global Step: 223170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:14:28,937-Speed 5192.33 samples/sec Loss 1.3472 LearningRate 0.0110 Epoch: 13 Global Step: 223180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:14:30,902-Speed 5213.46 samples/sec Loss 1.3806 LearningRate 0.0110 Epoch: 13 Global Step: 223190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:14:32,868-Speed 5210.26 samples/sec Loss 1.3645 LearningRate 0.0110 Epoch: 13 Global Step: 223200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:14:34,838-Speed 5200.30 samples/sec Loss 1.3758 LearningRate 0.0110 Epoch: 13 Global Step: 223210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:14:36,827-Speed 5149.49 samples/sec Loss 1.3317 LearningRate 0.0110 Epoch: 13 Global Step: 223220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:14:38,810-Speed 5166.39 samples/sec Loss 1.3447 LearningRate 0.0110 Epoch: 13 Global Step: 223230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:14:40,795-Speed 5161.49 samples/sec Loss 1.3661 LearningRate 0.0110 Epoch: 13 Global Step: 223240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:14:42,756-Speed 5221.01 samples/sec Loss 1.2998 LearningRate 0.0110 Epoch: 13 Global Step: 223250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:14:44,712-Speed 5238.58 samples/sec Loss 1.3477 LearningRate 0.0110 Epoch: 13 Global Step: 223260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:14:46,736-Speed 5061.22 samples/sec Loss 1.3484 LearningRate 0.0110 Epoch: 13 Global Step: 223270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:14:48,732-Speed 5131.61 samples/sec Loss 1.3092 LearningRate 0.0110 Epoch: 13 Global Step: 223280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:14:50,733-Speed 5119.44 samples/sec Loss 1.3534 LearningRate 0.0110 Epoch: 13 Global Step: 223290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:14:52,726-Speed 5137.40 samples/sec Loss 1.3835 LearningRate 0.0110 Epoch: 13 Global Step: 223300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:14:54,706-Speed 5176.31 samples/sec Loss 1.3451 LearningRate 0.0110 Epoch: 13 Global Step: 223310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:14:56,674-Speed 5204.28 samples/sec Loss 1.4109 LearningRate 0.0110 Epoch: 13 Global Step: 223320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:14:58,653-Speed 5175.30 samples/sec Loss 1.3141 LearningRate 0.0110 Epoch: 13 Global Step: 223330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:15:00,621-Speed 5205.07 samples/sec Loss 1.3257 LearningRate 0.0110 Epoch: 13 Global Step: 223340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:15:02,588-Speed 5209.06 samples/sec Loss 1.3602 LearningRate 0.0110 Epoch: 13 Global Step: 223350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:15:04,560-Speed 5192.70 samples/sec Loss 1.3471 LearningRate 0.0109 Epoch: 13 Global Step: 223360 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:15:06,521-Speed 5225.40 samples/sec Loss 1.3773 LearningRate 0.0109 Epoch: 13 Global Step: 223370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:15:08,491-Speed 5199.50 samples/sec Loss 1.3434 LearningRate 0.0109 Epoch: 13 Global Step: 223380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:15:10,476-Speed 5160.28 samples/sec Loss 1.3534 LearningRate 0.0109 Epoch: 13 Global Step: 223390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:15:12,453-Speed 5181.96 samples/sec Loss 1.3690 LearningRate 0.0109 Epoch: 13 Global Step: 223400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:15:14,446-Speed 5139.65 samples/sec Loss 1.3260 LearningRate 0.0109 Epoch: 13 Global Step: 223410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:15:16,417-Speed 5196.38 samples/sec Loss 1.3730 LearningRate 0.0109 Epoch: 13 Global Step: 223420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:15:18,380-Speed 5217.61 samples/sec Loss 1.2776 LearningRate 0.0109 Epoch: 13 Global Step: 223430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:15:20,348-Speed 5206.42 samples/sec Loss 1.3344 LearningRate 0.0109 Epoch: 13 Global Step: 223440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:15:22,328-Speed 5174.23 samples/sec Loss 1.3405 LearningRate 0.0109 Epoch: 13 Global Step: 223450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:15:24,305-Speed 5181.26 samples/sec Loss 1.3163 LearningRate 0.0109 Epoch: 13 Global Step: 223460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:15:26,296-Speed 5142.95 samples/sec Loss 1.4100 LearningRate 0.0109 Epoch: 13 Global Step: 223470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:15:28,277-Speed 5171.86 samples/sec Loss 1.3875 LearningRate 0.0109 Epoch: 13 Global Step: 223480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:15:30,266-Speed 5150.24 samples/sec Loss 1.3451 LearningRate 0.0109 Epoch: 13 Global Step: 223490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:15:32,244-Speed 5179.09 samples/sec Loss 1.3323 LearningRate 0.0109 Epoch: 13 Global Step: 223500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:15:34,213-Speed 5199.73 samples/sec Loss 1.3130 LearningRate 0.0109 Epoch: 13 Global Step: 223510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:15:36,201-Speed 5154.51 samples/sec Loss 1.3809 LearningRate 0.0109 Epoch: 13 Global Step: 223520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:15:38,186-Speed 5160.84 samples/sec Loss 1.3862 LearningRate 0.0109 Epoch: 13 Global Step: 223530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:15:40,167-Speed 5171.25 samples/sec Loss 1.3004 LearningRate 0.0109 Epoch: 13 Global Step: 223540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:15:42,145-Speed 5179.34 samples/sec Loss 1.3264 LearningRate 0.0109 Epoch: 13 Global Step: 223550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:15:44,111-Speed 5208.20 samples/sec Loss 1.3518 LearningRate 0.0109 Epoch: 13 Global Step: 223560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:15:46,082-Speed 5198.89 samples/sec Loss 1.3640 LearningRate 0.0109 Epoch: 13 Global Step: 223570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:15:48,058-Speed 5182.77 samples/sec Loss 1.3418 LearningRate 0.0109 Epoch: 13 Global Step: 223580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:15:50,034-Speed 5182.52 samples/sec Loss 1.3153 LearningRate 0.0109 Epoch: 13 Global Step: 223590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:15:52,003-Speed 5203.80 samples/sec Loss 1.3087 LearningRate 0.0109 Epoch: 13 Global Step: 223600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:15:53,964-Speed 5224.67 samples/sec Loss 1.3844 LearningRate 0.0109 Epoch: 13 Global Step: 223610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:15:55,947-Speed 5163.11 samples/sec Loss 1.3579 LearningRate 0.0109 Epoch: 13 Global Step: 223620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:15:57,930-Speed 5167.48 samples/sec Loss 1.3419 LearningRate 0.0109 Epoch: 13 Global Step: 223630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:15:59,912-Speed 5169.93 samples/sec Loss 1.3166 LearningRate 0.0109 Epoch: 13 Global Step: 223640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:16:01,893-Speed 5170.28 samples/sec Loss 1.3960 LearningRate 0.0109 Epoch: 13 Global Step: 223650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:16:03,861-Speed 5205.08 samples/sec Loss 1.3054 LearningRate 0.0109 Epoch: 13 Global Step: 223660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:16:05,839-Speed 5177.56 samples/sec Loss 1.3312 LearningRate 0.0109 Epoch: 13 Global Step: 223670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:16:07,805-Speed 5211.32 samples/sec Loss 1.3452 LearningRate 0.0109 Epoch: 13 Global Step: 223680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:16:09,787-Speed 5168.01 samples/sec Loss 1.3719 LearningRate 0.0109 Epoch: 13 Global Step: 223690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:16:11,791-Speed 5109.92 samples/sec Loss 1.3829 LearningRate 0.0109 Epoch: 13 Global Step: 223700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:16:13,772-Speed 5171.08 samples/sec Loss 1.3535 LearningRate 0.0109 Epoch: 13 Global Step: 223710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:16:15,784-Speed 5090.71 samples/sec Loss 1.3677 LearningRate 0.0109 Epoch: 13 Global Step: 223720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:16:17,760-Speed 5184.89 samples/sec Loss 1.3351 LearningRate 0.0109 Epoch: 13 Global Step: 223730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:16:19,725-Speed 5211.80 samples/sec Loss 1.3827 LearningRate 0.0109 Epoch: 13 Global Step: 223740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:16:21,766-Speed 5021.04 samples/sec Loss 1.4052 LearningRate 0.0109 Epoch: 13 Global Step: 223750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:16:23,739-Speed 5191.98 samples/sec Loss 1.3624 LearningRate 0.0109 Epoch: 13 Global Step: 223760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:16:25,747-Speed 5100.30 samples/sec Loss 1.3594 LearningRate 0.0109 Epoch: 13 Global Step: 223770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:16:27,718-Speed 5198.07 samples/sec Loss 1.3436 LearningRate 0.0109 Epoch: 13 Global Step: 223780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:16:29,696-Speed 5178.39 samples/sec Loss 1.3954 LearningRate 0.0109 Epoch: 13 Global Step: 223790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:16:31,664-Speed 5205.91 samples/sec Loss 1.3789 LearningRate 0.0109 Epoch: 13 Global Step: 223800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:16:33,650-Speed 5157.05 samples/sec Loss 1.3798 LearningRate 0.0109 Epoch: 13 Global Step: 223810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:16:35,638-Speed 5151.40 samples/sec Loss 1.3229 LearningRate 0.0109 Epoch: 13 Global Step: 223820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:16:37,646-Speed 5100.91 samples/sec Loss 1.3675 LearningRate 0.0109 Epoch: 13 Global Step: 223830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:16:39,631-Speed 5160.48 samples/sec Loss 1.3655 LearningRate 0.0109 Epoch: 13 Global Step: 223840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:16:41,612-Speed 5172.64 samples/sec Loss 1.3795 LearningRate 0.0109 Epoch: 13 Global Step: 223850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:16:43,591-Speed 5175.36 samples/sec Loss 1.3250 LearningRate 0.0109 Epoch: 13 Global Step: 223860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:16:45,578-Speed 5155.21 samples/sec Loss 1.3301 LearningRate 0.0108 Epoch: 13 Global Step: 223870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:16:47,561-Speed 5165.97 samples/sec Loss 1.4011 LearningRate 0.0108 Epoch: 13 Global Step: 223880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:16:49,614-Speed 4989.33 samples/sec Loss 1.3617 LearningRate 0.0108 Epoch: 13 Global Step: 223890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:16:51,586-Speed 5195.22 samples/sec Loss 1.3724 LearningRate 0.0108 Epoch: 13 Global Step: 223900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:16:53,557-Speed 5196.00 samples/sec Loss 1.4313 LearningRate 0.0108 Epoch: 13 Global Step: 223910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:16:55,526-Speed 5203.66 samples/sec Loss 1.3133 LearningRate 0.0108 Epoch: 13 Global Step: 223920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:16:57,487-Speed 5221.61 samples/sec Loss 1.3520 LearningRate 0.0108 Epoch: 13 Global Step: 223930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:16:59,503-Speed 5081.98 samples/sec Loss 1.4368 LearningRate 0.0108 Epoch: 13 Global Step: 223940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:17:01,513-Speed 5095.16 samples/sec Loss 1.4011 LearningRate 0.0108 Epoch: 13 Global Step: 223950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:17:03,485-Speed 5196.17 samples/sec Loss 1.3286 LearningRate 0.0108 Epoch: 13 Global Step: 223960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:17:05,480-Speed 5134.81 samples/sec Loss 1.3659 LearningRate 0.0108 Epoch: 13 Global Step: 223970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:17:07,466-Speed 5159.49 samples/sec Loss 1.3384 LearningRate 0.0108 Epoch: 13 Global Step: 223980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:17:09,442-Speed 5182.81 samples/sec Loss 1.3091 LearningRate 0.0108 Epoch: 13 Global Step: 223990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:17:11,424-Speed 5169.13 samples/sec Loss 1.3851 LearningRate 0.0108 Epoch: 13 Global Step: 224000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:17:38,617-[lfw][224000]XNorm: 20.924469 Training: 2022-04-11 14:17:38,617-[lfw][224000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 14:17:38,618-[lfw][224000]Accuracy-Highest: 0.99833 Training: 2022-04-11 14:18:09,442-[cfp_fp][224000]XNorm: 20.605251 Training: 2022-04-11 14:18:09,443-[cfp_fp][224000]Accuracy-Flip: 0.98771+-0.00434 Training: 2022-04-11 14:18:09,443-[cfp_fp][224000]Accuracy-Highest: 0.98771 Training: 2022-04-11 14:18:36,268-[agedb_30][224000]XNorm: 21.752406 Training: 2022-04-11 14:18:36,268-[agedb_30][224000]Accuracy-Flip: 0.98233+-0.00797 Training: 2022-04-11 14:18:36,269-[agedb_30][224000]Accuracy-Highest: 0.98250 Training: 2022-04-11 14:18:38,251-Speed 117.94 samples/sec Loss 1.3741 LearningRate 0.0108 Epoch: 13 Global Step: 224010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:18:40,242-Speed 5143.59 samples/sec Loss 1.3488 LearningRate 0.0108 Epoch: 13 Global Step: 224020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:18:42,208-Speed 5209.32 samples/sec Loss 1.3465 LearningRate 0.0108 Epoch: 13 Global Step: 224030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:18:44,178-Speed 5200.41 samples/sec Loss 1.3742 LearningRate 0.0108 Epoch: 13 Global Step: 224040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:18:46,154-Speed 5184.93 samples/sec Loss 1.3152 LearningRate 0.0108 Epoch: 13 Global Step: 224050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:18:48,141-Speed 5155.66 samples/sec Loss 1.3813 LearningRate 0.0108 Epoch: 13 Global Step: 224060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:18:50,142-Speed 5118.64 samples/sec Loss 1.3541 LearningRate 0.0108 Epoch: 13 Global Step: 224070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:18:52,116-Speed 5190.27 samples/sec Loss 1.3408 LearningRate 0.0108 Epoch: 13 Global Step: 224080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:18:54,076-Speed 5225.21 samples/sec Loss 1.3792 LearningRate 0.0108 Epoch: 13 Global Step: 224090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:18:56,046-Speed 5200.32 samples/sec Loss 1.3785 LearningRate 0.0108 Epoch: 13 Global Step: 224100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:18:58,009-Speed 5217.79 samples/sec Loss 1.3820 LearningRate 0.0108 Epoch: 13 Global Step: 224110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:18:59,974-Speed 5212.15 samples/sec Loss 1.3714 LearningRate 0.0108 Epoch: 13 Global Step: 224120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:19:01,954-Speed 5174.10 samples/sec Loss 1.3257 LearningRate 0.0108 Epoch: 13 Global Step: 224130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:19:03,947-Speed 5139.72 samples/sec Loss 1.3810 LearningRate 0.0108 Epoch: 13 Global Step: 224140 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:19:05,917-Speed 5199.14 samples/sec Loss 1.3296 LearningRate 0.0108 Epoch: 13 Global Step: 224150 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:19:07,877-Speed 5228.81 samples/sec Loss 1.4239 LearningRate 0.0108 Epoch: 13 Global Step: 224160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:19:09,871-Speed 5135.86 samples/sec Loss 1.3155 LearningRate 0.0108 Epoch: 13 Global Step: 224170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:19:11,853-Speed 5168.76 samples/sec Loss 1.3855 LearningRate 0.0108 Epoch: 13 Global Step: 224180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:19:13,842-Speed 5149.28 samples/sec Loss 1.3436 LearningRate 0.0108 Epoch: 13 Global Step: 224190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:19:15,823-Speed 5171.70 samples/sec Loss 1.4018 LearningRate 0.0108 Epoch: 13 Global Step: 224200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:19:17,791-Speed 5204.99 samples/sec Loss 1.3371 LearningRate 0.0108 Epoch: 13 Global Step: 224210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:19:19,766-Speed 5184.76 samples/sec Loss 1.3914 LearningRate 0.0108 Epoch: 13 Global Step: 224220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:19:21,755-Speed 5152.42 samples/sec Loss 1.3647 LearningRate 0.0108 Epoch: 13 Global Step: 224230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:19:23,729-Speed 5188.53 samples/sec Loss 1.3885 LearningRate 0.0108 Epoch: 13 Global Step: 224240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:19:25,705-Speed 5182.32 samples/sec Loss 1.3619 LearningRate 0.0108 Epoch: 13 Global Step: 224250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:19:27,685-Speed 5175.05 samples/sec Loss 1.3350 LearningRate 0.0108 Epoch: 13 Global Step: 224260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:19:29,673-Speed 5152.48 samples/sec Loss 1.3636 LearningRate 0.0108 Epoch: 13 Global Step: 224270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:19:31,665-Speed 5142.44 samples/sec Loss 1.3460 LearningRate 0.0108 Epoch: 13 Global Step: 224280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:19:33,641-Speed 5184.28 samples/sec Loss 1.3758 LearningRate 0.0108 Epoch: 13 Global Step: 224290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:19:35,674-Speed 5037.28 samples/sec Loss 1.3467 LearningRate 0.0108 Epoch: 13 Global Step: 224300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:19:37,650-Speed 5184.32 samples/sec Loss 1.3639 LearningRate 0.0108 Epoch: 13 Global Step: 224310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:19:39,626-Speed 5184.68 samples/sec Loss 1.3703 LearningRate 0.0108 Epoch: 13 Global Step: 224320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:19:41,595-Speed 5202.30 samples/sec Loss 1.3839 LearningRate 0.0108 Epoch: 13 Global Step: 224330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:19:43,566-Speed 5195.63 samples/sec Loss 1.3712 LearningRate 0.0108 Epoch: 13 Global Step: 224340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:19:45,565-Speed 5125.77 samples/sec Loss 1.3649 LearningRate 0.0108 Epoch: 13 Global Step: 224350 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:19:47,577-Speed 5090.64 samples/sec Loss 1.3594 LearningRate 0.0108 Epoch: 13 Global Step: 224360 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:19:49,549-Speed 5195.85 samples/sec Loss 1.3309 LearningRate 0.0107 Epoch: 13 Global Step: 224370 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:19:51,544-Speed 5135.75 samples/sec Loss 1.3729 LearningRate 0.0107 Epoch: 13 Global Step: 224380 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:19:53,517-Speed 5190.07 samples/sec Loss 1.3810 LearningRate 0.0107 Epoch: 13 Global Step: 224390 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:19:55,481-Speed 5215.89 samples/sec Loss 1.3564 LearningRate 0.0107 Epoch: 13 Global Step: 224400 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:19:57,442-Speed 5222.88 samples/sec Loss 1.3397 LearningRate 0.0107 Epoch: 13 Global Step: 224410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:19:59,423-Speed 5171.26 samples/sec Loss 1.3575 LearningRate 0.0107 Epoch: 13 Global Step: 224420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:20:01,404-Speed 5172.40 samples/sec Loss 1.3897 LearningRate 0.0107 Epoch: 13 Global Step: 224430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:20:03,368-Speed 5213.03 samples/sec Loss 1.3523 LearningRate 0.0107 Epoch: 13 Global Step: 224440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:20:05,364-Speed 5133.07 samples/sec Loss 1.3277 LearningRate 0.0107 Epoch: 13 Global Step: 224450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:20:07,336-Speed 5194.28 samples/sec Loss 1.3551 LearningRate 0.0107 Epoch: 13 Global Step: 224460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:20:09,318-Speed 5169.56 samples/sec Loss 1.4045 LearningRate 0.0107 Epoch: 13 Global Step: 224470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:20:11,330-Speed 5090.82 samples/sec Loss 1.3305 LearningRate 0.0107 Epoch: 13 Global Step: 224480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:20:13,315-Speed 5160.83 samples/sec Loss 1.2929 LearningRate 0.0107 Epoch: 13 Global Step: 224490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:20:15,289-Speed 5188.17 samples/sec Loss 1.3402 LearningRate 0.0107 Epoch: 13 Global Step: 224500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:20:17,273-Speed 5163.18 samples/sec Loss 1.3828 LearningRate 0.0107 Epoch: 13 Global Step: 224510 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:20:19,235-Speed 5222.49 samples/sec Loss 1.4407 LearningRate 0.0107 Epoch: 13 Global Step: 224520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:20:21,218-Speed 5164.05 samples/sec Loss 1.3521 LearningRate 0.0107 Epoch: 13 Global Step: 224530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:20:23,205-Speed 5155.11 samples/sec Loss 1.4093 LearningRate 0.0107 Epoch: 13 Global Step: 224540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:20:25,170-Speed 5212.33 samples/sec Loss 1.3513 LearningRate 0.0107 Epoch: 13 Global Step: 224550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:20:27,194-Speed 5061.31 samples/sec Loss 1.3501 LearningRate 0.0107 Epoch: 13 Global Step: 224560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:20:29,159-Speed 5214.06 samples/sec Loss 1.3507 LearningRate 0.0107 Epoch: 13 Global Step: 224570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:20:31,129-Speed 5199.03 samples/sec Loss 1.3544 LearningRate 0.0107 Epoch: 13 Global Step: 224580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:20:33,094-Speed 5212.38 samples/sec Loss 1.3946 LearningRate 0.0107 Epoch: 13 Global Step: 224590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:20:35,062-Speed 5205.91 samples/sec Loss 1.2986 LearningRate 0.0107 Epoch: 13 Global Step: 224600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:20:37,036-Speed 5189.79 samples/sec Loss 1.3804 LearningRate 0.0107 Epoch: 13 Global Step: 224610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:20:39,005-Speed 5201.36 samples/sec Loss 1.3361 LearningRate 0.0107 Epoch: 13 Global Step: 224620 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:20:40,970-Speed 5214.55 samples/sec Loss 1.3542 LearningRate 0.0107 Epoch: 13 Global Step: 224630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:20:42,934-Speed 5216.09 samples/sec Loss 1.4210 LearningRate 0.0107 Epoch: 13 Global Step: 224640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:20:44,900-Speed 5209.72 samples/sec Loss 1.3626 LearningRate 0.0107 Epoch: 13 Global Step: 224650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:20:46,884-Speed 5163.65 samples/sec Loss 1.3707 LearningRate 0.0107 Epoch: 13 Global Step: 224660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:20:48,853-Speed 5200.72 samples/sec Loss 1.3643 LearningRate 0.0107 Epoch: 13 Global Step: 224670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:20:50,819-Speed 5211.57 samples/sec Loss 1.3731 LearningRate 0.0107 Epoch: 13 Global Step: 224680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:20:52,781-Speed 5218.42 samples/sec Loss 1.3579 LearningRate 0.0107 Epoch: 13 Global Step: 224690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:20:54,746-Speed 5215.07 samples/sec Loss 1.4119 LearningRate 0.0107 Epoch: 13 Global Step: 224700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:20:56,712-Speed 5211.67 samples/sec Loss 1.3477 LearningRate 0.0107 Epoch: 13 Global Step: 224710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:20:58,686-Speed 5188.53 samples/sec Loss 1.3391 LearningRate 0.0107 Epoch: 13 Global Step: 224720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:21:00,648-Speed 5219.32 samples/sec Loss 1.2796 LearningRate 0.0107 Epoch: 13 Global Step: 224730 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:21:02,619-Speed 5197.96 samples/sec Loss 1.4048 LearningRate 0.0107 Epoch: 13 Global Step: 224740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:21:04,601-Speed 5167.82 samples/sec Loss 1.3697 LearningRate 0.0107 Epoch: 13 Global Step: 224750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:21:06,574-Speed 5191.30 samples/sec Loss 1.3232 LearningRate 0.0107 Epoch: 13 Global Step: 224760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:21:08,530-Speed 5236.50 samples/sec Loss 1.4308 LearningRate 0.0107 Epoch: 13 Global Step: 224770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:21:10,550-Speed 5071.48 samples/sec Loss 1.3564 LearningRate 0.0107 Epoch: 13 Global Step: 224780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:21:12,523-Speed 5194.67 samples/sec Loss 1.3763 LearningRate 0.0107 Epoch: 13 Global Step: 224790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:21:14,519-Speed 5130.93 samples/sec Loss 1.3548 LearningRate 0.0107 Epoch: 13 Global Step: 224800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:21:16,481-Speed 5221.77 samples/sec Loss 1.3740 LearningRate 0.0107 Epoch: 13 Global Step: 224810 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:21:18,442-Speed 5222.43 samples/sec Loss 1.3498 LearningRate 0.0107 Epoch: 13 Global Step: 224820 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:21:20,413-Speed 5197.71 samples/sec Loss 1.3402 LearningRate 0.0107 Epoch: 13 Global Step: 224830 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:21:22,388-Speed 5186.67 samples/sec Loss 1.3527 LearningRate 0.0107 Epoch: 13 Global Step: 224840 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:21:24,357-Speed 5201.72 samples/sec Loss 1.4260 LearningRate 0.0107 Epoch: 13 Global Step: 224850 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:21:26,325-Speed 5206.12 samples/sec Loss 1.4225 LearningRate 0.0107 Epoch: 13 Global Step: 224860 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:21:28,292-Speed 5208.02 samples/sec Loss 1.3296 LearningRate 0.0107 Epoch: 13 Global Step: 224870 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:21:30,262-Speed 5199.21 samples/sec Loss 1.4146 LearningRate 0.0107 Epoch: 13 Global Step: 224880 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:21:32,226-Speed 5215.15 samples/sec Loss 1.3569 LearningRate 0.0106 Epoch: 13 Global Step: 224890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:21:34,218-Speed 5143.18 samples/sec Loss 1.3590 LearningRate 0.0106 Epoch: 13 Global Step: 224900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:21:36,220-Speed 5115.26 samples/sec Loss 1.3927 LearningRate 0.0106 Epoch: 13 Global Step: 224910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:21:38,205-Speed 5161.72 samples/sec Loss 1.3532 LearningRate 0.0106 Epoch: 13 Global Step: 224920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:21:40,168-Speed 5216.06 samples/sec Loss 1.3955 LearningRate 0.0106 Epoch: 13 Global Step: 224930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:21:42,132-Speed 5216.98 samples/sec Loss 1.3711 LearningRate 0.0106 Epoch: 13 Global Step: 224940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:21:44,094-Speed 5221.88 samples/sec Loss 1.3926 LearningRate 0.0106 Epoch: 13 Global Step: 224950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:21:46,056-Speed 5221.88 samples/sec Loss 1.3888 LearningRate 0.0106 Epoch: 13 Global Step: 224960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:21:48,018-Speed 5218.92 samples/sec Loss 1.3305 LearningRate 0.0106 Epoch: 13 Global Step: 224970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:21:50,000-Speed 5168.80 samples/sec Loss 1.3680 LearningRate 0.0106 Epoch: 13 Global Step: 224980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:21:51,996-Speed 5132.24 samples/sec Loss 1.3767 LearningRate 0.0106 Epoch: 13 Global Step: 224990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:21:53,959-Speed 5217.70 samples/sec Loss 1.4012 LearningRate 0.0106 Epoch: 13 Global Step: 225000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:21:55,936-Speed 5181.02 samples/sec Loss 1.3484 LearningRate 0.0106 Epoch: 13 Global Step: 225010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:21:57,932-Speed 5132.52 samples/sec Loss 1.3505 LearningRate 0.0106 Epoch: 13 Global Step: 225020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:21:59,903-Speed 5199.60 samples/sec Loss 1.3812 LearningRate 0.0106 Epoch: 13 Global Step: 225030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:22:01,871-Speed 5203.19 samples/sec Loss 1.3667 LearningRate 0.0106 Epoch: 13 Global Step: 225040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:22:03,868-Speed 5130.39 samples/sec Loss 1.3636 LearningRate 0.0106 Epoch: 13 Global Step: 225050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:22:05,854-Speed 5157.30 samples/sec Loss 1.4074 LearningRate 0.0106 Epoch: 13 Global Step: 225060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:22:07,819-Speed 5211.91 samples/sec Loss 1.3619 LearningRate 0.0106 Epoch: 13 Global Step: 225070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:22:09,784-Speed 5212.46 samples/sec Loss 1.3530 LearningRate 0.0106 Epoch: 13 Global Step: 225080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:22:11,766-Speed 5169.33 samples/sec Loss 1.4061 LearningRate 0.0106 Epoch: 13 Global Step: 225090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:22:13,770-Speed 5111.66 samples/sec Loss 1.3684 LearningRate 0.0106 Epoch: 13 Global Step: 225100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:22:15,742-Speed 5193.14 samples/sec Loss 1.4030 LearningRate 0.0106 Epoch: 13 Global Step: 225110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:22:17,725-Speed 5167.40 samples/sec Loss 1.3922 LearningRate 0.0106 Epoch: 13 Global Step: 225120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:22:19,694-Speed 5201.91 samples/sec Loss 1.3563 LearningRate 0.0106 Epoch: 13 Global Step: 225130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:22:21,673-Speed 5175.73 samples/sec Loss 1.3932 LearningRate 0.0106 Epoch: 13 Global Step: 225140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:22:23,697-Speed 5062.29 samples/sec Loss 1.3793 LearningRate 0.0106 Epoch: 13 Global Step: 225150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:22:25,696-Speed 5122.39 samples/sec Loss 1.4069 LearningRate 0.0106 Epoch: 13 Global Step: 225160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:22:27,669-Speed 5192.83 samples/sec Loss 1.3781 LearningRate 0.0106 Epoch: 13 Global Step: 225170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:22:29,639-Speed 5200.06 samples/sec Loss 1.3570 LearningRate 0.0106 Epoch: 13 Global Step: 225180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:22:31,618-Speed 5177.40 samples/sec Loss 1.3921 LearningRate 0.0106 Epoch: 13 Global Step: 225190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:22:33,582-Speed 5212.92 samples/sec Loss 1.3562 LearningRate 0.0106 Epoch: 13 Global Step: 225200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:22:35,593-Speed 5095.59 samples/sec Loss 1.3971 LearningRate 0.0106 Epoch: 13 Global Step: 225210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:22:37,567-Speed 5188.82 samples/sec Loss 1.3232 LearningRate 0.0106 Epoch: 13 Global Step: 225220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:22:39,544-Speed 5181.25 samples/sec Loss 1.3662 LearningRate 0.0106 Epoch: 13 Global Step: 225230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:22:41,513-Speed 5202.96 samples/sec Loss 1.3792 LearningRate 0.0106 Epoch: 13 Global Step: 225240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:22:43,475-Speed 5221.93 samples/sec Loss 1.3651 LearningRate 0.0106 Epoch: 13 Global Step: 225250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:22:45,477-Speed 5117.10 samples/sec Loss 1.3478 LearningRate 0.0106 Epoch: 13 Global Step: 225260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:22:47,446-Speed 5199.79 samples/sec Loss 1.3821 LearningRate 0.0106 Epoch: 13 Global Step: 225270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:22:49,498-Speed 4992.73 samples/sec Loss 1.3785 LearningRate 0.0106 Epoch: 13 Global Step: 225280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:22:51,540-Speed 5015.99 samples/sec Loss 1.3805 LearningRate 0.0106 Epoch: 13 Global Step: 225290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:22:53,510-Speed 5198.53 samples/sec Loss 1.3906 LearningRate 0.0106 Epoch: 13 Global Step: 225300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:22:55,477-Speed 5207.64 samples/sec Loss 1.4117 LearningRate 0.0106 Epoch: 13 Global Step: 225310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:22:57,457-Speed 5173.75 samples/sec Loss 1.3626 LearningRate 0.0106 Epoch: 13 Global Step: 225320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:22:59,463-Speed 5108.06 samples/sec Loss 1.3651 LearningRate 0.0106 Epoch: 13 Global Step: 225330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:23:01,442-Speed 5176.72 samples/sec Loss 1.3874 LearningRate 0.0106 Epoch: 13 Global Step: 225340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:23:03,419-Speed 5181.82 samples/sec Loss 1.3605 LearningRate 0.0106 Epoch: 13 Global Step: 225350 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:23:05,379-Speed 5225.18 samples/sec Loss 1.3494 LearningRate 0.0106 Epoch: 13 Global Step: 225360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:23:07,344-Speed 5212.23 samples/sec Loss 1.3078 LearningRate 0.0106 Epoch: 13 Global Step: 225370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:23:09,314-Speed 5200.51 samples/sec Loss 1.3635 LearningRate 0.0106 Epoch: 13 Global Step: 225380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:23:11,281-Speed 5206.51 samples/sec Loss 1.3697 LearningRate 0.0106 Epoch: 13 Global Step: 225390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:23:13,261-Speed 5173.77 samples/sec Loss 1.3537 LearningRate 0.0105 Epoch: 13 Global Step: 225400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:23:15,240-Speed 5175.96 samples/sec Loss 1.4263 LearningRate 0.0105 Epoch: 13 Global Step: 225410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:23:17,243-Speed 5113.78 samples/sec Loss 1.4015 LearningRate 0.0105 Epoch: 13 Global Step: 225420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:23:19,223-Speed 5174.96 samples/sec Loss 1.3717 LearningRate 0.0105 Epoch: 13 Global Step: 225430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:23:21,194-Speed 5198.23 samples/sec Loss 1.3898 LearningRate 0.0105 Epoch: 13 Global Step: 225440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:23:23,162-Speed 5202.73 samples/sec Loss 1.3464 LearningRate 0.0105 Epoch: 13 Global Step: 225450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:23:25,131-Speed 5204.41 samples/sec Loss 1.3284 LearningRate 0.0105 Epoch: 13 Global Step: 225460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:23:27,118-Speed 5155.52 samples/sec Loss 1.3629 LearningRate 0.0105 Epoch: 13 Global Step: 225470 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:23:29,113-Speed 5132.94 samples/sec Loss 1.3937 LearningRate 0.0105 Epoch: 13 Global Step: 225480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:23:31,084-Speed 5197.38 samples/sec Loss 1.4137 LearningRate 0.0105 Epoch: 13 Global Step: 225490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:23:33,049-Speed 5212.51 samples/sec Loss 1.4321 LearningRate 0.0105 Epoch: 13 Global Step: 225500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:23:35,046-Speed 5129.81 samples/sec Loss 1.4016 LearningRate 0.0105 Epoch: 13 Global Step: 225510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:23:37,028-Speed 5168.13 samples/sec Loss 1.3590 LearningRate 0.0105 Epoch: 13 Global Step: 225520 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:23:39,008-Speed 5172.24 samples/sec Loss 1.4029 LearningRate 0.0105 Epoch: 13 Global Step: 225530 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:23:40,990-Speed 5169.62 samples/sec Loss 1.3582 LearningRate 0.0105 Epoch: 13 Global Step: 225540 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:23:42,957-Speed 5208.63 samples/sec Loss 1.4074 LearningRate 0.0105 Epoch: 13 Global Step: 225550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:23:44,921-Speed 5215.53 samples/sec Loss 1.3420 LearningRate 0.0105 Epoch: 13 Global Step: 225560 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:23:46,883-Speed 5220.02 samples/sec Loss 1.3585 LearningRate 0.0105 Epoch: 13 Global Step: 225570 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:23:48,847-Speed 5217.30 samples/sec Loss 1.3949 LearningRate 0.0105 Epoch: 13 Global Step: 225580 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:23:50,812-Speed 5212.57 samples/sec Loss 1.3594 LearningRate 0.0105 Epoch: 13 Global Step: 225590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:23:52,777-Speed 5212.32 samples/sec Loss 1.3185 LearningRate 0.0105 Epoch: 13 Global Step: 225600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:23:54,749-Speed 5194.10 samples/sec Loss 1.3924 LearningRate 0.0105 Epoch: 13 Global Step: 225610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:23:56,716-Speed 5207.68 samples/sec Loss 1.3956 LearningRate 0.0105 Epoch: 13 Global Step: 225620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:23:58,691-Speed 5185.76 samples/sec Loss 1.3951 LearningRate 0.0105 Epoch: 13 Global Step: 225630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:24:00,682-Speed 5145.95 samples/sec Loss 1.3701 LearningRate 0.0105 Epoch: 13 Global Step: 225640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:24:02,670-Speed 5152.81 samples/sec Loss 1.3458 LearningRate 0.0105 Epoch: 13 Global Step: 225650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:24:04,644-Speed 5188.14 samples/sec Loss 1.3338 LearningRate 0.0105 Epoch: 13 Global Step: 225660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:24:06,611-Speed 5209.25 samples/sec Loss 1.3953 LearningRate 0.0105 Epoch: 13 Global Step: 225670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:24:08,604-Speed 5138.69 samples/sec Loss 1.3362 LearningRate 0.0105 Epoch: 13 Global Step: 225680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:24:10,567-Speed 5218.56 samples/sec Loss 1.4034 LearningRate 0.0105 Epoch: 13 Global Step: 225690 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:24:12,539-Speed 5192.81 samples/sec Loss 1.3346 LearningRate 0.0105 Epoch: 13 Global Step: 225700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:24:14,543-Speed 5114.49 samples/sec Loss 1.3674 LearningRate 0.0105 Epoch: 13 Global Step: 225710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:24:16,534-Speed 5142.59 samples/sec Loss 1.3745 LearningRate 0.0105 Epoch: 13 Global Step: 225720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:24:18,512-Speed 5180.51 samples/sec Loss 1.4052 LearningRate 0.0105 Epoch: 13 Global Step: 225730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:24:20,482-Speed 5199.45 samples/sec Loss 1.3362 LearningRate 0.0105 Epoch: 13 Global Step: 225740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:24:22,465-Speed 5165.44 samples/sec Loss 1.3593 LearningRate 0.0105 Epoch: 13 Global Step: 225750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:24:24,451-Speed 5155.82 samples/sec Loss 1.3501 LearningRate 0.0105 Epoch: 13 Global Step: 225760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:24:26,463-Speed 5091.81 samples/sec Loss 1.3743 LearningRate 0.0105 Epoch: 13 Global Step: 225770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:24:28,442-Speed 5177.97 samples/sec Loss 1.3838 LearningRate 0.0105 Epoch: 13 Global Step: 225780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:24:30,427-Speed 5160.04 samples/sec Loss 1.3901 LearningRate 0.0105 Epoch: 13 Global Step: 225790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:24:32,391-Speed 5214.67 samples/sec Loss 1.3256 LearningRate 0.0105 Epoch: 13 Global Step: 225800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:24:34,371-Speed 5173.37 samples/sec Loss 1.4337 LearningRate 0.0105 Epoch: 13 Global Step: 225810 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:24:36,343-Speed 5195.61 samples/sec Loss 1.3595 LearningRate 0.0105 Epoch: 13 Global Step: 225820 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:24:38,335-Speed 5140.92 samples/sec Loss 1.3990 LearningRate 0.0105 Epoch: 13 Global Step: 225830 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:24:40,310-Speed 5187.94 samples/sec Loss 1.3650 LearningRate 0.0105 Epoch: 13 Global Step: 225840 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:24:42,279-Speed 5201.65 samples/sec Loss 1.3939 LearningRate 0.0105 Epoch: 13 Global Step: 225850 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:24:44,247-Speed 5205.26 samples/sec Loss 1.3638 LearningRate 0.0105 Epoch: 13 Global Step: 225860 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:24:46,241-Speed 5137.48 samples/sec Loss 1.3963 LearningRate 0.0105 Epoch: 13 Global Step: 225870 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:24:48,247-Speed 5107.16 samples/sec Loss 1.3594 LearningRate 0.0105 Epoch: 13 Global Step: 225880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:24:50,249-Speed 5114.84 samples/sec Loss 1.3752 LearningRate 0.0105 Epoch: 13 Global Step: 225890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:24:52,233-Speed 5163.33 samples/sec Loss 1.3752 LearningRate 0.0105 Epoch: 13 Global Step: 225900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:24:54,204-Speed 5197.81 samples/sec Loss 1.4220 LearningRate 0.0104 Epoch: 13 Global Step: 225910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:24:56,170-Speed 5209.88 samples/sec Loss 1.3314 LearningRate 0.0104 Epoch: 13 Global Step: 225920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:24:58,135-Speed 5214.37 samples/sec Loss 1.4071 LearningRate 0.0104 Epoch: 13 Global Step: 225930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:25:00,114-Speed 5175.06 samples/sec Loss 1.3796 LearningRate 0.0104 Epoch: 13 Global Step: 225940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:25:02,084-Speed 5198.10 samples/sec Loss 1.3437 LearningRate 0.0104 Epoch: 13 Global Step: 225950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:25:04,053-Speed 5204.53 samples/sec Loss 1.3844 LearningRate 0.0104 Epoch: 13 Global Step: 225960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:25:06,047-Speed 5135.88 samples/sec Loss 1.3859 LearningRate 0.0104 Epoch: 13 Global Step: 225970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:25:08,029-Speed 5169.96 samples/sec Loss 1.3505 LearningRate 0.0104 Epoch: 13 Global Step: 225980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:25:09,998-Speed 5200.79 samples/sec Loss 1.3770 LearningRate 0.0104 Epoch: 13 Global Step: 225990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:25:11,984-Speed 5158.54 samples/sec Loss 1.3832 LearningRate 0.0104 Epoch: 13 Global Step: 226000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:25:38,620-[lfw][226000]XNorm: 22.386542 Training: 2022-04-11 14:25:38,621-[lfw][226000]Accuracy-Flip: 0.99817+-0.00229 Training: 2022-04-11 14:25:38,621-[lfw][226000]Accuracy-Highest: 0.99833 Training: 2022-04-11 14:26:09,620-[cfp_fp][226000]XNorm: 21.614340 Training: 2022-04-11 14:26:09,620-[cfp_fp][226000]Accuracy-Flip: 0.98800+-0.00395 Training: 2022-04-11 14:26:09,621-[cfp_fp][226000]Accuracy-Highest: 0.98800 Training: 2022-04-11 14:26:36,503-[agedb_30][226000]XNorm: 22.827590 Training: 2022-04-11 14:26:36,504-[agedb_30][226000]Accuracy-Flip: 0.98217+-0.00806 Training: 2022-04-11 14:26:36,504-[agedb_30][226000]Accuracy-Highest: 0.98250 Training: 2022-04-11 14:26:38,517-Speed 118.34 samples/sec Loss 1.3886 LearningRate 0.0104 Epoch: 13 Global Step: 226010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:26:40,492-Speed 5185.87 samples/sec Loss 1.4182 LearningRate 0.0104 Epoch: 13 Global Step: 226020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:26:42,459-Speed 5206.99 samples/sec Loss 1.3829 LearningRate 0.0104 Epoch: 13 Global Step: 226030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:26:44,427-Speed 5207.14 samples/sec Loss 1.3305 LearningRate 0.0104 Epoch: 13 Global Step: 226040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:26:46,417-Speed 5148.38 samples/sec Loss 1.3562 LearningRate 0.0104 Epoch: 13 Global Step: 226050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:26:48,392-Speed 5185.05 samples/sec Loss 1.3721 LearningRate 0.0104 Epoch: 13 Global Step: 226060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:26:50,388-Speed 5131.95 samples/sec Loss 1.4437 LearningRate 0.0104 Epoch: 13 Global Step: 226070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:26:52,365-Speed 5181.84 samples/sec Loss 1.3481 LearningRate 0.0104 Epoch: 13 Global Step: 226080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:26:54,345-Speed 5173.32 samples/sec Loss 1.3670 LearningRate 0.0104 Epoch: 13 Global Step: 226090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:26:56,314-Speed 5202.28 samples/sec Loss 1.3715 LearningRate 0.0104 Epoch: 13 Global Step: 226100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:26:58,285-Speed 5196.66 samples/sec Loss 1.3869 LearningRate 0.0104 Epoch: 13 Global Step: 226110 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:27:00,273-Speed 5151.09 samples/sec Loss 1.3329 LearningRate 0.0104 Epoch: 13 Global Step: 226120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:27:02,250-Speed 5183.38 samples/sec Loss 1.4042 LearningRate 0.0104 Epoch: 13 Global Step: 226130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:27:04,234-Speed 5162.89 samples/sec Loss 1.3746 LearningRate 0.0104 Epoch: 13 Global Step: 226140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:27:06,201-Speed 5207.16 samples/sec Loss 1.3907 LearningRate 0.0104 Epoch: 13 Global Step: 226150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:27:08,168-Speed 5207.31 samples/sec Loss 1.4122 LearningRate 0.0104 Epoch: 13 Global Step: 226160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:27:10,156-Speed 5152.61 samples/sec Loss 1.3889 LearningRate 0.0104 Epoch: 13 Global Step: 226170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:27:12,141-Speed 5160.37 samples/sec Loss 1.3719 LearningRate 0.0104 Epoch: 13 Global Step: 226180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:27:14,134-Speed 5140.65 samples/sec Loss 1.3830 LearningRate 0.0104 Epoch: 13 Global Step: 226190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:27:16,117-Speed 5164.41 samples/sec Loss 1.3389 LearningRate 0.0104 Epoch: 13 Global Step: 226200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:27:18,087-Speed 5199.44 samples/sec Loss 1.3358 LearningRate 0.0104 Epoch: 13 Global Step: 226210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:27:20,060-Speed 5194.09 samples/sec Loss 1.3775 LearningRate 0.0104 Epoch: 13 Global Step: 226220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:27:22,025-Speed 5211.79 samples/sec Loss 1.3744 LearningRate 0.0104 Epoch: 13 Global Step: 226230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:27:24,003-Speed 5179.85 samples/sec Loss 1.4283 LearningRate 0.0104 Epoch: 13 Global Step: 226240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:27:25,978-Speed 5186.39 samples/sec Loss 1.3764 LearningRate 0.0104 Epoch: 13 Global Step: 226250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:27:27,955-Speed 5181.38 samples/sec Loss 1.4473 LearningRate 0.0104 Epoch: 13 Global Step: 226260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:27:29,926-Speed 5195.75 samples/sec Loss 1.4135 LearningRate 0.0104 Epoch: 13 Global Step: 226270 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:27:31,898-Speed 5195.24 samples/sec Loss 1.3699 LearningRate 0.0104 Epoch: 13 Global Step: 226280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:27:33,871-Speed 5191.41 samples/sec Loss 1.3659 LearningRate 0.0104 Epoch: 13 Global Step: 226290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:27:35,850-Speed 5176.73 samples/sec Loss 1.4855 LearningRate 0.0104 Epoch: 13 Global Step: 226300 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:27:37,838-Speed 5153.47 samples/sec Loss 1.3525 LearningRate 0.0104 Epoch: 13 Global Step: 226310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:27:39,819-Speed 5168.82 samples/sec Loss 1.3899 LearningRate 0.0104 Epoch: 13 Global Step: 226320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:27:41,814-Speed 5135.19 samples/sec Loss 1.3823 LearningRate 0.0104 Epoch: 13 Global Step: 226330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:27:43,801-Speed 5154.01 samples/sec Loss 1.3964 LearningRate 0.0104 Epoch: 13 Global Step: 226340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:27:45,791-Speed 5149.71 samples/sec Loss 1.3639 LearningRate 0.0104 Epoch: 13 Global Step: 226350 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:27:47,779-Speed 5153.66 samples/sec Loss 1.3976 LearningRate 0.0104 Epoch: 13 Global Step: 226360 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:27:49,749-Speed 5199.58 samples/sec Loss 1.3442 LearningRate 0.0104 Epoch: 13 Global Step: 226370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:27:51,737-Speed 5151.97 samples/sec Loss 1.3807 LearningRate 0.0104 Epoch: 13 Global Step: 226380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:27:53,715-Speed 5177.94 samples/sec Loss 1.3956 LearningRate 0.0104 Epoch: 13 Global Step: 226390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:27:55,697-Speed 5170.23 samples/sec Loss 1.3916 LearningRate 0.0104 Epoch: 13 Global Step: 226400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:27:57,711-Speed 5083.75 samples/sec Loss 1.3323 LearningRate 0.0104 Epoch: 13 Global Step: 226410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:27:59,707-Speed 5133.95 samples/sec Loss 1.4060 LearningRate 0.0104 Epoch: 13 Global Step: 226420 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:28:01,678-Speed 5195.12 samples/sec Loss 1.3901 LearningRate 0.0103 Epoch: 13 Global Step: 226430 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:28:03,739-Speed 4970.01 samples/sec Loss 1.4012 LearningRate 0.0103 Epoch: 13 Global Step: 226440 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:28:05,746-Speed 5105.57 samples/sec Loss 1.2999 LearningRate 0.0103 Epoch: 13 Global Step: 226450 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:28:07,712-Speed 5209.20 samples/sec Loss 1.3959 LearningRate 0.0103 Epoch: 13 Global Step: 226460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:28:09,693-Speed 5171.75 samples/sec Loss 1.3675 LearningRate 0.0103 Epoch: 13 Global Step: 226470 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:28:11,695-Speed 5116.44 samples/sec Loss 1.3941 LearningRate 0.0103 Epoch: 13 Global Step: 226480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:28:13,687-Speed 5140.96 samples/sec Loss 1.3481 LearningRate 0.0103 Epoch: 13 Global Step: 226490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:28:15,653-Speed 5211.28 samples/sec Loss 1.3400 LearningRate 0.0103 Epoch: 13 Global Step: 226500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:28:17,622-Speed 5203.71 samples/sec Loss 1.3633 LearningRate 0.0103 Epoch: 13 Global Step: 226510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:28:19,587-Speed 5212.92 samples/sec Loss 1.3950 LearningRate 0.0103 Epoch: 13 Global Step: 226520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:28:21,563-Speed 5181.88 samples/sec Loss 1.3440 LearningRate 0.0103 Epoch: 13 Global Step: 226530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:28:23,550-Speed 5156.82 samples/sec Loss 1.3715 LearningRate 0.0103 Epoch: 13 Global Step: 226540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:28:25,537-Speed 5155.11 samples/sec Loss 1.4156 LearningRate 0.0103 Epoch: 13 Global Step: 226550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:28:27,526-Speed 5150.40 samples/sec Loss 1.3553 LearningRate 0.0103 Epoch: 13 Global Step: 226560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:28:29,498-Speed 5193.33 samples/sec Loss 1.3563 LearningRate 0.0103 Epoch: 13 Global Step: 226570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:28:31,469-Speed 5198.09 samples/sec Loss 1.4060 LearningRate 0.0103 Epoch: 13 Global Step: 226580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:28:33,438-Speed 5202.31 samples/sec Loss 1.4253 LearningRate 0.0103 Epoch: 13 Global Step: 226590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:28:35,425-Speed 5154.18 samples/sec Loss 1.3787 LearningRate 0.0103 Epoch: 13 Global Step: 226600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:28:37,395-Speed 5199.56 samples/sec Loss 1.4028 LearningRate 0.0103 Epoch: 13 Global Step: 226610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:28:39,354-Speed 5228.39 samples/sec Loss 1.3627 LearningRate 0.0103 Epoch: 13 Global Step: 226620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:28:41,318-Speed 5217.80 samples/sec Loss 1.3735 LearningRate 0.0103 Epoch: 13 Global Step: 226630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:28:43,281-Speed 5216.62 samples/sec Loss 1.3815 LearningRate 0.0103 Epoch: 13 Global Step: 226640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:28:45,247-Speed 5210.32 samples/sec Loss 1.3719 LearningRate 0.0103 Epoch: 13 Global Step: 226650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:28:47,224-Speed 5182.82 samples/sec Loss 1.3957 LearningRate 0.0103 Epoch: 13 Global Step: 226660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:28:49,193-Speed 5200.60 samples/sec Loss 1.4120 LearningRate 0.0103 Epoch: 13 Global Step: 226670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:28:51,166-Speed 5193.82 samples/sec Loss 1.3980 LearningRate 0.0103 Epoch: 13 Global Step: 226680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:28:53,131-Speed 5213.84 samples/sec Loss 1.3514 LearningRate 0.0103 Epoch: 13 Global Step: 226690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:28:55,097-Speed 5208.30 samples/sec Loss 1.4271 LearningRate 0.0103 Epoch: 13 Global Step: 226700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:28:57,053-Speed 5236.49 samples/sec Loss 1.3857 LearningRate 0.0103 Epoch: 13 Global Step: 226710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:28:59,029-Speed 5184.97 samples/sec Loss 1.3881 LearningRate 0.0103 Epoch: 13 Global Step: 226720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:29:00,992-Speed 5217.78 samples/sec Loss 1.3214 LearningRate 0.0103 Epoch: 13 Global Step: 226730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:29:02,979-Speed 5153.95 samples/sec Loss 1.3656 LearningRate 0.0103 Epoch: 13 Global Step: 226740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:29:04,981-Speed 5118.38 samples/sec Loss 1.3553 LearningRate 0.0103 Epoch: 13 Global Step: 226750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:29:06,943-Speed 5221.33 samples/sec Loss 1.3678 LearningRate 0.0103 Epoch: 13 Global Step: 226760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:29:08,909-Speed 5209.51 samples/sec Loss 1.3996 LearningRate 0.0103 Epoch: 13 Global Step: 226770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:29:10,881-Speed 5194.94 samples/sec Loss 1.3794 LearningRate 0.0103 Epoch: 13 Global Step: 226780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:29:12,845-Speed 5216.16 samples/sec Loss 1.4118 LearningRate 0.0103 Epoch: 13 Global Step: 226790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:29:14,816-Speed 5195.73 samples/sec Loss 1.3841 LearningRate 0.0103 Epoch: 13 Global Step: 226800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:29:16,798-Speed 5170.02 samples/sec Loss 1.3765 LearningRate 0.0103 Epoch: 13 Global Step: 226810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:29:18,774-Speed 5184.25 samples/sec Loss 1.4539 LearningRate 0.0103 Epoch: 13 Global Step: 226820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:29:20,747-Speed 5190.28 samples/sec Loss 1.3851 LearningRate 0.0103 Epoch: 13 Global Step: 226830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:29:22,712-Speed 5211.89 samples/sec Loss 1.4284 LearningRate 0.0103 Epoch: 13 Global Step: 226840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:29:24,687-Speed 5187.11 samples/sec Loss 1.3543 LearningRate 0.0103 Epoch: 13 Global Step: 226850 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:29:26,653-Speed 5211.57 samples/sec Loss 1.3876 LearningRate 0.0103 Epoch: 13 Global Step: 226860 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:29:28,630-Speed 5180.22 samples/sec Loss 1.3721 LearningRate 0.0103 Epoch: 13 Global Step: 226870 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:29:30,617-Speed 5154.89 samples/sec Loss 1.3806 LearningRate 0.0103 Epoch: 13 Global Step: 226880 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:29:32,584-Speed 5209.33 samples/sec Loss 1.3551 LearningRate 0.0103 Epoch: 13 Global Step: 226890 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:29:34,563-Speed 5175.28 samples/sec Loss 1.3619 LearningRate 0.0103 Epoch: 13 Global Step: 226900 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:29:36,551-Speed 5152.65 samples/sec Loss 1.4331 LearningRate 0.0103 Epoch: 13 Global Step: 226910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:29:38,539-Speed 5154.24 samples/sec Loss 1.3770 LearningRate 0.0103 Epoch: 13 Global Step: 226920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:29:40,522-Speed 5165.78 samples/sec Loss 1.3932 LearningRate 0.0103 Epoch: 13 Global Step: 226930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:29:42,493-Speed 5197.06 samples/sec Loss 1.3656 LearningRate 0.0103 Epoch: 13 Global Step: 226940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:29:44,455-Speed 5219.88 samples/sec Loss 1.4176 LearningRate 0.0102 Epoch: 13 Global Step: 226950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:29:46,418-Speed 5218.90 samples/sec Loss 1.3882 LearningRate 0.0102 Epoch: 13 Global Step: 226960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:29:48,387-Speed 5201.74 samples/sec Loss 1.3704 LearningRate 0.0102 Epoch: 13 Global Step: 226970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:29:50,349-Speed 5220.82 samples/sec Loss 1.4001 LearningRate 0.0102 Epoch: 13 Global Step: 226980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:29:52,325-Speed 5183.00 samples/sec Loss 1.4340 LearningRate 0.0102 Epoch: 13 Global Step: 226990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:29:54,300-Speed 5187.87 samples/sec Loss 1.3687 LearningRate 0.0102 Epoch: 13 Global Step: 227000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:29:56,278-Speed 5179.48 samples/sec Loss 1.3863 LearningRate 0.0102 Epoch: 13 Global Step: 227010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:29:58,254-Speed 5183.44 samples/sec Loss 1.3263 LearningRate 0.0102 Epoch: 13 Global Step: 227020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:30:00,220-Speed 5212.04 samples/sec Loss 1.3802 LearningRate 0.0102 Epoch: 13 Global Step: 227030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:30:02,203-Speed 5165.40 samples/sec Loss 1.3381 LearningRate 0.0102 Epoch: 13 Global Step: 227040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:30:04,190-Speed 5155.42 samples/sec Loss 1.4276 LearningRate 0.0102 Epoch: 13 Global Step: 227050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:30:06,164-Speed 5188.84 samples/sec Loss 1.3930 LearningRate 0.0102 Epoch: 13 Global Step: 227060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:30:08,121-Speed 5232.63 samples/sec Loss 1.3958 LearningRate 0.0102 Epoch: 13 Global Step: 227070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:30:10,084-Speed 5218.16 samples/sec Loss 1.3644 LearningRate 0.0102 Epoch: 13 Global Step: 227080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:30:12,056-Speed 5195.97 samples/sec Loss 1.3982 LearningRate 0.0102 Epoch: 13 Global Step: 227090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:30:14,025-Speed 5201.63 samples/sec Loss 1.3809 LearningRate 0.0102 Epoch: 13 Global Step: 227100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:30:15,996-Speed 5196.81 samples/sec Loss 1.3718 LearningRate 0.0102 Epoch: 13 Global Step: 227110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:30:17,964-Speed 5204.47 samples/sec Loss 1.3602 LearningRate 0.0102 Epoch: 13 Global Step: 227120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:30:19,930-Speed 5212.18 samples/sec Loss 1.4089 LearningRate 0.0102 Epoch: 13 Global Step: 227130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:30:21,902-Speed 5193.50 samples/sec Loss 1.3705 LearningRate 0.0102 Epoch: 13 Global Step: 227140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:30:23,883-Speed 5170.60 samples/sec Loss 1.4109 LearningRate 0.0102 Epoch: 13 Global Step: 227150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:30:25,868-Speed 5160.89 samples/sec Loss 1.3547 LearningRate 0.0102 Epoch: 13 Global Step: 227160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:30:27,855-Speed 5155.18 samples/sec Loss 1.4002 LearningRate 0.0102 Epoch: 13 Global Step: 227170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:30:29,815-Speed 5226.03 samples/sec Loss 1.3949 LearningRate 0.0102 Epoch: 13 Global Step: 227180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:30:31,778-Speed 5218.06 samples/sec Loss 1.3830 LearningRate 0.0102 Epoch: 13 Global Step: 227190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:30:33,812-Speed 5036.11 samples/sec Loss 1.4004 LearningRate 0.0102 Epoch: 13 Global Step: 227200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:30:35,812-Speed 5122.19 samples/sec Loss 1.3239 LearningRate 0.0102 Epoch: 13 Global Step: 227210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:30:37,800-Speed 5152.19 samples/sec Loss 1.3611 LearningRate 0.0102 Epoch: 13 Global Step: 227220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:30:39,766-Speed 5211.03 samples/sec Loss 1.4259 LearningRate 0.0102 Epoch: 13 Global Step: 227230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:30:41,733-Speed 5207.81 samples/sec Loss 1.3646 LearningRate 0.0102 Epoch: 13 Global Step: 227240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:30:43,719-Speed 5158.16 samples/sec Loss 1.3951 LearningRate 0.0102 Epoch: 13 Global Step: 227250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:30:45,698-Speed 5177.04 samples/sec Loss 1.4069 LearningRate 0.0102 Epoch: 13 Global Step: 227260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:30:47,663-Speed 5211.39 samples/sec Loss 1.3706 LearningRate 0.0102 Epoch: 13 Global Step: 227270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:30:49,652-Speed 5151.57 samples/sec Loss 1.3931 LearningRate 0.0102 Epoch: 13 Global Step: 227280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:30:51,618-Speed 5208.16 samples/sec Loss 1.4002 LearningRate 0.0102 Epoch: 13 Global Step: 227290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:30:53,592-Speed 5190.11 samples/sec Loss 1.3747 LearningRate 0.0102 Epoch: 13 Global Step: 227300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:30:55,572-Speed 5174.78 samples/sec Loss 1.4115 LearningRate 0.0102 Epoch: 13 Global Step: 227310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:30:57,544-Speed 5194.03 samples/sec Loss 1.3963 LearningRate 0.0102 Epoch: 13 Global Step: 227320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:30:59,519-Speed 5186.10 samples/sec Loss 1.3764 LearningRate 0.0102 Epoch: 13 Global Step: 227330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:31:01,490-Speed 5196.60 samples/sec Loss 1.3878 LearningRate 0.0102 Epoch: 13 Global Step: 227340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:31:03,454-Speed 5215.04 samples/sec Loss 1.3368 LearningRate 0.0102 Epoch: 13 Global Step: 227350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:31:05,457-Speed 5115.70 samples/sec Loss 1.3856 LearningRate 0.0102 Epoch: 13 Global Step: 227360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:31:07,422-Speed 5211.82 samples/sec Loss 1.3296 LearningRate 0.0102 Epoch: 13 Global Step: 227370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:31:09,391-Speed 5203.06 samples/sec Loss 1.3646 LearningRate 0.0102 Epoch: 13 Global Step: 227380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:31:11,403-Speed 5089.81 samples/sec Loss 1.3594 LearningRate 0.0102 Epoch: 13 Global Step: 227390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:31:13,380-Speed 5182.98 samples/sec Loss 1.3967 LearningRate 0.0102 Epoch: 13 Global Step: 227400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:31:15,389-Speed 5097.77 samples/sec Loss 1.3815 LearningRate 0.0102 Epoch: 13 Global Step: 227410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:31:17,379-Speed 5147.93 samples/sec Loss 1.4012 LearningRate 0.0102 Epoch: 13 Global Step: 227420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:31:19,341-Speed 5221.45 samples/sec Loss 1.3885 LearningRate 0.0102 Epoch: 13 Global Step: 227430 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:31:21,338-Speed 5128.50 samples/sec Loss 1.3988 LearningRate 0.0102 Epoch: 13 Global Step: 227440 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:31:23,308-Speed 5201.99 samples/sec Loss 1.4356 LearningRate 0.0102 Epoch: 13 Global Step: 227450 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:31:25,323-Speed 5082.17 samples/sec Loss 1.4095 LearningRate 0.0102 Epoch: 13 Global Step: 227460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:31:27,309-Speed 5157.49 samples/sec Loss 1.3471 LearningRate 0.0101 Epoch: 13 Global Step: 227470 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:31:29,285-Speed 5182.98 samples/sec Loss 1.3181 LearningRate 0.0101 Epoch: 13 Global Step: 227480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:31:31,250-Speed 5213.56 samples/sec Loss 1.3963 LearningRate 0.0101 Epoch: 13 Global Step: 227490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:31:33,218-Speed 5204.63 samples/sec Loss 1.3767 LearningRate 0.0101 Epoch: 13 Global Step: 227500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:31:35,226-Speed 5101.63 samples/sec Loss 1.3653 LearningRate 0.0101 Epoch: 13 Global Step: 227510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:31:37,214-Speed 5152.73 samples/sec Loss 1.3342 LearningRate 0.0101 Epoch: 13 Global Step: 227520 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:31:39,213-Speed 5123.68 samples/sec Loss 1.4114 LearningRate 0.0101 Epoch: 13 Global Step: 227530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:31:41,208-Speed 5133.83 samples/sec Loss 1.3784 LearningRate 0.0101 Epoch: 13 Global Step: 227540 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:31:43,177-Speed 5204.86 samples/sec Loss 1.3746 LearningRate 0.0101 Epoch: 13 Global Step: 227550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:31:45,141-Speed 5215.42 samples/sec Loss 1.4077 LearningRate 0.0101 Epoch: 13 Global Step: 227560 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:31:47,108-Speed 5208.62 samples/sec Loss 1.3915 LearningRate 0.0101 Epoch: 13 Global Step: 227570 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:31:49,085-Speed 5181.65 samples/sec Loss 1.3990 LearningRate 0.0101 Epoch: 13 Global Step: 227580 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:31:51,055-Speed 5199.23 samples/sec Loss 1.3679 LearningRate 0.0101 Epoch: 13 Global Step: 227590 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:31:53,030-Speed 5184.75 samples/sec Loss 1.3745 LearningRate 0.0101 Epoch: 13 Global Step: 227600 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:31:54,999-Speed 5202.91 samples/sec Loss 1.3824 LearningRate 0.0101 Epoch: 13 Global Step: 227610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:31:56,968-Speed 5203.03 samples/sec Loss 1.3749 LearningRate 0.0101 Epoch: 13 Global Step: 227620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:31:58,951-Speed 5165.38 samples/sec Loss 1.4384 LearningRate 0.0101 Epoch: 13 Global Step: 227630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:32:00,944-Speed 5140.25 samples/sec Loss 1.3910 LearningRate 0.0101 Epoch: 13 Global Step: 227640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:32:02,942-Speed 5124.77 samples/sec Loss 1.4461 LearningRate 0.0101 Epoch: 13 Global Step: 227650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:32:04,927-Speed 5162.13 samples/sec Loss 1.3969 LearningRate 0.0101 Epoch: 13 Global Step: 227660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:32:06,887-Speed 5225.48 samples/sec Loss 1.3859 LearningRate 0.0101 Epoch: 13 Global Step: 227670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:32:08,857-Speed 5199.89 samples/sec Loss 1.3555 LearningRate 0.0101 Epoch: 13 Global Step: 227680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:32:10,835-Speed 5178.29 samples/sec Loss 1.4590 LearningRate 0.0101 Epoch: 13 Global Step: 227690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:32:12,802-Speed 5207.27 samples/sec Loss 1.4414 LearningRate 0.0101 Epoch: 13 Global Step: 227700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:32:14,772-Speed 5199.32 samples/sec Loss 1.3757 LearningRate 0.0101 Epoch: 13 Global Step: 227710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:32:16,752-Speed 5175.43 samples/sec Loss 1.4125 LearningRate 0.0101 Epoch: 13 Global Step: 227720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:32:18,739-Speed 5153.57 samples/sec Loss 1.3805 LearningRate 0.0101 Epoch: 13 Global Step: 227730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:32:20,706-Speed 5209.62 samples/sec Loss 1.4036 LearningRate 0.0101 Epoch: 13 Global Step: 227740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:32:22,681-Speed 5185.89 samples/sec Loss 1.3868 LearningRate 0.0101 Epoch: 13 Global Step: 227750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:32:24,669-Speed 5153.69 samples/sec Loss 1.3586 LearningRate 0.0101 Epoch: 13 Global Step: 227760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:32:26,652-Speed 5164.57 samples/sec Loss 1.4292 LearningRate 0.0101 Epoch: 13 Global Step: 227770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:32:28,649-Speed 5131.26 samples/sec Loss 1.3967 LearningRate 0.0101 Epoch: 13 Global Step: 227780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:32:30,626-Speed 5180.30 samples/sec Loss 1.3578 LearningRate 0.0101 Epoch: 13 Global Step: 227790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:32:32,595-Speed 5200.96 samples/sec Loss 1.4112 LearningRate 0.0101 Epoch: 13 Global Step: 227800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:32:34,571-Speed 5183.60 samples/sec Loss 1.4232 LearningRate 0.0101 Epoch: 13 Global Step: 227810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:32:36,590-Speed 5075.32 samples/sec Loss 1.4349 LearningRate 0.0101 Epoch: 13 Global Step: 227820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:32:38,588-Speed 5127.50 samples/sec Loss 1.3017 LearningRate 0.0101 Epoch: 13 Global Step: 227830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:32:40,557-Speed 5200.53 samples/sec Loss 1.3927 LearningRate 0.0101 Epoch: 13 Global Step: 227840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:32:42,540-Speed 5166.20 samples/sec Loss 1.4004 LearningRate 0.0101 Epoch: 13 Global Step: 227850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:32:44,530-Speed 5146.42 samples/sec Loss 1.3481 LearningRate 0.0101 Epoch: 13 Global Step: 227860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:32:46,533-Speed 5113.65 samples/sec Loss 1.3612 LearningRate 0.0101 Epoch: 13 Global Step: 227870 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:32:48,536-Speed 5114.90 samples/sec Loss 1.4104 LearningRate 0.0101 Epoch: 13 Global Step: 227880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:32:50,526-Speed 5147.09 samples/sec Loss 1.3713 LearningRate 0.0101 Epoch: 13 Global Step: 227890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:32:52,496-Speed 5200.71 samples/sec Loss 1.4207 LearningRate 0.0101 Epoch: 13 Global Step: 227900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:32:54,466-Speed 5200.03 samples/sec Loss 1.4108 LearningRate 0.0101 Epoch: 13 Global Step: 227910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:32:56,435-Speed 5201.09 samples/sec Loss 1.4155 LearningRate 0.0101 Epoch: 13 Global Step: 227920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:32:58,405-Speed 5200.51 samples/sec Loss 1.4144 LearningRate 0.0101 Epoch: 13 Global Step: 227930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:33:00,411-Speed 5105.83 samples/sec Loss 1.4108 LearningRate 0.0101 Epoch: 13 Global Step: 227940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:33:02,423-Speed 5091.40 samples/sec Loss 1.3769 LearningRate 0.0101 Epoch: 13 Global Step: 227950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:33:04,409-Speed 5158.57 samples/sec Loss 1.3880 LearningRate 0.0101 Epoch: 13 Global Step: 227960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:33:06,388-Speed 5176.11 samples/sec Loss 1.3529 LearningRate 0.0101 Epoch: 13 Global Step: 227970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:33:08,359-Speed 5196.86 samples/sec Loss 1.3641 LearningRate 0.0101 Epoch: 13 Global Step: 227980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:33:10,358-Speed 5124.97 samples/sec Loss 1.4585 LearningRate 0.0101 Epoch: 13 Global Step: 227990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:33:12,388-Speed 5045.54 samples/sec Loss 1.3801 LearningRate 0.0100 Epoch: 13 Global Step: 228000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:33:39,076-[lfw][228000]XNorm: 21.539478 Training: 2022-04-11 14:33:39,076-[lfw][228000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-04-11 14:33:39,077-[lfw][228000]Accuracy-Highest: 0.99833 Training: 2022-04-11 14:34:09,725-[cfp_fp][228000]XNorm: 20.700428 Training: 2022-04-11 14:34:09,726-[cfp_fp][228000]Accuracy-Flip: 0.98729+-0.00548 Training: 2022-04-11 14:34:09,726-[cfp_fp][228000]Accuracy-Highest: 0.98800 Training: 2022-04-11 14:34:36,168-[agedb_30][228000]XNorm: 22.070685 Training: 2022-04-11 14:34:36,168-[agedb_30][228000]Accuracy-Flip: 0.98200+-0.00884 Training: 2022-04-11 14:34:36,169-[agedb_30][228000]Accuracy-Highest: 0.98250 Training: 2022-04-11 14:34:38,154-Speed 119.40 samples/sec Loss 1.3653 LearningRate 0.0100 Epoch: 13 Global Step: 228010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:34:40,148-Speed 5136.79 samples/sec Loss 1.3469 LearningRate 0.0100 Epoch: 13 Global Step: 228020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:34:42,140-Speed 5141.31 samples/sec Loss 1.4065 LearningRate 0.0100 Epoch: 13 Global Step: 228030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:34:44,108-Speed 5206.98 samples/sec Loss 1.4028 LearningRate 0.0100 Epoch: 13 Global Step: 228040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:34:46,081-Speed 5191.76 samples/sec Loss 1.3710 LearningRate 0.0100 Epoch: 13 Global Step: 228050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:34:48,081-Speed 5121.75 samples/sec Loss 1.4060 LearningRate 0.0100 Epoch: 13 Global Step: 228060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:34:50,109-Speed 5050.74 samples/sec Loss 1.4097 LearningRate 0.0100 Epoch: 13 Global Step: 228070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:34:52,130-Speed 5067.43 samples/sec Loss 1.3582 LearningRate 0.0100 Epoch: 13 Global Step: 228080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:34:54,090-Speed 5226.46 samples/sec Loss 1.4269 LearningRate 0.0100 Epoch: 13 Global Step: 228090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:34:56,058-Speed 5204.98 samples/sec Loss 1.4197 LearningRate 0.0100 Epoch: 13 Global Step: 228100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:34:58,029-Speed 5197.86 samples/sec Loss 1.3975 LearningRate 0.0100 Epoch: 13 Global Step: 228110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:35:00,006-Speed 5180.89 samples/sec Loss 1.4149 LearningRate 0.0100 Epoch: 13 Global Step: 228120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:35:01,979-Speed 5193.28 samples/sec Loss 1.4124 LearningRate 0.0100 Epoch: 13 Global Step: 228130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:35:03,961-Speed 5167.35 samples/sec Loss 1.3694 LearningRate 0.0100 Epoch: 13 Global Step: 228140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:35:05,947-Speed 5157.71 samples/sec Loss 1.4315 LearningRate 0.0100 Epoch: 13 Global Step: 228150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:35:07,934-Speed 5156.03 samples/sec Loss 1.3759 LearningRate 0.0100 Epoch: 13 Global Step: 228160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:35:09,913-Speed 5174.23 samples/sec Loss 1.3736 LearningRate 0.0100 Epoch: 13 Global Step: 228170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:35:11,916-Speed 5114.44 samples/sec Loss 1.3962 LearningRate 0.0100 Epoch: 13 Global Step: 228180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:35:13,910-Speed 5138.27 samples/sec Loss 1.4051 LearningRate 0.0100 Epoch: 13 Global Step: 228190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:35:15,900-Speed 5148.10 samples/sec Loss 1.4477 LearningRate 0.0100 Epoch: 13 Global Step: 228200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:35:17,880-Speed 5170.84 samples/sec Loss 1.4252 LearningRate 0.0100 Epoch: 13 Global Step: 228210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:35:19,870-Speed 5150.31 samples/sec Loss 1.3464 LearningRate 0.0100 Epoch: 13 Global Step: 228220 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:35:21,848-Speed 5177.16 samples/sec Loss 1.4627 LearningRate 0.0100 Epoch: 13 Global Step: 228230 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:35:23,826-Speed 5180.47 samples/sec Loss 1.4165 LearningRate 0.0100 Epoch: 13 Global Step: 228240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:35:25,794-Speed 5203.37 samples/sec Loss 1.4201 LearningRate 0.0100 Epoch: 13 Global Step: 228250 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:35:27,777-Speed 5166.88 samples/sec Loss 1.4028 LearningRate 0.0100 Epoch: 13 Global Step: 228260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:35:29,755-Speed 5178.99 samples/sec Loss 1.4122 LearningRate 0.0100 Epoch: 13 Global Step: 228270 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:35:31,732-Speed 5179.11 samples/sec Loss 1.3684 LearningRate 0.0100 Epoch: 13 Global Step: 228280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:35:33,724-Speed 5142.20 samples/sec Loss 1.3942 LearningRate 0.0100 Epoch: 13 Global Step: 228290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:35:35,755-Speed 5043.06 samples/sec Loss 1.3536 LearningRate 0.0100 Epoch: 13 Global Step: 228300 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:35:37,749-Speed 5139.58 samples/sec Loss 1.3723 LearningRate 0.0100 Epoch: 13 Global Step: 228310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:35:39,742-Speed 5138.15 samples/sec Loss 1.4189 LearningRate 0.0100 Epoch: 13 Global Step: 228320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:35:41,739-Speed 5129.75 samples/sec Loss 1.4247 LearningRate 0.0100 Epoch: 13 Global Step: 228330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:35:43,731-Speed 5143.03 samples/sec Loss 1.4159 LearningRate 0.0100 Epoch: 13 Global Step: 228340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:35:45,715-Speed 5162.50 samples/sec Loss 1.3992 LearningRate 0.0100 Epoch: 13 Global Step: 228350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:35:47,710-Speed 5136.52 samples/sec Loss 1.3501 LearningRate 0.0100 Epoch: 13 Global Step: 228360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:35:49,727-Speed 5076.37 samples/sec Loss 1.4084 LearningRate 0.0100 Epoch: 13 Global Step: 228370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:35:51,725-Speed 5126.18 samples/sec Loss 1.3411 LearningRate 0.0100 Epoch: 13 Global Step: 228380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:35:53,718-Speed 5140.76 samples/sec Loss 1.4091 LearningRate 0.0100 Epoch: 13 Global Step: 228390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:35:55,692-Speed 5190.91 samples/sec Loss 1.3581 LearningRate 0.0100 Epoch: 13 Global Step: 228400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:35:57,669-Speed 5179.36 samples/sec Loss 1.3855 LearningRate 0.0100 Epoch: 13 Global Step: 228410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:35:59,674-Speed 5110.14 samples/sec Loss 1.4333 LearningRate 0.0100 Epoch: 13 Global Step: 228420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:36:01,656-Speed 5167.72 samples/sec Loss 1.3770 LearningRate 0.0100 Epoch: 13 Global Step: 228430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:36:03,633-Speed 5182.25 samples/sec Loss 1.4093 LearningRate 0.0100 Epoch: 13 Global Step: 228440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:36:05,607-Speed 5189.39 samples/sec Loss 1.3636 LearningRate 0.0100 Epoch: 13 Global Step: 228450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:36:07,572-Speed 5213.33 samples/sec Loss 1.3612 LearningRate 0.0100 Epoch: 13 Global Step: 228460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:36:09,543-Speed 5196.71 samples/sec Loss 1.3651 LearningRate 0.0100 Epoch: 13 Global Step: 228470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:36:11,539-Speed 5131.92 samples/sec Loss 1.4288 LearningRate 0.0100 Epoch: 13 Global Step: 228480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:36:13,525-Speed 5157.05 samples/sec Loss 1.4193 LearningRate 0.0100 Epoch: 13 Global Step: 228490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:36:15,495-Speed 5200.77 samples/sec Loss 1.3884 LearningRate 0.0100 Epoch: 13 Global Step: 228500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:36:17,468-Speed 5191.60 samples/sec Loss 1.3271 LearningRate 0.0100 Epoch: 13 Global Step: 228510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:36:19,436-Speed 5203.43 samples/sec Loss 1.4045 LearningRate 0.0100 Epoch: 13 Global Step: 228520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:36:21,464-Speed 5051.14 samples/sec Loss 1.3919 LearningRate 0.0099 Epoch: 13 Global Step: 228530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:36:23,482-Speed 5078.04 samples/sec Loss 1.3776 LearningRate 0.0099 Epoch: 13 Global Step: 228540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:36:25,454-Speed 5195.16 samples/sec Loss 1.4009 LearningRate 0.0099 Epoch: 13 Global Step: 228550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:36:27,425-Speed 5195.83 samples/sec Loss 1.3869 LearningRate 0.0099 Epoch: 13 Global Step: 228560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:36:29,392-Speed 5208.43 samples/sec Loss 1.3876 LearningRate 0.0099 Epoch: 13 Global Step: 228570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:36:31,360-Speed 5203.55 samples/sec Loss 1.3937 LearningRate 0.0099 Epoch: 13 Global Step: 228580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:36:33,323-Speed 5217.98 samples/sec Loss 1.3487 LearningRate 0.0099 Epoch: 13 Global Step: 228590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:36:35,356-Speed 5039.78 samples/sec Loss 1.3966 LearningRate 0.0099 Epoch: 13 Global Step: 228600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:36:37,363-Speed 5102.98 samples/sec Loss 1.4211 LearningRate 0.0099 Epoch: 13 Global Step: 228610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:36:39,330-Speed 5208.91 samples/sec Loss 1.3806 LearningRate 0.0099 Epoch: 13 Global Step: 228620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:36:41,313-Speed 5163.13 samples/sec Loss 1.3514 LearningRate 0.0099 Epoch: 13 Global Step: 228630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:36:43,284-Speed 5198.76 samples/sec Loss 1.3866 LearningRate 0.0099 Epoch: 13 Global Step: 228640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:36:45,285-Speed 5118.74 samples/sec Loss 1.3638 LearningRate 0.0099 Epoch: 13 Global Step: 228650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:36:47,275-Speed 5149.36 samples/sec Loss 1.3903 LearningRate 0.0099 Epoch: 13 Global Step: 228660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:36:49,270-Speed 5134.33 samples/sec Loss 1.3679 LearningRate 0.0099 Epoch: 13 Global Step: 228670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:36:51,290-Speed 5070.33 samples/sec Loss 1.3717 LearningRate 0.0099 Epoch: 13 Global Step: 228680 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:36:53,257-Speed 5207.46 samples/sec Loss 1.3852 LearningRate 0.0099 Epoch: 13 Global Step: 228690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:36:55,218-Speed 5224.15 samples/sec Loss 1.3998 LearningRate 0.0099 Epoch: 13 Global Step: 228700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:36:57,184-Speed 5210.91 samples/sec Loss 1.3618 LearningRate 0.0099 Epoch: 13 Global Step: 228710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:36:59,183-Speed 5122.04 samples/sec Loss 1.3372 LearningRate 0.0099 Epoch: 13 Global Step: 228720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:37:01,158-Speed 5186.42 samples/sec Loss 1.4023 LearningRate 0.0099 Epoch: 13 Global Step: 228730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:37:03,132-Speed 5188.75 samples/sec Loss 1.4084 LearningRate 0.0099 Epoch: 13 Global Step: 228740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:37:05,122-Speed 5148.50 samples/sec Loss 1.4202 LearningRate 0.0099 Epoch: 13 Global Step: 228750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:37:07,095-Speed 5192.05 samples/sec Loss 1.3835 LearningRate 0.0099 Epoch: 13 Global Step: 228760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:37:09,061-Speed 5212.31 samples/sec Loss 1.3825 LearningRate 0.0099 Epoch: 13 Global Step: 228770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:37:11,064-Speed 5113.60 samples/sec Loss 1.3721 LearningRate 0.0099 Epoch: 13 Global Step: 228780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:37:13,026-Speed 5220.26 samples/sec Loss 1.3559 LearningRate 0.0099 Epoch: 13 Global Step: 228790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:37:14,990-Speed 5215.20 samples/sec Loss 1.3910 LearningRate 0.0099 Epoch: 13 Global Step: 228800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:37:16,950-Speed 5225.60 samples/sec Loss 1.3939 LearningRate 0.0099 Epoch: 13 Global Step: 228810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:37:18,918-Speed 5207.02 samples/sec Loss 1.3898 LearningRate 0.0099 Epoch: 13 Global Step: 228820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:37:20,881-Speed 5215.68 samples/sec Loss 1.3301 LearningRate 0.0099 Epoch: 13 Global Step: 228830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:37:22,860-Speed 5176.51 samples/sec Loss 1.4257 LearningRate 0.0099 Epoch: 13 Global Step: 228840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:37:24,839-Speed 5177.98 samples/sec Loss 1.3684 LearningRate 0.0099 Epoch: 13 Global Step: 228850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:37:26,810-Speed 5195.58 samples/sec Loss 1.4094 LearningRate 0.0099 Epoch: 13 Global Step: 228860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:37:28,803-Speed 5139.60 samples/sec Loss 1.3777 LearningRate 0.0099 Epoch: 13 Global Step: 228870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:37:30,770-Speed 5207.68 samples/sec Loss 1.4136 LearningRate 0.0099 Epoch: 13 Global Step: 228880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:37:32,745-Speed 5187.13 samples/sec Loss 1.3818 LearningRate 0.0099 Epoch: 13 Global Step: 228890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:37:34,724-Speed 5176.25 samples/sec Loss 1.3364 LearningRate 0.0099 Epoch: 13 Global Step: 228900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:37:36,693-Speed 5202.87 samples/sec Loss 1.3604 LearningRate 0.0099 Epoch: 13 Global Step: 228910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:37:38,665-Speed 5195.47 samples/sec Loss 1.3693 LearningRate 0.0099 Epoch: 13 Global Step: 228920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:37:40,645-Speed 5171.32 samples/sec Loss 1.3743 LearningRate 0.0099 Epoch: 13 Global Step: 228930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:37:42,611-Speed 5210.66 samples/sec Loss 1.4067 LearningRate 0.0099 Epoch: 13 Global Step: 228940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:37:44,579-Speed 5204.44 samples/sec Loss 1.3966 LearningRate 0.0099 Epoch: 13 Global Step: 228950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:37:46,546-Speed 5209.79 samples/sec Loss 1.3792 LearningRate 0.0099 Epoch: 13 Global Step: 228960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:37:48,511-Speed 5213.00 samples/sec Loss 1.3658 LearningRate 0.0099 Epoch: 13 Global Step: 228970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:37:50,489-Speed 5179.67 samples/sec Loss 1.3849 LearningRate 0.0099 Epoch: 13 Global Step: 228980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:37:52,459-Speed 5197.82 samples/sec Loss 1.4224 LearningRate 0.0099 Epoch: 13 Global Step: 228990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:37:54,426-Speed 5208.90 samples/sec Loss 1.3402 LearningRate 0.0099 Epoch: 13 Global Step: 229000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:37:56,392-Speed 5208.36 samples/sec Loss 1.3610 LearningRate 0.0099 Epoch: 13 Global Step: 229010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:37:58,376-Speed 5164.09 samples/sec Loss 1.3826 LearningRate 0.0099 Epoch: 13 Global Step: 229020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:38:00,344-Speed 5204.97 samples/sec Loss 1.3923 LearningRate 0.0099 Epoch: 13 Global Step: 229030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:38:02,338-Speed 5137.26 samples/sec Loss 1.3726 LearningRate 0.0099 Epoch: 13 Global Step: 229040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:38:04,305-Speed 5205.69 samples/sec Loss 1.3557 LearningRate 0.0099 Epoch: 13 Global Step: 229050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:38:06,283-Speed 5180.57 samples/sec Loss 1.4371 LearningRate 0.0098 Epoch: 13 Global Step: 229060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:38:08,262-Speed 5175.08 samples/sec Loss 1.4053 LearningRate 0.0098 Epoch: 13 Global Step: 229070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:38:10,257-Speed 5135.06 samples/sec Loss 1.4173 LearningRate 0.0098 Epoch: 13 Global Step: 229080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:38:12,233-Speed 5185.11 samples/sec Loss 1.3844 LearningRate 0.0098 Epoch: 13 Global Step: 229090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:38:14,216-Speed 5165.41 samples/sec Loss 1.3682 LearningRate 0.0098 Epoch: 13 Global Step: 229100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:38:16,199-Speed 5166.64 samples/sec Loss 1.4154 LearningRate 0.0098 Epoch: 13 Global Step: 229110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:38:18,175-Speed 5181.81 samples/sec Loss 1.4451 LearningRate 0.0098 Epoch: 13 Global Step: 229120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:38:20,143-Speed 5205.90 samples/sec Loss 1.4164 LearningRate 0.0098 Epoch: 13 Global Step: 229130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:38:22,135-Speed 5143.19 samples/sec Loss 1.3722 LearningRate 0.0098 Epoch: 13 Global Step: 229140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:38:24,105-Speed 5198.74 samples/sec Loss 1.3872 LearningRate 0.0098 Epoch: 13 Global Step: 229150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:38:26,077-Speed 5194.83 samples/sec Loss 1.3963 LearningRate 0.0098 Epoch: 13 Global Step: 229160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:38:28,061-Speed 5161.44 samples/sec Loss 1.4262 LearningRate 0.0098 Epoch: 13 Global Step: 229170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:38:30,040-Speed 5176.85 samples/sec Loss 1.3561 LearningRate 0.0098 Epoch: 13 Global Step: 229180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:38:32,026-Speed 5159.76 samples/sec Loss 1.4249 LearningRate 0.0098 Epoch: 13 Global Step: 229190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:38:34,005-Speed 5173.97 samples/sec Loss 1.3652 LearningRate 0.0098 Epoch: 13 Global Step: 229200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:38:35,972-Speed 5208.76 samples/sec Loss 1.3896 LearningRate 0.0098 Epoch: 13 Global Step: 229210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:38:37,953-Speed 5171.38 samples/sec Loss 1.3732 LearningRate 0.0098 Epoch: 13 Global Step: 229220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:38:39,921-Speed 5204.54 samples/sec Loss 1.3773 LearningRate 0.0098 Epoch: 13 Global Step: 229230 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:38:41,896-Speed 5185.67 samples/sec Loss 1.3332 LearningRate 0.0098 Epoch: 13 Global Step: 229240 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:38:43,880-Speed 5164.77 samples/sec Loss 1.3756 LearningRate 0.0098 Epoch: 13 Global Step: 229250 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:38:45,878-Speed 5126.53 samples/sec Loss 1.3947 LearningRate 0.0098 Epoch: 13 Global Step: 229260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:38:47,897-Speed 5073.28 samples/sec Loss 1.3722 LearningRate 0.0098 Epoch: 13 Global Step: 229270 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:38:49,913-Speed 5081.84 samples/sec Loss 1.4275 LearningRate 0.0098 Epoch: 13 Global Step: 229280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:38:51,891-Speed 5179.62 samples/sec Loss 1.3925 LearningRate 0.0098 Epoch: 13 Global Step: 229290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:38:53,872-Speed 5169.53 samples/sec Loss 1.3678 LearningRate 0.0098 Epoch: 13 Global Step: 229300 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:38:55,839-Speed 5207.34 samples/sec Loss 1.3803 LearningRate 0.0098 Epoch: 13 Global Step: 229310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:38:57,833-Speed 5136.75 samples/sec Loss 1.3991 LearningRate 0.0098 Epoch: 13 Global Step: 229320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:38:59,830-Speed 5130.51 samples/sec Loss 1.3663 LearningRate 0.0098 Epoch: 13 Global Step: 229330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:39:01,791-Speed 5222.47 samples/sec Loss 1.4355 LearningRate 0.0098 Epoch: 13 Global Step: 229340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:39:03,770-Speed 5175.71 samples/sec Loss 1.4632 LearningRate 0.0098 Epoch: 13 Global Step: 229350 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:39:05,735-Speed 5214.56 samples/sec Loss 1.4063 LearningRate 0.0098 Epoch: 13 Global Step: 229360 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:39:07,703-Speed 5204.20 samples/sec Loss 1.3636 LearningRate 0.0098 Epoch: 13 Global Step: 229370 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:39:09,675-Speed 5194.69 samples/sec Loss 1.3982 LearningRate 0.0098 Epoch: 13 Global Step: 229380 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:39:11,657-Speed 5167.77 samples/sec Loss 1.3952 LearningRate 0.0098 Epoch: 13 Global Step: 229390 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:39:13,625-Speed 5204.75 samples/sec Loss 1.4343 LearningRate 0.0098 Epoch: 13 Global Step: 229400 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:39:15,596-Speed 5198.95 samples/sec Loss 1.3927 LearningRate 0.0098 Epoch: 13 Global Step: 229410 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:39:17,583-Speed 5155.40 samples/sec Loss 1.3712 LearningRate 0.0098 Epoch: 13 Global Step: 229420 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:39:19,547-Speed 5213.39 samples/sec Loss 1.3933 LearningRate 0.0098 Epoch: 13 Global Step: 229430 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:39:21,525-Speed 5180.13 samples/sec Loss 1.3909 LearningRate 0.0098 Epoch: 13 Global Step: 229440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:39:23,516-Speed 5142.93 samples/sec Loss 1.3902 LearningRate 0.0098 Epoch: 13 Global Step: 229450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:39:25,487-Speed 5197.68 samples/sec Loss 1.3697 LearningRate 0.0098 Epoch: 13 Global Step: 229460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:39:27,483-Speed 5133.27 samples/sec Loss 1.4213 LearningRate 0.0098 Epoch: 13 Global Step: 229470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:39:29,466-Speed 5166.38 samples/sec Loss 1.4007 LearningRate 0.0098 Epoch: 13 Global Step: 229480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:39:31,430-Speed 5216.42 samples/sec Loss 1.3799 LearningRate 0.0098 Epoch: 13 Global Step: 229490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:39:33,406-Speed 5181.97 samples/sec Loss 1.4189 LearningRate 0.0098 Epoch: 13 Global Step: 229500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:39:35,395-Speed 5151.64 samples/sec Loss 1.4648 LearningRate 0.0098 Epoch: 13 Global Step: 229510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:39:37,376-Speed 5170.51 samples/sec Loss 1.3780 LearningRate 0.0098 Epoch: 13 Global Step: 229520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:39:39,353-Speed 5180.10 samples/sec Loss 1.4398 LearningRate 0.0098 Epoch: 13 Global Step: 229530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:39:41,346-Speed 5140.13 samples/sec Loss 1.3934 LearningRate 0.0098 Epoch: 13 Global Step: 229540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:39:43,309-Speed 5219.61 samples/sec Loss 1.4270 LearningRate 0.0098 Epoch: 13 Global Step: 229550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:39:45,290-Speed 5169.00 samples/sec Loss 1.3612 LearningRate 0.0098 Epoch: 13 Global Step: 229560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:39:47,296-Speed 5106.54 samples/sec Loss 1.3902 LearningRate 0.0098 Epoch: 13 Global Step: 229570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:39:49,282-Speed 5158.80 samples/sec Loss 1.4091 LearningRate 0.0098 Epoch: 13 Global Step: 229580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:39:51,276-Speed 5137.25 samples/sec Loss 1.3893 LearningRate 0.0097 Epoch: 13 Global Step: 229590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:39:53,245-Speed 5201.62 samples/sec Loss 1.3942 LearningRate 0.0097 Epoch: 13 Global Step: 229600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:39:55,214-Speed 5202.68 samples/sec Loss 1.4054 LearningRate 0.0097 Epoch: 13 Global Step: 229610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:39:57,196-Speed 5170.00 samples/sec Loss 1.3932 LearningRate 0.0097 Epoch: 13 Global Step: 229620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:39:59,177-Speed 5170.46 samples/sec Loss 1.3682 LearningRate 0.0097 Epoch: 13 Global Step: 229630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:40:01,148-Speed 5195.44 samples/sec Loss 1.3645 LearningRate 0.0097 Epoch: 13 Global Step: 229640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:40:03,186-Speed 5025.80 samples/sec Loss 1.4057 LearningRate 0.0097 Epoch: 13 Global Step: 229650 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:40:05,187-Speed 5118.88 samples/sec Loss 1.3944 LearningRate 0.0097 Epoch: 13 Global Step: 229660 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:40:07,157-Speed 5199.63 samples/sec Loss 1.4356 LearningRate 0.0097 Epoch: 13 Global Step: 229670 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:40:09,120-Speed 5219.46 samples/sec Loss 1.3803 LearningRate 0.0097 Epoch: 13 Global Step: 229680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:40:11,108-Speed 5152.34 samples/sec Loss 1.4384 LearningRate 0.0097 Epoch: 13 Global Step: 229690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:40:13,086-Speed 5179.22 samples/sec Loss 1.3574 LearningRate 0.0097 Epoch: 13 Global Step: 229700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:40:15,054-Speed 5204.90 samples/sec Loss 1.3684 LearningRate 0.0097 Epoch: 13 Global Step: 229710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:40:17,031-Speed 5182.19 samples/sec Loss 1.4194 LearningRate 0.0097 Epoch: 13 Global Step: 229720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:40:19,000-Speed 5200.96 samples/sec Loss 1.4359 LearningRate 0.0097 Epoch: 13 Global Step: 229730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:40:20,991-Speed 5145.64 samples/sec Loss 1.4564 LearningRate 0.0097 Epoch: 13 Global Step: 229740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:40:22,957-Speed 5210.34 samples/sec Loss 1.3499 LearningRate 0.0097 Epoch: 13 Global Step: 229750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:40:24,931-Speed 5190.79 samples/sec Loss 1.3993 LearningRate 0.0097 Epoch: 13 Global Step: 229760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:40:26,907-Speed 5181.79 samples/sec Loss 1.4270 LearningRate 0.0097 Epoch: 13 Global Step: 229770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:40:28,886-Speed 5175.36 samples/sec Loss 1.4222 LearningRate 0.0097 Epoch: 13 Global Step: 229780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:40:30,858-Speed 5195.31 samples/sec Loss 1.4182 LearningRate 0.0097 Epoch: 13 Global Step: 229790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:40:32,825-Speed 5207.53 samples/sec Loss 1.3918 LearningRate 0.0097 Epoch: 13 Global Step: 229800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:40:34,806-Speed 5172.30 samples/sec Loss 1.3737 LearningRate 0.0097 Epoch: 13 Global Step: 229810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:40:36,798-Speed 5142.29 samples/sec Loss 1.3553 LearningRate 0.0097 Epoch: 13 Global Step: 229820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:40:38,797-Speed 5123.42 samples/sec Loss 1.3877 LearningRate 0.0097 Epoch: 13 Global Step: 229830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:40:40,774-Speed 5182.99 samples/sec Loss 1.3108 LearningRate 0.0097 Epoch: 13 Global Step: 229840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:40:42,749-Speed 5184.01 samples/sec Loss 1.3850 LearningRate 0.0097 Epoch: 13 Global Step: 229850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:40:44,741-Speed 5143.06 samples/sec Loss 1.3565 LearningRate 0.0097 Epoch: 13 Global Step: 229860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:40:46,714-Speed 5191.75 samples/sec Loss 1.3804 LearningRate 0.0097 Epoch: 13 Global Step: 229870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:40:48,703-Speed 5150.59 samples/sec Loss 1.4260 LearningRate 0.0097 Epoch: 13 Global Step: 229880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:40:50,680-Speed 5180.03 samples/sec Loss 1.4048 LearningRate 0.0097 Epoch: 13 Global Step: 229890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:40:52,657-Speed 5182.79 samples/sec Loss 1.3867 LearningRate 0.0097 Epoch: 13 Global Step: 229900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:40:54,643-Speed 5156.78 samples/sec Loss 1.3705 LearningRate 0.0097 Epoch: 13 Global Step: 229910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:40:56,623-Speed 5174.22 samples/sec Loss 1.3988 LearningRate 0.0097 Epoch: 13 Global Step: 229920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:40:58,605-Speed 5168.88 samples/sec Loss 1.4130 LearningRate 0.0097 Epoch: 13 Global Step: 229930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:41:00,574-Speed 5203.83 samples/sec Loss 1.3832 LearningRate 0.0097 Epoch: 13 Global Step: 229940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:41:02,575-Speed 5118.56 samples/sec Loss 1.3560 LearningRate 0.0097 Epoch: 13 Global Step: 229950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:41:04,553-Speed 5176.39 samples/sec Loss 1.3976 LearningRate 0.0097 Epoch: 13 Global Step: 229960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:41:06,537-Speed 5163.60 samples/sec Loss 1.3792 LearningRate 0.0097 Epoch: 13 Global Step: 229970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:41:08,504-Speed 5207.51 samples/sec Loss 1.3938 LearningRate 0.0097 Epoch: 13 Global Step: 229980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:41:10,477-Speed 5193.52 samples/sec Loss 1.3996 LearningRate 0.0097 Epoch: 13 Global Step: 229990 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:41:12,448-Speed 5196.02 samples/sec Loss 1.4120 LearningRate 0.0097 Epoch: 13 Global Step: 230000 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:41:39,431-[lfw][230000]XNorm: 21.734793 Training: 2022-04-11 14:41:39,431-[lfw][230000]Accuracy-Flip: 0.99783+-0.00308 Training: 2022-04-11 14:41:39,432-[lfw][230000]Accuracy-Highest: 0.99833 Training: 2022-04-11 14:42:10,533-[cfp_fp][230000]XNorm: 21.136654 Training: 2022-04-11 14:42:10,534-[cfp_fp][230000]Accuracy-Flip: 0.98843+-0.00445 Training: 2022-04-11 14:42:10,534-[cfp_fp][230000]Accuracy-Highest: 0.98843 Training: 2022-04-11 14:42:37,327-[agedb_30][230000]XNorm: 22.225574 Training: 2022-04-11 14:42:37,327-[agedb_30][230000]Accuracy-Flip: 0.98133+-0.00849 Training: 2022-04-11 14:42:37,328-[agedb_30][230000]Accuracy-Highest: 0.98250 Training: 2022-04-11 14:42:39,324-Speed 117.87 samples/sec Loss 1.3677 LearningRate 0.0097 Epoch: 13 Global Step: 230010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:42:41,290-Speed 5209.75 samples/sec Loss 1.3718 LearningRate 0.0097 Epoch: 13 Global Step: 230020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:42:43,261-Speed 5196.38 samples/sec Loss 1.4093 LearningRate 0.0097 Epoch: 13 Global Step: 230030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:42:45,248-Speed 5155.65 samples/sec Loss 1.3687 LearningRate 0.0097 Epoch: 13 Global Step: 230040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:42:47,217-Speed 5204.88 samples/sec Loss 1.3696 LearningRate 0.0097 Epoch: 13 Global Step: 230050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:42:49,213-Speed 5130.81 samples/sec Loss 1.3960 LearningRate 0.0097 Epoch: 13 Global Step: 230060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:42:51,188-Speed 5185.94 samples/sec Loss 1.3499 LearningRate 0.0097 Epoch: 13 Global Step: 230070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:42:53,155-Speed 5207.93 samples/sec Loss 1.3638 LearningRate 0.0097 Epoch: 13 Global Step: 230080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:42:55,141-Speed 5159.48 samples/sec Loss 1.4204 LearningRate 0.0097 Epoch: 13 Global Step: 230090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:42:57,127-Speed 5156.72 samples/sec Loss 1.3370 LearningRate 0.0097 Epoch: 13 Global Step: 230100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:42:59,097-Speed 5198.39 samples/sec Loss 1.3565 LearningRate 0.0097 Epoch: 13 Global Step: 230110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:43:01,073-Speed 5184.40 samples/sec Loss 1.3867 LearningRate 0.0097 Epoch: 13 Global Step: 230120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:43:03,042-Speed 5203.91 samples/sec Loss 1.3634 LearningRate 0.0096 Epoch: 13 Global Step: 230130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:43:05,023-Speed 5169.33 samples/sec Loss 1.4267 LearningRate 0.0096 Epoch: 13 Global Step: 230140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:43:07,000-Speed 5181.58 samples/sec Loss 1.3750 LearningRate 0.0096 Epoch: 13 Global Step: 230150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:43:08,985-Speed 5162.05 samples/sec Loss 1.3992 LearningRate 0.0096 Epoch: 13 Global Step: 230160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:43:10,964-Speed 5176.22 samples/sec Loss 1.4111 LearningRate 0.0096 Epoch: 13 Global Step: 230170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:43:12,939-Speed 5185.30 samples/sec Loss 1.4218 LearningRate 0.0096 Epoch: 13 Global Step: 230180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:43:14,915-Speed 5183.71 samples/sec Loss 1.3914 LearningRate 0.0096 Epoch: 13 Global Step: 230190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:43:16,892-Speed 5181.16 samples/sec Loss 1.3800 LearningRate 0.0096 Epoch: 13 Global Step: 230200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:43:18,869-Speed 5182.16 samples/sec Loss 1.3648 LearningRate 0.0096 Epoch: 13 Global Step: 230210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:43:20,836-Speed 5208.92 samples/sec Loss 1.4023 LearningRate 0.0096 Epoch: 13 Global Step: 230220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:43:22,809-Speed 5190.47 samples/sec Loss 1.3504 LearningRate 0.0096 Epoch: 13 Global Step: 230230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:43:24,788-Speed 5177.12 samples/sec Loss 1.4266 LearningRate 0.0096 Epoch: 13 Global Step: 230240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:43:26,776-Speed 5152.02 samples/sec Loss 1.4271 LearningRate 0.0096 Epoch: 13 Global Step: 230250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:43:28,802-Speed 5053.59 samples/sec Loss 1.4788 LearningRate 0.0096 Epoch: 13 Global Step: 230260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:43:30,801-Speed 5124.98 samples/sec Loss 1.3909 LearningRate 0.0096 Epoch: 13 Global Step: 230270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:43:32,784-Speed 5165.88 samples/sec Loss 1.4336 LearningRate 0.0096 Epoch: 13 Global Step: 230280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:43:34,764-Speed 5175.84 samples/sec Loss 1.3889 LearningRate 0.0096 Epoch: 13 Global Step: 230290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:43:36,750-Speed 5157.31 samples/sec Loss 1.3930 LearningRate 0.0096 Epoch: 13 Global Step: 230300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:43:38,725-Speed 5184.73 samples/sec Loss 1.3551 LearningRate 0.0096 Epoch: 13 Global Step: 230310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:43:40,704-Speed 5176.26 samples/sec Loss 1.3527 LearningRate 0.0096 Epoch: 13 Global Step: 230320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:43:42,696-Speed 5143.62 samples/sec Loss 1.4329 LearningRate 0.0096 Epoch: 13 Global Step: 230330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:43:44,683-Speed 5153.84 samples/sec Loss 1.4156 LearningRate 0.0096 Epoch: 13 Global Step: 230340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:43:46,666-Speed 5167.01 samples/sec Loss 1.3810 LearningRate 0.0096 Epoch: 13 Global Step: 230350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:43:48,654-Speed 5150.55 samples/sec Loss 1.4010 LearningRate 0.0096 Epoch: 13 Global Step: 230360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:43:50,648-Speed 5139.04 samples/sec Loss 1.3639 LearningRate 0.0096 Epoch: 13 Global Step: 230370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:43:52,644-Speed 5132.26 samples/sec Loss 1.4144 LearningRate 0.0096 Epoch: 13 Global Step: 230380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:43:54,637-Speed 5139.68 samples/sec Loss 1.4078 LearningRate 0.0096 Epoch: 13 Global Step: 230390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:43:56,629-Speed 5143.54 samples/sec Loss 1.3767 LearningRate 0.0096 Epoch: 13 Global Step: 230400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:43:58,603-Speed 5187.35 samples/sec Loss 1.3535 LearningRate 0.0096 Epoch: 13 Global Step: 230410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:44:00,584-Speed 5172.31 samples/sec Loss 1.3613 LearningRate 0.0096 Epoch: 13 Global Step: 230420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:44:02,566-Speed 5166.63 samples/sec Loss 1.3779 LearningRate 0.0096 Epoch: 13 Global Step: 230430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:44:04,559-Speed 5140.50 samples/sec Loss 1.3926 LearningRate 0.0096 Epoch: 13 Global Step: 230440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:44:06,549-Speed 5148.44 samples/sec Loss 1.4588 LearningRate 0.0096 Epoch: 13 Global Step: 230450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:44:08,554-Speed 5107.41 samples/sec Loss 1.3942 LearningRate 0.0096 Epoch: 13 Global Step: 230460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:44:10,546-Speed 5142.36 samples/sec Loss 1.4225 LearningRate 0.0096 Epoch: 13 Global Step: 230470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:44:12,536-Speed 5146.56 samples/sec Loss 1.3976 LearningRate 0.0096 Epoch: 13 Global Step: 230480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:44:14,525-Speed 5151.18 samples/sec Loss 1.3831 LearningRate 0.0096 Epoch: 13 Global Step: 230490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:44:16,494-Speed 5202.54 samples/sec Loss 1.3789 LearningRate 0.0096 Epoch: 13 Global Step: 230500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:44:18,469-Speed 5186.92 samples/sec Loss 1.4149 LearningRate 0.0096 Epoch: 13 Global Step: 230510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:44:20,440-Speed 5196.72 samples/sec Loss 1.4374 LearningRate 0.0096 Epoch: 13 Global Step: 230520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:44:22,409-Speed 5203.94 samples/sec Loss 1.4057 LearningRate 0.0096 Epoch: 13 Global Step: 230530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:44:24,381-Speed 5192.68 samples/sec Loss 1.3735 LearningRate 0.0096 Epoch: 13 Global Step: 230540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:44:26,367-Speed 5159.93 samples/sec Loss 1.3839 LearningRate 0.0096 Epoch: 13 Global Step: 230550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:44:28,340-Speed 5191.34 samples/sec Loss 1.4019 LearningRate 0.0096 Epoch: 13 Global Step: 230560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:44:30,306-Speed 5208.54 samples/sec Loss 1.3946 LearningRate 0.0096 Epoch: 13 Global Step: 230570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:44:32,279-Speed 5191.43 samples/sec Loss 1.4332 LearningRate 0.0096 Epoch: 13 Global Step: 230580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:44:34,240-Speed 5225.09 samples/sec Loss 1.4233 LearningRate 0.0096 Epoch: 13 Global Step: 230590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:44:36,220-Speed 5174.37 samples/sec Loss 1.4060 LearningRate 0.0096 Epoch: 13 Global Step: 230600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:44:38,188-Speed 5205.98 samples/sec Loss 1.4290 LearningRate 0.0096 Epoch: 13 Global Step: 230610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:44:40,157-Speed 5202.04 samples/sec Loss 1.3536 LearningRate 0.0096 Epoch: 13 Global Step: 230620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:44:42,127-Speed 5199.25 samples/sec Loss 1.3854 LearningRate 0.0096 Epoch: 13 Global Step: 230630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:44:44,104-Speed 5181.98 samples/sec Loss 1.4214 LearningRate 0.0096 Epoch: 13 Global Step: 230640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:44:46,075-Speed 5195.59 samples/sec Loss 1.3728 LearningRate 0.0096 Epoch: 13 Global Step: 230650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:44:48,058-Speed 5166.67 samples/sec Loss 1.3791 LearningRate 0.0095 Epoch: 13 Global Step: 230660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:44:50,031-Speed 5191.74 samples/sec Loss 1.4432 LearningRate 0.0095 Epoch: 13 Global Step: 230670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:44:52,003-Speed 5194.90 samples/sec Loss 1.3921 LearningRate 0.0095 Epoch: 13 Global Step: 230680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:44:53,969-Speed 5210.03 samples/sec Loss 1.4438 LearningRate 0.0095 Epoch: 13 Global Step: 230690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:44:55,955-Speed 5158.31 samples/sec Loss 1.3971 LearningRate 0.0095 Epoch: 13 Global Step: 230700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:44:57,919-Speed 5216.31 samples/sec Loss 1.4058 LearningRate 0.0095 Epoch: 13 Global Step: 230710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:44:59,889-Speed 5199.01 samples/sec Loss 1.4157 LearningRate 0.0095 Epoch: 13 Global Step: 230720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:45:01,864-Speed 5187.42 samples/sec Loss 1.3699 LearningRate 0.0095 Epoch: 13 Global Step: 230730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:45:03,832-Speed 5202.88 samples/sec Loss 1.3978 LearningRate 0.0095 Epoch: 13 Global Step: 230740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:45:05,805-Speed 5194.47 samples/sec Loss 1.3777 LearningRate 0.0095 Epoch: 13 Global Step: 230750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:45:07,772-Speed 5207.19 samples/sec Loss 1.3727 LearningRate 0.0095 Epoch: 13 Global Step: 230760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:45:09,749-Speed 5179.50 samples/sec Loss 1.4037 LearningRate 0.0095 Epoch: 13 Global Step: 230770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:45:11,736-Speed 5156.97 samples/sec Loss 1.4152 LearningRate 0.0095 Epoch: 13 Global Step: 230780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:45:13,706-Speed 5200.15 samples/sec Loss 1.4452 LearningRate 0.0095 Epoch: 13 Global Step: 230790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:45:15,672-Speed 5209.11 samples/sec Loss 1.3985 LearningRate 0.0095 Epoch: 13 Global Step: 230800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:45:17,655-Speed 5164.47 samples/sec Loss 1.4388 LearningRate 0.0095 Epoch: 13 Global Step: 230810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:45:19,634-Speed 5177.44 samples/sec Loss 1.4006 LearningRate 0.0095 Epoch: 13 Global Step: 230820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:45:21,608-Speed 5190.60 samples/sec Loss 1.4157 LearningRate 0.0095 Epoch: 13 Global Step: 230830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:45:23,586-Speed 5178.24 samples/sec Loss 1.4177 LearningRate 0.0095 Epoch: 13 Global Step: 230840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:45:25,572-Speed 5158.20 samples/sec Loss 1.3575 LearningRate 0.0095 Epoch: 13 Global Step: 230850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:45:27,545-Speed 5190.01 samples/sec Loss 1.3763 LearningRate 0.0095 Epoch: 13 Global Step: 230860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:45:29,514-Speed 5203.02 samples/sec Loss 1.4541 LearningRate 0.0095 Epoch: 13 Global Step: 230870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:45:31,485-Speed 5195.47 samples/sec Loss 1.4332 LearningRate 0.0095 Epoch: 13 Global Step: 230880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:45:33,459-Speed 5190.36 samples/sec Loss 1.3906 LearningRate 0.0095 Epoch: 13 Global Step: 230890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:45:35,420-Speed 5222.53 samples/sec Loss 1.3805 LearningRate 0.0095 Epoch: 13 Global Step: 230900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:45:37,388-Speed 5205.33 samples/sec Loss 1.4264 LearningRate 0.0095 Epoch: 13 Global Step: 230910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:45:39,357-Speed 5202.42 samples/sec Loss 1.3759 LearningRate 0.0095 Epoch: 13 Global Step: 230920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:45:41,324-Speed 5207.60 samples/sec Loss 1.4292 LearningRate 0.0095 Epoch: 13 Global Step: 230930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:45:43,297-Speed 5192.77 samples/sec Loss 1.4314 LearningRate 0.0095 Epoch: 13 Global Step: 230940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:45:45,267-Speed 5199.22 samples/sec Loss 1.3527 LearningRate 0.0095 Epoch: 13 Global Step: 230950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:45:47,237-Speed 5199.17 samples/sec Loss 1.4404 LearningRate 0.0095 Epoch: 13 Global Step: 230960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:45:49,208-Speed 5197.30 samples/sec Loss 1.3895 LearningRate 0.0095 Epoch: 13 Global Step: 230970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:45:51,201-Speed 5141.44 samples/sec Loss 1.3934 LearningRate 0.0095 Epoch: 13 Global Step: 230980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:45:53,176-Speed 5186.40 samples/sec Loss 1.4065 LearningRate 0.0095 Epoch: 13 Global Step: 230990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:45:55,150-Speed 5189.35 samples/sec Loss 1.4112 LearningRate 0.0095 Epoch: 13 Global Step: 231000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:45:57,122-Speed 5194.22 samples/sec Loss 1.4068 LearningRate 0.0095 Epoch: 13 Global Step: 231010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:45:59,107-Speed 5159.75 samples/sec Loss 1.4352 LearningRate 0.0095 Epoch: 13 Global Step: 231020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:46:01,090-Speed 5165.70 samples/sec Loss 1.3699 LearningRate 0.0095 Epoch: 13 Global Step: 231030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:46:03,074-Speed 5162.60 samples/sec Loss 1.4030 LearningRate 0.0095 Epoch: 13 Global Step: 231040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:46:05,064-Speed 5147.40 samples/sec Loss 1.3542 LearningRate 0.0095 Epoch: 13 Global Step: 231050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:46:07,033-Speed 5202.58 samples/sec Loss 1.3967 LearningRate 0.0095 Epoch: 13 Global Step: 231060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:46:09,005-Speed 5194.22 samples/sec Loss 1.3829 LearningRate 0.0095 Epoch: 13 Global Step: 231070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:46:10,988-Speed 5166.56 samples/sec Loss 1.4072 LearningRate 0.0095 Epoch: 13 Global Step: 231080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:46:12,976-Speed 5153.50 samples/sec Loss 1.3976 LearningRate 0.0095 Epoch: 13 Global Step: 231090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:46:14,964-Speed 5150.50 samples/sec Loss 1.3763 LearningRate 0.0095 Epoch: 13 Global Step: 231100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:46:16,928-Speed 5215.22 samples/sec Loss 1.4239 LearningRate 0.0095 Epoch: 13 Global Step: 231110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:46:18,904-Speed 5184.22 samples/sec Loss 1.4723 LearningRate 0.0095 Epoch: 13 Global Step: 231120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:46:20,888-Speed 5164.13 samples/sec Loss 1.3808 LearningRate 0.0095 Epoch: 13 Global Step: 231130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:46:22,884-Speed 5132.00 samples/sec Loss 1.3532 LearningRate 0.0095 Epoch: 13 Global Step: 231140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:46:24,871-Speed 5154.81 samples/sec Loss 1.4269 LearningRate 0.0095 Epoch: 13 Global Step: 231150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:46:26,845-Speed 5190.94 samples/sec Loss 1.3647 LearningRate 0.0095 Epoch: 13 Global Step: 231160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:46:28,841-Speed 5129.76 samples/sec Loss 1.4130 LearningRate 0.0095 Epoch: 13 Global Step: 231170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:46:30,825-Speed 5162.66 samples/sec Loss 1.3767 LearningRate 0.0095 Epoch: 13 Global Step: 231180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:46:32,794-Speed 5203.64 samples/sec Loss 1.3738 LearningRate 0.0095 Epoch: 13 Global Step: 231190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:46:34,766-Speed 5194.23 samples/sec Loss 1.3591 LearningRate 0.0095 Epoch: 13 Global Step: 231200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:46:36,745-Speed 5176.55 samples/sec Loss 1.3702 LearningRate 0.0094 Epoch: 13 Global Step: 231210 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:46:38,707-Speed 5220.83 samples/sec Loss 1.3442 LearningRate 0.0094 Epoch: 13 Global Step: 231220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:46:40,682-Speed 5187.34 samples/sec Loss 1.3943 LearningRate 0.0094 Epoch: 13 Global Step: 231230 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:46:42,650-Speed 5204.50 samples/sec Loss 1.3882 LearningRate 0.0094 Epoch: 13 Global Step: 231240 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:46:44,624-Speed 5189.97 samples/sec Loss 1.3839 LearningRate 0.0094 Epoch: 13 Global Step: 231250 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:46:46,600-Speed 5183.30 samples/sec Loss 1.3989 LearningRate 0.0094 Epoch: 13 Global Step: 231260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:46:48,568-Speed 5205.37 samples/sec Loss 1.3755 LearningRate 0.0094 Epoch: 13 Global Step: 231270 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:46:50,537-Speed 5203.41 samples/sec Loss 1.4201 LearningRate 0.0094 Epoch: 13 Global Step: 231280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:46:52,512-Speed 5186.36 samples/sec Loss 1.4203 LearningRate 0.0094 Epoch: 13 Global Step: 231290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:46:54,483-Speed 5197.59 samples/sec Loss 1.4394 LearningRate 0.0094 Epoch: 13 Global Step: 231300 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:46:56,467-Speed 5162.48 samples/sec Loss 1.3772 LearningRate 0.0094 Epoch: 13 Global Step: 231310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:46:58,441-Speed 5188.88 samples/sec Loss 1.3784 LearningRate 0.0094 Epoch: 13 Global Step: 231320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:47:00,415-Speed 5188.79 samples/sec Loss 1.3550 LearningRate 0.0094 Epoch: 13 Global Step: 231330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:47:02,405-Speed 5146.19 samples/sec Loss 1.3704 LearningRate 0.0094 Epoch: 13 Global Step: 231340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:47:04,382-Speed 5181.67 samples/sec Loss 1.3622 LearningRate 0.0094 Epoch: 13 Global Step: 231350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:47:06,352-Speed 5199.50 samples/sec Loss 1.3593 LearningRate 0.0094 Epoch: 13 Global Step: 231360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:47:08,320-Speed 5206.69 samples/sec Loss 1.3788 LearningRate 0.0094 Epoch: 13 Global Step: 231370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:47:10,308-Speed 5152.31 samples/sec Loss 1.3839 LearningRate 0.0094 Epoch: 13 Global Step: 231380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:47:12,277-Speed 5203.41 samples/sec Loss 1.3822 LearningRate 0.0094 Epoch: 13 Global Step: 231390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:47:14,249-Speed 5192.93 samples/sec Loss 1.3452 LearningRate 0.0094 Epoch: 13 Global Step: 231400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:47:16,236-Speed 5157.22 samples/sec Loss 1.4050 LearningRate 0.0094 Epoch: 13 Global Step: 231410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:47:18,209-Speed 5190.08 samples/sec Loss 1.3980 LearningRate 0.0094 Epoch: 13 Global Step: 231420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:47:20,173-Speed 5216.25 samples/sec Loss 1.3946 LearningRate 0.0094 Epoch: 13 Global Step: 231430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:47:22,154-Speed 5171.64 samples/sec Loss 1.3703 LearningRate 0.0094 Epoch: 13 Global Step: 231440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:47:24,142-Speed 5152.66 samples/sec Loss 1.3445 LearningRate 0.0094 Epoch: 13 Global Step: 231450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:47:26,120-Speed 5177.22 samples/sec Loss 1.4560 LearningRate 0.0094 Epoch: 13 Global Step: 231460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:47:28,146-Speed 5057.70 samples/sec Loss 1.4346 LearningRate 0.0094 Epoch: 13 Global Step: 231470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:47:30,115-Speed 5200.79 samples/sec Loss 1.3536 LearningRate 0.0094 Epoch: 13 Global Step: 231480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:47:32,084-Speed 5204.43 samples/sec Loss 1.4280 LearningRate 0.0094 Epoch: 13 Global Step: 231490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:47:34,051-Speed 5205.42 samples/sec Loss 1.3888 LearningRate 0.0094 Epoch: 13 Global Step: 231500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:47:36,032-Speed 5172.69 samples/sec Loss 1.3722 LearningRate 0.0094 Epoch: 13 Global Step: 231510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:47:38,016-Speed 5161.43 samples/sec Loss 1.4053 LearningRate 0.0094 Epoch: 13 Global Step: 231520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:47:39,993-Speed 5181.52 samples/sec Loss 1.4071 LearningRate 0.0094 Epoch: 13 Global Step: 231530 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:47:41,965-Speed 5193.39 samples/sec Loss 1.3864 LearningRate 0.0094 Epoch: 13 Global Step: 231540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:47:43,938-Speed 5194.58 samples/sec Loss 1.3818 LearningRate 0.0094 Epoch: 13 Global Step: 231550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:47:45,919-Speed 5169.38 samples/sec Loss 1.3859 LearningRate 0.0094 Epoch: 13 Global Step: 231560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:47:47,920-Speed 5147.86 samples/sec Loss 1.4224 LearningRate 0.0094 Epoch: 13 Global Step: 231570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:47:49,901-Speed 5170.62 samples/sec Loss 1.3673 LearningRate 0.0094 Epoch: 13 Global Step: 231580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:47:51,881-Speed 5173.15 samples/sec Loss 1.3943 LearningRate 0.0094 Epoch: 13 Global Step: 231590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:47:53,853-Speed 5193.63 samples/sec Loss 1.3494 LearningRate 0.0094 Epoch: 13 Global Step: 231600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:47:55,837-Speed 5164.01 samples/sec Loss 1.4533 LearningRate 0.0094 Epoch: 13 Global Step: 231610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:47:57,809-Speed 5194.39 samples/sec Loss 1.3364 LearningRate 0.0094 Epoch: 13 Global Step: 231620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:47:59,789-Speed 5173.13 samples/sec Loss 1.3766 LearningRate 0.0094 Epoch: 13 Global Step: 231630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:48:01,790-Speed 5118.04 samples/sec Loss 1.3710 LearningRate 0.0094 Epoch: 13 Global Step: 231640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:48:03,755-Speed 5212.73 samples/sec Loss 1.4651 LearningRate 0.0094 Epoch: 13 Global Step: 231650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:48:05,722-Speed 5209.66 samples/sec Loss 1.3768 LearningRate 0.0094 Epoch: 13 Global Step: 231660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:48:07,693-Speed 5196.92 samples/sec Loss 1.3832 LearningRate 0.0094 Epoch: 13 Global Step: 231670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:48:09,672-Speed 5175.41 samples/sec Loss 1.4051 LearningRate 0.0094 Epoch: 13 Global Step: 231680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:48:11,656-Speed 5163.44 samples/sec Loss 1.4404 LearningRate 0.0094 Epoch: 13 Global Step: 231690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:48:13,635-Speed 5177.51 samples/sec Loss 1.3902 LearningRate 0.0094 Epoch: 13 Global Step: 231700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:48:15,602-Speed 5206.66 samples/sec Loss 1.3607 LearningRate 0.0094 Epoch: 13 Global Step: 231710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:48:17,575-Speed 5192.49 samples/sec Loss 1.3983 LearningRate 0.0094 Epoch: 13 Global Step: 231720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:48:19,541-Speed 5210.30 samples/sec Loss 1.4054 LearningRate 0.0094 Epoch: 13 Global Step: 231730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:48:21,511-Speed 5199.32 samples/sec Loss 1.3330 LearningRate 0.0094 Epoch: 13 Global Step: 231740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:48:23,484-Speed 5190.85 samples/sec Loss 1.4145 LearningRate 0.0093 Epoch: 13 Global Step: 231750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:48:25,457-Speed 5192.79 samples/sec Loss 1.4132 LearningRate 0.0093 Epoch: 13 Global Step: 231760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:48:27,445-Speed 5151.25 samples/sec Loss 1.3874 LearningRate 0.0093 Epoch: 13 Global Step: 231770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:48:29,440-Speed 5134.95 samples/sec Loss 1.3738 LearningRate 0.0093 Epoch: 13 Global Step: 231780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:48:31,409-Speed 5202.01 samples/sec Loss 1.3932 LearningRate 0.0093 Epoch: 13 Global Step: 231790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:48:33,384-Speed 5186.61 samples/sec Loss 1.3910 LearningRate 0.0093 Epoch: 13 Global Step: 231800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:48:35,362-Speed 5179.56 samples/sec Loss 1.3866 LearningRate 0.0093 Epoch: 13 Global Step: 231810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:48:37,350-Speed 5154.41 samples/sec Loss 1.4355 LearningRate 0.0093 Epoch: 13 Global Step: 231820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:48:39,340-Speed 5145.08 samples/sec Loss 1.4013 LearningRate 0.0093 Epoch: 13 Global Step: 231830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:48:41,321-Speed 5172.98 samples/sec Loss 1.3919 LearningRate 0.0093 Epoch: 13 Global Step: 231840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:48:43,288-Speed 5206.36 samples/sec Loss 1.3403 LearningRate 0.0093 Epoch: 13 Global Step: 231850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:48:45,274-Speed 5157.73 samples/sec Loss 1.4036 LearningRate 0.0093 Epoch: 13 Global Step: 231860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:48:47,250-Speed 5182.98 samples/sec Loss 1.3534 LearningRate 0.0093 Epoch: 13 Global Step: 231870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:48:49,243-Speed 5139.48 samples/sec Loss 1.4642 LearningRate 0.0093 Epoch: 13 Global Step: 231880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:48:51,219-Speed 5186.79 samples/sec Loss 1.4306 LearningRate 0.0093 Epoch: 13 Global Step: 231890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:48:53,191-Speed 5192.72 samples/sec Loss 1.3158 LearningRate 0.0093 Epoch: 13 Global Step: 231900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:48:55,173-Speed 5167.77 samples/sec Loss 1.4165 LearningRate 0.0093 Epoch: 13 Global Step: 231910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:48:57,143-Speed 5200.26 samples/sec Loss 1.3534 LearningRate 0.0093 Epoch: 13 Global Step: 231920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:48:59,140-Speed 5129.54 samples/sec Loss 1.3730 LearningRate 0.0093 Epoch: 13 Global Step: 231930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:49:01,122-Speed 5168.92 samples/sec Loss 1.3583 LearningRate 0.0093 Epoch: 13 Global Step: 231940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:49:03,091-Speed 5203.01 samples/sec Loss 1.3985 LearningRate 0.0093 Epoch: 13 Global Step: 231950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:49:05,048-Speed 5234.00 samples/sec Loss 1.3740 LearningRate 0.0093 Epoch: 13 Global Step: 231960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:49:07,017-Speed 5201.22 samples/sec Loss 1.3578 LearningRate 0.0093 Epoch: 13 Global Step: 231970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:49:08,996-Speed 5176.57 samples/sec Loss 1.4212 LearningRate 0.0093 Epoch: 13 Global Step: 231980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:49:10,969-Speed 5192.34 samples/sec Loss 1.4231 LearningRate 0.0093 Epoch: 13 Global Step: 231990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:49:12,954-Speed 5159.62 samples/sec Loss 1.4329 LearningRate 0.0093 Epoch: 13 Global Step: 232000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:49:39,395-[lfw][232000]XNorm: 21.801200 Training: 2022-04-11 14:49:39,395-[lfw][232000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-04-11 14:49:39,396-[lfw][232000]Accuracy-Highest: 0.99833 Training: 2022-04-11 14:50:10,023-[cfp_fp][232000]XNorm: 21.254916 Training: 2022-04-11 14:50:10,023-[cfp_fp][232000]Accuracy-Flip: 0.98800+-0.00400 Training: 2022-04-11 14:50:10,024-[cfp_fp][232000]Accuracy-Highest: 0.98843 Training: 2022-04-11 14:50:36,406-[agedb_30][232000]XNorm: 22.303147 Training: 2022-04-11 14:50:36,406-[agedb_30][232000]Accuracy-Flip: 0.98183+-0.00713 Training: 2022-04-11 14:50:36,407-[agedb_30][232000]Accuracy-Highest: 0.98250 Training: 2022-04-11 14:50:38,403-Speed 119.84 samples/sec Loss 1.3442 LearningRate 0.0093 Epoch: 13 Global Step: 232010 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:50:40,399-Speed 5132.45 samples/sec Loss 1.3958 LearningRate 0.0093 Epoch: 13 Global Step: 232020 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:50:42,376-Speed 5182.09 samples/sec Loss 1.4468 LearningRate 0.0093 Epoch: 13 Global Step: 232030 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:50:44,379-Speed 5115.21 samples/sec Loss 1.4286 LearningRate 0.0093 Epoch: 13 Global Step: 232040 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:50:46,350-Speed 5197.44 samples/sec Loss 1.4050 LearningRate 0.0093 Epoch: 13 Global Step: 232050 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:50:48,350-Speed 5121.10 samples/sec Loss 1.3629 LearningRate 0.0093 Epoch: 13 Global Step: 232060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:50:50,411-Speed 4970.31 samples/sec Loss 1.3954 LearningRate 0.0093 Epoch: 13 Global Step: 232070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:50:52,410-Speed 5125.15 samples/sec Loss 1.3350 LearningRate 0.0093 Epoch: 13 Global Step: 232080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:50:54,376-Speed 5208.82 samples/sec Loss 1.3657 LearningRate 0.0093 Epoch: 13 Global Step: 232090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:50:56,358-Speed 5169.42 samples/sec Loss 1.3586 LearningRate 0.0093 Epoch: 13 Global Step: 232100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:50:58,326-Speed 5203.51 samples/sec Loss 1.4228 LearningRate 0.0093 Epoch: 13 Global Step: 232110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:51:00,304-Speed 5180.32 samples/sec Loss 1.3875 LearningRate 0.0093 Epoch: 13 Global Step: 232120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:51:02,290-Speed 5155.74 samples/sec Loss 1.4647 LearningRate 0.0093 Epoch: 13 Global Step: 232130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:51:04,257-Speed 5208.83 samples/sec Loss 1.3743 LearningRate 0.0093 Epoch: 13 Global Step: 232140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:51:06,260-Speed 5114.54 samples/sec Loss 1.4284 LearningRate 0.0093 Epoch: 13 Global Step: 232150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:51:08,229-Speed 5202.62 samples/sec Loss 1.4000 LearningRate 0.0093 Epoch: 13 Global Step: 232160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:51:10,198-Speed 5202.98 samples/sec Loss 1.4045 LearningRate 0.0093 Epoch: 13 Global Step: 232170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:51:12,181-Speed 5163.98 samples/sec Loss 1.4046 LearningRate 0.0093 Epoch: 13 Global Step: 232180 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:51:14,151-Speed 5200.64 samples/sec Loss 1.4486 LearningRate 0.0093 Epoch: 13 Global Step: 232190 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:51:16,120-Speed 5203.67 samples/sec Loss 1.3715 LearningRate 0.0093 Epoch: 13 Global Step: 232200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:51:18,087-Speed 5205.65 samples/sec Loss 1.3386 LearningRate 0.0093 Epoch: 13 Global Step: 232210 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:51:20,054-Speed 5207.36 samples/sec Loss 1.3611 LearningRate 0.0093 Epoch: 13 Global Step: 232220 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:51:22,049-Speed 5134.96 samples/sec Loss 1.4076 LearningRate 0.0093 Epoch: 13 Global Step: 232230 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:51:24,019-Speed 5199.42 samples/sec Loss 1.3943 LearningRate 0.0093 Epoch: 13 Global Step: 232240 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:51:26,000-Speed 5172.15 samples/sec Loss 1.3481 LearningRate 0.0093 Epoch: 13 Global Step: 232250 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:51:27,973-Speed 5192.03 samples/sec Loss 1.3956 LearningRate 0.0093 Epoch: 13 Global Step: 232260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:51:29,949-Speed 5184.04 samples/sec Loss 1.4055 LearningRate 0.0093 Epoch: 13 Global Step: 232270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:51:31,923-Speed 5189.79 samples/sec Loss 1.3596 LearningRate 0.0093 Epoch: 13 Global Step: 232280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:51:33,899-Speed 5184.10 samples/sec Loss 1.4092 LearningRate 0.0093 Epoch: 13 Global Step: 232290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:51:35,874-Speed 5187.03 samples/sec Loss 1.4273 LearningRate 0.0092 Epoch: 13 Global Step: 232300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:51:37,857-Speed 5164.76 samples/sec Loss 1.4442 LearningRate 0.0092 Epoch: 13 Global Step: 232310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:51:39,834-Speed 5180.77 samples/sec Loss 1.3749 LearningRate 0.0092 Epoch: 13 Global Step: 232320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:51:41,803-Speed 5200.97 samples/sec Loss 1.3609 LearningRate 0.0092 Epoch: 13 Global Step: 232330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:51:43,781-Speed 5179.25 samples/sec Loss 1.3573 LearningRate 0.0092 Epoch: 13 Global Step: 232340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:51:45,750-Speed 5202.24 samples/sec Loss 1.3498 LearningRate 0.0092 Epoch: 13 Global Step: 232350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:51:47,734-Speed 5164.71 samples/sec Loss 1.3910 LearningRate 0.0092 Epoch: 13 Global Step: 232360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:51:49,714-Speed 5171.78 samples/sec Loss 1.3273 LearningRate 0.0092 Epoch: 13 Global Step: 232370 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:51:51,702-Speed 5154.53 samples/sec Loss 1.4082 LearningRate 0.0092 Epoch: 13 Global Step: 232380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:51:53,685-Speed 5164.45 samples/sec Loss 1.3988 LearningRate 0.0092 Epoch: 13 Global Step: 232390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:51:55,660-Speed 5189.08 samples/sec Loss 1.4056 LearningRate 0.0092 Epoch: 13 Global Step: 232400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:51:57,654-Speed 5136.80 samples/sec Loss 1.4204 LearningRate 0.0092 Epoch: 13 Global Step: 232410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:51:59,641-Speed 5155.05 samples/sec Loss 1.4759 LearningRate 0.0092 Epoch: 13 Global Step: 232420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:52:01,620-Speed 5173.78 samples/sec Loss 1.3973 LearningRate 0.0092 Epoch: 13 Global Step: 232430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:52:03,601-Speed 5172.45 samples/sec Loss 1.3888 LearningRate 0.0092 Epoch: 13 Global Step: 232440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:52:05,589-Speed 5151.28 samples/sec Loss 1.3872 LearningRate 0.0092 Epoch: 13 Global Step: 232450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:52:07,571-Speed 5168.26 samples/sec Loss 1.3929 LearningRate 0.0092 Epoch: 13 Global Step: 232460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:52:09,544-Speed 5193.09 samples/sec Loss 1.3932 LearningRate 0.0092 Epoch: 13 Global Step: 232470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:52:11,530-Speed 5158.03 samples/sec Loss 1.4456 LearningRate 0.0092 Epoch: 13 Global Step: 232480 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:52:13,515-Speed 5158.42 samples/sec Loss 1.4182 LearningRate 0.0092 Epoch: 13 Global Step: 232490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:52:15,508-Speed 5142.40 samples/sec Loss 1.3895 LearningRate 0.0092 Epoch: 13 Global Step: 232500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:52:17,478-Speed 5198.59 samples/sec Loss 1.3743 LearningRate 0.0092 Epoch: 13 Global Step: 232510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:52:19,452-Speed 5188.66 samples/sec Loss 1.3204 LearningRate 0.0092 Epoch: 13 Global Step: 232520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:52:21,425-Speed 5194.11 samples/sec Loss 1.3900 LearningRate 0.0092 Epoch: 13 Global Step: 232530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:52:23,408-Speed 5163.91 samples/sec Loss 1.3579 LearningRate 0.0092 Epoch: 13 Global Step: 232540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:52:25,399-Speed 5144.50 samples/sec Loss 1.4464 LearningRate 0.0092 Epoch: 13 Global Step: 232550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:52:27,388-Speed 5150.22 samples/sec Loss 1.4258 LearningRate 0.0092 Epoch: 13 Global Step: 232560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:52:29,357-Speed 5202.88 samples/sec Loss 1.3537 LearningRate 0.0092 Epoch: 13 Global Step: 232570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:52:31,327-Speed 5199.77 samples/sec Loss 1.3649 LearningRate 0.0092 Epoch: 13 Global Step: 232580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:52:33,297-Speed 5199.51 samples/sec Loss 1.3521 LearningRate 0.0092 Epoch: 13 Global Step: 232590 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:52:35,285-Speed 5152.56 samples/sec Loss 1.3785 LearningRate 0.0092 Epoch: 13 Global Step: 232600 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:52:37,253-Speed 5205.49 samples/sec Loss 1.3809 LearningRate 0.0092 Epoch: 13 Global Step: 232610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:52:39,225-Speed 5194.90 samples/sec Loss 1.3814 LearningRate 0.0092 Epoch: 13 Global Step: 232620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:52:41,198-Speed 5190.13 samples/sec Loss 1.3623 LearningRate 0.0092 Epoch: 13 Global Step: 232630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:52:43,177-Speed 5175.97 samples/sec Loss 1.4133 LearningRate 0.0092 Epoch: 13 Global Step: 232640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:52:45,170-Speed 5141.66 samples/sec Loss 1.4052 LearningRate 0.0092 Epoch: 13 Global Step: 232650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:52:47,145-Speed 5186.00 samples/sec Loss 1.4356 LearningRate 0.0092 Epoch: 13 Global Step: 232660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:52:49,124-Speed 5176.10 samples/sec Loss 1.3845 LearningRate 0.0092 Epoch: 13 Global Step: 232670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:52:51,118-Speed 5137.05 samples/sec Loss 1.3796 LearningRate 0.0092 Epoch: 13 Global Step: 232680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:52:53,101-Speed 5164.92 samples/sec Loss 1.3864 LearningRate 0.0092 Epoch: 13 Global Step: 232690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:52:55,070-Speed 5201.96 samples/sec Loss 1.3773 LearningRate 0.0092 Epoch: 13 Global Step: 232700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:52:57,046-Speed 5185.03 samples/sec Loss 1.4049 LearningRate 0.0092 Epoch: 13 Global Step: 232710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:52:59,034-Speed 5152.69 samples/sec Loss 1.3914 LearningRate 0.0092 Epoch: 13 Global Step: 232720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:53:01,020-Speed 5157.17 samples/sec Loss 1.4079 LearningRate 0.0092 Epoch: 13 Global Step: 232730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:53:03,012-Speed 5142.78 samples/sec Loss 1.3837 LearningRate 0.0092 Epoch: 13 Global Step: 232740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:53:04,986-Speed 5190.42 samples/sec Loss 1.4313 LearningRate 0.0092 Epoch: 13 Global Step: 232750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:53:06,955-Speed 5201.72 samples/sec Loss 1.3935 LearningRate 0.0092 Epoch: 13 Global Step: 232760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:53:08,931-Speed 5185.04 samples/sec Loss 1.3268 LearningRate 0.0092 Epoch: 13 Global Step: 232770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:53:10,929-Speed 5125.50 samples/sec Loss 1.4193 LearningRate 0.0092 Epoch: 13 Global Step: 232780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:53:12,913-Speed 5162.08 samples/sec Loss 1.3839 LearningRate 0.0092 Epoch: 13 Global Step: 232790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:53:14,886-Speed 5193.30 samples/sec Loss 1.3989 LearningRate 0.0092 Epoch: 13 Global Step: 232800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:53:16,870-Speed 5161.32 samples/sec Loss 1.4313 LearningRate 0.0092 Epoch: 13 Global Step: 232810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:53:18,858-Speed 5153.98 samples/sec Loss 1.3833 LearningRate 0.0092 Epoch: 13 Global Step: 232820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:53:20,844-Speed 5159.04 samples/sec Loss 1.3584 LearningRate 0.0092 Epoch: 13 Global Step: 232830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:53:22,848-Speed 5110.79 samples/sec Loss 1.3967 LearningRate 0.0092 Epoch: 13 Global Step: 232840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:53:24,824-Speed 5185.34 samples/sec Loss 1.4213 LearningRate 0.0091 Epoch: 13 Global Step: 232850 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:53:26,802-Speed 5178.91 samples/sec Loss 1.3745 LearningRate 0.0091 Epoch: 13 Global Step: 232860 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:53:28,772-Speed 5197.95 samples/sec Loss 1.3491 LearningRate 0.0091 Epoch: 13 Global Step: 232870 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:53:30,747-Speed 5187.51 samples/sec Loss 1.3677 LearningRate 0.0091 Epoch: 13 Global Step: 232880 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:53:32,722-Speed 5185.21 samples/sec Loss 1.3920 LearningRate 0.0091 Epoch: 13 Global Step: 232890 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:53:34,702-Speed 5173.58 samples/sec Loss 1.4092 LearningRate 0.0091 Epoch: 13 Global Step: 232900 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:53:36,674-Speed 5195.88 samples/sec Loss 1.3912 LearningRate 0.0091 Epoch: 13 Global Step: 232910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:53:38,649-Speed 5184.13 samples/sec Loss 1.3898 LearningRate 0.0091 Epoch: 13 Global Step: 232920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:53:40,640-Speed 5146.01 samples/sec Loss 1.3570 LearningRate 0.0091 Epoch: 13 Global Step: 232930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:53:42,625-Speed 5161.88 samples/sec Loss 1.4312 LearningRate 0.0091 Epoch: 13 Global Step: 232940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:53:44,605-Speed 5172.36 samples/sec Loss 1.3556 LearningRate 0.0091 Epoch: 13 Global Step: 232950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:53:46,582-Speed 5181.91 samples/sec Loss 1.3889 LearningRate 0.0091 Epoch: 13 Global Step: 232960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:53:48,590-Speed 5102.34 samples/sec Loss 1.3891 LearningRate 0.0091 Epoch: 13 Global Step: 232970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:53:50,560-Speed 5197.22 samples/sec Loss 1.3751 LearningRate 0.0091 Epoch: 13 Global Step: 232980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:53:52,536-Speed 5185.00 samples/sec Loss 1.4029 LearningRate 0.0091 Epoch: 13 Global Step: 232990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:53:54,526-Speed 5147.65 samples/sec Loss 1.3815 LearningRate 0.0091 Epoch: 13 Global Step: 233000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:53:56,520-Speed 5136.90 samples/sec Loss 1.3571 LearningRate 0.0091 Epoch: 13 Global Step: 233010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:53:58,498-Speed 5178.11 samples/sec Loss 1.3499 LearningRate 0.0091 Epoch: 13 Global Step: 233020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:54:00,490-Speed 5141.21 samples/sec Loss 1.4388 LearningRate 0.0091 Epoch: 13 Global Step: 233030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:54:02,480-Speed 5148.90 samples/sec Loss 1.3801 LearningRate 0.0091 Epoch: 13 Global Step: 233040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:54:04,469-Speed 5151.10 samples/sec Loss 1.4011 LearningRate 0.0091 Epoch: 13 Global Step: 233050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:54:06,439-Speed 5198.87 samples/sec Loss 1.4030 LearningRate 0.0091 Epoch: 13 Global Step: 233060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:54:08,417-Speed 5179.27 samples/sec Loss 1.3779 LearningRate 0.0091 Epoch: 13 Global Step: 233070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:54:10,396-Speed 5176.59 samples/sec Loss 1.3862 LearningRate 0.0091 Epoch: 13 Global Step: 233080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:54:12,370-Speed 5187.70 samples/sec Loss 1.3864 LearningRate 0.0091 Epoch: 13 Global Step: 233090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:54:14,351-Speed 5172.37 samples/sec Loss 1.4264 LearningRate 0.0091 Epoch: 13 Global Step: 233100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:54:16,340-Speed 5150.14 samples/sec Loss 1.3572 LearningRate 0.0091 Epoch: 13 Global Step: 233110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:54:18,342-Speed 5114.40 samples/sec Loss 1.3862 LearningRate 0.0091 Epoch: 13 Global Step: 233120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:54:20,315-Speed 5193.45 samples/sec Loss 1.3908 LearningRate 0.0091 Epoch: 13 Global Step: 233130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:54:22,299-Speed 5163.08 samples/sec Loss 1.3809 LearningRate 0.0091 Epoch: 13 Global Step: 233140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:54:24,277-Speed 5178.52 samples/sec Loss 1.3316 LearningRate 0.0091 Epoch: 13 Global Step: 233150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:54:26,264-Speed 5154.70 samples/sec Loss 1.3797 LearningRate 0.0091 Epoch: 13 Global Step: 233160 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:54:28,258-Speed 5138.79 samples/sec Loss 1.3966 LearningRate 0.0091 Epoch: 13 Global Step: 233170 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:54:30,226-Speed 5203.96 samples/sec Loss 1.4251 LearningRate 0.0091 Epoch: 13 Global Step: 233180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:54:32,204-Speed 5177.97 samples/sec Loss 1.3512 LearningRate 0.0091 Epoch: 13 Global Step: 233190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:54:34,177-Speed 5192.39 samples/sec Loss 1.3840 LearningRate 0.0091 Epoch: 13 Global Step: 233200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:54:36,165-Speed 5151.79 samples/sec Loss 1.4057 LearningRate 0.0091 Epoch: 13 Global Step: 233210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:54:38,183-Speed 5076.12 samples/sec Loss 1.3741 LearningRate 0.0091 Epoch: 13 Global Step: 233220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:54:40,157-Speed 5188.18 samples/sec Loss 1.3783 LearningRate 0.0091 Epoch: 13 Global Step: 233230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:54:42,133-Speed 5185.23 samples/sec Loss 1.4260 LearningRate 0.0091 Epoch: 13 Global Step: 233240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:54:44,108-Speed 5187.40 samples/sec Loss 1.4074 LearningRate 0.0091 Epoch: 13 Global Step: 233250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:54:46,097-Speed 5148.13 samples/sec Loss 1.3572 LearningRate 0.0091 Epoch: 13 Global Step: 233260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:54:48,089-Speed 5143.71 samples/sec Loss 1.3206 LearningRate 0.0091 Epoch: 13 Global Step: 233270 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:54:50,076-Speed 5155.35 samples/sec Loss 1.4134 LearningRate 0.0091 Epoch: 13 Global Step: 233280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:54:52,048-Speed 5195.47 samples/sec Loss 1.3385 LearningRate 0.0091 Epoch: 13 Global Step: 233290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:54:54,029-Speed 5169.04 samples/sec Loss 1.4096 LearningRate 0.0091 Epoch: 13 Global Step: 233300 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:54:56,003-Speed 5190.85 samples/sec Loss 1.3781 LearningRate 0.0091 Epoch: 13 Global Step: 233310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:54:57,983-Speed 5172.82 samples/sec Loss 1.4228 LearningRate 0.0091 Epoch: 13 Global Step: 233320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:54:59,973-Speed 5148.46 samples/sec Loss 1.3995 LearningRate 0.0091 Epoch: 13 Global Step: 233330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:55:01,966-Speed 5138.84 samples/sec Loss 1.3487 LearningRate 0.0091 Epoch: 13 Global Step: 233340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:55:03,957-Speed 5145.70 samples/sec Loss 1.4091 LearningRate 0.0091 Epoch: 13 Global Step: 233350 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:55:05,937-Speed 5173.07 samples/sec Loss 1.3928 LearningRate 0.0091 Epoch: 13 Global Step: 233360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:55:07,922-Speed 5158.89 samples/sec Loss 1.3680 LearningRate 0.0091 Epoch: 13 Global Step: 233370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:55:09,898-Speed 5184.68 samples/sec Loss 1.3406 LearningRate 0.0091 Epoch: 13 Global Step: 233380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:55:11,873-Speed 5188.21 samples/sec Loss 1.3826 LearningRate 0.0091 Epoch: 13 Global Step: 233390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:55:13,843-Speed 5198.69 samples/sec Loss 1.3978 LearningRate 0.0090 Epoch: 13 Global Step: 233400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:55:15,815-Speed 5194.77 samples/sec Loss 1.4287 LearningRate 0.0090 Epoch: 13 Global Step: 233410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:55:17,798-Speed 5164.49 samples/sec Loss 1.3735 LearningRate 0.0090 Epoch: 13 Global Step: 233420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:55:19,771-Speed 5191.60 samples/sec Loss 1.3969 LearningRate 0.0090 Epoch: 13 Global Step: 233430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:55:21,750-Speed 5175.82 samples/sec Loss 1.3944 LearningRate 0.0090 Epoch: 13 Global Step: 233440 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:55:23,725-Speed 5187.82 samples/sec Loss 1.4070 LearningRate 0.0090 Epoch: 13 Global Step: 233450 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:55:25,734-Speed 5098.26 samples/sec Loss 1.3525 LearningRate 0.0090 Epoch: 13 Global Step: 233460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:55:27,714-Speed 5173.06 samples/sec Loss 1.3791 LearningRate 0.0090 Epoch: 13 Global Step: 233470 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:55:29,704-Speed 5149.16 samples/sec Loss 1.4143 LearningRate 0.0090 Epoch: 13 Global Step: 233480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:55:31,678-Speed 5189.99 samples/sec Loss 1.3371 LearningRate 0.0090 Epoch: 13 Global Step: 233490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:55:33,664-Speed 5156.78 samples/sec Loss 1.4195 LearningRate 0.0090 Epoch: 13 Global Step: 233500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:55:35,651-Speed 5153.86 samples/sec Loss 1.3954 LearningRate 0.0090 Epoch: 13 Global Step: 233510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:55:37,664-Speed 5088.62 samples/sec Loss 1.3869 LearningRate 0.0090 Epoch: 13 Global Step: 233520 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:55:39,652-Speed 5154.39 samples/sec Loss 1.3859 LearningRate 0.0090 Epoch: 13 Global Step: 233530 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 14:55:41,621-Speed 5200.83 samples/sec Loss 1.3670 LearningRate 0.0090 Epoch: 13 Global Step: 233540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:55:43,596-Speed 5186.84 samples/sec Loss 1.3929 LearningRate 0.0090 Epoch: 13 Global Step: 233550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:55:45,577-Speed 5170.92 samples/sec Loss 1.4244 LearningRate 0.0090 Epoch: 13 Global Step: 233560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:55:47,563-Speed 5157.54 samples/sec Loss 1.3605 LearningRate 0.0090 Epoch: 13 Global Step: 233570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:55:49,574-Speed 5093.10 samples/sec Loss 1.4004 LearningRate 0.0090 Epoch: 13 Global Step: 233580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:55:51,553-Speed 5177.04 samples/sec Loss 1.3972 LearningRate 0.0090 Epoch: 13 Global Step: 233590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:55:53,532-Speed 5176.21 samples/sec Loss 1.4024 LearningRate 0.0090 Epoch: 13 Global Step: 233600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:55:55,514-Speed 5168.50 samples/sec Loss 1.4193 LearningRate 0.0090 Epoch: 13 Global Step: 233610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:55:57,493-Speed 5177.42 samples/sec Loss 1.3755 LearningRate 0.0090 Epoch: 13 Global Step: 233620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:55:59,467-Speed 5188.56 samples/sec Loss 1.3757 LearningRate 0.0090 Epoch: 13 Global Step: 233630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:56:01,447-Speed 5173.59 samples/sec Loss 1.3584 LearningRate 0.0090 Epoch: 13 Global Step: 233640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:56:03,446-Speed 5124.72 samples/sec Loss 1.3625 LearningRate 0.0090 Epoch: 13 Global Step: 233650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:56:05,429-Speed 5165.18 samples/sec Loss 1.3715 LearningRate 0.0090 Epoch: 13 Global Step: 233660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:56:07,982-Speed 4011.30 samples/sec Loss 1.4364 LearningRate 0.0090 Epoch: 13 Global Step: 233670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:56:38,460-Speed 336.00 samples/sec Loss 1.1587 LearningRate 0.0090 Epoch: 14 Global Step: 233680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:56:40,433-Speed 5194.78 samples/sec Loss 0.9823 LearningRate 0.0090 Epoch: 14 Global Step: 233690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:56:42,416-Speed 5164.64 samples/sec Loss 0.9277 LearningRate 0.0090 Epoch: 14 Global Step: 233700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:56:44,415-Speed 5125.13 samples/sec Loss 0.9740 LearningRate 0.0090 Epoch: 14 Global Step: 233710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:56:46,414-Speed 5124.24 samples/sec Loss 0.9497 LearningRate 0.0090 Epoch: 14 Global Step: 233720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:56:48,547-Speed 4802.11 samples/sec Loss 0.9295 LearningRate 0.0090 Epoch: 14 Global Step: 233730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:56:50,552-Speed 5111.12 samples/sec Loss 0.9323 LearningRate 0.0090 Epoch: 14 Global Step: 233740 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:56:52,524-Speed 5197.48 samples/sec Loss 0.9764 LearningRate 0.0090 Epoch: 14 Global Step: 233750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:56:54,514-Speed 5147.61 samples/sec Loss 0.9264 LearningRate 0.0090 Epoch: 14 Global Step: 233760 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:56:56,499-Speed 5159.92 samples/sec Loss 1.0133 LearningRate 0.0090 Epoch: 14 Global Step: 233770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:56:58,535-Speed 5033.19 samples/sec Loss 0.9611 LearningRate 0.0090 Epoch: 14 Global Step: 233780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:57:00,533-Speed 5127.41 samples/sec Loss 0.9911 LearningRate 0.0090 Epoch: 14 Global Step: 233790 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:57:02,553-Speed 5072.61 samples/sec Loss 0.9419 LearningRate 0.0090 Epoch: 14 Global Step: 233800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:57:04,521-Speed 5205.10 samples/sec Loss 0.9419 LearningRate 0.0090 Epoch: 14 Global Step: 233810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:57:06,508-Speed 5156.21 samples/sec Loss 0.9613 LearningRate 0.0090 Epoch: 14 Global Step: 233820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:57:08,490-Speed 5166.65 samples/sec Loss 0.9655 LearningRate 0.0090 Epoch: 14 Global Step: 233830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:57:10,463-Speed 5194.28 samples/sec Loss 0.9712 LearningRate 0.0090 Epoch: 14 Global Step: 233840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:57:12,453-Speed 5148.52 samples/sec Loss 1.0017 LearningRate 0.0090 Epoch: 14 Global Step: 233850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:57:14,441-Speed 5151.89 samples/sec Loss 0.9780 LearningRate 0.0090 Epoch: 14 Global Step: 233860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:57:16,435-Speed 5137.54 samples/sec Loss 0.9167 LearningRate 0.0090 Epoch: 14 Global Step: 233870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:57:18,431-Speed 5132.38 samples/sec Loss 0.9811 LearningRate 0.0090 Epoch: 14 Global Step: 233880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:57:20,427-Speed 5130.93 samples/sec Loss 0.9577 LearningRate 0.0090 Epoch: 14 Global Step: 233890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:57:22,451-Speed 5062.86 samples/sec Loss 0.9621 LearningRate 0.0090 Epoch: 14 Global Step: 233900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:57:24,893-Speed 4194.42 samples/sec Loss 0.9967 LearningRate 0.0090 Epoch: 14 Global Step: 233910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:57:26,899-Speed 5107.86 samples/sec Loss 0.9518 LearningRate 0.0090 Epoch: 14 Global Step: 233920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:57:28,920-Speed 5067.58 samples/sec Loss 0.9580 LearningRate 0.0090 Epoch: 14 Global Step: 233930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:57:30,909-Speed 5151.04 samples/sec Loss 0.9383 LearningRate 0.0090 Epoch: 14 Global Step: 233940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:57:32,902-Speed 5141.11 samples/sec Loss 1.0107 LearningRate 0.0090 Epoch: 14 Global Step: 233950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:57:34,887-Speed 5160.90 samples/sec Loss 0.9937 LearningRate 0.0089 Epoch: 14 Global Step: 233960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:57:36,909-Speed 5066.08 samples/sec Loss 0.9779 LearningRate 0.0089 Epoch: 14 Global Step: 233970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:57:38,913-Speed 5110.58 samples/sec Loss 1.0240 LearningRate 0.0089 Epoch: 14 Global Step: 233980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:57:40,925-Speed 5092.13 samples/sec Loss 0.9605 LearningRate 0.0089 Epoch: 14 Global Step: 233990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:57:42,909-Speed 5164.36 samples/sec Loss 0.9868 LearningRate 0.0089 Epoch: 14 Global Step: 234000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:58:09,454-[lfw][234000]XNorm: 21.405007 Training: 2022-04-11 14:58:09,454-[lfw][234000]Accuracy-Flip: 0.99767+-0.00260 Training: 2022-04-11 14:58:09,455-[lfw][234000]Accuracy-Highest: 0.99833 Training: 2022-04-11 14:58:40,472-[cfp_fp][234000]XNorm: 20.884846 Training: 2022-04-11 14:58:40,473-[cfp_fp][234000]Accuracy-Flip: 0.98686+-0.00408 Training: 2022-04-11 14:58:40,473-[cfp_fp][234000]Accuracy-Highest: 0.98843 Training: 2022-04-11 14:59:07,118-[agedb_30][234000]XNorm: 21.748847 Training: 2022-04-11 14:59:07,119-[agedb_30][234000]Accuracy-Flip: 0.98283+-0.00675 Training: 2022-04-11 14:59:07,120-[agedb_30][234000]Accuracy-Highest: 0.98283 Training: 2022-04-11 14:59:09,103-Speed 118.80 samples/sec Loss 0.9905 LearningRate 0.0089 Epoch: 14 Global Step: 234010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:59:11,051-Speed 5258.73 samples/sec Loss 0.9594 LearningRate 0.0089 Epoch: 14 Global Step: 234020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:59:13,009-Speed 5230.21 samples/sec Loss 0.9628 LearningRate 0.0089 Epoch: 14 Global Step: 234030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:59:14,987-Speed 5181.02 samples/sec Loss 0.9538 LearningRate 0.0089 Epoch: 14 Global Step: 234040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:59:16,948-Speed 5223.81 samples/sec Loss 0.9379 LearningRate 0.0089 Epoch: 14 Global Step: 234050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:59:18,914-Speed 5210.02 samples/sec Loss 0.9836 LearningRate 0.0089 Epoch: 14 Global Step: 234060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:59:20,878-Speed 5213.41 samples/sec Loss 0.9374 LearningRate 0.0089 Epoch: 14 Global Step: 234070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:59:22,839-Speed 5225.72 samples/sec Loss 0.9536 LearningRate 0.0089 Epoch: 14 Global Step: 234080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:59:24,804-Speed 5213.66 samples/sec Loss 1.0035 LearningRate 0.0089 Epoch: 14 Global Step: 234090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:59:26,772-Speed 5205.93 samples/sec Loss 0.9796 LearningRate 0.0089 Epoch: 14 Global Step: 234100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:59:28,740-Speed 5204.28 samples/sec Loss 0.9848 LearningRate 0.0089 Epoch: 14 Global Step: 234110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:59:30,706-Speed 5210.80 samples/sec Loss 0.9638 LearningRate 0.0089 Epoch: 14 Global Step: 234120 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:59:32,686-Speed 5174.49 samples/sec Loss 0.9541 LearningRate 0.0089 Epoch: 14 Global Step: 234130 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:59:34,650-Speed 5214.86 samples/sec Loss 0.9640 LearningRate 0.0089 Epoch: 14 Global Step: 234140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:59:36,613-Speed 5219.55 samples/sec Loss 0.9758 LearningRate 0.0089 Epoch: 14 Global Step: 234150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:59:38,577-Speed 5216.21 samples/sec Loss 0.9595 LearningRate 0.0089 Epoch: 14 Global Step: 234160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:59:40,539-Speed 5220.75 samples/sec Loss 0.9542 LearningRate 0.0089 Epoch: 14 Global Step: 234170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:59:42,498-Speed 5228.47 samples/sec Loss 0.9810 LearningRate 0.0089 Epoch: 14 Global Step: 234180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:59:44,458-Speed 5225.40 samples/sec Loss 0.9819 LearningRate 0.0089 Epoch: 14 Global Step: 234190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:59:46,437-Speed 5178.30 samples/sec Loss 0.9479 LearningRate 0.0089 Epoch: 14 Global Step: 234200 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:59:48,428-Speed 5144.62 samples/sec Loss 0.9332 LearningRate 0.0089 Epoch: 14 Global Step: 234210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:59:50,395-Speed 5208.00 samples/sec Loss 0.9838 LearningRate 0.0089 Epoch: 14 Global Step: 234220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:59:52,381-Speed 5157.02 samples/sec Loss 0.9680 LearningRate 0.0089 Epoch: 14 Global Step: 234230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 14:59:54,348-Speed 5209.40 samples/sec Loss 0.9848 LearningRate 0.0089 Epoch: 14 Global Step: 234240 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:59:56,311-Speed 5217.48 samples/sec Loss 0.9838 LearningRate 0.0089 Epoch: 14 Global Step: 234250 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 14:59:58,283-Speed 5194.15 samples/sec Loss 0.9814 LearningRate 0.0089 Epoch: 14 Global Step: 234260 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:00:00,254-Speed 5196.47 samples/sec Loss 0.9518 LearningRate 0.0089 Epoch: 14 Global Step: 234270 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:00:02,242-Speed 5153.26 samples/sec Loss 1.0378 LearningRate 0.0089 Epoch: 14 Global Step: 234280 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:00:04,223-Speed 5171.25 samples/sec Loss 1.0123 LearningRate 0.0089 Epoch: 14 Global Step: 234290 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:00:06,328-Speed 4866.50 samples/sec Loss 1.0359 LearningRate 0.0089 Epoch: 14 Global Step: 234300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:00:08,287-Speed 5231.05 samples/sec Loss 0.9631 LearningRate 0.0089 Epoch: 14 Global Step: 234310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:00:10,258-Speed 5196.93 samples/sec Loss 1.0144 LearningRate 0.0089 Epoch: 14 Global Step: 234320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:00:12,235-Speed 5179.32 samples/sec Loss 0.9596 LearningRate 0.0089 Epoch: 14 Global Step: 234330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:00:14,211-Speed 5184.24 samples/sec Loss 0.9646 LearningRate 0.0089 Epoch: 14 Global Step: 234340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:00:16,176-Speed 5212.58 samples/sec Loss 0.9667 LearningRate 0.0089 Epoch: 14 Global Step: 234350 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:00:18,153-Speed 5181.06 samples/sec Loss 0.9962 LearningRate 0.0089 Epoch: 14 Global Step: 234360 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:00:20,120-Speed 5207.15 samples/sec Loss 0.9759 LearningRate 0.0089 Epoch: 14 Global Step: 234370 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:00:22,082-Speed 5222.44 samples/sec Loss 0.9925 LearningRate 0.0089 Epoch: 14 Global Step: 234380 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:00:24,058-Speed 5183.53 samples/sec Loss 1.0048 LearningRate 0.0089 Epoch: 14 Global Step: 234390 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:00:26,031-Speed 5194.43 samples/sec Loss 0.9831 LearningRate 0.0089 Epoch: 14 Global Step: 234400 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:00:28,026-Speed 5133.01 samples/sec Loss 0.9661 LearningRate 0.0089 Epoch: 14 Global Step: 234410 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:00:29,996-Speed 5199.16 samples/sec Loss 0.9738 LearningRate 0.0089 Epoch: 14 Global Step: 234420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:00:31,975-Speed 5178.03 samples/sec Loss 0.9527 LearningRate 0.0089 Epoch: 14 Global Step: 234430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:00:33,954-Speed 5174.91 samples/sec Loss 1.0288 LearningRate 0.0089 Epoch: 14 Global Step: 234440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:00:35,940-Speed 5157.87 samples/sec Loss 0.9636 LearningRate 0.0089 Epoch: 14 Global Step: 234450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:00:37,914-Speed 5190.71 samples/sec Loss 0.9920 LearningRate 0.0089 Epoch: 14 Global Step: 234460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:00:39,878-Speed 5214.89 samples/sec Loss 0.9905 LearningRate 0.0089 Epoch: 14 Global Step: 234470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:00:41,857-Speed 5175.30 samples/sec Loss 1.0259 LearningRate 0.0089 Epoch: 14 Global Step: 234480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:00:43,816-Speed 5232.09 samples/sec Loss 0.9872 LearningRate 0.0089 Epoch: 14 Global Step: 234490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:00:45,783-Speed 5208.17 samples/sec Loss 0.9570 LearningRate 0.0089 Epoch: 14 Global Step: 234500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:00:47,756-Speed 5189.46 samples/sec Loss 0.9838 LearningRate 0.0089 Epoch: 14 Global Step: 234510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:00:49,713-Speed 5236.30 samples/sec Loss 0.9702 LearningRate 0.0088 Epoch: 14 Global Step: 234520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:00:51,674-Speed 5221.61 samples/sec Loss 1.0031 LearningRate 0.0088 Epoch: 14 Global Step: 234530 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:00:53,642-Speed 5206.14 samples/sec Loss 1.0082 LearningRate 0.0088 Epoch: 14 Global Step: 234540 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:00:55,614-Speed 5194.85 samples/sec Loss 0.9833 LearningRate 0.0088 Epoch: 14 Global Step: 234550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:00:57,582-Speed 5205.13 samples/sec Loss 0.9487 LearningRate 0.0088 Epoch: 14 Global Step: 234560 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:00:59,556-Speed 5188.73 samples/sec Loss 0.9556 LearningRate 0.0088 Epoch: 14 Global Step: 234570 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:01:01,529-Speed 5191.20 samples/sec Loss 0.9402 LearningRate 0.0088 Epoch: 14 Global Step: 234580 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:01:03,497-Speed 5206.14 samples/sec Loss 0.9604 LearningRate 0.0088 Epoch: 14 Global Step: 234590 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:01:05,458-Speed 5223.81 samples/sec Loss 1.0068 LearningRate 0.0088 Epoch: 14 Global Step: 234600 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:01:07,422-Speed 5216.09 samples/sec Loss 0.9593 LearningRate 0.0088 Epoch: 14 Global Step: 234610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:01:09,417-Speed 5132.71 samples/sec Loss 0.9257 LearningRate 0.0088 Epoch: 14 Global Step: 234620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:01:11,395-Speed 5178.92 samples/sec Loss 0.9961 LearningRate 0.0088 Epoch: 14 Global Step: 234630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:01:13,356-Speed 5222.84 samples/sec Loss 0.9861 LearningRate 0.0088 Epoch: 14 Global Step: 234640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:01:15,324-Speed 5208.26 samples/sec Loss 0.9939 LearningRate 0.0088 Epoch: 14 Global Step: 234650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:01:17,335-Speed 5094.68 samples/sec Loss 0.9844 LearningRate 0.0088 Epoch: 14 Global Step: 234660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:01:19,292-Speed 5233.05 samples/sec Loss 0.9637 LearningRate 0.0088 Epoch: 14 Global Step: 234670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:01:21,252-Speed 5227.23 samples/sec Loss 0.9703 LearningRate 0.0088 Epoch: 14 Global Step: 234680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:01:23,224-Speed 5193.74 samples/sec Loss 0.9812 LearningRate 0.0088 Epoch: 14 Global Step: 234690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:01:25,190-Speed 5213.13 samples/sec Loss 1.0284 LearningRate 0.0088 Epoch: 14 Global Step: 234700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:01:27,158-Speed 5205.48 samples/sec Loss 0.9893 LearningRate 0.0088 Epoch: 14 Global Step: 234710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:01:29,121-Speed 5216.38 samples/sec Loss 1.0033 LearningRate 0.0088 Epoch: 14 Global Step: 234720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:01:31,088-Speed 5207.40 samples/sec Loss 0.9601 LearningRate 0.0088 Epoch: 14 Global Step: 234730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:01:33,069-Speed 5171.27 samples/sec Loss 1.0025 LearningRate 0.0088 Epoch: 14 Global Step: 234740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:01:35,074-Speed 5108.44 samples/sec Loss 0.9473 LearningRate 0.0088 Epoch: 14 Global Step: 234750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:01:37,033-Speed 5231.26 samples/sec Loss 1.0265 LearningRate 0.0088 Epoch: 14 Global Step: 234760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:01:38,994-Speed 5224.27 samples/sec Loss 0.9914 LearningRate 0.0088 Epoch: 14 Global Step: 234770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:01:40,960-Speed 5209.41 samples/sec Loss 0.9925 LearningRate 0.0088 Epoch: 14 Global Step: 234780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:01:42,919-Speed 5227.80 samples/sec Loss 0.9694 LearningRate 0.0088 Epoch: 14 Global Step: 234790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:01:44,880-Speed 5226.03 samples/sec Loss 1.0253 LearningRate 0.0088 Epoch: 14 Global Step: 234800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:01:46,839-Speed 5226.67 samples/sec Loss 1.0285 LearningRate 0.0088 Epoch: 14 Global Step: 234810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:01:48,818-Speed 5177.83 samples/sec Loss 1.0153 LearningRate 0.0088 Epoch: 14 Global Step: 234820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:01:50,780-Speed 5218.79 samples/sec Loss 0.9690 LearningRate 0.0088 Epoch: 14 Global Step: 234830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:01:52,746-Speed 5215.10 samples/sec Loss 0.9972 LearningRate 0.0088 Epoch: 14 Global Step: 234840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:01:54,708-Speed 5220.83 samples/sec Loss 1.0228 LearningRate 0.0088 Epoch: 14 Global Step: 234850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:01:56,673-Speed 5213.96 samples/sec Loss 0.9879 LearningRate 0.0088 Epoch: 14 Global Step: 234860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:01:58,656-Speed 5166.60 samples/sec Loss 0.9700 LearningRate 0.0088 Epoch: 14 Global Step: 234870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:02:00,631-Speed 5186.46 samples/sec Loss 0.9884 LearningRate 0.0088 Epoch: 14 Global Step: 234880 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:02:02,606-Speed 5184.40 samples/sec Loss 0.9777 LearningRate 0.0088 Epoch: 14 Global Step: 234890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:02:04,569-Speed 5219.69 samples/sec Loss 1.0018 LearningRate 0.0088 Epoch: 14 Global Step: 234900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:02:06,546-Speed 5181.82 samples/sec Loss 0.9873 LearningRate 0.0088 Epoch: 14 Global Step: 234910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:02:08,522-Speed 5183.82 samples/sec Loss 0.9709 LearningRate 0.0088 Epoch: 14 Global Step: 234920 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:02:10,509-Speed 5154.24 samples/sec Loss 0.9405 LearningRate 0.0088 Epoch: 14 Global Step: 234930 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:02:12,523-Speed 5085.36 samples/sec Loss 0.9657 LearningRate 0.0088 Epoch: 14 Global Step: 234940 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:02:14,510-Speed 5156.50 samples/sec Loss 0.9789 LearningRate 0.0088 Epoch: 14 Global Step: 234950 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:02:16,482-Speed 5195.14 samples/sec Loss 1.0309 LearningRate 0.0088 Epoch: 14 Global Step: 234960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:02:18,443-Speed 5223.15 samples/sec Loss 1.0175 LearningRate 0.0088 Epoch: 14 Global Step: 234970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:02:20,415-Speed 5194.45 samples/sec Loss 0.9648 LearningRate 0.0088 Epoch: 14 Global Step: 234980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:02:22,390-Speed 5187.65 samples/sec Loss 0.9998 LearningRate 0.0088 Epoch: 14 Global Step: 234990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:02:24,355-Speed 5213.24 samples/sec Loss 0.9919 LearningRate 0.0088 Epoch: 14 Global Step: 235000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:02:26,320-Speed 5213.95 samples/sec Loss 1.0039 LearningRate 0.0088 Epoch: 14 Global Step: 235010 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:02:28,286-Speed 5212.44 samples/sec Loss 1.0023 LearningRate 0.0088 Epoch: 14 Global Step: 235020 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:02:30,250-Speed 5213.64 samples/sec Loss 1.0191 LearningRate 0.0088 Epoch: 14 Global Step: 235030 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:02:32,240-Speed 5148.28 samples/sec Loss 0.9894 LearningRate 0.0088 Epoch: 14 Global Step: 235040 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:02:34,222-Speed 5166.36 samples/sec Loss 1.0070 LearningRate 0.0088 Epoch: 14 Global Step: 235050 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:02:36,198-Speed 5185.55 samples/sec Loss 0.9353 LearningRate 0.0088 Epoch: 14 Global Step: 235060 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:02:38,167-Speed 5201.88 samples/sec Loss 1.0091 LearningRate 0.0088 Epoch: 14 Global Step: 235070 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:02:40,129-Speed 5222.56 samples/sec Loss 0.9654 LearningRate 0.0087 Epoch: 14 Global Step: 235080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:02:42,096-Speed 5207.69 samples/sec Loss 1.0077 LearningRate 0.0087 Epoch: 14 Global Step: 235090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:02:44,058-Speed 5219.23 samples/sec Loss 0.9701 LearningRate 0.0087 Epoch: 14 Global Step: 235100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:02:46,043-Speed 5162.56 samples/sec Loss 1.0075 LearningRate 0.0087 Epoch: 14 Global Step: 235110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:02:48,009-Speed 5209.88 samples/sec Loss 1.0063 LearningRate 0.0087 Epoch: 14 Global Step: 235120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:02:49,983-Speed 5188.79 samples/sec Loss 1.0326 LearningRate 0.0087 Epoch: 14 Global Step: 235130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:02:51,946-Speed 5216.97 samples/sec Loss 1.0197 LearningRate 0.0087 Epoch: 14 Global Step: 235140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:02:53,907-Speed 5223.29 samples/sec Loss 1.0042 LearningRate 0.0087 Epoch: 14 Global Step: 235150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:02:55,875-Speed 5206.61 samples/sec Loss 1.0134 LearningRate 0.0087 Epoch: 14 Global Step: 235160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:02:57,853-Speed 5180.04 samples/sec Loss 0.9717 LearningRate 0.0087 Epoch: 14 Global Step: 235170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:02:59,829-Speed 5183.79 samples/sec Loss 0.9687 LearningRate 0.0087 Epoch: 14 Global Step: 235180 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:03:01,801-Speed 5195.42 samples/sec Loss 0.9737 LearningRate 0.0087 Epoch: 14 Global Step: 235190 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:03:03,769-Speed 5204.05 samples/sec Loss 0.9776 LearningRate 0.0087 Epoch: 14 Global Step: 235200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:03:05,724-Speed 5238.75 samples/sec Loss 0.9861 LearningRate 0.0087 Epoch: 14 Global Step: 235210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:03:07,686-Speed 5221.84 samples/sec Loss 0.9772 LearningRate 0.0087 Epoch: 14 Global Step: 235220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:03:09,649-Speed 5217.74 samples/sec Loss 1.0049 LearningRate 0.0087 Epoch: 14 Global Step: 235230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:03:11,649-Speed 5120.44 samples/sec Loss 0.9864 LearningRate 0.0087 Epoch: 14 Global Step: 235240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:03:13,648-Speed 5124.84 samples/sec Loss 1.0694 LearningRate 0.0087 Epoch: 14 Global Step: 235250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:03:15,628-Speed 5173.68 samples/sec Loss 0.9995 LearningRate 0.0087 Epoch: 14 Global Step: 235260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:03:17,609-Speed 5169.63 samples/sec Loss 0.9907 LearningRate 0.0087 Epoch: 14 Global Step: 235270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:03:19,575-Speed 5212.21 samples/sec Loss 1.0614 LearningRate 0.0087 Epoch: 14 Global Step: 235280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:03:21,539-Speed 5215.26 samples/sec Loss 0.9819 LearningRate 0.0087 Epoch: 14 Global Step: 235290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:03:23,500-Speed 5225.23 samples/sec Loss 1.0009 LearningRate 0.0087 Epoch: 14 Global Step: 235300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:03:25,469-Speed 5200.55 samples/sec Loss 1.0007 LearningRate 0.0087 Epoch: 14 Global Step: 235310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:03:27,438-Speed 5201.49 samples/sec Loss 0.9950 LearningRate 0.0087 Epoch: 14 Global Step: 235320 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:03:29,416-Speed 5181.55 samples/sec Loss 1.0295 LearningRate 0.0087 Epoch: 14 Global Step: 235330 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:03:31,376-Speed 5223.56 samples/sec Loss 1.0098 LearningRate 0.0087 Epoch: 14 Global Step: 235340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:03:33,339-Speed 5219.37 samples/sec Loss 1.0191 LearningRate 0.0087 Epoch: 14 Global Step: 235350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:03:35,320-Speed 5172.00 samples/sec Loss 1.0181 LearningRate 0.0087 Epoch: 14 Global Step: 235360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:03:37,303-Speed 5164.56 samples/sec Loss 1.0300 LearningRate 0.0087 Epoch: 14 Global Step: 235370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:03:39,275-Speed 5193.64 samples/sec Loss 0.9983 LearningRate 0.0087 Epoch: 14 Global Step: 235380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:03:41,254-Speed 5178.01 samples/sec Loss 0.9938 LearningRate 0.0087 Epoch: 14 Global Step: 235390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:03:43,218-Speed 5214.30 samples/sec Loss 0.9926 LearningRate 0.0087 Epoch: 14 Global Step: 235400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:03:45,186-Speed 5206.15 samples/sec Loss 1.0474 LearningRate 0.0087 Epoch: 14 Global Step: 235410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:03:47,148-Speed 5220.35 samples/sec Loss 0.9949 LearningRate 0.0087 Epoch: 14 Global Step: 235420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:03:49,120-Speed 5193.19 samples/sec Loss 1.0340 LearningRate 0.0087 Epoch: 14 Global Step: 235430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:03:51,092-Speed 5194.98 samples/sec Loss 1.0242 LearningRate 0.0087 Epoch: 14 Global Step: 235440 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:03:53,066-Speed 5189.03 samples/sec Loss 1.0241 LearningRate 0.0087 Epoch: 14 Global Step: 235450 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:03:55,045-Speed 5177.08 samples/sec Loss 0.9880 LearningRate 0.0087 Epoch: 14 Global Step: 235460 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:03:57,002-Speed 5232.86 samples/sec Loss 1.0082 LearningRate 0.0087 Epoch: 14 Global Step: 235470 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:03:58,982-Speed 5175.24 samples/sec Loss 1.0390 LearningRate 0.0087 Epoch: 14 Global Step: 235480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:04:00,954-Speed 5194.35 samples/sec Loss 1.0009 LearningRate 0.0087 Epoch: 14 Global Step: 235490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:04:02,921-Speed 5207.85 samples/sec Loss 1.0001 LearningRate 0.0087 Epoch: 14 Global Step: 235500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:04:04,885-Speed 5216.93 samples/sec Loss 1.0177 LearningRate 0.0087 Epoch: 14 Global Step: 235510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:04:06,851-Speed 5209.67 samples/sec Loss 1.0350 LearningRate 0.0087 Epoch: 14 Global Step: 235520 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:04:08,814-Speed 5218.13 samples/sec Loss 1.0279 LearningRate 0.0087 Epoch: 14 Global Step: 235530 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:04:10,792-Speed 5177.59 samples/sec Loss 1.0331 LearningRate 0.0087 Epoch: 14 Global Step: 235540 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:04:12,763-Speed 5199.49 samples/sec Loss 0.9975 LearningRate 0.0087 Epoch: 14 Global Step: 235550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:04:14,735-Speed 5194.20 samples/sec Loss 0.9971 LearningRate 0.0087 Epoch: 14 Global Step: 235560 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:04:16,701-Speed 5209.01 samples/sec Loss 1.0039 LearningRate 0.0087 Epoch: 14 Global Step: 235570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:04:18,687-Speed 5157.21 samples/sec Loss 1.0306 LearningRate 0.0087 Epoch: 14 Global Step: 235580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:04:20,664-Speed 5184.77 samples/sec Loss 1.0132 LearningRate 0.0087 Epoch: 14 Global Step: 235590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:04:22,627-Speed 5218.73 samples/sec Loss 1.0456 LearningRate 0.0087 Epoch: 14 Global Step: 235600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:04:24,596-Speed 5201.02 samples/sec Loss 1.0433 LearningRate 0.0087 Epoch: 14 Global Step: 235610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:04:26,580-Speed 5162.57 samples/sec Loss 1.0531 LearningRate 0.0087 Epoch: 14 Global Step: 235620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:04:28,566-Speed 5158.52 samples/sec Loss 1.0179 LearningRate 0.0087 Epoch: 14 Global Step: 235630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:04:30,537-Speed 5195.87 samples/sec Loss 1.0252 LearningRate 0.0087 Epoch: 14 Global Step: 235640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:04:32,501-Speed 5215.86 samples/sec Loss 1.0182 LearningRate 0.0086 Epoch: 14 Global Step: 235650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:04:34,481-Speed 5173.04 samples/sec Loss 1.0412 LearningRate 0.0086 Epoch: 14 Global Step: 235660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:04:36,446-Speed 5212.64 samples/sec Loss 1.0491 LearningRate 0.0086 Epoch: 14 Global Step: 235670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:04:38,448-Speed 5117.23 samples/sec Loss 1.0272 LearningRate 0.0086 Epoch: 14 Global Step: 235680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:04:40,414-Speed 5209.75 samples/sec Loss 1.0269 LearningRate 0.0086 Epoch: 14 Global Step: 235690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:04:42,388-Speed 5207.02 samples/sec Loss 1.0576 LearningRate 0.0086 Epoch: 14 Global Step: 235700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:04:44,354-Speed 5211.19 samples/sec Loss 1.0256 LearningRate 0.0086 Epoch: 14 Global Step: 235710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:04:46,329-Speed 5186.54 samples/sec Loss 1.0281 LearningRate 0.0086 Epoch: 14 Global Step: 235720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:04:48,331-Speed 5115.15 samples/sec Loss 1.0303 LearningRate 0.0086 Epoch: 14 Global Step: 235730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:04:50,337-Speed 5107.89 samples/sec Loss 1.0418 LearningRate 0.0086 Epoch: 14 Global Step: 235740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:04:52,313-Speed 5182.32 samples/sec Loss 1.0349 LearningRate 0.0086 Epoch: 14 Global Step: 235750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:04:54,284-Speed 5197.30 samples/sec Loss 1.0583 LearningRate 0.0086 Epoch: 14 Global Step: 235760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:04:56,270-Speed 5157.78 samples/sec Loss 1.0274 LearningRate 0.0086 Epoch: 14 Global Step: 235770 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:04:58,253-Speed 5165.00 samples/sec Loss 1.0281 LearningRate 0.0086 Epoch: 14 Global Step: 235780 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:05:00,222-Speed 5202.58 samples/sec Loss 1.0707 LearningRate 0.0086 Epoch: 14 Global Step: 235790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:05:02,201-Speed 5176.64 samples/sec Loss 1.0244 LearningRate 0.0086 Epoch: 14 Global Step: 235800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:05:04,176-Speed 5187.32 samples/sec Loss 1.0247 LearningRate 0.0086 Epoch: 14 Global Step: 235810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:05:06,150-Speed 5189.67 samples/sec Loss 1.0359 LearningRate 0.0086 Epoch: 14 Global Step: 235820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:05:08,137-Speed 5153.80 samples/sec Loss 1.0196 LearningRate 0.0086 Epoch: 14 Global Step: 235830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:05:10,104-Speed 5207.66 samples/sec Loss 0.9986 LearningRate 0.0086 Epoch: 14 Global Step: 235840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:05:12,071-Speed 5208.41 samples/sec Loss 0.9833 LearningRate 0.0086 Epoch: 14 Global Step: 235850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:05:14,048-Speed 5180.58 samples/sec Loss 1.0427 LearningRate 0.0086 Epoch: 14 Global Step: 235860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:05:16,010-Speed 5220.44 samples/sec Loss 1.0116 LearningRate 0.0086 Epoch: 14 Global Step: 235870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:05:17,996-Speed 5157.92 samples/sec Loss 0.9869 LearningRate 0.0086 Epoch: 14 Global Step: 235880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:05:19,971-Speed 5188.58 samples/sec Loss 1.0599 LearningRate 0.0086 Epoch: 14 Global Step: 235890 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:05:21,955-Speed 5161.58 samples/sec Loss 0.9973 LearningRate 0.0086 Epoch: 14 Global Step: 235900 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:05:23,934-Speed 5177.18 samples/sec Loss 1.0714 LearningRate 0.0086 Epoch: 14 Global Step: 235910 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:05:25,914-Speed 5173.50 samples/sec Loss 1.0064 LearningRate 0.0086 Epoch: 14 Global Step: 235920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:05:27,899-Speed 5160.76 samples/sec Loss 0.9844 LearningRate 0.0086 Epoch: 14 Global Step: 235930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:05:29,868-Speed 5203.53 samples/sec Loss 1.0143 LearningRate 0.0086 Epoch: 14 Global Step: 235940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:05:31,846-Speed 5179.20 samples/sec Loss 1.0056 LearningRate 0.0086 Epoch: 14 Global Step: 235950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:05:33,818-Speed 5192.19 samples/sec Loss 1.0089 LearningRate 0.0086 Epoch: 14 Global Step: 235960 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:05:35,797-Speed 5177.81 samples/sec Loss 0.9914 LearningRate 0.0086 Epoch: 14 Global Step: 235970 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:05:37,796-Speed 5122.11 samples/sec Loss 1.0422 LearningRate 0.0086 Epoch: 14 Global Step: 235980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:05:39,787-Speed 5144.98 samples/sec Loss 1.0231 LearningRate 0.0086 Epoch: 14 Global Step: 235990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:05:41,764-Speed 5181.78 samples/sec Loss 1.0224 LearningRate 0.0086 Epoch: 14 Global Step: 236000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:06:08,347-[lfw][236000]XNorm: 22.080613 Training: 2022-04-11 15:06:08,348-[lfw][236000]Accuracy-Flip: 0.99800+-0.00296 Training: 2022-04-11 15:06:08,348-[lfw][236000]Accuracy-Highest: 0.99833 Training: 2022-04-11 15:06:39,050-[cfp_fp][236000]XNorm: 21.585234 Training: 2022-04-11 15:06:39,051-[cfp_fp][236000]Accuracy-Flip: 0.98857+-0.00579 Training: 2022-04-11 15:06:39,051-[cfp_fp][236000]Accuracy-Highest: 0.98857 Training: 2022-04-11 15:07:05,534-[agedb_30][236000]XNorm: 22.659416 Training: 2022-04-11 15:07:05,535-[agedb_30][236000]Accuracy-Flip: 0.98133+-0.00718 Training: 2022-04-11 15:07:05,535-[agedb_30][236000]Accuracy-Highest: 0.98283 Training: 2022-04-11 15:07:07,515-Speed 119.42 samples/sec Loss 1.0288 LearningRate 0.0086 Epoch: 14 Global Step: 236010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:07:09,495-Speed 5173.48 samples/sec Loss 1.0699 LearningRate 0.0086 Epoch: 14 Global Step: 236020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:07:11,456-Speed 5224.13 samples/sec Loss 1.0561 LearningRate 0.0086 Epoch: 14 Global Step: 236030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:07:13,452-Speed 5132.06 samples/sec Loss 0.9949 LearningRate 0.0086 Epoch: 14 Global Step: 236040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:07:15,443-Speed 5144.25 samples/sec Loss 1.0370 LearningRate 0.0086 Epoch: 14 Global Step: 236050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:07:17,466-Speed 5066.05 samples/sec Loss 0.9897 LearningRate 0.0086 Epoch: 14 Global Step: 236060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:07:19,435-Speed 5200.25 samples/sec Loss 1.0508 LearningRate 0.0086 Epoch: 14 Global Step: 236070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:07:21,399-Speed 5218.60 samples/sec Loss 1.0393 LearningRate 0.0086 Epoch: 14 Global Step: 236080 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:07:23,387-Speed 5152.39 samples/sec Loss 1.0615 LearningRate 0.0086 Epoch: 14 Global Step: 236090 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:07:25,358-Speed 5196.62 samples/sec Loss 0.9690 LearningRate 0.0086 Epoch: 14 Global Step: 236100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:07:27,337-Speed 5175.47 samples/sec Loss 1.0943 LearningRate 0.0086 Epoch: 14 Global Step: 236110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:07:29,314-Speed 5182.78 samples/sec Loss 1.0667 LearningRate 0.0086 Epoch: 14 Global Step: 236120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:07:31,287-Speed 5190.64 samples/sec Loss 1.0326 LearningRate 0.0086 Epoch: 14 Global Step: 236130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:07:33,252-Speed 5212.62 samples/sec Loss 1.0189 LearningRate 0.0086 Epoch: 14 Global Step: 236140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:07:35,237-Speed 5162.84 samples/sec Loss 1.0513 LearningRate 0.0086 Epoch: 14 Global Step: 236150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:07:37,225-Speed 5151.80 samples/sec Loss 1.0674 LearningRate 0.0086 Epoch: 14 Global Step: 236160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:07:39,218-Speed 5140.02 samples/sec Loss 1.0281 LearningRate 0.0086 Epoch: 14 Global Step: 236170 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:07:41,206-Speed 5151.97 samples/sec Loss 1.0156 LearningRate 0.0086 Epoch: 14 Global Step: 236180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:07:43,194-Speed 5153.53 samples/sec Loss 1.0530 LearningRate 0.0086 Epoch: 14 Global Step: 236190 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:07:45,168-Speed 5188.81 samples/sec Loss 1.0278 LearningRate 0.0086 Epoch: 14 Global Step: 236200 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:07:47,180-Speed 5091.04 samples/sec Loss 1.0641 LearningRate 0.0085 Epoch: 14 Global Step: 236210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:07:49,182-Speed 5114.81 samples/sec Loss 1.0400 LearningRate 0.0085 Epoch: 14 Global Step: 236220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:07:51,158-Speed 5184.92 samples/sec Loss 1.0405 LearningRate 0.0085 Epoch: 14 Global Step: 236230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:07:53,133-Speed 5187.54 samples/sec Loss 1.0788 LearningRate 0.0085 Epoch: 14 Global Step: 236240 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:07:55,107-Speed 5189.18 samples/sec Loss 1.0497 LearningRate 0.0085 Epoch: 14 Global Step: 236250 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:07:57,082-Speed 5187.44 samples/sec Loss 1.0338 LearningRate 0.0085 Epoch: 14 Global Step: 236260 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:07:59,064-Speed 5168.84 samples/sec Loss 0.9957 LearningRate 0.0085 Epoch: 14 Global Step: 236270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:08:01,049-Speed 5160.44 samples/sec Loss 1.0109 LearningRate 0.0085 Epoch: 14 Global Step: 236280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:08:03,037-Speed 5151.33 samples/sec Loss 1.0244 LearningRate 0.0085 Epoch: 14 Global Step: 236290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:08:05,014-Speed 5181.75 samples/sec Loss 1.0432 LearningRate 0.0085 Epoch: 14 Global Step: 236300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:08:06,990-Speed 5182.84 samples/sec Loss 1.0520 LearningRate 0.0085 Epoch: 14 Global Step: 236310 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:08:08,978-Speed 5152.69 samples/sec Loss 1.0078 LearningRate 0.0085 Epoch: 14 Global Step: 236320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:08:10,975-Speed 5129.89 samples/sec Loss 1.0465 LearningRate 0.0085 Epoch: 14 Global Step: 236330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:08:12,974-Speed 5124.70 samples/sec Loss 1.0192 LearningRate 0.0085 Epoch: 14 Global Step: 236340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:08:14,953-Speed 5176.12 samples/sec Loss 1.0623 LearningRate 0.0085 Epoch: 14 Global Step: 236350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:08:16,944-Speed 5145.85 samples/sec Loss 1.0250 LearningRate 0.0085 Epoch: 14 Global Step: 236360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:08:18,926-Speed 5168.58 samples/sec Loss 1.0344 LearningRate 0.0085 Epoch: 14 Global Step: 236370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:08:20,901-Speed 5184.82 samples/sec Loss 1.0383 LearningRate 0.0085 Epoch: 14 Global Step: 236380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:08:22,872-Speed 5198.19 samples/sec Loss 1.0724 LearningRate 0.0085 Epoch: 14 Global Step: 236390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:08:24,852-Speed 5174.14 samples/sec Loss 1.0286 LearningRate 0.0085 Epoch: 14 Global Step: 236400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:08:26,856-Speed 5109.64 samples/sec Loss 1.0644 LearningRate 0.0085 Epoch: 14 Global Step: 236410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:08:28,839-Speed 5167.41 samples/sec Loss 1.0248 LearningRate 0.0085 Epoch: 14 Global Step: 236420 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:08:30,808-Speed 5201.19 samples/sec Loss 1.0186 LearningRate 0.0085 Epoch: 14 Global Step: 236430 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:08:32,787-Speed 5176.03 samples/sec Loss 0.9878 LearningRate 0.0085 Epoch: 14 Global Step: 236440 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:08:34,760-Speed 5192.52 samples/sec Loss 1.0562 LearningRate 0.0085 Epoch: 14 Global Step: 236450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:08:36,733-Speed 5192.91 samples/sec Loss 1.0299 LearningRate 0.0085 Epoch: 14 Global Step: 236460 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:08:38,705-Speed 5194.71 samples/sec Loss 1.0293 LearningRate 0.0085 Epoch: 14 Global Step: 236470 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:08:40,688-Speed 5164.81 samples/sec Loss 1.0021 LearningRate 0.0085 Epoch: 14 Global Step: 236480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:08:42,670-Speed 5167.30 samples/sec Loss 1.0270 LearningRate 0.0085 Epoch: 14 Global Step: 236490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:08:44,646-Speed 5185.48 samples/sec Loss 1.0309 LearningRate 0.0085 Epoch: 14 Global Step: 236500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:08:46,629-Speed 5165.56 samples/sec Loss 1.0252 LearningRate 0.0085 Epoch: 14 Global Step: 236510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:08:48,616-Speed 5154.46 samples/sec Loss 1.0392 LearningRate 0.0085 Epoch: 14 Global Step: 236520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:08:50,586-Speed 5200.78 samples/sec Loss 1.0361 LearningRate 0.0085 Epoch: 14 Global Step: 236530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:08:52,562-Speed 5182.96 samples/sec Loss 1.0087 LearningRate 0.0085 Epoch: 14 Global Step: 236540 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:08:54,530-Speed 5207.66 samples/sec Loss 1.0513 LearningRate 0.0085 Epoch: 14 Global Step: 236550 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:08:56,498-Speed 5203.44 samples/sec Loss 1.0515 LearningRate 0.0085 Epoch: 14 Global Step: 236560 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:08:58,469-Speed 5199.87 samples/sec Loss 0.9949 LearningRate 0.0085 Epoch: 14 Global Step: 236570 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:09:00,436-Speed 5206.61 samples/sec Loss 1.0054 LearningRate 0.0085 Epoch: 14 Global Step: 236580 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:09:02,416-Speed 5174.13 samples/sec Loss 1.0251 LearningRate 0.0085 Epoch: 14 Global Step: 236590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:09:04,390-Speed 5186.81 samples/sec Loss 1.0717 LearningRate 0.0085 Epoch: 14 Global Step: 236600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:09:06,381-Speed 5145.56 samples/sec Loss 1.0586 LearningRate 0.0085 Epoch: 14 Global Step: 236610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:09:08,364-Speed 5166.30 samples/sec Loss 1.0409 LearningRate 0.0085 Epoch: 14 Global Step: 236620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:09:10,341-Speed 5179.28 samples/sec Loss 1.0406 LearningRate 0.0085 Epoch: 14 Global Step: 236630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:09:12,323-Speed 5170.78 samples/sec Loss 1.0335 LearningRate 0.0085 Epoch: 14 Global Step: 236640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:09:14,307-Speed 5163.06 samples/sec Loss 0.9893 LearningRate 0.0085 Epoch: 14 Global Step: 236650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:09:16,278-Speed 5197.33 samples/sec Loss 1.0550 LearningRate 0.0085 Epoch: 14 Global Step: 236660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:09:18,249-Speed 5197.52 samples/sec Loss 1.0866 LearningRate 0.0085 Epoch: 14 Global Step: 236670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:09:20,215-Speed 5208.80 samples/sec Loss 1.0527 LearningRate 0.0085 Epoch: 14 Global Step: 236680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:09:22,180-Speed 5213.48 samples/sec Loss 1.0118 LearningRate 0.0085 Epoch: 14 Global Step: 236690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:09:24,153-Speed 5191.68 samples/sec Loss 1.0333 LearningRate 0.0085 Epoch: 14 Global Step: 236700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-11 15:09:26,149-Speed 5133.46 samples/sec Loss 1.0583 LearningRate 0.0085 Epoch: 14 Global Step: 236710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:09:28,118-Speed 5199.98 samples/sec Loss 1.0605 LearningRate 0.0085 Epoch: 14 Global Step: 236720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:09:30,126-Speed 5105.01 samples/sec Loss 1.0579 LearningRate 0.0085 Epoch: 14 Global Step: 236730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:09:32,094-Speed 5203.04 samples/sec Loss 1.0765 LearningRate 0.0085 Epoch: 14 Global Step: 236740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:09:34,062-Speed 5206.66 samples/sec Loss 1.0304 LearningRate 0.0085 Epoch: 14 Global Step: 236750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:09:36,058-Speed 5132.78 samples/sec Loss 1.0337 LearningRate 0.0085 Epoch: 14 Global Step: 236760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:09:38,031-Speed 5190.54 samples/sec Loss 1.0195 LearningRate 0.0085 Epoch: 14 Global Step: 236770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:09:40,007-Speed 5183.95 samples/sec Loss 1.0527 LearningRate 0.0085 Epoch: 14 Global Step: 236780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:09:41,997-Speed 5148.65 samples/sec Loss 1.0152 LearningRate 0.0084 Epoch: 14 Global Step: 236790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:09:43,969-Speed 5194.26 samples/sec Loss 1.0549 LearningRate 0.0084 Epoch: 14 Global Step: 236800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:09:45,942-Speed 5192.89 samples/sec Loss 1.0344 LearningRate 0.0084 Epoch: 14 Global Step: 236810 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:09:47,934-Speed 5143.47 samples/sec Loss 1.0587 LearningRate 0.0084 Epoch: 14 Global Step: 236820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:09:49,907-Speed 5191.09 samples/sec Loss 1.0503 LearningRate 0.0084 Epoch: 14 Global Step: 236830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:09:51,878-Speed 5196.92 samples/sec Loss 1.0254 LearningRate 0.0084 Epoch: 14 Global Step: 236840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:09:53,850-Speed 5195.08 samples/sec Loss 1.0964 LearningRate 0.0084 Epoch: 14 Global Step: 236850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:09:55,827-Speed 5179.68 samples/sec Loss 1.0766 LearningRate 0.0084 Epoch: 14 Global Step: 236860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:09:57,813-Speed 5158.35 samples/sec Loss 1.0658 LearningRate 0.0084 Epoch: 14 Global Step: 236870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:09:59,800-Speed 5157.78 samples/sec Loss 1.0307 LearningRate 0.0084 Epoch: 14 Global Step: 236880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:10:01,773-Speed 5192.05 samples/sec Loss 1.0353 LearningRate 0.0084 Epoch: 14 Global Step: 236890 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:10:03,756-Speed 5164.99 samples/sec Loss 1.0047 LearningRate 0.0084 Epoch: 14 Global Step: 236900 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:10:05,737-Speed 5170.05 samples/sec Loss 1.0747 LearningRate 0.0084 Epoch: 14 Global Step: 236910 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:10:07,709-Speed 5195.15 samples/sec Loss 1.0797 LearningRate 0.0084 Epoch: 14 Global Step: 236920 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:10:09,687-Speed 5177.64 samples/sec Loss 1.0425 LearningRate 0.0084 Epoch: 14 Global Step: 236930 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:10:11,656-Speed 5202.56 samples/sec Loss 1.0597 LearningRate 0.0084 Epoch: 14 Global Step: 236940 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:10:13,629-Speed 5192.39 samples/sec Loss 1.0419 LearningRate 0.0084 Epoch: 14 Global Step: 236950 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-11 15:10:15,608-Speed 5175.63 samples/sec Loss 1.0329 LearningRate 0.0084 Epoch: 14 Global Step: 236960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:10:17,588-Speed 5172.94 samples/sec Loss 1.0131 LearningRate 0.0084 Epoch: 14 Global Step: 236970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:10:19,566-Speed 5178.83 samples/sec Loss 1.0276 LearningRate 0.0084 Epoch: 14 Global Step: 236980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:10:21,544-Speed 5179.33 samples/sec Loss 1.0487 LearningRate 0.0084 Epoch: 14 Global Step: 236990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:10:23,536-Speed 5141.50 samples/sec Loss 1.0633 LearningRate 0.0084 Epoch: 14 Global Step: 237000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:10:25,518-Speed 5170.40 samples/sec Loss 1.0249 LearningRate 0.0084 Epoch: 14 Global Step: 237010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:10:27,490-Speed 5192.49 samples/sec Loss 1.0423 LearningRate 0.0084 Epoch: 14 Global Step: 237020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:10:29,471-Speed 5172.03 samples/sec Loss 1.0132 LearningRate 0.0084 Epoch: 14 Global Step: 237030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:10:31,468-Speed 5129.40 samples/sec Loss 1.0496 LearningRate 0.0084 Epoch: 14 Global Step: 237040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:10:33,457-Speed 5151.10 samples/sec Loss 1.0443 LearningRate 0.0084 Epoch: 14 Global Step: 237050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-11 15:10:35,454-Speed 5128.76 samples/sec Loss 1.0570 LearningRate 0.0084 Epoch: 14 Global Step: 237060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:10:37,433-Speed 5175.44 samples/sec Loss 1.0013 LearningRate 0.0084 Epoch: 14 Global Step: 237070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:10:39,429-Speed 5131.91 samples/sec Loss 1.0567 LearningRate 0.0084 Epoch: 14 Global Step: 237080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:10:41,413-Speed 5164.68 samples/sec Loss 1.0720 LearningRate 0.0084 Epoch: 14 Global Step: 237090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:10:43,394-Speed 5170.49 samples/sec Loss 1.0636 LearningRate 0.0084 Epoch: 14 Global Step: 237100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:10:45,389-Speed 5133.16 samples/sec Loss 1.0939 LearningRate 0.0084 Epoch: 14 Global Step: 237110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:10:47,388-Speed 5124.52 samples/sec Loss 1.0387 LearningRate 0.0084 Epoch: 14 Global Step: 237120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:10:49,364-Speed 5184.56 samples/sec Loss 1.0677 LearningRate 0.0084 Epoch: 14 Global Step: 237130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:10:51,328-Speed 5215.47 samples/sec Loss 1.0877 LearningRate 0.0084 Epoch: 14 Global Step: 237140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:10:53,296-Speed 5206.60 samples/sec Loss 1.0597 LearningRate 0.0084 Epoch: 14 Global Step: 237150 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:10:55,287-Speed 5143.09 samples/sec Loss 1.0481 LearningRate 0.0084 Epoch: 14 Global Step: 237160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:10:57,266-Speed 5176.60 samples/sec Loss 1.0631 LearningRate 0.0084 Epoch: 14 Global Step: 237170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:10:59,241-Speed 5188.82 samples/sec Loss 1.0713 LearningRate 0.0084 Epoch: 14 Global Step: 237180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:11:01,222-Speed 5171.21 samples/sec Loss 1.0482 LearningRate 0.0084 Epoch: 14 Global Step: 237190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:11:03,196-Speed 5189.62 samples/sec Loss 1.1027 LearningRate 0.0084 Epoch: 14 Global Step: 237200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:11:05,175-Speed 5175.55 samples/sec Loss 1.0378 LearningRate 0.0084 Epoch: 14 Global Step: 237210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:11:07,145-Speed 5198.59 samples/sec Loss 1.0229 LearningRate 0.0084 Epoch: 14 Global Step: 237220 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:11:09,125-Speed 5172.74 samples/sec Loss 1.0462 LearningRate 0.0084 Epoch: 14 Global Step: 237230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:11:11,098-Speed 5193.07 samples/sec Loss 1.0459 LearningRate 0.0084 Epoch: 14 Global Step: 237240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:11:13,072-Speed 5188.76 samples/sec Loss 1.0614 LearningRate 0.0084 Epoch: 14 Global Step: 237250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:11:15,063-Speed 5144.53 samples/sec Loss 1.0713 LearningRate 0.0084 Epoch: 14 Global Step: 237260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:11:17,044-Speed 5171.93 samples/sec Loss 1.0197 LearningRate 0.0084 Epoch: 14 Global Step: 237270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:11:19,026-Speed 5168.00 samples/sec Loss 1.0407 LearningRate 0.0084 Epoch: 14 Global Step: 237280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:11:21,006-Speed 5173.72 samples/sec Loss 1.0013 LearningRate 0.0084 Epoch: 14 Global Step: 237290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:11:22,993-Speed 5155.53 samples/sec Loss 1.0405 LearningRate 0.0084 Epoch: 14 Global Step: 237300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:11:24,996-Speed 5114.63 samples/sec Loss 1.0511 LearningRate 0.0084 Epoch: 14 Global Step: 237310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:11:26,978-Speed 5166.90 samples/sec Loss 1.0452 LearningRate 0.0084 Epoch: 14 Global Step: 237320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:11:28,952-Speed 5190.01 samples/sec Loss 1.0484 LearningRate 0.0084 Epoch: 14 Global Step: 237330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:11:30,921-Speed 5202.18 samples/sec Loss 1.0438 LearningRate 0.0084 Epoch: 14 Global Step: 237340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:11:32,894-Speed 5191.06 samples/sec Loss 1.1169 LearningRate 0.0084 Epoch: 14 Global Step: 237350 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:11:34,888-Speed 5137.77 samples/sec Loss 1.0291 LearningRate 0.0083 Epoch: 14 Global Step: 237360 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:11:36,868-Speed 5172.44 samples/sec Loss 1.0697 LearningRate 0.0083 Epoch: 14 Global Step: 237370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:11:38,860-Speed 5142.22 samples/sec Loss 1.0414 LearningRate 0.0083 Epoch: 14 Global Step: 237380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:11:40,849-Speed 5151.10 samples/sec Loss 1.0371 LearningRate 0.0083 Epoch: 14 Global Step: 237390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:11:42,821-Speed 5196.06 samples/sec Loss 1.0405 LearningRate 0.0083 Epoch: 14 Global Step: 237400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:11:44,796-Speed 5184.53 samples/sec Loss 1.1099 LearningRate 0.0083 Epoch: 14 Global Step: 237410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:11:46,768-Speed 5195.20 samples/sec Loss 1.0898 LearningRate 0.0083 Epoch: 14 Global Step: 237420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:11:48,742-Speed 5190.70 samples/sec Loss 1.0531 LearningRate 0.0083 Epoch: 14 Global Step: 237430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:11:50,737-Speed 5132.89 samples/sec Loss 1.0891 LearningRate 0.0083 Epoch: 14 Global Step: 237440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:11:52,736-Speed 5125.10 samples/sec Loss 1.0957 LearningRate 0.0083 Epoch: 14 Global Step: 237450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:11:54,725-Speed 5149.80 samples/sec Loss 1.0339 LearningRate 0.0083 Epoch: 14 Global Step: 237460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:11:56,722-Speed 5128.81 samples/sec Loss 1.0332 LearningRate 0.0083 Epoch: 14 Global Step: 237470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:11:58,723-Speed 5120.65 samples/sec Loss 1.0971 LearningRate 0.0083 Epoch: 14 Global Step: 237480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:12:00,692-Speed 5200.68 samples/sec Loss 1.1040 LearningRate 0.0083 Epoch: 14 Global Step: 237490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:12:02,661-Speed 5203.43 samples/sec Loss 1.0492 LearningRate 0.0083 Epoch: 14 Global Step: 237500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:12:04,636-Speed 5187.19 samples/sec Loss 1.0197 LearningRate 0.0083 Epoch: 14 Global Step: 237510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:12:06,613-Speed 5180.55 samples/sec Loss 1.0794 LearningRate 0.0083 Epoch: 14 Global Step: 237520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:12:08,587-Speed 5189.69 samples/sec Loss 1.0893 LearningRate 0.0083 Epoch: 14 Global Step: 237530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:12:10,558-Speed 5195.98 samples/sec Loss 1.0826 LearningRate 0.0083 Epoch: 14 Global Step: 237540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:12:12,541-Speed 5166.54 samples/sec Loss 1.0290 LearningRate 0.0083 Epoch: 14 Global Step: 237550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:12:14,548-Speed 5102.65 samples/sec Loss 1.0634 LearningRate 0.0083 Epoch: 14 Global Step: 237560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:12:16,537-Speed 5150.34 samples/sec Loss 1.0490 LearningRate 0.0083 Epoch: 14 Global Step: 237570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:12:18,513-Speed 5184.04 samples/sec Loss 1.0660 LearningRate 0.0083 Epoch: 14 Global Step: 237580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:12:20,493-Speed 5174.80 samples/sec Loss 1.0355 LearningRate 0.0083 Epoch: 14 Global Step: 237590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:12:22,468-Speed 5186.65 samples/sec Loss 1.0161 LearningRate 0.0083 Epoch: 14 Global Step: 237600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:12:24,461-Speed 5138.82 samples/sec Loss 1.0626 LearningRate 0.0083 Epoch: 14 Global Step: 237610 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:12:26,430-Speed 5202.00 samples/sec Loss 1.0754 LearningRate 0.0083 Epoch: 14 Global Step: 237620 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:12:28,397-Speed 5207.48 samples/sec Loss 1.0607 LearningRate 0.0083 Epoch: 14 Global Step: 237630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:12:30,368-Speed 5197.59 samples/sec Loss 1.0136 LearningRate 0.0083 Epoch: 14 Global Step: 237640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:12:32,343-Speed 5185.55 samples/sec Loss 1.0886 LearningRate 0.0083 Epoch: 14 Global Step: 237650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:12:34,310-Speed 5208.70 samples/sec Loss 1.0695 LearningRate 0.0083 Epoch: 14 Global Step: 237660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:12:36,301-Speed 5146.02 samples/sec Loss 1.0459 LearningRate 0.0083 Epoch: 14 Global Step: 237670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:12:38,281-Speed 5172.16 samples/sec Loss 1.0774 LearningRate 0.0083 Epoch: 14 Global Step: 237680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:12:40,258-Speed 5181.43 samples/sec Loss 1.0348 LearningRate 0.0083 Epoch: 14 Global Step: 237690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:12:42,234-Speed 5185.23 samples/sec Loss 1.0303 LearningRate 0.0083 Epoch: 14 Global Step: 237700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:12:44,223-Speed 5150.73 samples/sec Loss 1.0603 LearningRate 0.0083 Epoch: 14 Global Step: 237710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:12:46,215-Speed 5142.11 samples/sec Loss 1.0419 LearningRate 0.0083 Epoch: 14 Global Step: 237720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:12:48,204-Speed 5150.08 samples/sec Loss 1.0753 LearningRate 0.0083 Epoch: 14 Global Step: 237730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:12:50,200-Speed 5131.42 samples/sec Loss 1.0547 LearningRate 0.0083 Epoch: 14 Global Step: 237740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:12:52,182-Speed 5167.83 samples/sec Loss 1.0485 LearningRate 0.0083 Epoch: 14 Global Step: 237750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:12:54,162-Speed 5172.13 samples/sec Loss 1.0745 LearningRate 0.0083 Epoch: 14 Global Step: 237760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:12:56,133-Speed 5200.03 samples/sec Loss 1.0427 LearningRate 0.0083 Epoch: 14 Global Step: 237770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:12:58,104-Speed 5196.40 samples/sec Loss 1.0550 LearningRate 0.0083 Epoch: 14 Global Step: 237780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:13:00,074-Speed 5199.79 samples/sec Loss 1.0957 LearningRate 0.0083 Epoch: 14 Global Step: 237790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:13:02,061-Speed 5155.40 samples/sec Loss 1.0755 LearningRate 0.0083 Epoch: 14 Global Step: 237800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:13:04,035-Speed 5188.14 samples/sec Loss 1.0745 LearningRate 0.0083 Epoch: 14 Global Step: 237810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:13:06,033-Speed 5127.18 samples/sec Loss 1.0267 LearningRate 0.0083 Epoch: 14 Global Step: 237820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:13:08,009-Speed 5184.14 samples/sec Loss 1.0679 LearningRate 0.0083 Epoch: 14 Global Step: 237830 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:13:10,011-Speed 5117.35 samples/sec Loss 1.0566 LearningRate 0.0083 Epoch: 14 Global Step: 237840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:13:11,991-Speed 5172.32 samples/sec Loss 1.0825 LearningRate 0.0083 Epoch: 14 Global Step: 237850 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:13:13,963-Speed 5195.73 samples/sec Loss 1.0435 LearningRate 0.0083 Epoch: 14 Global Step: 237860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:13:15,930-Speed 5209.11 samples/sec Loss 1.0265 LearningRate 0.0083 Epoch: 14 Global Step: 237870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:13:17,901-Speed 5196.13 samples/sec Loss 1.0799 LearningRate 0.0083 Epoch: 14 Global Step: 237880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:13:19,880-Speed 5174.62 samples/sec Loss 1.0494 LearningRate 0.0083 Epoch: 14 Global Step: 237890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:13:21,875-Speed 5135.33 samples/sec Loss 1.0738 LearningRate 0.0083 Epoch: 14 Global Step: 237900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:13:23,865-Speed 5148.62 samples/sec Loss 1.0843 LearningRate 0.0083 Epoch: 14 Global Step: 237910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:13:25,833-Speed 5204.22 samples/sec Loss 1.0597 LearningRate 0.0083 Epoch: 14 Global Step: 237920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:13:27,819-Speed 5158.31 samples/sec Loss 1.0836 LearningRate 0.0083 Epoch: 14 Global Step: 237930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:13:29,795-Speed 5182.02 samples/sec Loss 1.0407 LearningRate 0.0082 Epoch: 14 Global Step: 237940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:13:31,763-Speed 5205.43 samples/sec Loss 1.0655 LearningRate 0.0082 Epoch: 14 Global Step: 237950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:13:33,733-Speed 5201.41 samples/sec Loss 1.0764 LearningRate 0.0082 Epoch: 14 Global Step: 237960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:13:35,713-Speed 5173.54 samples/sec Loss 1.0635 LearningRate 0.0082 Epoch: 14 Global Step: 237970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:13:37,689-Speed 5185.72 samples/sec Loss 1.1321 LearningRate 0.0082 Epoch: 14 Global Step: 237980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:13:39,660-Speed 5196.31 samples/sec Loss 1.0367 LearningRate 0.0082 Epoch: 14 Global Step: 237990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:13:41,634-Speed 5187.83 samples/sec Loss 1.0923 LearningRate 0.0082 Epoch: 14 Global Step: 238000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:14:08,140-[lfw][238000]XNorm: 21.634084 Training: 2022-04-11 15:14:08,141-[lfw][238000]Accuracy-Flip: 0.99800+-0.00296 Training: 2022-04-11 15:14:08,141-[lfw][238000]Accuracy-Highest: 0.99833 Training: 2022-04-11 15:14:38,990-[cfp_fp][238000]XNorm: 21.319689 Training: 2022-04-11 15:14:38,991-[cfp_fp][238000]Accuracy-Flip: 0.98586+-0.00497 Training: 2022-04-11 15:14:38,991-[cfp_fp][238000]Accuracy-Highest: 0.98857 Training: 2022-04-11 15:15:05,431-[agedb_30][238000]XNorm: 22.326256 Training: 2022-04-11 15:15:05,432-[agedb_30][238000]Accuracy-Flip: 0.98300+-0.00752 Training: 2022-04-11 15:15:05,432-[agedb_30][238000]Accuracy-Highest: 0.98300 Training: 2022-04-11 15:15:07,416-Speed 119.37 samples/sec Loss 1.0528 LearningRate 0.0082 Epoch: 14 Global Step: 238010 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:15:09,388-Speed 5194.08 samples/sec Loss 1.0519 LearningRate 0.0082 Epoch: 14 Global Step: 238020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:15:11,371-Speed 5164.34 samples/sec Loss 1.1095 LearningRate 0.0082 Epoch: 14 Global Step: 238030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:15:13,341-Speed 5199.88 samples/sec Loss 1.0470 LearningRate 0.0082 Epoch: 14 Global Step: 238040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:15:15,307-Speed 5211.37 samples/sec Loss 1.0702 LearningRate 0.0082 Epoch: 14 Global Step: 238050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:15:17,287-Speed 5173.59 samples/sec Loss 1.0907 LearningRate 0.0082 Epoch: 14 Global Step: 238060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:15:19,266-Speed 5176.16 samples/sec Loss 1.0514 LearningRate 0.0082 Epoch: 14 Global Step: 238070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:15:21,234-Speed 5204.74 samples/sec Loss 1.0743 LearningRate 0.0082 Epoch: 14 Global Step: 238080 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:15:23,233-Speed 5125.53 samples/sec Loss 1.0599 LearningRate 0.0082 Epoch: 14 Global Step: 238090 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:15:25,241-Speed 5101.61 samples/sec Loss 1.0571 LearningRate 0.0082 Epoch: 14 Global Step: 238100 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:15:27,215-Speed 5189.48 samples/sec Loss 1.0813 LearningRate 0.0082 Epoch: 14 Global Step: 238110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:15:29,186-Speed 5196.76 samples/sec Loss 1.0562 LearningRate 0.0082 Epoch: 14 Global Step: 238120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:15:31,159-Speed 5189.98 samples/sec Loss 1.0714 LearningRate 0.0082 Epoch: 14 Global Step: 238130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:15:33,124-Speed 5212.69 samples/sec Loss 1.0841 LearningRate 0.0082 Epoch: 14 Global Step: 238140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:15:35,109-Speed 5161.46 samples/sec Loss 1.0911 LearningRate 0.0082 Epoch: 14 Global Step: 238150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:15:37,093-Speed 5163.36 samples/sec Loss 1.0855 LearningRate 0.0082 Epoch: 14 Global Step: 238160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:15:39,074-Speed 5171.70 samples/sec Loss 1.0981 LearningRate 0.0082 Epoch: 14 Global Step: 238170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:15:41,061-Speed 5153.03 samples/sec Loss 1.1005 LearningRate 0.0082 Epoch: 14 Global Step: 238180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:15:43,041-Speed 5176.33 samples/sec Loss 1.0744 LearningRate 0.0082 Epoch: 14 Global Step: 238190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:15:45,016-Speed 5185.50 samples/sec Loss 1.0651 LearningRate 0.0082 Epoch: 14 Global Step: 238200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:15:46,996-Speed 5175.13 samples/sec Loss 1.0585 LearningRate 0.0082 Epoch: 14 Global Step: 238210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:15:48,980-Speed 5162.27 samples/sec Loss 1.1401 LearningRate 0.0082 Epoch: 14 Global Step: 238220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:15:50,959-Speed 5175.47 samples/sec Loss 1.0911 LearningRate 0.0082 Epoch: 14 Global Step: 238230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:15:52,990-Speed 5044.80 samples/sec Loss 1.0919 LearningRate 0.0082 Epoch: 14 Global Step: 238240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:15:54,966-Speed 5182.65 samples/sec Loss 1.0347 LearningRate 0.0082 Epoch: 14 Global Step: 238250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:15:56,950-Speed 5163.68 samples/sec Loss 1.0613 LearningRate 0.0082 Epoch: 14 Global Step: 238260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:15:58,949-Speed 5124.28 samples/sec Loss 1.0798 LearningRate 0.0082 Epoch: 14 Global Step: 238270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:16:00,958-Speed 5099.96 samples/sec Loss 1.0748 LearningRate 0.0082 Epoch: 14 Global Step: 238280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:16:02,968-Speed 5097.15 samples/sec Loss 1.0529 LearningRate 0.0082 Epoch: 14 Global Step: 238290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:16:04,966-Speed 5127.19 samples/sec Loss 1.0646 LearningRate 0.0082 Epoch: 14 Global Step: 238300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:16:06,950-Speed 5162.27 samples/sec Loss 1.0669 LearningRate 0.0082 Epoch: 14 Global Step: 238310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:16:08,938-Speed 5151.71 samples/sec Loss 1.0857 LearningRate 0.0082 Epoch: 14 Global Step: 238320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:16:10,942-Speed 5112.70 samples/sec Loss 1.1174 LearningRate 0.0082 Epoch: 14 Global Step: 238330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:16:12,928-Speed 5158.29 samples/sec Loss 1.0835 LearningRate 0.0082 Epoch: 14 Global Step: 238340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:16:14,901-Speed 5191.25 samples/sec Loss 1.0685 LearningRate 0.0082 Epoch: 14 Global Step: 238350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:16:16,913-Speed 5091.07 samples/sec Loss 1.0741 LearningRate 0.0082 Epoch: 14 Global Step: 238360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:16:18,895-Speed 5167.96 samples/sec Loss 1.0570 LearningRate 0.0082 Epoch: 14 Global Step: 238370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:16:20,890-Speed 5133.83 samples/sec Loss 1.0607 LearningRate 0.0082 Epoch: 14 Global Step: 238380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:16:22,878-Speed 5155.31 samples/sec Loss 1.0536 LearningRate 0.0082 Epoch: 14 Global Step: 238390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:16:24,870-Speed 5141.24 samples/sec Loss 1.0736 LearningRate 0.0082 Epoch: 14 Global Step: 238400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:16:26,866-Speed 5132.63 samples/sec Loss 1.0344 LearningRate 0.0082 Epoch: 14 Global Step: 238410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:16:28,862-Speed 5130.82 samples/sec Loss 1.0835 LearningRate 0.0082 Epoch: 14 Global Step: 238420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:16:30,848-Speed 5158.53 samples/sec Loss 1.0370 LearningRate 0.0082 Epoch: 14 Global Step: 238430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:16:32,826-Speed 5179.07 samples/sec Loss 1.0771 LearningRate 0.0082 Epoch: 14 Global Step: 238440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:16:34,813-Speed 5154.01 samples/sec Loss 1.1095 LearningRate 0.0082 Epoch: 14 Global Step: 238450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:16:36,795-Speed 5170.11 samples/sec Loss 1.0588 LearningRate 0.0082 Epoch: 14 Global Step: 238460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:16:38,797-Speed 5115.04 samples/sec Loss 1.0356 LearningRate 0.0082 Epoch: 14 Global Step: 238470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:16:40,785-Speed 5153.43 samples/sec Loss 1.1049 LearningRate 0.0082 Epoch: 14 Global Step: 238480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:16:42,767-Speed 5167.05 samples/sec Loss 1.0868 LearningRate 0.0082 Epoch: 14 Global Step: 238490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:16:44,747-Speed 5175.49 samples/sec Loss 1.0897 LearningRate 0.0082 Epoch: 14 Global Step: 238500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:16:46,745-Speed 5126.29 samples/sec Loss 1.0804 LearningRate 0.0082 Epoch: 14 Global Step: 238510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:16:48,758-Speed 5090.17 samples/sec Loss 1.1091 LearningRate 0.0082 Epoch: 14 Global Step: 238520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:16:50,744-Speed 5155.84 samples/sec Loss 1.0792 LearningRate 0.0081 Epoch: 14 Global Step: 238530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:16:52,719-Speed 5187.56 samples/sec Loss 1.1304 LearningRate 0.0081 Epoch: 14 Global Step: 238540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:16:54,716-Speed 5128.02 samples/sec Loss 1.0809 LearningRate 0.0081 Epoch: 14 Global Step: 238550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:16:56,687-Speed 5198.26 samples/sec Loss 1.0736 LearningRate 0.0081 Epoch: 14 Global Step: 238560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:16:58,672-Speed 5159.21 samples/sec Loss 1.0876 LearningRate 0.0081 Epoch: 14 Global Step: 238570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:17:00,656-Speed 5162.60 samples/sec Loss 1.1248 LearningRate 0.0081 Epoch: 14 Global Step: 238580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:17:02,630-Speed 5190.04 samples/sec Loss 1.0572 LearningRate 0.0081 Epoch: 14 Global Step: 238590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:17:04,644-Speed 5085.33 samples/sec Loss 1.0569 LearningRate 0.0081 Epoch: 14 Global Step: 238600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:17:06,634-Speed 5150.17 samples/sec Loss 1.0512 LearningRate 0.0081 Epoch: 14 Global Step: 238610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:17:08,605-Speed 5196.39 samples/sec Loss 1.0814 LearningRate 0.0081 Epoch: 14 Global Step: 238620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:17:10,585-Speed 5171.78 samples/sec Loss 1.0568 LearningRate 0.0081 Epoch: 14 Global Step: 238630 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:17:12,570-Speed 5161.85 samples/sec Loss 1.0612 LearningRate 0.0081 Epoch: 14 Global Step: 238640 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:17:14,541-Speed 5198.09 samples/sec Loss 1.0935 LearningRate 0.0081 Epoch: 14 Global Step: 238650 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:17:16,525-Speed 5161.21 samples/sec Loss 1.0994 LearningRate 0.0081 Epoch: 14 Global Step: 238660 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:17:18,501-Speed 5185.02 samples/sec Loss 1.0487 LearningRate 0.0081 Epoch: 14 Global Step: 238670 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:17:20,468-Speed 5205.71 samples/sec Loss 1.0967 LearningRate 0.0081 Epoch: 14 Global Step: 238680 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:17:22,441-Speed 5192.57 samples/sec Loss 1.1027 LearningRate 0.0081 Epoch: 14 Global Step: 238690 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:17:24,420-Speed 5175.66 samples/sec Loss 1.0886 LearningRate 0.0081 Epoch: 14 Global Step: 238700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:17:26,410-Speed 5148.15 samples/sec Loss 1.0978 LearningRate 0.0081 Epoch: 14 Global Step: 238710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:17:28,390-Speed 5174.25 samples/sec Loss 1.1147 LearningRate 0.0081 Epoch: 14 Global Step: 238720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:17:30,372-Speed 5168.17 samples/sec Loss 1.1024 LearningRate 0.0081 Epoch: 14 Global Step: 238730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:17:32,344-Speed 5193.47 samples/sec Loss 1.0690 LearningRate 0.0081 Epoch: 14 Global Step: 238740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:17:34,309-Speed 5212.79 samples/sec Loss 1.0708 LearningRate 0.0081 Epoch: 14 Global Step: 238750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:17:36,287-Speed 5178.76 samples/sec Loss 1.0690 LearningRate 0.0081 Epoch: 14 Global Step: 238760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:17:38,269-Speed 5171.43 samples/sec Loss 1.0937 LearningRate 0.0081 Epoch: 14 Global Step: 238770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:17:40,260-Speed 5144.42 samples/sec Loss 1.1354 LearningRate 0.0081 Epoch: 14 Global Step: 238780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:17:42,233-Speed 5192.89 samples/sec Loss 1.0570 LearningRate 0.0081 Epoch: 14 Global Step: 238790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:17:44,207-Speed 5188.74 samples/sec Loss 1.0958 LearningRate 0.0081 Epoch: 14 Global Step: 238800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:17:46,182-Speed 5188.13 samples/sec Loss 1.0888 LearningRate 0.0081 Epoch: 14 Global Step: 238810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:17:48,183-Speed 5119.16 samples/sec Loss 1.1069 LearningRate 0.0081 Epoch: 14 Global Step: 238820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:17:50,183-Speed 5121.43 samples/sec Loss 1.0673 LearningRate 0.0081 Epoch: 14 Global Step: 238830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:17:52,157-Speed 5189.57 samples/sec Loss 1.0425 LearningRate 0.0081 Epoch: 14 Global Step: 238840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:17:54,142-Speed 5159.65 samples/sec Loss 1.0967 LearningRate 0.0081 Epoch: 14 Global Step: 238850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:17:56,116-Speed 5189.10 samples/sec Loss 1.0767 LearningRate 0.0081 Epoch: 14 Global Step: 238860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:17:58,091-Speed 5187.76 samples/sec Loss 1.0878 LearningRate 0.0081 Epoch: 14 Global Step: 238870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:18:00,072-Speed 5170.82 samples/sec Loss 1.0772 LearningRate 0.0081 Epoch: 14 Global Step: 238880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:18:02,056-Speed 5162.61 samples/sec Loss 1.0815 LearningRate 0.0081 Epoch: 14 Global Step: 238890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:18:04,037-Speed 5170.69 samples/sec Loss 1.1010 LearningRate 0.0081 Epoch: 14 Global Step: 238900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:18:06,010-Speed 5190.55 samples/sec Loss 1.0937 LearningRate 0.0081 Epoch: 14 Global Step: 238910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:18:07,997-Speed 5158.37 samples/sec Loss 1.1025 LearningRate 0.0081 Epoch: 14 Global Step: 238920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:18:09,995-Speed 5126.43 samples/sec Loss 1.0458 LearningRate 0.0081 Epoch: 14 Global Step: 238930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:18:11,995-Speed 5120.84 samples/sec Loss 1.0911 LearningRate 0.0081 Epoch: 14 Global Step: 238940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:18:13,980-Speed 5159.91 samples/sec Loss 1.1292 LearningRate 0.0081 Epoch: 14 Global Step: 238950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:18:15,981-Speed 5121.43 samples/sec Loss 1.0839 LearningRate 0.0081 Epoch: 14 Global Step: 238960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:18:17,984-Speed 5114.91 samples/sec Loss 1.1394 LearningRate 0.0081 Epoch: 14 Global Step: 238970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:18:19,960-Speed 5183.95 samples/sec Loss 1.1078 LearningRate 0.0081 Epoch: 14 Global Step: 238980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:18:21,944-Speed 5162.22 samples/sec Loss 1.0537 LearningRate 0.0081 Epoch: 14 Global Step: 238990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:18:23,926-Speed 5169.85 samples/sec Loss 1.0545 LearningRate 0.0081 Epoch: 14 Global Step: 239000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:18:25,898-Speed 5192.29 samples/sec Loss 1.1515 LearningRate 0.0081 Epoch: 14 Global Step: 239010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:18:27,873-Speed 5186.41 samples/sec Loss 1.0703 LearningRate 0.0081 Epoch: 14 Global Step: 239020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:18:29,844-Speed 5197.58 samples/sec Loss 1.0838 LearningRate 0.0081 Epoch: 14 Global Step: 239030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:18:31,817-Speed 5192.91 samples/sec Loss 1.0905 LearningRate 0.0081 Epoch: 14 Global Step: 239040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:18:33,795-Speed 5180.57 samples/sec Loss 1.0980 LearningRate 0.0081 Epoch: 14 Global Step: 239050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:18:35,765-Speed 5199.55 samples/sec Loss 1.0770 LearningRate 0.0081 Epoch: 14 Global Step: 239060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:18:37,758-Speed 5140.17 samples/sec Loss 1.1073 LearningRate 0.0081 Epoch: 14 Global Step: 239070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:18:39,737-Speed 5176.03 samples/sec Loss 1.1331 LearningRate 0.0081 Epoch: 14 Global Step: 239080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:18:41,764-Speed 5054.40 samples/sec Loss 1.0872 LearningRate 0.0081 Epoch: 14 Global Step: 239090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:18:43,733-Speed 5200.10 samples/sec Loss 1.0928 LearningRate 0.0081 Epoch: 14 Global Step: 239100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:18:45,723-Speed 5148.84 samples/sec Loss 1.0959 LearningRate 0.0080 Epoch: 14 Global Step: 239110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:18:47,695-Speed 5194.42 samples/sec Loss 1.1137 LearningRate 0.0080 Epoch: 14 Global Step: 239120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:18:49,699-Speed 5111.18 samples/sec Loss 1.0943 LearningRate 0.0080 Epoch: 14 Global Step: 239130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:18:51,711-Speed 5090.69 samples/sec Loss 1.0610 LearningRate 0.0080 Epoch: 14 Global Step: 239140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:18:53,699-Speed 5152.48 samples/sec Loss 1.0970 LearningRate 0.0080 Epoch: 14 Global Step: 239150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:18:55,678-Speed 5177.52 samples/sec Loss 1.1062 LearningRate 0.0080 Epoch: 14 Global Step: 239160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:18:57,663-Speed 5162.05 samples/sec Loss 1.0786 LearningRate 0.0080 Epoch: 14 Global Step: 239170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:18:59,650-Speed 5155.14 samples/sec Loss 1.0548 LearningRate 0.0080 Epoch: 14 Global Step: 239180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:19:01,663-Speed 5089.03 samples/sec Loss 1.1170 LearningRate 0.0080 Epoch: 14 Global Step: 239190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:19:03,650-Speed 5155.13 samples/sec Loss 1.1342 LearningRate 0.0080 Epoch: 14 Global Step: 239200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:19:05,635-Speed 5159.61 samples/sec Loss 1.1298 LearningRate 0.0080 Epoch: 14 Global Step: 239210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:19:07,610-Speed 5185.40 samples/sec Loss 1.0698 LearningRate 0.0080 Epoch: 14 Global Step: 239220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:19:09,604-Speed 5138.64 samples/sec Loss 1.0474 LearningRate 0.0080 Epoch: 14 Global Step: 239230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:19:11,589-Speed 5160.10 samples/sec Loss 1.0913 LearningRate 0.0080 Epoch: 14 Global Step: 239240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:19:13,572-Speed 5165.66 samples/sec Loss 1.0978 LearningRate 0.0080 Epoch: 14 Global Step: 239250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:19:15,562-Speed 5148.44 samples/sec Loss 1.1082 LearningRate 0.0080 Epoch: 14 Global Step: 239260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:19:17,544-Speed 5168.32 samples/sec Loss 1.0997 LearningRate 0.0080 Epoch: 14 Global Step: 239270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:19:19,516-Speed 5193.10 samples/sec Loss 1.0688 LearningRate 0.0080 Epoch: 14 Global Step: 239280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:19:21,523-Speed 5105.92 samples/sec Loss 1.1293 LearningRate 0.0080 Epoch: 14 Global Step: 239290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:19:23,520-Speed 5128.40 samples/sec Loss 1.0928 LearningRate 0.0080 Epoch: 14 Global Step: 239300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:19:25,491-Speed 5196.02 samples/sec Loss 1.1151 LearningRate 0.0080 Epoch: 14 Global Step: 239310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:19:27,484-Speed 5139.66 samples/sec Loss 1.1164 LearningRate 0.0080 Epoch: 14 Global Step: 239320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:19:29,477-Speed 5139.92 samples/sec Loss 1.0604 LearningRate 0.0080 Epoch: 14 Global Step: 239330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:19:31,457-Speed 5172.98 samples/sec Loss 1.0848 LearningRate 0.0080 Epoch: 14 Global Step: 239340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:19:33,441-Speed 5162.83 samples/sec Loss 1.0446 LearningRate 0.0080 Epoch: 14 Global Step: 239350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:19:35,428-Speed 5156.86 samples/sec Loss 1.1128 LearningRate 0.0080 Epoch: 14 Global Step: 239360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:19:37,447-Speed 5072.91 samples/sec Loss 1.1181 LearningRate 0.0080 Epoch: 14 Global Step: 239370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:19:39,428-Speed 5172.16 samples/sec Loss 1.0983 LearningRate 0.0080 Epoch: 14 Global Step: 239380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:19:41,397-Speed 5202.35 samples/sec Loss 1.1074 LearningRate 0.0080 Epoch: 14 Global Step: 239390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:19:43,367-Speed 5199.01 samples/sec Loss 1.0797 LearningRate 0.0080 Epoch: 14 Global Step: 239400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:19:45,339-Speed 5195.22 samples/sec Loss 1.1198 LearningRate 0.0080 Epoch: 14 Global Step: 239410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:19:47,337-Speed 5126.75 samples/sec Loss 1.1093 LearningRate 0.0080 Epoch: 14 Global Step: 239420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:19:49,327-Speed 5145.46 samples/sec Loss 1.0862 LearningRate 0.0080 Epoch: 14 Global Step: 239430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:19:51,299-Speed 5196.04 samples/sec Loss 1.0560 LearningRate 0.0080 Epoch: 14 Global Step: 239440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:19:53,283-Speed 5161.52 samples/sec Loss 1.1155 LearningRate 0.0080 Epoch: 14 Global Step: 239450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:19:55,261-Speed 5178.62 samples/sec Loss 1.0422 LearningRate 0.0080 Epoch: 14 Global Step: 239460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:19:57,245-Speed 5163.04 samples/sec Loss 1.0887 LearningRate 0.0080 Epoch: 14 Global Step: 239470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:19:59,242-Speed 5132.58 samples/sec Loss 1.1005 LearningRate 0.0080 Epoch: 14 Global Step: 239480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:20:01,214-Speed 5194.68 samples/sec Loss 1.0936 LearningRate 0.0080 Epoch: 14 Global Step: 239490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:20:03,243-Speed 5047.78 samples/sec Loss 1.0883 LearningRate 0.0080 Epoch: 14 Global Step: 239500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:20:05,250-Speed 5105.20 samples/sec Loss 1.0936 LearningRate 0.0080 Epoch: 14 Global Step: 239510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:20:07,245-Speed 5132.60 samples/sec Loss 1.1036 LearningRate 0.0080 Epoch: 14 Global Step: 239520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:20:09,237-Speed 5143.75 samples/sec Loss 1.0996 LearningRate 0.0080 Epoch: 14 Global Step: 239530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:20:11,219-Speed 5168.55 samples/sec Loss 1.0946 LearningRate 0.0080 Epoch: 14 Global Step: 239540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:20:13,200-Speed 5171.25 samples/sec Loss 1.1209 LearningRate 0.0080 Epoch: 14 Global Step: 239550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:20:15,178-Speed 5179.76 samples/sec Loss 1.0912 LearningRate 0.0080 Epoch: 14 Global Step: 239560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:20:17,170-Speed 5142.12 samples/sec Loss 1.1196 LearningRate 0.0080 Epoch: 14 Global Step: 239570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:20:19,158-Speed 5151.52 samples/sec Loss 1.1005 LearningRate 0.0080 Epoch: 14 Global Step: 239580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:20:21,129-Speed 5198.71 samples/sec Loss 1.1426 LearningRate 0.0080 Epoch: 14 Global Step: 239590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:20:23,110-Speed 5171.10 samples/sec Loss 1.1195 LearningRate 0.0080 Epoch: 14 Global Step: 239600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:20:25,084-Speed 5188.07 samples/sec Loss 1.0966 LearningRate 0.0080 Epoch: 14 Global Step: 239610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:20:27,059-Speed 5186.22 samples/sec Loss 1.1028 LearningRate 0.0080 Epoch: 14 Global Step: 239620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:20:29,062-Speed 5115.77 samples/sec Loss 1.0696 LearningRate 0.0080 Epoch: 14 Global Step: 239630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:20:31,040-Speed 5177.12 samples/sec Loss 1.1050 LearningRate 0.0080 Epoch: 14 Global Step: 239640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:20:33,029-Speed 5149.79 samples/sec Loss 1.0723 LearningRate 0.0080 Epoch: 14 Global Step: 239650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:20:35,002-Speed 5192.74 samples/sec Loss 1.0853 LearningRate 0.0080 Epoch: 14 Global Step: 239660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:20:36,977-Speed 5186.95 samples/sec Loss 1.0626 LearningRate 0.0080 Epoch: 14 Global Step: 239670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:20:38,953-Speed 5186.56 samples/sec Loss 1.0820 LearningRate 0.0080 Epoch: 14 Global Step: 239680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:20:40,929-Speed 5182.97 samples/sec Loss 1.0723 LearningRate 0.0080 Epoch: 14 Global Step: 239690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:20:42,913-Speed 5164.89 samples/sec Loss 1.0822 LearningRate 0.0079 Epoch: 14 Global Step: 239700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:20:44,921-Speed 5099.47 samples/sec Loss 1.1080 LearningRate 0.0079 Epoch: 14 Global Step: 239710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:20:46,922-Speed 5118.85 samples/sec Loss 1.0653 LearningRate 0.0079 Epoch: 14 Global Step: 239720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:20:48,916-Speed 5138.00 samples/sec Loss 1.1240 LearningRate 0.0079 Epoch: 14 Global Step: 239730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:20:50,915-Speed 5127.42 samples/sec Loss 1.0709 LearningRate 0.0079 Epoch: 14 Global Step: 239740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:20:52,888-Speed 5191.60 samples/sec Loss 1.0944 LearningRate 0.0079 Epoch: 14 Global Step: 239750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:20:54,892-Speed 5109.56 samples/sec Loss 1.1113 LearningRate 0.0079 Epoch: 14 Global Step: 239760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:20:56,876-Speed 5165.41 samples/sec Loss 1.1010 LearningRate 0.0079 Epoch: 14 Global Step: 239770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:20:58,842-Speed 5209.62 samples/sec Loss 1.0530 LearningRate 0.0079 Epoch: 14 Global Step: 239780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:21:00,821-Speed 5176.81 samples/sec Loss 1.1089 LearningRate 0.0079 Epoch: 14 Global Step: 239790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:21:02,801-Speed 5171.77 samples/sec Loss 1.0947 LearningRate 0.0079 Epoch: 14 Global Step: 239800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:21:04,776-Speed 5187.21 samples/sec Loss 1.0853 LearningRate 0.0079 Epoch: 14 Global Step: 239810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:21:06,752-Speed 5183.83 samples/sec Loss 1.0726 LearningRate 0.0079 Epoch: 14 Global Step: 239820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:21:08,734-Speed 5167.35 samples/sec Loss 1.1299 LearningRate 0.0079 Epoch: 14 Global Step: 239830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:21:10,740-Speed 5107.15 samples/sec Loss 1.0506 LearningRate 0.0079 Epoch: 14 Global Step: 239840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:21:12,739-Speed 5124.71 samples/sec Loss 1.0910 LearningRate 0.0079 Epoch: 14 Global Step: 239850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:21:14,710-Speed 5196.70 samples/sec Loss 1.1140 LearningRate 0.0079 Epoch: 14 Global Step: 239860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:21:16,701-Speed 5145.31 samples/sec Loss 1.1468 LearningRate 0.0079 Epoch: 14 Global Step: 239870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:21:18,713-Speed 5092.04 samples/sec Loss 1.0883 LearningRate 0.0079 Epoch: 14 Global Step: 239880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:21:20,685-Speed 5197.61 samples/sec Loss 1.1059 LearningRate 0.0079 Epoch: 14 Global Step: 239890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:21:22,657-Speed 5192.60 samples/sec Loss 1.0679 LearningRate 0.0079 Epoch: 14 Global Step: 239900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:21:24,629-Speed 5195.33 samples/sec Loss 1.1117 LearningRate 0.0079 Epoch: 14 Global Step: 239910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:21:26,613-Speed 5162.36 samples/sec Loss 1.0978 LearningRate 0.0079 Epoch: 14 Global Step: 239920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:21:28,583-Speed 5198.92 samples/sec Loss 1.1257 LearningRate 0.0079 Epoch: 14 Global Step: 239930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:21:30,561-Speed 5179.71 samples/sec Loss 1.1024 LearningRate 0.0079 Epoch: 14 Global Step: 239940 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:21:32,530-Speed 5203.09 samples/sec Loss 1.0827 LearningRate 0.0079 Epoch: 14 Global Step: 239950 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:21:34,531-Speed 5117.83 samples/sec Loss 1.1059 LearningRate 0.0079 Epoch: 14 Global Step: 239960 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:21:36,508-Speed 5181.43 samples/sec Loss 1.0910 LearningRate 0.0079 Epoch: 14 Global Step: 239970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:21:38,482-Speed 5190.93 samples/sec Loss 1.0974 LearningRate 0.0079 Epoch: 14 Global Step: 239980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:21:40,455-Speed 5191.03 samples/sec Loss 1.1002 LearningRate 0.0079 Epoch: 14 Global Step: 239990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:21:42,426-Speed 5197.32 samples/sec Loss 1.0962 LearningRate 0.0079 Epoch: 14 Global Step: 240000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:22:09,125-[lfw][240000]XNorm: 21.146837 Training: 2022-04-11 15:22:09,126-[lfw][240000]Accuracy-Flip: 0.99783+-0.00289 Training: 2022-04-11 15:22:09,126-[lfw][240000]Accuracy-Highest: 0.99833 Training: 2022-04-11 15:22:40,003-[cfp_fp][240000]XNorm: 20.802980 Training: 2022-04-11 15:22:40,004-[cfp_fp][240000]Accuracy-Flip: 0.98657+-0.00462 Training: 2022-04-11 15:22:40,004-[cfp_fp][240000]Accuracy-Highest: 0.98857 Training: 2022-04-11 15:23:06,543-[agedb_30][240000]XNorm: 21.758702 Training: 2022-04-11 15:23:06,544-[agedb_30][240000]Accuracy-Flip: 0.98233+-0.00739 Training: 2022-04-11 15:23:06,544-[agedb_30][240000]Accuracy-Highest: 0.98300 Training: 2022-04-11 15:23:08,525-Speed 118.93 samples/sec Loss 1.1009 LearningRate 0.0079 Epoch: 14 Global Step: 240010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:23:10,494-Speed 5202.32 samples/sec Loss 1.1078 LearningRate 0.0079 Epoch: 14 Global Step: 240020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:23:12,467-Speed 5190.90 samples/sec Loss 1.0952 LearningRate 0.0079 Epoch: 14 Global Step: 240030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:23:14,435-Speed 5207.36 samples/sec Loss 1.1413 LearningRate 0.0079 Epoch: 14 Global Step: 240040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:23:16,401-Speed 5210.56 samples/sec Loss 1.0710 LearningRate 0.0079 Epoch: 14 Global Step: 240050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:23:18,368-Speed 5205.69 samples/sec Loss 1.1093 LearningRate 0.0079 Epoch: 14 Global Step: 240060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:23:20,337-Speed 5204.80 samples/sec Loss 1.1262 LearningRate 0.0079 Epoch: 14 Global Step: 240070 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:23:22,305-Speed 5205.30 samples/sec Loss 1.1180 LearningRate 0.0079 Epoch: 14 Global Step: 240080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:23:24,288-Speed 5164.81 samples/sec Loss 1.0626 LearningRate 0.0079 Epoch: 14 Global Step: 240090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:23:26,267-Speed 5174.94 samples/sec Loss 1.0848 LearningRate 0.0079 Epoch: 14 Global Step: 240100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:23:28,260-Speed 5139.79 samples/sec Loss 1.1423 LearningRate 0.0079 Epoch: 14 Global Step: 240110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:23:30,237-Speed 5183.00 samples/sec Loss 1.0741 LearningRate 0.0079 Epoch: 14 Global Step: 240120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:23:32,207-Speed 5200.00 samples/sec Loss 1.1012 LearningRate 0.0079 Epoch: 14 Global Step: 240130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:23:34,177-Speed 5198.62 samples/sec Loss 1.1104 LearningRate 0.0079 Epoch: 14 Global Step: 240140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:23:36,162-Speed 5160.01 samples/sec Loss 1.0838 LearningRate 0.0079 Epoch: 14 Global Step: 240150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:23:38,144-Speed 5169.70 samples/sec Loss 1.1058 LearningRate 0.0079 Epoch: 14 Global Step: 240160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:23:40,130-Speed 5156.10 samples/sec Loss 1.0678 LearningRate 0.0079 Epoch: 14 Global Step: 240170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:23:42,103-Speed 5192.05 samples/sec Loss 1.0724 LearningRate 0.0079 Epoch: 14 Global Step: 240180 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:23:44,079-Speed 5184.23 samples/sec Loss 1.1024 LearningRate 0.0079 Epoch: 14 Global Step: 240190 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:23:46,048-Speed 5202.82 samples/sec Loss 1.1223 LearningRate 0.0079 Epoch: 14 Global Step: 240200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:23:48,029-Speed 5171.31 samples/sec Loss 1.1346 LearningRate 0.0079 Epoch: 14 Global Step: 240210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:23:50,030-Speed 5120.39 samples/sec Loss 1.1231 LearningRate 0.0079 Epoch: 14 Global Step: 240220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:23:52,020-Speed 5147.86 samples/sec Loss 1.1527 LearningRate 0.0079 Epoch: 14 Global Step: 240230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:23:53,996-Speed 5183.96 samples/sec Loss 1.1307 LearningRate 0.0079 Epoch: 14 Global Step: 240240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:23:55,970-Speed 5187.63 samples/sec Loss 1.0727 LearningRate 0.0079 Epoch: 14 Global Step: 240250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:23:57,944-Speed 5188.87 samples/sec Loss 1.1120 LearningRate 0.0079 Epoch: 14 Global Step: 240260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:23:59,919-Speed 5186.33 samples/sec Loss 1.1066 LearningRate 0.0079 Epoch: 14 Global Step: 240270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:24:01,907-Speed 5153.36 samples/sec Loss 1.0635 LearningRate 0.0079 Epoch: 14 Global Step: 240280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:24:03,896-Speed 5149.74 samples/sec Loss 1.0622 LearningRate 0.0079 Epoch: 14 Global Step: 240290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:24:05,873-Speed 5182.26 samples/sec Loss 1.0979 LearningRate 0.0078 Epoch: 14 Global Step: 240300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:24:07,848-Speed 5187.07 samples/sec Loss 1.0986 LearningRate 0.0078 Epoch: 14 Global Step: 240310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:24:09,825-Speed 5179.55 samples/sec Loss 1.1043 LearningRate 0.0078 Epoch: 14 Global Step: 240320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:24:11,806-Speed 5172.23 samples/sec Loss 1.1157 LearningRate 0.0078 Epoch: 14 Global Step: 240330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:24:13,780-Speed 5188.21 samples/sec Loss 1.0908 LearningRate 0.0078 Epoch: 14 Global Step: 240340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:24:15,777-Speed 5129.06 samples/sec Loss 1.0641 LearningRate 0.0078 Epoch: 14 Global Step: 240350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:24:17,758-Speed 5170.64 samples/sec Loss 1.0983 LearningRate 0.0078 Epoch: 14 Global Step: 240360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:24:19,735-Speed 5180.76 samples/sec Loss 1.0724 LearningRate 0.0078 Epoch: 14 Global Step: 240370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:24:21,730-Speed 5138.14 samples/sec Loss 1.1046 LearningRate 0.0078 Epoch: 14 Global Step: 240380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:24:23,715-Speed 5159.14 samples/sec Loss 1.1155 LearningRate 0.0078 Epoch: 14 Global Step: 240390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:24:25,713-Speed 5127.88 samples/sec Loss 1.0928 LearningRate 0.0078 Epoch: 14 Global Step: 240400 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:24:27,713-Speed 5121.02 samples/sec Loss 1.0660 LearningRate 0.0078 Epoch: 14 Global Step: 240410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:24:29,688-Speed 5186.89 samples/sec Loss 1.1028 LearningRate 0.0078 Epoch: 14 Global Step: 240420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:24:31,670-Speed 5169.85 samples/sec Loss 1.0762 LearningRate 0.0078 Epoch: 14 Global Step: 240430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:24:33,651-Speed 5170.69 samples/sec Loss 1.1360 LearningRate 0.0078 Epoch: 14 Global Step: 240440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:24:35,654-Speed 5115.20 samples/sec Loss 1.1077 LearningRate 0.0078 Epoch: 14 Global Step: 240450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:24:37,631-Speed 5181.76 samples/sec Loss 1.0827 LearningRate 0.0078 Epoch: 14 Global Step: 240460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:24:39,612-Speed 5169.31 samples/sec Loss 1.1198 LearningRate 0.0078 Epoch: 14 Global Step: 240470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:24:41,617-Speed 5110.36 samples/sec Loss 1.1280 LearningRate 0.0078 Epoch: 14 Global Step: 240480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:24:43,599-Speed 5169.22 samples/sec Loss 1.1207 LearningRate 0.0078 Epoch: 14 Global Step: 240490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:24:45,581-Speed 5168.77 samples/sec Loss 1.0968 LearningRate 0.0078 Epoch: 14 Global Step: 240500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:24:47,561-Speed 5172.27 samples/sec Loss 1.1187 LearningRate 0.0078 Epoch: 14 Global Step: 240510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:24:49,540-Speed 5175.22 samples/sec Loss 1.1056 LearningRate 0.0078 Epoch: 14 Global Step: 240520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:24:51,531-Speed 5145.04 samples/sec Loss 1.0751 LearningRate 0.0078 Epoch: 14 Global Step: 240530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:24:53,520-Speed 5151.96 samples/sec Loss 1.1472 LearningRate 0.0078 Epoch: 14 Global Step: 240540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:24:55,488-Speed 5203.36 samples/sec Loss 1.0743 LearningRate 0.0078 Epoch: 14 Global Step: 240550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:24:57,466-Speed 5179.54 samples/sec Loss 1.1272 LearningRate 0.0078 Epoch: 14 Global Step: 240560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:24:59,481-Speed 5083.58 samples/sec Loss 1.1165 LearningRate 0.0078 Epoch: 14 Global Step: 240570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:25:01,460-Speed 5175.24 samples/sec Loss 1.0809 LearningRate 0.0078 Epoch: 14 Global Step: 240580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:25:03,462-Speed 5117.35 samples/sec Loss 1.1370 LearningRate 0.0078 Epoch: 14 Global Step: 240590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:25:05,454-Speed 5141.50 samples/sec Loss 1.1202 LearningRate 0.0078 Epoch: 14 Global Step: 240600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:25:07,432-Speed 5180.17 samples/sec Loss 1.1005 LearningRate 0.0078 Epoch: 14 Global Step: 240610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:25:09,405-Speed 5190.18 samples/sec Loss 1.1530 LearningRate 0.0078 Epoch: 14 Global Step: 240620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:25:11,390-Speed 5161.52 samples/sec Loss 1.1400 LearningRate 0.0078 Epoch: 14 Global Step: 240630 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:25:13,362-Speed 5194.53 samples/sec Loss 1.1124 LearningRate 0.0078 Epoch: 14 Global Step: 240640 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:25:15,344-Speed 5167.78 samples/sec Loss 1.0924 LearningRate 0.0078 Epoch: 14 Global Step: 240650 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:25:17,318-Speed 5188.77 samples/sec Loss 1.0872 LearningRate 0.0078 Epoch: 14 Global Step: 240660 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:25:19,293-Speed 5188.25 samples/sec Loss 1.1085 LearningRate 0.0078 Epoch: 14 Global Step: 240670 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:25:21,264-Speed 5195.91 samples/sec Loss 1.1177 LearningRate 0.0078 Epoch: 14 Global Step: 240680 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:25:23,263-Speed 5123.94 samples/sec Loss 1.1062 LearningRate 0.0078 Epoch: 14 Global Step: 240690 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:25:25,242-Speed 5177.37 samples/sec Loss 1.0727 LearningRate 0.0078 Epoch: 14 Global Step: 240700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:25:27,223-Speed 5170.33 samples/sec Loss 1.0789 LearningRate 0.0078 Epoch: 14 Global Step: 240710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:25:29,222-Speed 5124.99 samples/sec Loss 1.1053 LearningRate 0.0078 Epoch: 14 Global Step: 240720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:25:31,192-Speed 5198.82 samples/sec Loss 1.0751 LearningRate 0.0078 Epoch: 14 Global Step: 240730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:25:33,176-Speed 5163.71 samples/sec Loss 1.0909 LearningRate 0.0078 Epoch: 14 Global Step: 240740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:25:35,163-Speed 5155.39 samples/sec Loss 1.1130 LearningRate 0.0078 Epoch: 14 Global Step: 240750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:25:37,143-Speed 5174.78 samples/sec Loss 1.1101 LearningRate 0.0078 Epoch: 14 Global Step: 240760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:25:39,121-Speed 5178.54 samples/sec Loss 1.0842 LearningRate 0.0078 Epoch: 14 Global Step: 240770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:25:41,098-Speed 5179.52 samples/sec Loss 1.1081 LearningRate 0.0078 Epoch: 14 Global Step: 240780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:25:43,079-Speed 5171.75 samples/sec Loss 1.1109 LearningRate 0.0078 Epoch: 14 Global Step: 240790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:25:45,074-Speed 5133.87 samples/sec Loss 1.0894 LearningRate 0.0078 Epoch: 14 Global Step: 240800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:25:47,065-Speed 5144.71 samples/sec Loss 1.1305 LearningRate 0.0078 Epoch: 14 Global Step: 240810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:25:49,050-Speed 5162.43 samples/sec Loss 1.0797 LearningRate 0.0078 Epoch: 14 Global Step: 240820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:25:51,074-Speed 5060.04 samples/sec Loss 1.0972 LearningRate 0.0078 Epoch: 14 Global Step: 240830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:25:53,055-Speed 5171.69 samples/sec Loss 1.1073 LearningRate 0.0078 Epoch: 14 Global Step: 240840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:25:55,033-Speed 5179.44 samples/sec Loss 1.1465 LearningRate 0.0078 Epoch: 14 Global Step: 240850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:25:57,023-Speed 5146.84 samples/sec Loss 1.1114 LearningRate 0.0078 Epoch: 14 Global Step: 240860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:25:59,000-Speed 5181.48 samples/sec Loss 1.0772 LearningRate 0.0078 Epoch: 14 Global Step: 240870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:26:00,994-Speed 5137.97 samples/sec Loss 1.1158 LearningRate 0.0078 Epoch: 14 Global Step: 240880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:26:02,975-Speed 5169.55 samples/sec Loss 1.1241 LearningRate 0.0077 Epoch: 14 Global Step: 240890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:26:04,960-Speed 5159.21 samples/sec Loss 1.1063 LearningRate 0.0077 Epoch: 14 Global Step: 240900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:26:06,950-Speed 5147.83 samples/sec Loss 1.1037 LearningRate 0.0077 Epoch: 14 Global Step: 240910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:26:08,925-Speed 5185.36 samples/sec Loss 1.1275 LearningRate 0.0077 Epoch: 14 Global Step: 240920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:26:10,903-Speed 5180.61 samples/sec Loss 1.0745 LearningRate 0.0077 Epoch: 14 Global Step: 240930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:26:12,894-Speed 5145.55 samples/sec Loss 1.1164 LearningRate 0.0077 Epoch: 14 Global Step: 240940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:26:14,899-Speed 5107.29 samples/sec Loss 1.0998 LearningRate 0.0077 Epoch: 14 Global Step: 240950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:26:16,873-Speed 5189.89 samples/sec Loss 1.0905 LearningRate 0.0077 Epoch: 14 Global Step: 240960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:26:18,862-Speed 5151.48 samples/sec Loss 1.0973 LearningRate 0.0077 Epoch: 14 Global Step: 240970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:26:20,845-Speed 5164.05 samples/sec Loss 1.1537 LearningRate 0.0077 Epoch: 14 Global Step: 240980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:26:22,846-Speed 5119.71 samples/sec Loss 1.1063 LearningRate 0.0077 Epoch: 14 Global Step: 240990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:26:24,829-Speed 5164.87 samples/sec Loss 1.0634 LearningRate 0.0077 Epoch: 14 Global Step: 241000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:26:26,823-Speed 5138.88 samples/sec Loss 1.1014 LearningRate 0.0077 Epoch: 14 Global Step: 241010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:26:28,797-Speed 5187.84 samples/sec Loss 1.1429 LearningRate 0.0077 Epoch: 14 Global Step: 241020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:26:30,782-Speed 5160.01 samples/sec Loss 1.1120 LearningRate 0.0077 Epoch: 14 Global Step: 241030 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:26:32,768-Speed 5157.38 samples/sec Loss 1.1092 LearningRate 0.0077 Epoch: 14 Global Step: 241040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:26:34,756-Speed 5155.52 samples/sec Loss 1.1202 LearningRate 0.0077 Epoch: 14 Global Step: 241050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:26:36,725-Speed 5201.22 samples/sec Loss 1.0560 LearningRate 0.0077 Epoch: 14 Global Step: 241060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:26:38,750-Speed 5059.30 samples/sec Loss 1.1237 LearningRate 0.0077 Epoch: 14 Global Step: 241070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:26:40,732-Speed 5165.65 samples/sec Loss 1.1802 LearningRate 0.0077 Epoch: 14 Global Step: 241080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:26:42,713-Speed 5172.51 samples/sec Loss 1.0783 LearningRate 0.0077 Epoch: 14 Global Step: 241090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:26:44,688-Speed 5187.21 samples/sec Loss 1.1097 LearningRate 0.0077 Epoch: 14 Global Step: 241100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:26:46,665-Speed 5179.44 samples/sec Loss 1.0939 LearningRate 0.0077 Epoch: 14 Global Step: 241110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:26:48,659-Speed 5138.92 samples/sec Loss 1.1318 LearningRate 0.0077 Epoch: 14 Global Step: 241120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:26:50,642-Speed 5164.72 samples/sec Loss 1.0620 LearningRate 0.0077 Epoch: 14 Global Step: 241130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:26:52,625-Speed 5164.72 samples/sec Loss 1.0615 LearningRate 0.0077 Epoch: 14 Global Step: 241140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:26:54,610-Speed 5162.52 samples/sec Loss 1.1178 LearningRate 0.0077 Epoch: 14 Global Step: 241150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:26:56,602-Speed 5141.15 samples/sec Loss 1.0785 LearningRate 0.0077 Epoch: 14 Global Step: 241160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:26:58,577-Speed 5188.01 samples/sec Loss 1.1511 LearningRate 0.0077 Epoch: 14 Global Step: 241170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:27:00,574-Speed 5129.17 samples/sec Loss 1.1372 LearningRate 0.0077 Epoch: 14 Global Step: 241180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:27:02,556-Speed 5169.18 samples/sec Loss 1.1188 LearningRate 0.0077 Epoch: 14 Global Step: 241190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:27:04,534-Speed 5177.60 samples/sec Loss 1.1346 LearningRate 0.0077 Epoch: 14 Global Step: 241200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:27:06,516-Speed 5167.42 samples/sec Loss 1.1102 LearningRate 0.0077 Epoch: 14 Global Step: 241210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:27:08,491-Speed 5187.48 samples/sec Loss 1.1674 LearningRate 0.0077 Epoch: 14 Global Step: 241220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:27:10,477-Speed 5157.42 samples/sec Loss 1.0944 LearningRate 0.0077 Epoch: 14 Global Step: 241230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:27:12,463-Speed 5158.50 samples/sec Loss 1.1343 LearningRate 0.0077 Epoch: 14 Global Step: 241240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:27:14,449-Speed 5156.57 samples/sec Loss 1.0821 LearningRate 0.0077 Epoch: 14 Global Step: 241250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:27:16,439-Speed 5148.34 samples/sec Loss 1.0914 LearningRate 0.0077 Epoch: 14 Global Step: 241260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:27:18,434-Speed 5133.91 samples/sec Loss 1.0767 LearningRate 0.0077 Epoch: 14 Global Step: 241270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:27:20,437-Speed 5117.87 samples/sec Loss 1.1389 LearningRate 0.0077 Epoch: 14 Global Step: 241280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:27:22,427-Speed 5146.08 samples/sec Loss 1.1235 LearningRate 0.0077 Epoch: 14 Global Step: 241290 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:27:24,444-Speed 5079.21 samples/sec Loss 1.1400 LearningRate 0.0077 Epoch: 14 Global Step: 241300 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:27:26,426-Speed 5168.68 samples/sec Loss 1.0752 LearningRate 0.0077 Epoch: 14 Global Step: 241310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:27:28,415-Speed 5150.96 samples/sec Loss 1.1562 LearningRate 0.0077 Epoch: 14 Global Step: 241320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:27:30,406-Speed 5147.31 samples/sec Loss 1.1320 LearningRate 0.0077 Epoch: 14 Global Step: 241330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:27:32,383-Speed 5179.95 samples/sec Loss 1.0918 LearningRate 0.0077 Epoch: 14 Global Step: 241340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:27:34,361-Speed 5179.66 samples/sec Loss 1.1287 LearningRate 0.0077 Epoch: 14 Global Step: 241350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:27:36,354-Speed 5138.53 samples/sec Loss 1.1280 LearningRate 0.0077 Epoch: 14 Global Step: 241360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:27:38,360-Speed 5105.85 samples/sec Loss 1.0733 LearningRate 0.0077 Epoch: 14 Global Step: 241370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:27:40,343-Speed 5167.79 samples/sec Loss 1.0797 LearningRate 0.0077 Epoch: 14 Global Step: 241380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:27:42,320-Speed 5182.08 samples/sec Loss 1.1497 LearningRate 0.0077 Epoch: 14 Global Step: 241390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:27:44,302-Speed 5166.16 samples/sec Loss 1.1781 LearningRate 0.0077 Epoch: 14 Global Step: 241400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:27:46,291-Speed 5151.20 samples/sec Loss 1.1635 LearningRate 0.0077 Epoch: 14 Global Step: 241410 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:27:48,283-Speed 5142.26 samples/sec Loss 1.1664 LearningRate 0.0077 Epoch: 14 Global Step: 241420 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:27:50,255-Speed 5193.80 samples/sec Loss 1.1550 LearningRate 0.0077 Epoch: 14 Global Step: 241430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:27:52,236-Speed 5172.61 samples/sec Loss 1.1527 LearningRate 0.0077 Epoch: 14 Global Step: 241440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:27:54,220-Speed 5161.22 samples/sec Loss 1.1244 LearningRate 0.0077 Epoch: 14 Global Step: 241450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:27:56,208-Speed 5153.63 samples/sec Loss 1.0762 LearningRate 0.0077 Epoch: 14 Global Step: 241460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:27:58,195-Speed 5156.20 samples/sec Loss 1.1244 LearningRate 0.0077 Epoch: 14 Global Step: 241470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:28:00,177-Speed 5167.57 samples/sec Loss 1.0814 LearningRate 0.0077 Epoch: 14 Global Step: 241480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:28:02,160-Speed 5166.42 samples/sec Loss 1.1187 LearningRate 0.0076 Epoch: 14 Global Step: 241490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:28:04,153-Speed 5138.74 samples/sec Loss 1.1299 LearningRate 0.0076 Epoch: 14 Global Step: 241500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:28:06,128-Speed 5187.28 samples/sec Loss 1.0914 LearningRate 0.0076 Epoch: 14 Global Step: 241510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:28:08,102-Speed 5189.03 samples/sec Loss 1.0808 LearningRate 0.0076 Epoch: 14 Global Step: 241520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:28:10,081-Speed 5174.67 samples/sec Loss 1.1498 LearningRate 0.0076 Epoch: 14 Global Step: 241530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:28:12,073-Speed 5144.24 samples/sec Loss 1.0833 LearningRate 0.0076 Epoch: 14 Global Step: 241540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:28:14,045-Speed 5192.56 samples/sec Loss 1.1586 LearningRate 0.0076 Epoch: 14 Global Step: 241550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:28:16,032-Speed 5154.33 samples/sec Loss 1.1110 LearningRate 0.0076 Epoch: 14 Global Step: 241560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:28:18,014-Speed 5170.34 samples/sec Loss 1.0624 LearningRate 0.0076 Epoch: 14 Global Step: 241570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:28:19,990-Speed 5184.25 samples/sec Loss 1.0820 LearningRate 0.0076 Epoch: 14 Global Step: 241580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:28:21,966-Speed 5183.04 samples/sec Loss 1.1087 LearningRate 0.0076 Epoch: 14 Global Step: 241590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:28:23,954-Speed 5153.76 samples/sec Loss 1.1328 LearningRate 0.0076 Epoch: 14 Global Step: 241600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:28:25,945-Speed 5144.65 samples/sec Loss 1.1341 LearningRate 0.0076 Epoch: 14 Global Step: 241610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:28:27,932-Speed 5154.39 samples/sec Loss 1.1284 LearningRate 0.0076 Epoch: 14 Global Step: 241620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:28:29,921-Speed 5150.06 samples/sec Loss 1.1589 LearningRate 0.0076 Epoch: 14 Global Step: 241630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:28:31,895-Speed 5188.72 samples/sec Loss 1.1414 LearningRate 0.0076 Epoch: 14 Global Step: 241640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:28:33,873-Speed 5179.41 samples/sec Loss 1.1290 LearningRate 0.0076 Epoch: 14 Global Step: 241650 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:28:35,869-Speed 5132.31 samples/sec Loss 1.1278 LearningRate 0.0076 Epoch: 14 Global Step: 241660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:28:37,851-Speed 5167.24 samples/sec Loss 1.0811 LearningRate 0.0076 Epoch: 14 Global Step: 241670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:28:39,840-Speed 5151.37 samples/sec Loss 1.1277 LearningRate 0.0076 Epoch: 14 Global Step: 241680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:28:41,834-Speed 5136.50 samples/sec Loss 1.1655 LearningRate 0.0076 Epoch: 14 Global Step: 241690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:28:43,825-Speed 5147.38 samples/sec Loss 1.0838 LearningRate 0.0076 Epoch: 14 Global Step: 241700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:28:45,804-Speed 5176.06 samples/sec Loss 1.1309 LearningRate 0.0076 Epoch: 14 Global Step: 241710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:28:47,809-Speed 5109.82 samples/sec Loss 1.1590 LearningRate 0.0076 Epoch: 14 Global Step: 241720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:28:49,806-Speed 5132.51 samples/sec Loss 1.1067 LearningRate 0.0076 Epoch: 14 Global Step: 241730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:28:51,786-Speed 5172.55 samples/sec Loss 1.1261 LearningRate 0.0076 Epoch: 14 Global Step: 241740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:28:53,766-Speed 5173.68 samples/sec Loss 1.1270 LearningRate 0.0076 Epoch: 14 Global Step: 241750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:28:55,743-Speed 5181.33 samples/sec Loss 1.1506 LearningRate 0.0076 Epoch: 14 Global Step: 241760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:28:57,735-Speed 5144.02 samples/sec Loss 1.1274 LearningRate 0.0076 Epoch: 14 Global Step: 241770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:28:59,717-Speed 5166.62 samples/sec Loss 1.1188 LearningRate 0.0076 Epoch: 14 Global Step: 241780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:29:01,698-Speed 5172.86 samples/sec Loss 1.0928 LearningRate 0.0076 Epoch: 14 Global Step: 241790 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:29:03,694-Speed 5131.45 samples/sec Loss 1.1237 LearningRate 0.0076 Epoch: 14 Global Step: 241800 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:29:05,685-Speed 5144.26 samples/sec Loss 1.1636 LearningRate 0.0076 Epoch: 14 Global Step: 241810 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:29:07,663-Speed 5179.40 samples/sec Loss 1.1092 LearningRate 0.0076 Epoch: 14 Global Step: 241820 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:29:09,639-Speed 5182.95 samples/sec Loss 1.1261 LearningRate 0.0076 Epoch: 14 Global Step: 241830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:29:11,607-Speed 5205.47 samples/sec Loss 1.1626 LearningRate 0.0076 Epoch: 14 Global Step: 241840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:29:13,582-Speed 5186.14 samples/sec Loss 1.1176 LearningRate 0.0076 Epoch: 14 Global Step: 241850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:29:15,596-Speed 5085.86 samples/sec Loss 1.1307 LearningRate 0.0076 Epoch: 14 Global Step: 241860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:29:17,575-Speed 5178.07 samples/sec Loss 1.1530 LearningRate 0.0076 Epoch: 14 Global Step: 241870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:29:19,553-Speed 5178.35 samples/sec Loss 1.1429 LearningRate 0.0076 Epoch: 14 Global Step: 241880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:29:21,555-Speed 5118.42 samples/sec Loss 1.1191 LearningRate 0.0076 Epoch: 14 Global Step: 241890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:29:23,532-Speed 5178.98 samples/sec Loss 1.1287 LearningRate 0.0076 Epoch: 14 Global Step: 241900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:29:25,534-Speed 5119.09 samples/sec Loss 1.1178 LearningRate 0.0076 Epoch: 14 Global Step: 241910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:29:27,521-Speed 5155.51 samples/sec Loss 1.1525 LearningRate 0.0076 Epoch: 14 Global Step: 241920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:29:29,506-Speed 5159.84 samples/sec Loss 1.1076 LearningRate 0.0076 Epoch: 14 Global Step: 241930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:29:31,487-Speed 5170.84 samples/sec Loss 1.1436 LearningRate 0.0076 Epoch: 14 Global Step: 241940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:29:33,487-Speed 5122.28 samples/sec Loss 1.0857 LearningRate 0.0076 Epoch: 14 Global Step: 241950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:29:35,486-Speed 5124.26 samples/sec Loss 1.1042 LearningRate 0.0076 Epoch: 14 Global Step: 241960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:29:37,485-Speed 5125.64 samples/sec Loss 1.1067 LearningRate 0.0076 Epoch: 14 Global Step: 241970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:29:39,471-Speed 5156.25 samples/sec Loss 1.1481 LearningRate 0.0076 Epoch: 14 Global Step: 241980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:29:41,456-Speed 5160.78 samples/sec Loss 1.1045 LearningRate 0.0076 Epoch: 14 Global Step: 241990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:29:43,433-Speed 5181.29 samples/sec Loss 1.1126 LearningRate 0.0076 Epoch: 14 Global Step: 242000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:30:10,059-[lfw][242000]XNorm: 22.496307 Training: 2022-04-11 15:30:10,060-[lfw][242000]Accuracy-Flip: 0.99800+-0.00296 Training: 2022-04-11 15:30:10,060-[lfw][242000]Accuracy-Highest: 0.99833 Training: 2022-04-11 15:30:40,834-[cfp_fp][242000]XNorm: 21.783606 Training: 2022-04-11 15:30:40,835-[cfp_fp][242000]Accuracy-Flip: 0.98800+-0.00448 Training: 2022-04-11 15:30:40,835-[cfp_fp][242000]Accuracy-Highest: 0.98857 Training: 2022-04-11 15:31:07,321-[agedb_30][242000]XNorm: 23.068935 Training: 2022-04-11 15:31:07,325-[agedb_30][242000]Accuracy-Flip: 0.98183+-0.00758 Training: 2022-04-11 15:31:07,325-[agedb_30][242000]Accuracy-Highest: 0.98300 Training: 2022-04-11 15:31:09,309-Speed 119.24 samples/sec Loss 1.1188 LearningRate 0.0076 Epoch: 14 Global Step: 242010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:31:11,306-Speed 5129.70 samples/sec Loss 1.1151 LearningRate 0.0076 Epoch: 14 Global Step: 242020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:31:13,275-Speed 5202.67 samples/sec Loss 1.1190 LearningRate 0.0076 Epoch: 14 Global Step: 242030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:31:15,331-Speed 4983.01 samples/sec Loss 1.1395 LearningRate 0.0076 Epoch: 14 Global Step: 242040 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:31:17,334-Speed 5113.98 samples/sec Loss 1.1272 LearningRate 0.0076 Epoch: 14 Global Step: 242050 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:31:19,322-Speed 5152.65 samples/sec Loss 1.1186 LearningRate 0.0076 Epoch: 14 Global Step: 242060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:31:21,296-Speed 5198.69 samples/sec Loss 1.1483 LearningRate 0.0076 Epoch: 14 Global Step: 242070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:31:23,272-Speed 5183.36 samples/sec Loss 1.0901 LearningRate 0.0076 Epoch: 14 Global Step: 242080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:31:25,272-Speed 5120.45 samples/sec Loss 1.1199 LearningRate 0.0076 Epoch: 14 Global Step: 242090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:31:27,252-Speed 5173.29 samples/sec Loss 1.1076 LearningRate 0.0075 Epoch: 14 Global Step: 242100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:31:29,247-Speed 5136.35 samples/sec Loss 1.1180 LearningRate 0.0075 Epoch: 14 Global Step: 242110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:31:31,229-Speed 5169.06 samples/sec Loss 1.0971 LearningRate 0.0075 Epoch: 14 Global Step: 242120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:31:33,199-Speed 5198.71 samples/sec Loss 1.1340 LearningRate 0.0075 Epoch: 14 Global Step: 242130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:31:35,181-Speed 5167.93 samples/sec Loss 1.1206 LearningRate 0.0075 Epoch: 14 Global Step: 242140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:31:37,165-Speed 5164.59 samples/sec Loss 1.1441 LearningRate 0.0075 Epoch: 14 Global Step: 242150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:31:39,139-Speed 5188.62 samples/sec Loss 1.1795 LearningRate 0.0075 Epoch: 14 Global Step: 242160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:31:41,130-Speed 5142.95 samples/sec Loss 1.1288 LearningRate 0.0075 Epoch: 14 Global Step: 242170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:31:43,116-Speed 5157.85 samples/sec Loss 1.1365 LearningRate 0.0075 Epoch: 14 Global Step: 242180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:31:45,108-Speed 5142.54 samples/sec Loss 1.0992 LearningRate 0.0075 Epoch: 14 Global Step: 242190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:31:47,084-Speed 5185.21 samples/sec Loss 1.1617 LearningRate 0.0075 Epoch: 14 Global Step: 242200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:31:49,095-Speed 5092.76 samples/sec Loss 1.0926 LearningRate 0.0075 Epoch: 14 Global Step: 242210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:31:51,079-Speed 5163.69 samples/sec Loss 1.0929 LearningRate 0.0075 Epoch: 14 Global Step: 242220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:31:53,048-Speed 5203.56 samples/sec Loss 1.1097 LearningRate 0.0075 Epoch: 14 Global Step: 242230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:31:55,031-Speed 5166.11 samples/sec Loss 1.1267 LearningRate 0.0075 Epoch: 14 Global Step: 242240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:31:57,009-Speed 5176.88 samples/sec Loss 1.1165 LearningRate 0.0075 Epoch: 14 Global Step: 242250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:31:59,000-Speed 5144.94 samples/sec Loss 1.1484 LearningRate 0.0075 Epoch: 14 Global Step: 242260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:32:01,047-Speed 5005.36 samples/sec Loss 1.1510 LearningRate 0.0075 Epoch: 14 Global Step: 242270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:32:03,028-Speed 5170.36 samples/sec Loss 1.1196 LearningRate 0.0075 Epoch: 14 Global Step: 242280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:32:05,029-Speed 5121.02 samples/sec Loss 1.0889 LearningRate 0.0075 Epoch: 14 Global Step: 242290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:32:07,016-Speed 5156.15 samples/sec Loss 1.1630 LearningRate 0.0075 Epoch: 14 Global Step: 242300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:32:08,989-Speed 5191.84 samples/sec Loss 1.1306 LearningRate 0.0075 Epoch: 14 Global Step: 242310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:32:10,988-Speed 5124.82 samples/sec Loss 1.1148 LearningRate 0.0075 Epoch: 14 Global Step: 242320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:32:12,986-Speed 5127.53 samples/sec Loss 1.1418 LearningRate 0.0075 Epoch: 14 Global Step: 242330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:32:14,965-Speed 5175.07 samples/sec Loss 1.1226 LearningRate 0.0075 Epoch: 14 Global Step: 242340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:32:16,957-Speed 5142.14 samples/sec Loss 1.1817 LearningRate 0.0075 Epoch: 14 Global Step: 242350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:32:18,937-Speed 5173.86 samples/sec Loss 1.1258 LearningRate 0.0075 Epoch: 14 Global Step: 242360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:32:20,914-Speed 5182.76 samples/sec Loss 1.1421 LearningRate 0.0075 Epoch: 14 Global Step: 242370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:32:22,907-Speed 5139.53 samples/sec Loss 1.1201 LearningRate 0.0075 Epoch: 14 Global Step: 242380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:32:24,890-Speed 5165.91 samples/sec Loss 1.1399 LearningRate 0.0075 Epoch: 14 Global Step: 242390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:32:26,873-Speed 5164.78 samples/sec Loss 1.1212 LearningRate 0.0075 Epoch: 14 Global Step: 242400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:32:28,851-Speed 5179.80 samples/sec Loss 1.1332 LearningRate 0.0075 Epoch: 14 Global Step: 242410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:32:30,834-Speed 5164.20 samples/sec Loss 1.1188 LearningRate 0.0075 Epoch: 14 Global Step: 242420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:32:32,818-Speed 5163.96 samples/sec Loss 1.1161 LearningRate 0.0075 Epoch: 14 Global Step: 242430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:32:34,799-Speed 5171.48 samples/sec Loss 1.1313 LearningRate 0.0075 Epoch: 14 Global Step: 242440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:32:36,775-Speed 5182.76 samples/sec Loss 1.1486 LearningRate 0.0075 Epoch: 14 Global Step: 242450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:32:38,748-Speed 5191.64 samples/sec Loss 1.1255 LearningRate 0.0075 Epoch: 14 Global Step: 242460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:32:40,726-Speed 5177.78 samples/sec Loss 1.1636 LearningRate 0.0075 Epoch: 14 Global Step: 242470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:32:42,701-Speed 5189.58 samples/sec Loss 1.0982 LearningRate 0.0075 Epoch: 14 Global Step: 242480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:32:44,676-Speed 5185.18 samples/sec Loss 1.1339 LearningRate 0.0075 Epoch: 14 Global Step: 242490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:32:46,655-Speed 5176.03 samples/sec Loss 1.1444 LearningRate 0.0075 Epoch: 14 Global Step: 242500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:32:48,638-Speed 5165.52 samples/sec Loss 1.1066 LearningRate 0.0075 Epoch: 14 Global Step: 242510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:32:50,617-Speed 5176.98 samples/sec Loss 1.1493 LearningRate 0.0075 Epoch: 14 Global Step: 242520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:32:52,600-Speed 5164.49 samples/sec Loss 1.1042 LearningRate 0.0075 Epoch: 14 Global Step: 242530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:32:54,589-Speed 5150.35 samples/sec Loss 1.1281 LearningRate 0.0075 Epoch: 14 Global Step: 242540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:32:56,570-Speed 5170.63 samples/sec Loss 1.1459 LearningRate 0.0075 Epoch: 14 Global Step: 242550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:32:58,591-Speed 5068.58 samples/sec Loss 1.0998 LearningRate 0.0075 Epoch: 14 Global Step: 242560 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:33:00,583-Speed 5144.65 samples/sec Loss 1.1627 LearningRate 0.0075 Epoch: 14 Global Step: 242570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:33:02,570-Speed 5155.96 samples/sec Loss 1.1423 LearningRate 0.0075 Epoch: 14 Global Step: 242580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:33:04,584-Speed 5084.85 samples/sec Loss 1.1395 LearningRate 0.0075 Epoch: 14 Global Step: 242590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:33:06,574-Speed 5149.95 samples/sec Loss 1.1315 LearningRate 0.0075 Epoch: 14 Global Step: 242600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:33:08,560-Speed 5157.49 samples/sec Loss 1.1795 LearningRate 0.0075 Epoch: 14 Global Step: 242610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:33:10,549-Speed 5149.87 samples/sec Loss 1.1209 LearningRate 0.0075 Epoch: 14 Global Step: 242620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:33:12,529-Speed 5173.71 samples/sec Loss 1.1618 LearningRate 0.0075 Epoch: 14 Global Step: 242630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:33:14,504-Speed 5187.69 samples/sec Loss 1.1603 LearningRate 0.0075 Epoch: 14 Global Step: 242640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:33:16,489-Speed 5159.92 samples/sec Loss 1.0822 LearningRate 0.0075 Epoch: 14 Global Step: 242650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:33:18,472-Speed 5166.25 samples/sec Loss 1.1257 LearningRate 0.0075 Epoch: 14 Global Step: 242660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:33:20,457-Speed 5160.56 samples/sec Loss 1.1350 LearningRate 0.0075 Epoch: 14 Global Step: 242670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:33:22,449-Speed 5143.49 samples/sec Loss 1.1588 LearningRate 0.0075 Epoch: 14 Global Step: 242680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:33:24,458-Speed 5099.95 samples/sec Loss 1.1295 LearningRate 0.0075 Epoch: 14 Global Step: 242690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:33:26,437-Speed 5175.63 samples/sec Loss 1.1334 LearningRate 0.0075 Epoch: 14 Global Step: 242700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:33:28,454-Speed 5079.30 samples/sec Loss 1.0926 LearningRate 0.0074 Epoch: 14 Global Step: 242710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:33:30,437-Speed 5165.27 samples/sec Loss 1.1193 LearningRate 0.0074 Epoch: 14 Global Step: 242720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:33:32,415-Speed 5179.02 samples/sec Loss 1.1055 LearningRate 0.0074 Epoch: 14 Global Step: 242730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:33:34,403-Speed 5154.42 samples/sec Loss 1.0877 LearningRate 0.0074 Epoch: 14 Global Step: 242740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:33:36,388-Speed 5159.58 samples/sec Loss 1.0919 LearningRate 0.0074 Epoch: 14 Global Step: 242750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:33:38,365-Speed 5182.83 samples/sec Loss 1.0662 LearningRate 0.0074 Epoch: 14 Global Step: 242760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:33:40,338-Speed 5191.42 samples/sec Loss 1.0961 LearningRate 0.0074 Epoch: 14 Global Step: 242770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:33:42,312-Speed 5189.50 samples/sec Loss 1.1212 LearningRate 0.0074 Epoch: 14 Global Step: 242780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:33:44,290-Speed 5177.20 samples/sec Loss 1.1332 LearningRate 0.0074 Epoch: 14 Global Step: 242790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:33:46,274-Speed 5165.76 samples/sec Loss 1.1384 LearningRate 0.0074 Epoch: 14 Global Step: 242800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:33:48,273-Speed 5122.55 samples/sec Loss 1.1116 LearningRate 0.0074 Epoch: 14 Global Step: 242810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:33:50,259-Speed 5158.16 samples/sec Loss 1.1500 LearningRate 0.0074 Epoch: 14 Global Step: 242820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:33:52,238-Speed 5177.79 samples/sec Loss 1.1270 LearningRate 0.0074 Epoch: 14 Global Step: 242830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:33:54,218-Speed 5173.09 samples/sec Loss 1.1347 LearningRate 0.0074 Epoch: 14 Global Step: 242840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:33:56,190-Speed 5193.44 samples/sec Loss 1.1108 LearningRate 0.0074 Epoch: 14 Global Step: 242850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:33:58,182-Speed 5141.82 samples/sec Loss 1.1048 LearningRate 0.0074 Epoch: 14 Global Step: 242860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:34:00,165-Speed 5165.90 samples/sec Loss 1.1591 LearningRate 0.0074 Epoch: 14 Global Step: 242870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:34:02,150-Speed 5159.72 samples/sec Loss 1.0893 LearningRate 0.0074 Epoch: 14 Global Step: 242880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:34:04,133-Speed 5167.53 samples/sec Loss 1.1401 LearningRate 0.0074 Epoch: 14 Global Step: 242890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:34:06,108-Speed 5187.23 samples/sec Loss 1.1146 LearningRate 0.0074 Epoch: 14 Global Step: 242900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:34:08,089-Speed 5170.22 samples/sec Loss 1.1538 LearningRate 0.0074 Epoch: 14 Global Step: 242910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:34:10,076-Speed 5154.51 samples/sec Loss 1.1353 LearningRate 0.0074 Epoch: 14 Global Step: 242920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:34:12,057-Speed 5171.34 samples/sec Loss 1.0700 LearningRate 0.0074 Epoch: 14 Global Step: 242930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:34:14,101-Speed 5012.50 samples/sec Loss 1.1677 LearningRate 0.0074 Epoch: 14 Global Step: 242940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:34:16,089-Speed 5151.08 samples/sec Loss 1.1637 LearningRate 0.0074 Epoch: 14 Global Step: 242950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:34:18,086-Speed 5130.31 samples/sec Loss 1.1260 LearningRate 0.0074 Epoch: 14 Global Step: 242960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:34:20,060-Speed 5189.60 samples/sec Loss 1.1079 LearningRate 0.0074 Epoch: 14 Global Step: 242970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:34:22,048-Speed 5153.50 samples/sec Loss 1.1167 LearningRate 0.0074 Epoch: 14 Global Step: 242980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:34:24,023-Speed 5187.23 samples/sec Loss 1.1075 LearningRate 0.0074 Epoch: 14 Global Step: 242990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:34:26,015-Speed 5140.55 samples/sec Loss 1.0702 LearningRate 0.0074 Epoch: 14 Global Step: 243000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:34:27,998-Speed 5166.23 samples/sec Loss 1.1110 LearningRate 0.0074 Epoch: 14 Global Step: 243010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:34:29,973-Speed 5187.79 samples/sec Loss 1.1261 LearningRate 0.0074 Epoch: 14 Global Step: 243020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:34:31,942-Speed 5203.07 samples/sec Loss 1.1012 LearningRate 0.0074 Epoch: 14 Global Step: 243030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:34:33,917-Speed 5185.91 samples/sec Loss 1.1459 LearningRate 0.0074 Epoch: 14 Global Step: 243040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:34:35,902-Speed 5160.89 samples/sec Loss 1.1148 LearningRate 0.0074 Epoch: 14 Global Step: 243050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:34:37,889-Speed 5155.25 samples/sec Loss 1.1329 LearningRate 0.0074 Epoch: 14 Global Step: 243060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:34:39,911-Speed 5064.42 samples/sec Loss 1.1240 LearningRate 0.0074 Epoch: 14 Global Step: 243070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:34:41,897-Speed 5158.71 samples/sec Loss 1.1388 LearningRate 0.0074 Epoch: 14 Global Step: 243080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:34:43,895-Speed 5129.00 samples/sec Loss 1.1118 LearningRate 0.0074 Epoch: 14 Global Step: 243090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:34:45,883-Speed 5154.50 samples/sec Loss 1.1636 LearningRate 0.0074 Epoch: 14 Global Step: 243100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:34:47,871-Speed 5153.60 samples/sec Loss 1.1806 LearningRate 0.0074 Epoch: 14 Global Step: 243110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:34:49,845-Speed 5187.02 samples/sec Loss 1.1508 LearningRate 0.0074 Epoch: 14 Global Step: 243120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:34:51,857-Speed 5093.24 samples/sec Loss 1.1769 LearningRate 0.0074 Epoch: 14 Global Step: 243130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:34:53,830-Speed 5191.51 samples/sec Loss 1.1085 LearningRate 0.0074 Epoch: 14 Global Step: 243140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:34:55,804-Speed 5190.85 samples/sec Loss 1.1329 LearningRate 0.0074 Epoch: 14 Global Step: 243150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:34:57,778-Speed 5189.07 samples/sec Loss 1.1133 LearningRate 0.0074 Epoch: 14 Global Step: 243160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:34:59,776-Speed 5125.92 samples/sec Loss 1.1425 LearningRate 0.0074 Epoch: 14 Global Step: 243170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:01,775-Speed 5125.43 samples/sec Loss 1.1125 LearningRate 0.0074 Epoch: 14 Global Step: 243180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:03,762-Speed 5156.26 samples/sec Loss 1.1344 LearningRate 0.0074 Epoch: 14 Global Step: 243190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:05,743-Speed 5170.20 samples/sec Loss 1.1146 LearningRate 0.0074 Epoch: 14 Global Step: 243200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:07,721-Speed 5178.07 samples/sec Loss 1.1543 LearningRate 0.0074 Epoch: 14 Global Step: 243210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:09,707-Speed 5159.01 samples/sec Loss 1.1951 LearningRate 0.0074 Epoch: 14 Global Step: 243220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:11,689-Speed 5165.96 samples/sec Loss 1.1660 LearningRate 0.0074 Epoch: 14 Global Step: 243230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:13,665-Speed 5185.30 samples/sec Loss 1.1136 LearningRate 0.0074 Epoch: 14 Global Step: 243240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:15,658-Speed 5138.48 samples/sec Loss 1.1326 LearningRate 0.0074 Epoch: 14 Global Step: 243250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:17,665-Speed 5104.73 samples/sec Loss 1.2152 LearningRate 0.0074 Epoch: 14 Global Step: 243260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:19,652-Speed 5155.82 samples/sec Loss 1.1421 LearningRate 0.0074 Epoch: 14 Global Step: 243270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:21,634-Speed 5168.05 samples/sec Loss 1.1719 LearningRate 0.0074 Epoch: 14 Global Step: 243280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:23,628-Speed 5137.68 samples/sec Loss 1.0916 LearningRate 0.0074 Epoch: 14 Global Step: 243290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:25,619-Speed 5144.53 samples/sec Loss 1.1299 LearningRate 0.0074 Epoch: 14 Global Step: 243300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:27,602-Speed 5164.78 samples/sec Loss 1.1302 LearningRate 0.0074 Epoch: 14 Global Step: 243310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:29,617-Speed 5084.02 samples/sec Loss 1.1379 LearningRate 0.0073 Epoch: 14 Global Step: 243320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:31,603-Speed 5158.95 samples/sec Loss 1.1186 LearningRate 0.0073 Epoch: 14 Global Step: 243330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:33,578-Speed 5187.06 samples/sec Loss 1.1556 LearningRate 0.0073 Epoch: 14 Global Step: 243340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:35,575-Speed 5129.84 samples/sec Loss 1.1133 LearningRate 0.0073 Epoch: 14 Global Step: 243350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:37,557-Speed 5167.15 samples/sec Loss 1.0891 LearningRate 0.0073 Epoch: 14 Global Step: 243360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:39,558-Speed 5119.51 samples/sec Loss 1.1146 LearningRate 0.0073 Epoch: 14 Global Step: 243370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:41,545-Speed 5155.38 samples/sec Loss 1.1228 LearningRate 0.0073 Epoch: 14 Global Step: 243380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:43,526-Speed 5169.99 samples/sec Loss 1.0987 LearningRate 0.0073 Epoch: 14 Global Step: 243390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:45,499-Speed 5193.77 samples/sec Loss 1.1147 LearningRate 0.0073 Epoch: 14 Global Step: 243400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:47,479-Speed 5172.84 samples/sec Loss 1.1328 LearningRate 0.0073 Epoch: 14 Global Step: 243410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:49,456-Speed 5181.12 samples/sec Loss 1.1211 LearningRate 0.0073 Epoch: 14 Global Step: 243420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:51,457-Speed 5117.70 samples/sec Loss 1.1908 LearningRate 0.0073 Epoch: 14 Global Step: 243430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:53,438-Speed 5172.23 samples/sec Loss 1.1358 LearningRate 0.0073 Epoch: 14 Global Step: 243440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:35:55,416-Speed 5178.53 samples/sec Loss 1.1117 LearningRate 0.0073 Epoch: 14 Global Step: 243450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:57,393-Speed 5186.34 samples/sec Loss 1.0976 LearningRate 0.0073 Epoch: 14 Global Step: 243460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:35:59,384-Speed 5142.57 samples/sec Loss 1.1144 LearningRate 0.0073 Epoch: 14 Global Step: 243470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:36:01,379-Speed 5135.33 samples/sec Loss 1.1336 LearningRate 0.0073 Epoch: 14 Global Step: 243480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:36:03,368-Speed 5151.41 samples/sec Loss 1.1473 LearningRate 0.0073 Epoch: 14 Global Step: 243490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:36:05,345-Speed 5181.89 samples/sec Loss 1.1592 LearningRate 0.0073 Epoch: 14 Global Step: 243500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:36:07,332-Speed 5154.26 samples/sec Loss 1.1389 LearningRate 0.0073 Epoch: 14 Global Step: 243510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:36:09,322-Speed 5148.10 samples/sec Loss 1.1141 LearningRate 0.0073 Epoch: 14 Global Step: 243520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:36:11,323-Speed 5118.63 samples/sec Loss 1.1550 LearningRate 0.0073 Epoch: 14 Global Step: 243530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:36:13,312-Speed 5147.82 samples/sec Loss 1.1462 LearningRate 0.0073 Epoch: 14 Global Step: 243540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:36:15,301-Speed 5152.29 samples/sec Loss 1.1438 LearningRate 0.0073 Epoch: 14 Global Step: 243550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:36:17,275-Speed 5187.99 samples/sec Loss 1.1341 LearningRate 0.0073 Epoch: 14 Global Step: 243560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:36:19,248-Speed 5193.67 samples/sec Loss 1.1615 LearningRate 0.0073 Epoch: 14 Global Step: 243570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:36:21,225-Speed 5180.71 samples/sec Loss 1.1447 LearningRate 0.0073 Epoch: 14 Global Step: 243580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:36:23,203-Speed 5179.61 samples/sec Loss 1.1359 LearningRate 0.0073 Epoch: 14 Global Step: 243590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:36:25,214-Speed 5093.52 samples/sec Loss 1.1327 LearningRate 0.0073 Epoch: 14 Global Step: 243600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:36:27,207-Speed 5140.29 samples/sec Loss 1.1582 LearningRate 0.0073 Epoch: 14 Global Step: 243610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:36:29,215-Speed 5101.12 samples/sec Loss 1.1337 LearningRate 0.0073 Epoch: 14 Global Step: 243620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:36:31,199-Speed 5161.09 samples/sec Loss 1.0986 LearningRate 0.0073 Epoch: 14 Global Step: 243630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:36:33,175-Speed 5185.34 samples/sec Loss 1.1657 LearningRate 0.0073 Epoch: 14 Global Step: 243640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:36:35,157-Speed 5169.11 samples/sec Loss 1.1090 LearningRate 0.0073 Epoch: 14 Global Step: 243650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:36:37,150-Speed 5139.24 samples/sec Loss 1.1117 LearningRate 0.0073 Epoch: 14 Global Step: 243660 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:36:39,149-Speed 5126.01 samples/sec Loss 1.1159 LearningRate 0.0073 Epoch: 14 Global Step: 243670 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:36:41,147-Speed 5127.13 samples/sec Loss 1.1219 LearningRate 0.0073 Epoch: 14 Global Step: 243680 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:36:43,149-Speed 5115.69 samples/sec Loss 1.0888 LearningRate 0.0073 Epoch: 14 Global Step: 243690 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:36:45,147-Speed 5127.79 samples/sec Loss 1.1292 LearningRate 0.0073 Epoch: 14 Global Step: 243700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:36:47,133-Speed 5158.94 samples/sec Loss 1.1511 LearningRate 0.0073 Epoch: 14 Global Step: 243710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:36:49,111-Speed 5179.15 samples/sec Loss 1.1323 LearningRate 0.0073 Epoch: 14 Global Step: 243720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:36:51,099-Speed 5151.53 samples/sec Loss 1.1306 LearningRate 0.0073 Epoch: 14 Global Step: 243730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:36:53,114-Speed 5083.63 samples/sec Loss 1.1380 LearningRate 0.0073 Epoch: 14 Global Step: 243740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:36:55,101-Speed 5155.75 samples/sec Loss 1.1402 LearningRate 0.0073 Epoch: 14 Global Step: 243750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:36:57,091-Speed 5148.81 samples/sec Loss 1.1139 LearningRate 0.0073 Epoch: 14 Global Step: 243760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:36:59,065-Speed 5189.70 samples/sec Loss 1.1359 LearningRate 0.0073 Epoch: 14 Global Step: 243770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:37:01,065-Speed 5122.26 samples/sec Loss 1.1132 LearningRate 0.0073 Epoch: 14 Global Step: 243780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:37:03,056-Speed 5144.55 samples/sec Loss 1.0830 LearningRate 0.0073 Epoch: 14 Global Step: 243790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:37:05,039-Speed 5164.28 samples/sec Loss 1.2136 LearningRate 0.0073 Epoch: 14 Global Step: 243800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:37:07,031-Speed 5142.51 samples/sec Loss 1.0981 LearningRate 0.0073 Epoch: 14 Global Step: 243810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:37:09,012-Speed 5170.96 samples/sec Loss 1.1285 LearningRate 0.0073 Epoch: 14 Global Step: 243820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:37:11,006-Speed 5137.34 samples/sec Loss 1.1473 LearningRate 0.0073 Epoch: 14 Global Step: 243830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:37:12,995-Speed 5151.07 samples/sec Loss 1.1419 LearningRate 0.0073 Epoch: 14 Global Step: 243840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:37:14,981-Speed 5157.16 samples/sec Loss 1.1329 LearningRate 0.0073 Epoch: 14 Global Step: 243850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:37:16,955-Speed 5189.51 samples/sec Loss 1.1828 LearningRate 0.0073 Epoch: 14 Global Step: 243860 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:37:18,931-Speed 5186.36 samples/sec Loss 1.1611 LearningRate 0.0073 Epoch: 14 Global Step: 243870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:37:20,914-Speed 5165.11 samples/sec Loss 1.0856 LearningRate 0.0073 Epoch: 14 Global Step: 243880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:37:22,893-Speed 5175.75 samples/sec Loss 1.0775 LearningRate 0.0073 Epoch: 14 Global Step: 243890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:37:24,874-Speed 5171.22 samples/sec Loss 1.1775 LearningRate 0.0073 Epoch: 14 Global Step: 243900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:37:26,872-Speed 5126.17 samples/sec Loss 1.1236 LearningRate 0.0073 Epoch: 14 Global Step: 243910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:37:28,861-Speed 5151.29 samples/sec Loss 1.2110 LearningRate 0.0073 Epoch: 14 Global Step: 243920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:37:30,843-Speed 5168.02 samples/sec Loss 1.1012 LearningRate 0.0073 Epoch: 14 Global Step: 243930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:37:32,827-Speed 5162.35 samples/sec Loss 1.1058 LearningRate 0.0072 Epoch: 14 Global Step: 243940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:37:34,802-Speed 5186.50 samples/sec Loss 1.0743 LearningRate 0.0072 Epoch: 14 Global Step: 243950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:37:36,786-Speed 5164.66 samples/sec Loss 1.1417 LearningRate 0.0072 Epoch: 14 Global Step: 243960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:37:38,810-Speed 5061.14 samples/sec Loss 1.1560 LearningRate 0.0072 Epoch: 14 Global Step: 243970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:37:40,786-Speed 5184.80 samples/sec Loss 1.1360 LearningRate 0.0072 Epoch: 14 Global Step: 243980 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:37:42,764-Speed 5178.12 samples/sec Loss 1.1689 LearningRate 0.0072 Epoch: 14 Global Step: 243990 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:37:44,739-Speed 5185.55 samples/sec Loss 1.1331 LearningRate 0.0072 Epoch: 14 Global Step: 244000 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:38:12,053-[lfw][244000]XNorm: 22.342027 Training: 2022-04-11 15:38:12,053-[lfw][244000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 15:38:12,054-[lfw][244000]Accuracy-Highest: 0.99833 Training: 2022-04-11 15:38:42,775-[cfp_fp][244000]XNorm: 21.904956 Training: 2022-04-11 15:38:42,776-[cfp_fp][244000]Accuracy-Flip: 0.98900+-0.00447 Training: 2022-04-11 15:38:42,777-[cfp_fp][244000]Accuracy-Highest: 0.98900 Training: 2022-04-11 15:39:09,375-[agedb_30][244000]XNorm: 23.069361 Training: 2022-04-11 15:39:09,375-[agedb_30][244000]Accuracy-Flip: 0.98283+-0.00817 Training: 2022-04-11 15:39:09,376-[agedb_30][244000]Accuracy-Highest: 0.98300 Training: 2022-04-11 15:39:11,361-Speed 118.22 samples/sec Loss 1.1762 LearningRate 0.0072 Epoch: 14 Global Step: 244010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:39:13,374-Speed 5087.89 samples/sec Loss 1.1682 LearningRate 0.0072 Epoch: 14 Global Step: 244020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:39:15,370-Speed 5133.57 samples/sec Loss 1.1577 LearningRate 0.0072 Epoch: 14 Global Step: 244030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:39:17,351-Speed 5171.24 samples/sec Loss 1.0821 LearningRate 0.0072 Epoch: 14 Global Step: 244040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:39:19,360-Speed 5098.28 samples/sec Loss 1.0951 LearningRate 0.0072 Epoch: 14 Global Step: 244050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:39:21,342-Speed 5169.46 samples/sec Loss 1.1201 LearningRate 0.0072 Epoch: 14 Global Step: 244060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:39:23,324-Speed 5167.08 samples/sec Loss 1.1417 LearningRate 0.0072 Epoch: 14 Global Step: 244070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:39:25,325-Speed 5121.26 samples/sec Loss 1.1758 LearningRate 0.0072 Epoch: 14 Global Step: 244080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:39:27,297-Speed 5195.11 samples/sec Loss 1.1259 LearningRate 0.0072 Epoch: 14 Global Step: 244090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:39:29,277-Speed 5172.58 samples/sec Loss 1.1860 LearningRate 0.0072 Epoch: 14 Global Step: 244100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:39:31,252-Speed 5186.04 samples/sec Loss 1.1524 LearningRate 0.0072 Epoch: 14 Global Step: 244110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:39:33,235-Speed 5166.70 samples/sec Loss 1.1120 LearningRate 0.0072 Epoch: 14 Global Step: 244120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:39:35,223-Speed 5152.28 samples/sec Loss 1.1319 LearningRate 0.0072 Epoch: 14 Global Step: 244130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:39:37,217-Speed 5138.94 samples/sec Loss 1.1984 LearningRate 0.0072 Epoch: 14 Global Step: 244140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:39:39,200-Speed 5163.76 samples/sec Loss 1.1674 LearningRate 0.0072 Epoch: 14 Global Step: 244150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:39:41,188-Speed 5154.70 samples/sec Loss 1.1057 LearningRate 0.0072 Epoch: 14 Global Step: 244160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:39:43,160-Speed 5192.01 samples/sec Loss 1.0868 LearningRate 0.0072 Epoch: 14 Global Step: 244170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:39:45,131-Speed 5197.60 samples/sec Loss 1.1642 LearningRate 0.0072 Epoch: 14 Global Step: 244180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:39:47,105-Speed 5190.65 samples/sec Loss 1.1303 LearningRate 0.0072 Epoch: 14 Global Step: 244190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:39:49,107-Speed 5116.48 samples/sec Loss 1.1669 LearningRate 0.0072 Epoch: 14 Global Step: 244200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:39:51,088-Speed 5170.46 samples/sec Loss 1.1767 LearningRate 0.0072 Epoch: 14 Global Step: 244210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:39:53,061-Speed 5191.30 samples/sec Loss 1.1143 LearningRate 0.0072 Epoch: 14 Global Step: 244220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:39:55,031-Speed 5199.51 samples/sec Loss 1.1559 LearningRate 0.0072 Epoch: 14 Global Step: 244230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:39:57,004-Speed 5193.49 samples/sec Loss 1.1504 LearningRate 0.0072 Epoch: 14 Global Step: 244240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:39:58,994-Speed 5148.16 samples/sec Loss 1.1171 LearningRate 0.0072 Epoch: 14 Global Step: 244250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:40:00,977-Speed 5164.74 samples/sec Loss 1.1352 LearningRate 0.0072 Epoch: 14 Global Step: 244260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:40:02,997-Speed 5070.62 samples/sec Loss 1.1294 LearningRate 0.0072 Epoch: 14 Global Step: 244270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:40:05,016-Speed 5076.45 samples/sec Loss 1.1391 LearningRate 0.0072 Epoch: 14 Global Step: 244280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:40:06,991-Speed 5189.17 samples/sec Loss 1.1449 LearningRate 0.0072 Epoch: 14 Global Step: 244290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:40:08,965-Speed 5188.55 samples/sec Loss 1.1362 LearningRate 0.0072 Epoch: 14 Global Step: 244300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:40:10,960-Speed 5134.62 samples/sec Loss 1.1560 LearningRate 0.0072 Epoch: 14 Global Step: 244310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:40:12,950-Speed 5145.96 samples/sec Loss 1.1540 LearningRate 0.0072 Epoch: 14 Global Step: 244320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:40:14,936-Speed 5159.90 samples/sec Loss 1.1565 LearningRate 0.0072 Epoch: 14 Global Step: 244330 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:40:16,928-Speed 5140.31 samples/sec Loss 1.1356 LearningRate 0.0072 Epoch: 14 Global Step: 244340 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:40:18,922-Speed 5137.40 samples/sec Loss 1.1830 LearningRate 0.0072 Epoch: 14 Global Step: 244350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:40:20,930-Speed 5104.13 samples/sec Loss 1.1214 LearningRate 0.0072 Epoch: 14 Global Step: 244360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:40:22,925-Speed 5133.11 samples/sec Loss 1.1389 LearningRate 0.0072 Epoch: 14 Global Step: 244370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:40:24,961-Speed 5032.42 samples/sec Loss 1.1757 LearningRate 0.0072 Epoch: 14 Global Step: 244380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:40:26,965-Speed 5111.31 samples/sec Loss 1.1513 LearningRate 0.0072 Epoch: 14 Global Step: 244390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:40:28,967-Speed 5117.32 samples/sec Loss 1.1691 LearningRate 0.0072 Epoch: 14 Global Step: 244400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:40:30,941-Speed 5189.02 samples/sec Loss 1.1021 LearningRate 0.0072 Epoch: 14 Global Step: 244410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:40:32,942-Speed 5120.25 samples/sec Loss 1.1739 LearningRate 0.0072 Epoch: 14 Global Step: 244420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:40:34,926-Speed 5164.10 samples/sec Loss 1.1578 LearningRate 0.0072 Epoch: 14 Global Step: 244430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:40:36,919-Speed 5138.62 samples/sec Loss 1.1533 LearningRate 0.0072 Epoch: 14 Global Step: 244440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:40:38,903-Speed 5164.42 samples/sec Loss 1.1318 LearningRate 0.0072 Epoch: 14 Global Step: 244450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:40:40,937-Speed 5036.38 samples/sec Loss 1.1570 LearningRate 0.0072 Epoch: 14 Global Step: 244460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:40:42,952-Speed 5083.07 samples/sec Loss 1.1650 LearningRate 0.0072 Epoch: 14 Global Step: 244470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:40:44,933-Speed 5171.99 samples/sec Loss 1.1637 LearningRate 0.0072 Epoch: 14 Global Step: 244480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:40:46,907-Speed 5187.23 samples/sec Loss 1.1001 LearningRate 0.0072 Epoch: 14 Global Step: 244490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:40:48,907-Speed 5122.83 samples/sec Loss 1.1251 LearningRate 0.0072 Epoch: 14 Global Step: 244500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:40:50,882-Speed 5189.24 samples/sec Loss 1.1230 LearningRate 0.0072 Epoch: 14 Global Step: 244510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:40:52,857-Speed 5185.83 samples/sec Loss 1.1414 LearningRate 0.0072 Epoch: 14 Global Step: 244520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:40:54,830-Speed 5190.29 samples/sec Loss 1.1255 LearningRate 0.0072 Epoch: 14 Global Step: 244530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:40:56,818-Speed 5153.05 samples/sec Loss 1.1748 LearningRate 0.0072 Epoch: 14 Global Step: 244540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:40:58,827-Speed 5099.96 samples/sec Loss 1.1552 LearningRate 0.0072 Epoch: 14 Global Step: 244550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:41:00,801-Speed 5190.29 samples/sec Loss 1.1147 LearningRate 0.0071 Epoch: 14 Global Step: 244560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:41:02,781-Speed 5172.05 samples/sec Loss 1.1454 LearningRate 0.0071 Epoch: 14 Global Step: 244570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:41:04,758-Speed 5183.12 samples/sec Loss 1.1516 LearningRate 0.0071 Epoch: 14 Global Step: 244580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:41:06,755-Speed 5127.64 samples/sec Loss 1.1190 LearningRate 0.0071 Epoch: 14 Global Step: 244590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:41:08,748-Speed 5141.06 samples/sec Loss 1.1392 LearningRate 0.0071 Epoch: 14 Global Step: 244600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:41:10,725-Speed 5180.29 samples/sec Loss 1.0932 LearningRate 0.0071 Epoch: 14 Global Step: 244610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:41:12,714-Speed 5149.77 samples/sec Loss 1.1183 LearningRate 0.0071 Epoch: 14 Global Step: 244620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:41:14,722-Speed 5102.48 samples/sec Loss 1.1452 LearningRate 0.0071 Epoch: 14 Global Step: 244630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:41:16,701-Speed 5175.32 samples/sec Loss 1.1693 LearningRate 0.0071 Epoch: 14 Global Step: 244640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:41:18,693-Speed 5143.98 samples/sec Loss 1.0668 LearningRate 0.0071 Epoch: 14 Global Step: 244650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:41:20,673-Speed 5171.92 samples/sec Loss 1.1258 LearningRate 0.0071 Epoch: 14 Global Step: 244660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:41:22,656-Speed 5165.76 samples/sec Loss 1.0992 LearningRate 0.0071 Epoch: 14 Global Step: 244670 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:41:24,640-Speed 5162.91 samples/sec Loss 1.0918 LearningRate 0.0071 Epoch: 14 Global Step: 244680 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:41:26,609-Speed 5203.50 samples/sec Loss 1.1545 LearningRate 0.0071 Epoch: 14 Global Step: 244690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:41:28,589-Speed 5174.92 samples/sec Loss 1.1534 LearningRate 0.0071 Epoch: 14 Global Step: 244700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:41:30,565-Speed 5182.60 samples/sec Loss 1.1531 LearningRate 0.0071 Epoch: 14 Global Step: 244710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:41:32,560-Speed 5133.33 samples/sec Loss 1.1266 LearningRate 0.0071 Epoch: 14 Global Step: 244720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:41:34,565-Speed 5111.58 samples/sec Loss 1.1570 LearningRate 0.0071 Epoch: 14 Global Step: 244730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:41:36,548-Speed 5165.52 samples/sec Loss 1.1044 LearningRate 0.0071 Epoch: 14 Global Step: 244740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:41:38,536-Speed 5152.15 samples/sec Loss 1.1360 LearningRate 0.0071 Epoch: 14 Global Step: 244750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:41:40,514-Speed 5178.34 samples/sec Loss 1.1191 LearningRate 0.0071 Epoch: 14 Global Step: 244760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:41:42,490-Speed 5185.01 samples/sec Loss 1.1305 LearningRate 0.0071 Epoch: 14 Global Step: 244770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:41:44,466-Speed 5184.31 samples/sec Loss 1.1435 LearningRate 0.0071 Epoch: 14 Global Step: 244780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:41:46,450-Speed 5162.05 samples/sec Loss 1.1358 LearningRate 0.0071 Epoch: 14 Global Step: 244790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:41:48,434-Speed 5163.51 samples/sec Loss 1.1573 LearningRate 0.0071 Epoch: 14 Global Step: 244800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:41:50,425-Speed 5143.79 samples/sec Loss 1.1617 LearningRate 0.0071 Epoch: 14 Global Step: 244810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:41:52,432-Speed 5104.74 samples/sec Loss 1.1020 LearningRate 0.0071 Epoch: 14 Global Step: 244820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:41:54,409-Speed 5180.19 samples/sec Loss 1.1433 LearningRate 0.0071 Epoch: 14 Global Step: 244830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:41:56,385-Speed 5184.29 samples/sec Loss 1.1906 LearningRate 0.0071 Epoch: 14 Global Step: 244840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:41:58,385-Speed 5123.79 samples/sec Loss 1.1622 LearningRate 0.0071 Epoch: 14 Global Step: 244850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:42:00,409-Speed 5060.58 samples/sec Loss 1.1414 LearningRate 0.0071 Epoch: 14 Global Step: 244860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:42:02,396-Speed 5157.10 samples/sec Loss 1.1209 LearningRate 0.0071 Epoch: 14 Global Step: 244870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:42:04,376-Speed 5171.64 samples/sec Loss 1.1317 LearningRate 0.0071 Epoch: 14 Global Step: 244880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:42:06,354-Speed 5178.54 samples/sec Loss 1.1748 LearningRate 0.0071 Epoch: 14 Global Step: 244890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:42:08,336-Speed 5168.64 samples/sec Loss 1.1672 LearningRate 0.0071 Epoch: 14 Global Step: 244900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:42:10,331-Speed 5134.69 samples/sec Loss 1.1816 LearningRate 0.0071 Epoch: 14 Global Step: 244910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:42:12,312-Speed 5170.66 samples/sec Loss 1.1446 LearningRate 0.0071 Epoch: 14 Global Step: 244920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:42:14,301-Speed 5149.84 samples/sec Loss 1.1350 LearningRate 0.0071 Epoch: 14 Global Step: 244930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:42:16,283-Speed 5170.61 samples/sec Loss 1.1089 LearningRate 0.0071 Epoch: 14 Global Step: 244940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:42:18,269-Speed 5157.34 samples/sec Loss 1.1615 LearningRate 0.0071 Epoch: 14 Global Step: 244950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:42:20,250-Speed 5172.77 samples/sec Loss 1.1548 LearningRate 0.0071 Epoch: 14 Global Step: 244960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:42:22,224-Speed 5188.09 samples/sec Loss 1.0964 LearningRate 0.0071 Epoch: 14 Global Step: 244970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:42:24,203-Speed 5175.41 samples/sec Loss 1.1295 LearningRate 0.0071 Epoch: 14 Global Step: 244980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:42:26,200-Speed 5130.26 samples/sec Loss 1.1442 LearningRate 0.0071 Epoch: 14 Global Step: 244990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:42:28,200-Speed 5122.19 samples/sec Loss 1.1592 LearningRate 0.0071 Epoch: 14 Global Step: 245000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:42:30,213-Speed 5086.74 samples/sec Loss 1.1738 LearningRate 0.0071 Epoch: 14 Global Step: 245010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:42:32,237-Speed 5062.54 samples/sec Loss 1.1651 LearningRate 0.0071 Epoch: 14 Global Step: 245020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:42:34,206-Speed 5204.31 samples/sec Loss 1.1470 LearningRate 0.0071 Epoch: 14 Global Step: 245030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:42:36,194-Speed 5152.76 samples/sec Loss 1.1769 LearningRate 0.0071 Epoch: 14 Global Step: 245040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:42:38,178-Speed 5162.56 samples/sec Loss 1.1532 LearningRate 0.0071 Epoch: 14 Global Step: 245050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:42:40,169-Speed 5146.04 samples/sec Loss 1.1657 LearningRate 0.0071 Epoch: 14 Global Step: 245060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:42:42,152-Speed 5164.99 samples/sec Loss 1.1166 LearningRate 0.0071 Epoch: 14 Global Step: 245070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:42:44,126-Speed 5189.32 samples/sec Loss 1.1179 LearningRate 0.0071 Epoch: 14 Global Step: 245080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:42:46,114-Speed 5153.31 samples/sec Loss 1.1344 LearningRate 0.0071 Epoch: 14 Global Step: 245090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:42:48,104-Speed 5146.97 samples/sec Loss 1.1297 LearningRate 0.0071 Epoch: 14 Global Step: 245100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:42:50,094-Speed 5147.98 samples/sec Loss 1.1560 LearningRate 0.0071 Epoch: 14 Global Step: 245110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:42:52,079-Speed 5160.83 samples/sec Loss 1.1349 LearningRate 0.0071 Epoch: 14 Global Step: 245120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:42:54,061-Speed 5170.20 samples/sec Loss 1.1253 LearningRate 0.0071 Epoch: 14 Global Step: 245130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:42:56,050-Speed 5148.27 samples/sec Loss 1.1502 LearningRate 0.0071 Epoch: 14 Global Step: 245140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:42:58,034-Speed 5164.79 samples/sec Loss 1.1708 LearningRate 0.0071 Epoch: 14 Global Step: 245150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:43:00,014-Speed 5172.66 samples/sec Loss 1.1894 LearningRate 0.0071 Epoch: 14 Global Step: 245160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:01,995-Speed 5170.52 samples/sec Loss 1.1106 LearningRate 0.0071 Epoch: 14 Global Step: 245170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:03,994-Speed 5125.06 samples/sec Loss 1.1308 LearningRate 0.0071 Epoch: 14 Global Step: 245180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:05,982-Speed 5153.79 samples/sec Loss 1.1272 LearningRate 0.0070 Epoch: 14 Global Step: 245190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:07,960-Speed 5178.86 samples/sec Loss 1.1853 LearningRate 0.0070 Epoch: 14 Global Step: 245200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:09,945-Speed 5159.07 samples/sec Loss 1.0978 LearningRate 0.0070 Epoch: 14 Global Step: 245210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:11,932-Speed 5154.43 samples/sec Loss 1.1155 LearningRate 0.0070 Epoch: 14 Global Step: 245220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:13,908-Speed 5184.49 samples/sec Loss 1.1891 LearningRate 0.0070 Epoch: 14 Global Step: 245230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:15,884-Speed 5184.45 samples/sec Loss 1.0973 LearningRate 0.0070 Epoch: 14 Global Step: 245240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:17,869-Speed 5161.18 samples/sec Loss 1.1778 LearningRate 0.0070 Epoch: 14 Global Step: 245250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:19,852-Speed 5166.46 samples/sec Loss 1.1488 LearningRate 0.0070 Epoch: 14 Global Step: 245260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:43:21,840-Speed 5151.45 samples/sec Loss 1.1150 LearningRate 0.0070 Epoch: 14 Global Step: 245270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:23,825-Speed 5161.98 samples/sec Loss 1.1226 LearningRate 0.0070 Epoch: 14 Global Step: 245280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:25,822-Speed 5128.96 samples/sec Loss 1.1336 LearningRate 0.0070 Epoch: 14 Global Step: 245290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:27,803-Speed 5170.56 samples/sec Loss 1.1261 LearningRate 0.0070 Epoch: 14 Global Step: 245300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:29,779-Speed 5183.09 samples/sec Loss 1.1350 LearningRate 0.0070 Epoch: 14 Global Step: 245310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:31,753-Speed 5191.40 samples/sec Loss 1.1103 LearningRate 0.0070 Epoch: 14 Global Step: 245320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:33,731-Speed 5177.35 samples/sec Loss 1.1330 LearningRate 0.0070 Epoch: 14 Global Step: 245330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:35,719-Speed 5153.44 samples/sec Loss 1.1418 LearningRate 0.0070 Epoch: 14 Global Step: 245340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:37,711-Speed 5140.73 samples/sec Loss 1.1639 LearningRate 0.0070 Epoch: 14 Global Step: 245350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:39,690-Speed 5178.15 samples/sec Loss 1.1366 LearningRate 0.0070 Epoch: 14 Global Step: 245360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:41,662-Speed 5194.88 samples/sec Loss 1.1244 LearningRate 0.0070 Epoch: 14 Global Step: 245370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:43,640-Speed 5177.19 samples/sec Loss 1.1468 LearningRate 0.0070 Epoch: 14 Global Step: 245380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:45,616-Speed 5185.55 samples/sec Loss 1.1706 LearningRate 0.0070 Epoch: 14 Global Step: 245390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:47,608-Speed 5141.53 samples/sec Loss 1.1512 LearningRate 0.0070 Epoch: 14 Global Step: 245400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:49,589-Speed 5171.69 samples/sec Loss 1.1404 LearningRate 0.0070 Epoch: 14 Global Step: 245410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:51,562-Speed 5190.05 samples/sec Loss 1.1619 LearningRate 0.0070 Epoch: 14 Global Step: 245420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:53,570-Speed 5101.94 samples/sec Loss 1.1734 LearningRate 0.0070 Epoch: 14 Global Step: 245430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:55,560-Speed 5147.44 samples/sec Loss 1.1298 LearningRate 0.0070 Epoch: 14 Global Step: 245440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:57,566-Speed 5106.42 samples/sec Loss 1.1300 LearningRate 0.0070 Epoch: 14 Global Step: 245450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:43:59,549-Speed 5166.30 samples/sec Loss 1.1088 LearningRate 0.0070 Epoch: 14 Global Step: 245460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:44:01,536-Speed 5154.97 samples/sec Loss 1.1317 LearningRate 0.0070 Epoch: 14 Global Step: 245470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:44:03,508-Speed 5195.50 samples/sec Loss 1.2089 LearningRate 0.0070 Epoch: 14 Global Step: 245480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:44:05,495-Speed 5154.27 samples/sec Loss 1.1611 LearningRate 0.0070 Epoch: 14 Global Step: 245490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:44:07,471-Speed 5184.22 samples/sec Loss 1.1382 LearningRate 0.0070 Epoch: 14 Global Step: 245500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:44:09,451-Speed 5174.20 samples/sec Loss 1.1245 LearningRate 0.0070 Epoch: 14 Global Step: 245510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:44:11,429-Speed 5178.42 samples/sec Loss 1.1211 LearningRate 0.0070 Epoch: 14 Global Step: 245520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:44:13,444-Speed 5082.55 samples/sec Loss 1.1608 LearningRate 0.0070 Epoch: 14 Global Step: 245530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:44:15,433-Speed 5150.11 samples/sec Loss 1.1334 LearningRate 0.0070 Epoch: 14 Global Step: 245540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:44:17,408-Speed 5186.14 samples/sec Loss 1.1584 LearningRate 0.0070 Epoch: 14 Global Step: 245550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:44:19,389-Speed 5173.68 samples/sec Loss 1.1198 LearningRate 0.0070 Epoch: 14 Global Step: 245560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:44:21,371-Speed 5168.00 samples/sec Loss 1.1874 LearningRate 0.0070 Epoch: 14 Global Step: 245570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:44:23,403-Speed 5042.11 samples/sec Loss 1.1394 LearningRate 0.0070 Epoch: 14 Global Step: 245580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:44:25,375-Speed 5195.27 samples/sec Loss 1.0935 LearningRate 0.0070 Epoch: 14 Global Step: 245590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:44:27,376-Speed 5119.78 samples/sec Loss 1.1136 LearningRate 0.0070 Epoch: 14 Global Step: 245600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:44:29,370-Speed 5135.28 samples/sec Loss 1.1859 LearningRate 0.0070 Epoch: 14 Global Step: 245610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:44:31,347-Speed 5181.42 samples/sec Loss 1.1645 LearningRate 0.0070 Epoch: 14 Global Step: 245620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:44:33,342-Speed 5137.26 samples/sec Loss 1.1285 LearningRate 0.0070 Epoch: 14 Global Step: 245630 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:44:35,354-Speed 5091.30 samples/sec Loss 1.1088 LearningRate 0.0070 Epoch: 14 Global Step: 245640 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:44:37,333-Speed 5174.23 samples/sec Loss 1.1671 LearningRate 0.0070 Epoch: 14 Global Step: 245650 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:44:39,315-Speed 5168.11 samples/sec Loss 1.1222 LearningRate 0.0070 Epoch: 14 Global Step: 245660 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:44:41,304-Speed 5152.68 samples/sec Loss 1.1205 LearningRate 0.0070 Epoch: 14 Global Step: 245670 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:44:43,283-Speed 5176.21 samples/sec Loss 1.0733 LearningRate 0.0070 Epoch: 14 Global Step: 245680 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:44:45,256-Speed 5192.20 samples/sec Loss 1.1228 LearningRate 0.0070 Epoch: 14 Global Step: 245690 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:44:47,233-Speed 5181.91 samples/sec Loss 1.1507 LearningRate 0.0070 Epoch: 14 Global Step: 245700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:44:49,230-Speed 5129.08 samples/sec Loss 1.1351 LearningRate 0.0070 Epoch: 14 Global Step: 245710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:44:51,205-Speed 5185.91 samples/sec Loss 1.1215 LearningRate 0.0070 Epoch: 14 Global Step: 245720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:44:53,178-Speed 5189.98 samples/sec Loss 1.1431 LearningRate 0.0070 Epoch: 14 Global Step: 245730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:44:55,151-Speed 5193.79 samples/sec Loss 1.1148 LearningRate 0.0070 Epoch: 14 Global Step: 245740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:44:57,128-Speed 5182.39 samples/sec Loss 1.1241 LearningRate 0.0070 Epoch: 14 Global Step: 245750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:44:59,112-Speed 5163.73 samples/sec Loss 1.1353 LearningRate 0.0070 Epoch: 14 Global Step: 245760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:45:01,085-Speed 5190.98 samples/sec Loss 1.1351 LearningRate 0.0070 Epoch: 14 Global Step: 245770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:45:03,064-Speed 5177.13 samples/sec Loss 1.1377 LearningRate 0.0070 Epoch: 14 Global Step: 245780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:45:05,042-Speed 5176.97 samples/sec Loss 1.0779 LearningRate 0.0070 Epoch: 14 Global Step: 245790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:45:07,043-Speed 5121.38 samples/sec Loss 1.1088 LearningRate 0.0070 Epoch: 14 Global Step: 245800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:45:09,022-Speed 5176.69 samples/sec Loss 1.0948 LearningRate 0.0070 Epoch: 14 Global Step: 245810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:45:10,996-Speed 5190.89 samples/sec Loss 1.1203 LearningRate 0.0069 Epoch: 14 Global Step: 245820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:45:12,979-Speed 5165.57 samples/sec Loss 1.1688 LearningRate 0.0069 Epoch: 14 Global Step: 245830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:45:14,952-Speed 5191.32 samples/sec Loss 1.1642 LearningRate 0.0069 Epoch: 14 Global Step: 245840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:45:16,926-Speed 5189.12 samples/sec Loss 1.1346 LearningRate 0.0069 Epoch: 14 Global Step: 245850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:45:18,901-Speed 5187.05 samples/sec Loss 1.1449 LearningRate 0.0069 Epoch: 14 Global Step: 245860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:45:20,885-Speed 5164.56 samples/sec Loss 1.1198 LearningRate 0.0069 Epoch: 14 Global Step: 245870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:45:22,868-Speed 5163.81 samples/sec Loss 1.1264 LearningRate 0.0069 Epoch: 14 Global Step: 245880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:45:24,860-Speed 5143.88 samples/sec Loss 1.1318 LearningRate 0.0069 Epoch: 14 Global Step: 245890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:45:26,842-Speed 5169.69 samples/sec Loss 1.1667 LearningRate 0.0069 Epoch: 14 Global Step: 245900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:45:28,833-Speed 5144.91 samples/sec Loss 1.1860 LearningRate 0.0069 Epoch: 14 Global Step: 245910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:45:30,821-Speed 5151.04 samples/sec Loss 1.1610 LearningRate 0.0069 Epoch: 14 Global Step: 245920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:45:32,797-Speed 5184.54 samples/sec Loss 1.1313 LearningRate 0.0069 Epoch: 14 Global Step: 245930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:45:34,771-Speed 5189.32 samples/sec Loss 1.1672 LearningRate 0.0069 Epoch: 14 Global Step: 245940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:45:36,754-Speed 5166.06 samples/sec Loss 1.1548 LearningRate 0.0069 Epoch: 14 Global Step: 245950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:45:38,737-Speed 5165.04 samples/sec Loss 1.1745 LearningRate 0.0069 Epoch: 14 Global Step: 245960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:45:40,725-Speed 5152.65 samples/sec Loss 1.1812 LearningRate 0.0069 Epoch: 14 Global Step: 245970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:45:42,705-Speed 5173.16 samples/sec Loss 1.1024 LearningRate 0.0069 Epoch: 14 Global Step: 245980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:45:44,685-Speed 5172.49 samples/sec Loss 1.1281 LearningRate 0.0069 Epoch: 14 Global Step: 245990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:45:46,691-Speed 5107.41 samples/sec Loss 1.1558 LearningRate 0.0069 Epoch: 14 Global Step: 246000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:46:13,297-[lfw][246000]XNorm: 21.998836 Training: 2022-04-11 15:46:13,298-[lfw][246000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 15:46:13,298-[lfw][246000]Accuracy-Highest: 0.99833 Training: 2022-04-11 15:46:44,332-[cfp_fp][246000]XNorm: 21.857475 Training: 2022-04-11 15:46:44,333-[cfp_fp][246000]Accuracy-Flip: 0.98914+-0.00415 Training: 2022-04-11 15:46:44,333-[cfp_fp][246000]Accuracy-Highest: 0.98914 Training: 2022-04-11 15:47:10,973-[agedb_30][246000]XNorm: 22.710397 Training: 2022-04-11 15:47:10,974-[agedb_30][246000]Accuracy-Flip: 0.98183+-0.00673 Training: 2022-04-11 15:47:10,974-[agedb_30][246000]Accuracy-Highest: 0.98300 Training: 2022-04-11 15:47:12,968-Speed 118.69 samples/sec Loss 1.1745 LearningRate 0.0069 Epoch: 14 Global Step: 246010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:47:14,938-Speed 5201.00 samples/sec Loss 1.1057 LearningRate 0.0069 Epoch: 14 Global Step: 246020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:47:16,912-Speed 5189.16 samples/sec Loss 1.1204 LearningRate 0.0069 Epoch: 14 Global Step: 246030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:47:18,871-Speed 5226.99 samples/sec Loss 1.1233 LearningRate 0.0069 Epoch: 14 Global Step: 246040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:47:20,844-Speed 5196.39 samples/sec Loss 1.1373 LearningRate 0.0069 Epoch: 14 Global Step: 246050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:47:22,817-Speed 5192.31 samples/sec Loss 1.1633 LearningRate 0.0069 Epoch: 14 Global Step: 246060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:47:24,788-Speed 5197.23 samples/sec Loss 1.1433 LearningRate 0.0069 Epoch: 14 Global Step: 246070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:47:26,765-Speed 5178.52 samples/sec Loss 1.0972 LearningRate 0.0069 Epoch: 14 Global Step: 246080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:47:28,750-Speed 5162.09 samples/sec Loss 1.1844 LearningRate 0.0069 Epoch: 14 Global Step: 246090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:47:30,731-Speed 5171.71 samples/sec Loss 1.0762 LearningRate 0.0069 Epoch: 14 Global Step: 246100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:47:32,730-Speed 5124.22 samples/sec Loss 1.1616 LearningRate 0.0069 Epoch: 14 Global Step: 246110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:47:34,705-Speed 5184.96 samples/sec Loss 1.0924 LearningRate 0.0069 Epoch: 14 Global Step: 246120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:47:36,685-Speed 5175.69 samples/sec Loss 1.1212 LearningRate 0.0069 Epoch: 14 Global Step: 246130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:47:38,696-Speed 5092.88 samples/sec Loss 1.1476 LearningRate 0.0069 Epoch: 14 Global Step: 246140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:47:40,688-Speed 5142.01 samples/sec Loss 1.1455 LearningRate 0.0069 Epoch: 14 Global Step: 246150 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:47:42,657-Speed 5202.82 samples/sec Loss 1.1890 LearningRate 0.0069 Epoch: 14 Global Step: 246160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:47:44,656-Speed 5125.67 samples/sec Loss 1.1138 LearningRate 0.0069 Epoch: 14 Global Step: 246170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:47:46,661-Speed 5109.07 samples/sec Loss 1.1701 LearningRate 0.0069 Epoch: 14 Global Step: 246180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:47:48,688-Speed 5052.52 samples/sec Loss 1.1503 LearningRate 0.0069 Epoch: 14 Global Step: 246190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:47:50,673-Speed 5162.58 samples/sec Loss 1.1142 LearningRate 0.0069 Epoch: 14 Global Step: 246200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:47:52,689-Speed 5081.24 samples/sec Loss 1.0622 LearningRate 0.0069 Epoch: 14 Global Step: 246210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:47:54,668-Speed 5177.41 samples/sec Loss 1.1125 LearningRate 0.0069 Epoch: 14 Global Step: 246220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:47:56,681-Speed 5087.47 samples/sec Loss 1.2028 LearningRate 0.0069 Epoch: 14 Global Step: 246230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:47:58,653-Speed 5196.88 samples/sec Loss 1.1191 LearningRate 0.0069 Epoch: 14 Global Step: 246240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:48:00,672-Speed 5072.02 samples/sec Loss 1.1395 LearningRate 0.0069 Epoch: 14 Global Step: 246250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:48:02,657-Speed 5160.58 samples/sec Loss 1.1439 LearningRate 0.0069 Epoch: 14 Global Step: 246260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:48:04,644-Speed 5155.81 samples/sec Loss 1.1686 LearningRate 0.0069 Epoch: 14 Global Step: 246270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:48:06,616-Speed 5195.14 samples/sec Loss 1.1579 LearningRate 0.0069 Epoch: 14 Global Step: 246280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:48:08,635-Speed 5073.39 samples/sec Loss 1.1766 LearningRate 0.0069 Epoch: 14 Global Step: 246290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:48:10,640-Speed 5109.50 samples/sec Loss 1.1330 LearningRate 0.0069 Epoch: 14 Global Step: 246300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:48:12,624-Speed 5164.97 samples/sec Loss 1.1794 LearningRate 0.0069 Epoch: 14 Global Step: 246310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:48:14,606-Speed 5165.94 samples/sec Loss 1.1037 LearningRate 0.0069 Epoch: 14 Global Step: 246320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:48:16,601-Speed 5137.32 samples/sec Loss 1.1287 LearningRate 0.0069 Epoch: 14 Global Step: 246330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:48:18,585-Speed 5161.76 samples/sec Loss 1.1724 LearningRate 0.0069 Epoch: 14 Global Step: 246340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:48:20,561-Speed 5185.02 samples/sec Loss 1.1202 LearningRate 0.0069 Epoch: 14 Global Step: 246350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:48:22,552-Speed 5144.35 samples/sec Loss 1.1131 LearningRate 0.0069 Epoch: 14 Global Step: 246360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:48:24,532-Speed 5172.41 samples/sec Loss 1.0972 LearningRate 0.0069 Epoch: 14 Global Step: 246370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:48:26,505-Speed 5192.06 samples/sec Loss 1.1477 LearningRate 0.0069 Epoch: 14 Global Step: 246380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:48:28,481-Speed 5185.80 samples/sec Loss 1.1601 LearningRate 0.0069 Epoch: 14 Global Step: 246390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:48:30,454-Speed 5192.78 samples/sec Loss 1.1716 LearningRate 0.0069 Epoch: 14 Global Step: 246400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:48:32,424-Speed 5200.16 samples/sec Loss 1.1137 LearningRate 0.0069 Epoch: 14 Global Step: 246410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:48:34,403-Speed 5174.06 samples/sec Loss 1.1321 LearningRate 0.0069 Epoch: 14 Global Step: 246420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:48:36,399-Speed 5132.53 samples/sec Loss 1.1321 LearningRate 0.0069 Epoch: 14 Global Step: 246430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:48:38,440-Speed 5019.34 samples/sec Loss 1.1609 LearningRate 0.0069 Epoch: 14 Global Step: 246440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:48:40,429-Speed 5149.07 samples/sec Loss 1.1294 LearningRate 0.0069 Epoch: 14 Global Step: 246450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:48:42,412-Speed 5167.34 samples/sec Loss 1.1529 LearningRate 0.0068 Epoch: 14 Global Step: 246460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:48:44,388-Speed 5183.20 samples/sec Loss 1.1220 LearningRate 0.0068 Epoch: 14 Global Step: 246470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:48:46,393-Speed 5110.22 samples/sec Loss 1.1613 LearningRate 0.0068 Epoch: 14 Global Step: 246480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:48:48,385-Speed 5142.00 samples/sec Loss 1.1176 LearningRate 0.0068 Epoch: 14 Global Step: 246490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:48:50,366-Speed 5172.53 samples/sec Loss 1.1089 LearningRate 0.0068 Epoch: 14 Global Step: 246500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:48:52,346-Speed 5174.16 samples/sec Loss 1.1124 LearningRate 0.0068 Epoch: 14 Global Step: 246510 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:48:54,316-Speed 5199.53 samples/sec Loss 1.1129 LearningRate 0.0068 Epoch: 14 Global Step: 246520 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:48:56,286-Speed 5199.09 samples/sec Loss 1.1467 LearningRate 0.0068 Epoch: 14 Global Step: 246530 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:48:58,265-Speed 5176.52 samples/sec Loss 1.1236 LearningRate 0.0068 Epoch: 14 Global Step: 246540 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:49:00,237-Speed 5196.07 samples/sec Loss 1.1415 LearningRate 0.0068 Epoch: 14 Global Step: 246550 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:49:02,242-Speed 5106.83 samples/sec Loss 1.1142 LearningRate 0.0068 Epoch: 14 Global Step: 246560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:49:04,245-Speed 5113.57 samples/sec Loss 1.1263 LearningRate 0.0068 Epoch: 14 Global Step: 246570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:49:06,225-Speed 5176.14 samples/sec Loss 1.1166 LearningRate 0.0068 Epoch: 14 Global Step: 246580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:49:08,202-Speed 5180.84 samples/sec Loss 1.0921 LearningRate 0.0068 Epoch: 14 Global Step: 246590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:49:10,189-Speed 5154.02 samples/sec Loss 1.1779 LearningRate 0.0068 Epoch: 14 Global Step: 246600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:49:12,177-Speed 5153.05 samples/sec Loss 1.1555 LearningRate 0.0068 Epoch: 14 Global Step: 246610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:49:14,155-Speed 5178.93 samples/sec Loss 1.1511 LearningRate 0.0068 Epoch: 14 Global Step: 246620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:49:16,132-Speed 5180.86 samples/sec Loss 1.0795 LearningRate 0.0068 Epoch: 14 Global Step: 246630 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:49:18,109-Speed 5183.66 samples/sec Loss 1.1207 LearningRate 0.0068 Epoch: 14 Global Step: 246640 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:49:20,088-Speed 5175.37 samples/sec Loss 1.1399 LearningRate 0.0068 Epoch: 14 Global Step: 246650 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:49:22,083-Speed 5134.83 samples/sec Loss 1.0971 LearningRate 0.0068 Epoch: 14 Global Step: 246660 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:49:24,059-Speed 5183.90 samples/sec Loss 1.1332 LearningRate 0.0068 Epoch: 14 Global Step: 246670 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:49:26,042-Speed 5165.37 samples/sec Loss 1.1441 LearningRate 0.0068 Epoch: 14 Global Step: 246680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:49:28,024-Speed 5169.17 samples/sec Loss 1.1132 LearningRate 0.0068 Epoch: 14 Global Step: 246690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:49:30,009-Speed 5159.41 samples/sec Loss 1.1112 LearningRate 0.0068 Epoch: 14 Global Step: 246700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:49:31,982-Speed 5192.23 samples/sec Loss 1.1401 LearningRate 0.0068 Epoch: 14 Global Step: 246710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:49:33,952-Speed 5199.26 samples/sec Loss 1.1549 LearningRate 0.0068 Epoch: 14 Global Step: 246720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:49:35,929-Speed 5180.91 samples/sec Loss 1.1256 LearningRate 0.0068 Epoch: 14 Global Step: 246730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:49:37,907-Speed 5179.70 samples/sec Loss 1.1124 LearningRate 0.0068 Epoch: 14 Global Step: 246740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:49:39,892-Speed 5159.94 samples/sec Loss 1.1520 LearningRate 0.0068 Epoch: 14 Global Step: 246750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:49:41,886-Speed 5138.52 samples/sec Loss 1.1730 LearningRate 0.0068 Epoch: 14 Global Step: 246760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:49:43,857-Speed 5196.81 samples/sec Loss 1.1173 LearningRate 0.0068 Epoch: 14 Global Step: 246770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:49:45,845-Speed 5151.66 samples/sec Loss 1.1727 LearningRate 0.0068 Epoch: 14 Global Step: 246780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:49:47,828-Speed 5165.63 samples/sec Loss 1.1366 LearningRate 0.0068 Epoch: 14 Global Step: 246790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:49:49,801-Speed 5192.56 samples/sec Loss 1.1378 LearningRate 0.0068 Epoch: 14 Global Step: 246800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:49:51,785-Speed 5163.75 samples/sec Loss 1.1795 LearningRate 0.0068 Epoch: 14 Global Step: 246810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:49:53,765-Speed 5174.53 samples/sec Loss 1.1627 LearningRate 0.0068 Epoch: 14 Global Step: 246820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:49:55,752-Speed 5154.56 samples/sec Loss 1.1883 LearningRate 0.0068 Epoch: 14 Global Step: 246830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:49:57,739-Speed 5156.18 samples/sec Loss 1.1438 LearningRate 0.0068 Epoch: 14 Global Step: 246840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:49:59,726-Speed 5154.22 samples/sec Loss 1.1431 LearningRate 0.0068 Epoch: 14 Global Step: 246850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:50:01,707-Speed 5169.76 samples/sec Loss 1.1819 LearningRate 0.0068 Epoch: 14 Global Step: 246860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:50:03,685-Speed 5179.92 samples/sec Loss 1.1695 LearningRate 0.0068 Epoch: 14 Global Step: 246870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:50:05,671-Speed 5156.14 samples/sec Loss 1.1465 LearningRate 0.0068 Epoch: 14 Global Step: 246880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:50:07,649-Speed 5178.67 samples/sec Loss 1.1162 LearningRate 0.0068 Epoch: 14 Global Step: 246890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:50:09,627-Speed 5181.69 samples/sec Loss 1.0912 LearningRate 0.0068 Epoch: 14 Global Step: 246900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:50:11,606-Speed 5177.16 samples/sec Loss 1.1802 LearningRate 0.0068 Epoch: 14 Global Step: 246910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:50:13,613-Speed 5104.72 samples/sec Loss 1.1347 LearningRate 0.0068 Epoch: 14 Global Step: 246920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:50:15,637-Speed 5061.50 samples/sec Loss 1.1498 LearningRate 0.0068 Epoch: 14 Global Step: 246930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:50:17,612-Speed 5183.97 samples/sec Loss 1.1551 LearningRate 0.0068 Epoch: 14 Global Step: 246940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:50:19,605-Speed 5141.88 samples/sec Loss 1.1877 LearningRate 0.0068 Epoch: 14 Global Step: 246950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:50:21,594-Speed 5150.10 samples/sec Loss 1.1584 LearningRate 0.0068 Epoch: 14 Global Step: 246960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:50:23,588-Speed 5137.28 samples/sec Loss 1.1715 LearningRate 0.0068 Epoch: 14 Global Step: 246970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:50:25,620-Speed 5042.08 samples/sec Loss 1.1610 LearningRate 0.0068 Epoch: 14 Global Step: 246980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:50:27,607-Speed 5154.66 samples/sec Loss 1.1357 LearningRate 0.0068 Epoch: 14 Global Step: 246990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:50:29,583-Speed 5185.54 samples/sec Loss 1.1134 LearningRate 0.0068 Epoch: 14 Global Step: 247000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:50:31,551-Speed 5204.54 samples/sec Loss 1.1221 LearningRate 0.0068 Epoch: 14 Global Step: 247010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:50:33,531-Speed 5174.64 samples/sec Loss 1.1588 LearningRate 0.0068 Epoch: 14 Global Step: 247020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:50:35,531-Speed 5121.00 samples/sec Loss 1.1574 LearningRate 0.0068 Epoch: 14 Global Step: 247030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:50:37,512-Speed 5170.31 samples/sec Loss 1.1654 LearningRate 0.0068 Epoch: 14 Global Step: 247040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:50:39,507-Speed 5134.85 samples/sec Loss 1.1075 LearningRate 0.0068 Epoch: 14 Global Step: 247050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:50:41,525-Speed 5077.88 samples/sec Loss 1.1509 LearningRate 0.0068 Epoch: 14 Global Step: 247060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:50:43,502-Speed 5181.04 samples/sec Loss 1.1430 LearningRate 0.0068 Epoch: 14 Global Step: 247070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:50:45,481-Speed 5174.88 samples/sec Loss 1.0940 LearningRate 0.0068 Epoch: 14 Global Step: 247080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:50:47,460-Speed 5176.78 samples/sec Loss 1.1418 LearningRate 0.0068 Epoch: 14 Global Step: 247090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:50:49,460-Speed 5122.17 samples/sec Loss 1.1461 LearningRate 0.0067 Epoch: 14 Global Step: 247100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:50:51,447-Speed 5155.21 samples/sec Loss 1.1415 LearningRate 0.0067 Epoch: 14 Global Step: 247110 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:50:53,438-Speed 5145.77 samples/sec Loss 1.1811 LearningRate 0.0067 Epoch: 14 Global Step: 247120 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:50:55,413-Speed 5186.37 samples/sec Loss 1.0823 LearningRate 0.0067 Epoch: 14 Global Step: 247130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:50:57,397-Speed 5163.71 samples/sec Loss 1.1654 LearningRate 0.0067 Epoch: 14 Global Step: 247140 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:50:59,419-Speed 5064.67 samples/sec Loss 1.1208 LearningRate 0.0067 Epoch: 14 Global Step: 247150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:51:01,415-Speed 5134.33 samples/sec Loss 1.1995 LearningRate 0.0067 Epoch: 14 Global Step: 247160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:51:03,419-Speed 5110.59 samples/sec Loss 1.1772 LearningRate 0.0067 Epoch: 14 Global Step: 247170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:51:05,416-Speed 5130.41 samples/sec Loss 1.1530 LearningRate 0.0067 Epoch: 14 Global Step: 247180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:51:07,408-Speed 5142.67 samples/sec Loss 1.1493 LearningRate 0.0067 Epoch: 14 Global Step: 247190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:51:09,387-Speed 5175.08 samples/sec Loss 1.1467 LearningRate 0.0067 Epoch: 14 Global Step: 247200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:51:11,366-Speed 5176.54 samples/sec Loss 1.1575 LearningRate 0.0067 Epoch: 14 Global Step: 247210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:51:13,351-Speed 5160.83 samples/sec Loss 1.1235 LearningRate 0.0067 Epoch: 14 Global Step: 247220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:51:15,354-Speed 5114.88 samples/sec Loss 1.1476 LearningRate 0.0067 Epoch: 14 Global Step: 247230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:51:17,327-Speed 5190.05 samples/sec Loss 1.1283 LearningRate 0.0067 Epoch: 14 Global Step: 247240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:51:19,304-Speed 5181.83 samples/sec Loss 1.1247 LearningRate 0.0067 Epoch: 14 Global Step: 247250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:51:21,297-Speed 5139.86 samples/sec Loss 1.1600 LearningRate 0.0067 Epoch: 14 Global Step: 247260 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:51:23,292-Speed 5136.56 samples/sec Loss 1.1140 LearningRate 0.0067 Epoch: 14 Global Step: 247270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:51:25,272-Speed 5176.30 samples/sec Loss 1.1561 LearningRate 0.0067 Epoch: 14 Global Step: 247280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:51:27,254-Speed 5170.64 samples/sec Loss 1.1569 LearningRate 0.0067 Epoch: 14 Global Step: 247290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:51:29,255-Speed 5118.82 samples/sec Loss 1.1317 LearningRate 0.0067 Epoch: 14 Global Step: 247300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:51:31,245-Speed 5146.51 samples/sec Loss 1.1459 LearningRate 0.0067 Epoch: 14 Global Step: 247310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:51:33,230-Speed 5161.63 samples/sec Loss 1.1496 LearningRate 0.0067 Epoch: 14 Global Step: 247320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:51:35,213-Speed 5164.78 samples/sec Loss 1.1471 LearningRate 0.0067 Epoch: 14 Global Step: 247330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:51:37,209-Speed 5133.73 samples/sec Loss 1.1057 LearningRate 0.0067 Epoch: 14 Global Step: 247340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:51:39,197-Speed 5151.28 samples/sec Loss 1.1672 LearningRate 0.0067 Epoch: 14 Global Step: 247350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:51:41,210-Speed 5088.59 samples/sec Loss 1.2037 LearningRate 0.0067 Epoch: 14 Global Step: 247360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:51:43,186-Speed 5185.44 samples/sec Loss 1.0901 LearningRate 0.0067 Epoch: 14 Global Step: 247370 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:51:45,165-Speed 5176.53 samples/sec Loss 1.1398 LearningRate 0.0067 Epoch: 14 Global Step: 247380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:51:47,153-Speed 5153.96 samples/sec Loss 1.1437 LearningRate 0.0067 Epoch: 14 Global Step: 247390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:51:49,134-Speed 5171.44 samples/sec Loss 1.1303 LearningRate 0.0067 Epoch: 14 Global Step: 247400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:51:51,121-Speed 5154.56 samples/sec Loss 1.1419 LearningRate 0.0067 Epoch: 14 Global Step: 247410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:51:53,131-Speed 5094.28 samples/sec Loss 1.1448 LearningRate 0.0067 Epoch: 14 Global Step: 247420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:51:55,124-Speed 5142.50 samples/sec Loss 1.1090 LearningRate 0.0067 Epoch: 14 Global Step: 247430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:51:57,106-Speed 5170.50 samples/sec Loss 1.1104 LearningRate 0.0067 Epoch: 14 Global Step: 247440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:51:59,112-Speed 5105.35 samples/sec Loss 1.1479 LearningRate 0.0067 Epoch: 14 Global Step: 247450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:52:01,121-Speed 5099.25 samples/sec Loss 1.1343 LearningRate 0.0067 Epoch: 14 Global Step: 247460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:52:03,114-Speed 5141.01 samples/sec Loss 1.1661 LearningRate 0.0067 Epoch: 14 Global Step: 247470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:52:05,126-Speed 5092.24 samples/sec Loss 1.1625 LearningRate 0.0067 Epoch: 14 Global Step: 247480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:52:07,110-Speed 5162.54 samples/sec Loss 1.1694 LearningRate 0.0067 Epoch: 14 Global Step: 247490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:52:09,081-Speed 5195.25 samples/sec Loss 1.1404 LearningRate 0.0067 Epoch: 14 Global Step: 247500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:52:11,074-Speed 5142.20 samples/sec Loss 1.1430 LearningRate 0.0067 Epoch: 14 Global Step: 247510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:52:13,074-Speed 5119.42 samples/sec Loss 1.1196 LearningRate 0.0067 Epoch: 14 Global Step: 247520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:52:15,074-Speed 5123.81 samples/sec Loss 1.0946 LearningRate 0.0067 Epoch: 14 Global Step: 247530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:52:17,079-Speed 5109.91 samples/sec Loss 1.1764 LearningRate 0.0067 Epoch: 14 Global Step: 247540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:52:19,076-Speed 5128.93 samples/sec Loss 1.1343 LearningRate 0.0067 Epoch: 14 Global Step: 247550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:52:21,050-Speed 5189.22 samples/sec Loss 1.0757 LearningRate 0.0067 Epoch: 14 Global Step: 247560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:52:23,065-Speed 5084.60 samples/sec Loss 1.1259 LearningRate 0.0067 Epoch: 14 Global Step: 247570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:52:25,077-Speed 5090.97 samples/sec Loss 1.1912 LearningRate 0.0067 Epoch: 14 Global Step: 247580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:52:27,101-Speed 5062.75 samples/sec Loss 1.1825 LearningRate 0.0067 Epoch: 14 Global Step: 247590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:52:29,095-Speed 5137.12 samples/sec Loss 1.1133 LearningRate 0.0067 Epoch: 14 Global Step: 247600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:52:31,064-Speed 5203.74 samples/sec Loss 1.1322 LearningRate 0.0067 Epoch: 14 Global Step: 247610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:52:33,040-Speed 5183.50 samples/sec Loss 1.1416 LearningRate 0.0067 Epoch: 14 Global Step: 247620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:52:35,019-Speed 5176.26 samples/sec Loss 1.1325 LearningRate 0.0067 Epoch: 14 Global Step: 247630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:52:37,018-Speed 5125.92 samples/sec Loss 1.1495 LearningRate 0.0067 Epoch: 14 Global Step: 247640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:52:38,999-Speed 5170.81 samples/sec Loss 1.1918 LearningRate 0.0067 Epoch: 14 Global Step: 247650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:52:40,967-Speed 5204.97 samples/sec Loss 1.1500 LearningRate 0.0067 Epoch: 14 Global Step: 247660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:52:42,959-Speed 5140.92 samples/sec Loss 1.1322 LearningRate 0.0067 Epoch: 14 Global Step: 247670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:52:44,940-Speed 5171.78 samples/sec Loss 1.1735 LearningRate 0.0067 Epoch: 14 Global Step: 247680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:52:46,918-Speed 5179.37 samples/sec Loss 1.1895 LearningRate 0.0067 Epoch: 14 Global Step: 247690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:52:48,919-Speed 5117.95 samples/sec Loss 1.1579 LearningRate 0.0067 Epoch: 14 Global Step: 247700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:52:50,896-Speed 5182.62 samples/sec Loss 1.1132 LearningRate 0.0067 Epoch: 14 Global Step: 247710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:52:52,867-Speed 5196.66 samples/sec Loss 1.1274 LearningRate 0.0067 Epoch: 14 Global Step: 247720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:52:54,844-Speed 5180.38 samples/sec Loss 1.1386 LearningRate 0.0067 Epoch: 14 Global Step: 247730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:52:56,854-Speed 5098.47 samples/sec Loss 1.1136 LearningRate 0.0066 Epoch: 14 Global Step: 247740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:52:58,849-Speed 5134.40 samples/sec Loss 1.1339 LearningRate 0.0066 Epoch: 14 Global Step: 247750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:53:00,880-Speed 5042.77 samples/sec Loss 1.1605 LearningRate 0.0066 Epoch: 14 Global Step: 247760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:53:02,922-Speed 5019.60 samples/sec Loss 1.1212 LearningRate 0.0066 Epoch: 14 Global Step: 247770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:53:04,906-Speed 5161.03 samples/sec Loss 1.1798 LearningRate 0.0066 Epoch: 14 Global Step: 247780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:53:06,908-Speed 5117.22 samples/sec Loss 1.1885 LearningRate 0.0066 Epoch: 14 Global Step: 247790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:53:08,909-Speed 5119.75 samples/sec Loss 1.1590 LearningRate 0.0066 Epoch: 14 Global Step: 247800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:53:10,908-Speed 5126.73 samples/sec Loss 1.1396 LearningRate 0.0066 Epoch: 14 Global Step: 247810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:53:12,885-Speed 5180.03 samples/sec Loss 1.2044 LearningRate 0.0066 Epoch: 14 Global Step: 247820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:53:14,865-Speed 5173.40 samples/sec Loss 1.1390 LearningRate 0.0066 Epoch: 14 Global Step: 247830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:53:16,855-Speed 5147.30 samples/sec Loss 1.1421 LearningRate 0.0066 Epoch: 14 Global Step: 247840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:53:18,842-Speed 5156.65 samples/sec Loss 1.1530 LearningRate 0.0066 Epoch: 14 Global Step: 247850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:53:20,820-Speed 5180.17 samples/sec Loss 1.1680 LearningRate 0.0066 Epoch: 14 Global Step: 247860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:53:22,807-Speed 5153.41 samples/sec Loss 1.0866 LearningRate 0.0066 Epoch: 14 Global Step: 247870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:53:24,805-Speed 5128.24 samples/sec Loss 1.1167 LearningRate 0.0066 Epoch: 14 Global Step: 247880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:53:26,784-Speed 5176.48 samples/sec Loss 1.1106 LearningRate 0.0066 Epoch: 14 Global Step: 247890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:53:28,770-Speed 5156.16 samples/sec Loss 1.0955 LearningRate 0.0066 Epoch: 14 Global Step: 247900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:53:30,753-Speed 5167.01 samples/sec Loss 1.1244 LearningRate 0.0066 Epoch: 14 Global Step: 247910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:53:32,737-Speed 5162.47 samples/sec Loss 1.1447 LearningRate 0.0066 Epoch: 14 Global Step: 247920 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:53:34,725-Speed 5153.09 samples/sec Loss 1.1725 LearningRate 0.0066 Epoch: 14 Global Step: 247930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:53:36,720-Speed 5133.61 samples/sec Loss 1.1544 LearningRate 0.0066 Epoch: 14 Global Step: 247940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:53:38,721-Speed 5121.72 samples/sec Loss 1.0984 LearningRate 0.0066 Epoch: 14 Global Step: 247950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:53:40,704-Speed 5164.01 samples/sec Loss 1.1381 LearningRate 0.0066 Epoch: 14 Global Step: 247960 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:53:42,681-Speed 5181.04 samples/sec Loss 1.1585 LearningRate 0.0066 Epoch: 14 Global Step: 247970 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:53:44,665-Speed 5163.21 samples/sec Loss 1.1350 LearningRate 0.0066 Epoch: 14 Global Step: 247980 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:53:46,659-Speed 5138.12 samples/sec Loss 1.0986 LearningRate 0.0066 Epoch: 14 Global Step: 247990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:53:48,657-Speed 5125.88 samples/sec Loss 1.1381 LearningRate 0.0066 Epoch: 14 Global Step: 248000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:54:15,515-[lfw][248000]XNorm: 22.116190 Training: 2022-04-11 15:54:15,516-[lfw][248000]Accuracy-Flip: 0.99800+-0.00296 Training: 2022-04-11 15:54:15,516-[lfw][248000]Accuracy-Highest: 0.99833 Training: 2022-04-11 15:54:46,303-[cfp_fp][248000]XNorm: 21.591691 Training: 2022-04-11 15:54:46,304-[cfp_fp][248000]Accuracy-Flip: 0.98829+-0.00567 Training: 2022-04-11 15:54:46,304-[cfp_fp][248000]Accuracy-Highest: 0.98914 Training: 2022-04-11 15:55:12,928-[agedb_30][248000]XNorm: 22.852472 Training: 2022-04-11 15:55:12,928-[agedb_30][248000]Accuracy-Flip: 0.98200+-0.00718 Training: 2022-04-11 15:55:12,929-[agedb_30][248000]Accuracy-Highest: 0.98300 Training: 2022-04-11 15:55:14,915-Speed 118.72 samples/sec Loss 1.1453 LearningRate 0.0066 Epoch: 14 Global Step: 248010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:55:16,894-Speed 5175.45 samples/sec Loss 1.1168 LearningRate 0.0066 Epoch: 14 Global Step: 248020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:55:18,887-Speed 5139.31 samples/sec Loss 1.1466 LearningRate 0.0066 Epoch: 14 Global Step: 248030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:55:20,848-Speed 5222.90 samples/sec Loss 1.1250 LearningRate 0.0066 Epoch: 14 Global Step: 248040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:55:22,826-Speed 5179.33 samples/sec Loss 1.1452 LearningRate 0.0066 Epoch: 14 Global Step: 248050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:55:24,835-Speed 5098.29 samples/sec Loss 1.1433 LearningRate 0.0066 Epoch: 14 Global Step: 248060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:55:26,822-Speed 5158.13 samples/sec Loss 1.1704 LearningRate 0.0066 Epoch: 14 Global Step: 248070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:55:28,827-Speed 5108.96 samples/sec Loss 1.1217 LearningRate 0.0066 Epoch: 14 Global Step: 248080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:55:30,810-Speed 5165.00 samples/sec Loss 1.1116 LearningRate 0.0066 Epoch: 14 Global Step: 248090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:55:32,784-Speed 5188.96 samples/sec Loss 1.1357 LearningRate 0.0066 Epoch: 14 Global Step: 248100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:55:34,773-Speed 5152.16 samples/sec Loss 1.1747 LearningRate 0.0066 Epoch: 14 Global Step: 248110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:55:36,770-Speed 5128.78 samples/sec Loss 1.1040 LearningRate 0.0066 Epoch: 14 Global Step: 248120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:55:38,747-Speed 5181.51 samples/sec Loss 1.1142 LearningRate 0.0066 Epoch: 14 Global Step: 248130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:55:40,719-Speed 5194.83 samples/sec Loss 1.1530 LearningRate 0.0066 Epoch: 14 Global Step: 248140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:55:42,706-Speed 5155.77 samples/sec Loss 1.1223 LearningRate 0.0066 Epoch: 14 Global Step: 248150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:55:44,690-Speed 5163.99 samples/sec Loss 1.1751 LearningRate 0.0066 Epoch: 14 Global Step: 248160 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:55:46,667-Speed 5180.94 samples/sec Loss 1.1547 LearningRate 0.0066 Epoch: 14 Global Step: 248170 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:55:48,647-Speed 5173.12 samples/sec Loss 1.1515 LearningRate 0.0066 Epoch: 14 Global Step: 248180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:55:50,626-Speed 5175.42 samples/sec Loss 1.1384 LearningRate 0.0066 Epoch: 14 Global Step: 248190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:55:52,617-Speed 5145.46 samples/sec Loss 1.1615 LearningRate 0.0066 Epoch: 14 Global Step: 248200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:55:54,594-Speed 5181.31 samples/sec Loss 1.1389 LearningRate 0.0066 Epoch: 14 Global Step: 248210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:55:56,577-Speed 5168.71 samples/sec Loss 1.1138 LearningRate 0.0066 Epoch: 14 Global Step: 248220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:55:58,567-Speed 5147.09 samples/sec Loss 1.1268 LearningRate 0.0066 Epoch: 14 Global Step: 248230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:56:00,543-Speed 5182.95 samples/sec Loss 1.0856 LearningRate 0.0066 Epoch: 14 Global Step: 248240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:56:02,557-Speed 5086.22 samples/sec Loss 1.1572 LearningRate 0.0066 Epoch: 14 Global Step: 248250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:56:04,542-Speed 5159.71 samples/sec Loss 1.1857 LearningRate 0.0066 Epoch: 14 Global Step: 248260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:56:06,518-Speed 5184.84 samples/sec Loss 1.1732 LearningRate 0.0066 Epoch: 14 Global Step: 248270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:56:08,506-Speed 5153.29 samples/sec Loss 1.1753 LearningRate 0.0066 Epoch: 14 Global Step: 248280 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:56:10,475-Speed 5200.67 samples/sec Loss 1.1132 LearningRate 0.0066 Epoch: 14 Global Step: 248290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:56:12,452-Speed 5182.97 samples/sec Loss 1.1575 LearningRate 0.0066 Epoch: 14 Global Step: 248300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:56:14,441-Speed 5149.13 samples/sec Loss 1.1367 LearningRate 0.0066 Epoch: 14 Global Step: 248310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:56:16,427-Speed 5157.88 samples/sec Loss 1.1655 LearningRate 0.0066 Epoch: 14 Global Step: 248320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:56:18,415-Speed 5154.26 samples/sec Loss 1.1437 LearningRate 0.0066 Epoch: 14 Global Step: 248330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:56:20,391-Speed 5181.99 samples/sec Loss 1.1408 LearningRate 0.0066 Epoch: 14 Global Step: 248340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:56:22,367-Speed 5186.00 samples/sec Loss 1.1374 LearningRate 0.0066 Epoch: 14 Global Step: 248350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:56:24,335-Speed 5203.96 samples/sec Loss 1.1154 LearningRate 0.0066 Epoch: 14 Global Step: 248360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:56:26,312-Speed 5180.79 samples/sec Loss 1.1466 LearningRate 0.0066 Epoch: 14 Global Step: 248370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:56:28,292-Speed 5175.47 samples/sec Loss 1.1203 LearningRate 0.0066 Epoch: 14 Global Step: 248380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:56:30,280-Speed 5151.99 samples/sec Loss 1.0892 LearningRate 0.0065 Epoch: 14 Global Step: 248390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:56:32,263-Speed 5166.13 samples/sec Loss 1.1271 LearningRate 0.0065 Epoch: 14 Global Step: 248400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:56:34,246-Speed 5166.98 samples/sec Loss 1.1424 LearningRate 0.0065 Epoch: 14 Global Step: 248410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:56:36,228-Speed 5166.15 samples/sec Loss 1.1516 LearningRate 0.0065 Epoch: 14 Global Step: 248420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:56:38,208-Speed 5173.61 samples/sec Loss 1.1051 LearningRate 0.0065 Epoch: 14 Global Step: 248430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:56:40,177-Speed 5201.91 samples/sec Loss 1.0897 LearningRate 0.0065 Epoch: 14 Global Step: 248440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:56:42,157-Speed 5173.22 samples/sec Loss 1.1659 LearningRate 0.0065 Epoch: 14 Global Step: 248450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:56:44,131-Speed 5188.92 samples/sec Loss 1.1481 LearningRate 0.0065 Epoch: 14 Global Step: 248460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:56:46,129-Speed 5127.71 samples/sec Loss 1.1477 LearningRate 0.0065 Epoch: 14 Global Step: 248470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:56:48,117-Speed 5155.68 samples/sec Loss 1.1455 LearningRate 0.0065 Epoch: 14 Global Step: 248480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:56:50,104-Speed 5155.77 samples/sec Loss 1.1287 LearningRate 0.0065 Epoch: 14 Global Step: 248490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:56:52,081-Speed 5182.54 samples/sec Loss 1.1443 LearningRate 0.0065 Epoch: 14 Global Step: 248500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:56:54,056-Speed 5185.77 samples/sec Loss 1.1203 LearningRate 0.0065 Epoch: 14 Global Step: 248510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:56:56,035-Speed 5179.00 samples/sec Loss 1.1913 LearningRate 0.0065 Epoch: 14 Global Step: 248520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:56:58,011-Speed 5183.07 samples/sec Loss 1.1439 LearningRate 0.0065 Epoch: 14 Global Step: 248530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:56:59,997-Speed 5156.61 samples/sec Loss 1.1071 LearningRate 0.0065 Epoch: 14 Global Step: 248540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:57:01,988-Speed 5145.21 samples/sec Loss 1.1596 LearningRate 0.0065 Epoch: 14 Global Step: 248550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:57:03,974-Speed 5157.32 samples/sec Loss 1.2119 LearningRate 0.0065 Epoch: 14 Global Step: 248560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:57:05,958-Speed 5163.06 samples/sec Loss 1.1266 LearningRate 0.0065 Epoch: 14 Global Step: 248570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:57:07,934-Speed 5184.79 samples/sec Loss 1.1293 LearningRate 0.0065 Epoch: 14 Global Step: 248580 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:57:09,929-Speed 5133.27 samples/sec Loss 1.1231 LearningRate 0.0065 Epoch: 14 Global Step: 248590 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:57:11,915-Speed 5160.28 samples/sec Loss 1.1557 LearningRate 0.0065 Epoch: 14 Global Step: 248600 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:57:13,903-Speed 5153.73 samples/sec Loss 1.1282 LearningRate 0.0065 Epoch: 14 Global Step: 248610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:57:15,885-Speed 5166.81 samples/sec Loss 1.1436 LearningRate 0.0065 Epoch: 14 Global Step: 248620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:57:17,860-Speed 5188.37 samples/sec Loss 1.1470 LearningRate 0.0065 Epoch: 14 Global Step: 248630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:57:19,842-Speed 5166.18 samples/sec Loss 1.1308 LearningRate 0.0065 Epoch: 14 Global Step: 248640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:57:21,844-Speed 5116.17 samples/sec Loss 1.1593 LearningRate 0.0065 Epoch: 14 Global Step: 248650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:57:23,842-Speed 5127.85 samples/sec Loss 1.1236 LearningRate 0.0065 Epoch: 14 Global Step: 248660 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:57:25,816-Speed 5188.53 samples/sec Loss 1.1489 LearningRate 0.0065 Epoch: 14 Global Step: 248670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:57:27,793-Speed 5181.64 samples/sec Loss 1.1289 LearningRate 0.0065 Epoch: 14 Global Step: 248680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:57:29,782-Speed 5150.67 samples/sec Loss 1.1635 LearningRate 0.0065 Epoch: 14 Global Step: 248690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:57:31,779-Speed 5130.37 samples/sec Loss 1.1815 LearningRate 0.0065 Epoch: 14 Global Step: 248700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:57:33,772-Speed 5138.97 samples/sec Loss 1.0658 LearningRate 0.0065 Epoch: 14 Global Step: 248710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:57:35,752-Speed 5174.86 samples/sec Loss 1.1534 LearningRate 0.0065 Epoch: 14 Global Step: 248720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:57:37,736-Speed 5162.24 samples/sec Loss 1.1560 LearningRate 0.0065 Epoch: 14 Global Step: 248730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:57:39,704-Speed 5205.28 samples/sec Loss 1.1403 LearningRate 0.0065 Epoch: 14 Global Step: 248740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:57:41,686-Speed 5168.51 samples/sec Loss 1.1139 LearningRate 0.0065 Epoch: 14 Global Step: 248750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:57:43,668-Speed 5169.57 samples/sec Loss 1.1584 LearningRate 0.0065 Epoch: 14 Global Step: 248760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:57:45,650-Speed 5165.52 samples/sec Loss 1.1215 LearningRate 0.0065 Epoch: 14 Global Step: 248770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:57:47,651-Speed 5121.93 samples/sec Loss 1.1493 LearningRate 0.0065 Epoch: 14 Global Step: 248780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:57:49,638-Speed 5154.19 samples/sec Loss 1.1348 LearningRate 0.0065 Epoch: 14 Global Step: 248790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:57:51,627-Speed 5148.78 samples/sec Loss 1.1652 LearningRate 0.0065 Epoch: 14 Global Step: 248800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:57:53,617-Speed 5148.01 samples/sec Loss 1.1449 LearningRate 0.0065 Epoch: 14 Global Step: 248810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:57:55,602-Speed 5162.95 samples/sec Loss 1.1945 LearningRate 0.0065 Epoch: 14 Global Step: 248820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:57:57,592-Speed 5148.77 samples/sec Loss 1.1682 LearningRate 0.0065 Epoch: 14 Global Step: 248830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 15:57:59,573-Speed 5169.86 samples/sec Loss 1.1250 LearningRate 0.0065 Epoch: 14 Global Step: 248840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:58:01,555-Speed 5166.74 samples/sec Loss 1.2095 LearningRate 0.0065 Epoch: 14 Global Step: 248850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:58:03,528-Speed 5193.34 samples/sec Loss 1.1185 LearningRate 0.0065 Epoch: 14 Global Step: 248860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:58:05,512-Speed 5162.83 samples/sec Loss 1.1389 LearningRate 0.0065 Epoch: 14 Global Step: 248870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:58:07,490-Speed 5178.98 samples/sec Loss 1.1592 LearningRate 0.0065 Epoch: 14 Global Step: 248880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:58:09,463-Speed 5190.69 samples/sec Loss 1.1364 LearningRate 0.0065 Epoch: 14 Global Step: 248890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:58:11,444-Speed 5172.00 samples/sec Loss 1.1289 LearningRate 0.0065 Epoch: 14 Global Step: 248900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:58:13,427-Speed 5166.86 samples/sec Loss 1.1631 LearningRate 0.0065 Epoch: 14 Global Step: 248910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:58:15,434-Speed 5129.63 samples/sec Loss 1.1769 LearningRate 0.0065 Epoch: 14 Global Step: 248920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:58:17,449-Speed 5085.90 samples/sec Loss 1.1230 LearningRate 0.0065 Epoch: 14 Global Step: 248930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:58:19,424-Speed 5185.59 samples/sec Loss 1.1322 LearningRate 0.0065 Epoch: 14 Global Step: 248940 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:58:21,398-Speed 5190.29 samples/sec Loss 1.1190 LearningRate 0.0065 Epoch: 14 Global Step: 248950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:58:23,376-Speed 5178.27 samples/sec Loss 1.1604 LearningRate 0.0065 Epoch: 14 Global Step: 248960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:58:25,378-Speed 5117.40 samples/sec Loss 1.1456 LearningRate 0.0065 Epoch: 14 Global Step: 248970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:58:27,364-Speed 5157.42 samples/sec Loss 1.1582 LearningRate 0.0065 Epoch: 14 Global Step: 248980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:58:29,346-Speed 5168.59 samples/sec Loss 1.1816 LearningRate 0.0065 Epoch: 14 Global Step: 248990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:58:31,341-Speed 5134.13 samples/sec Loss 1.1485 LearningRate 0.0065 Epoch: 14 Global Step: 249000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:58:33,314-Speed 5193.60 samples/sec Loss 1.0947 LearningRate 0.0065 Epoch: 14 Global Step: 249010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:58:35,296-Speed 5166.84 samples/sec Loss 1.1422 LearningRate 0.0065 Epoch: 14 Global Step: 249020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:58:37,277-Speed 5171.40 samples/sec Loss 1.1155 LearningRate 0.0065 Epoch: 14 Global Step: 249030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:58:39,273-Speed 5131.10 samples/sec Loss 1.1360 LearningRate 0.0065 Epoch: 14 Global Step: 249040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:58:41,256-Speed 5168.41 samples/sec Loss 1.1177 LearningRate 0.0064 Epoch: 14 Global Step: 249050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:58:43,250-Speed 5136.55 samples/sec Loss 1.1501 LearningRate 0.0064 Epoch: 14 Global Step: 249060 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:58:45,221-Speed 5195.89 samples/sec Loss 1.1724 LearningRate 0.0064 Epoch: 14 Global Step: 249070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:58:47,217-Speed 5133.64 samples/sec Loss 1.1000 LearningRate 0.0064 Epoch: 14 Global Step: 249080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:58:49,194-Speed 5179.64 samples/sec Loss 1.1328 LearningRate 0.0064 Epoch: 14 Global Step: 249090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:58:51,178-Speed 5163.74 samples/sec Loss 1.1260 LearningRate 0.0064 Epoch: 14 Global Step: 249100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:58:53,166-Speed 5152.43 samples/sec Loss 1.1657 LearningRate 0.0064 Epoch: 14 Global Step: 249110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:58:55,150-Speed 5163.34 samples/sec Loss 1.1708 LearningRate 0.0064 Epoch: 14 Global Step: 249120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:58:57,165-Speed 5083.79 samples/sec Loss 1.1560 LearningRate 0.0064 Epoch: 14 Global Step: 249130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:58:59,145-Speed 5174.50 samples/sec Loss 1.1392 LearningRate 0.0064 Epoch: 14 Global Step: 249140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:59:01,146-Speed 5118.82 samples/sec Loss 1.1650 LearningRate 0.0064 Epoch: 14 Global Step: 249150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:59:03,140-Speed 5138.32 samples/sec Loss 1.1297 LearningRate 0.0064 Epoch: 14 Global Step: 249160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:59:05,149-Speed 5100.43 samples/sec Loss 1.1086 LearningRate 0.0064 Epoch: 14 Global Step: 249170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:59:07,131-Speed 5167.13 samples/sec Loss 1.0938 LearningRate 0.0064 Epoch: 14 Global Step: 249180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:59:09,110-Speed 5177.02 samples/sec Loss 1.1431 LearningRate 0.0064 Epoch: 14 Global Step: 249190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:59:11,107-Speed 5127.07 samples/sec Loss 1.1452 LearningRate 0.0064 Epoch: 14 Global Step: 249200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:59:13,089-Speed 5169.30 samples/sec Loss 1.1384 LearningRate 0.0064 Epoch: 14 Global Step: 249210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:59:15,076-Speed 5155.75 samples/sec Loss 1.1003 LearningRate 0.0064 Epoch: 14 Global Step: 249220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:59:17,066-Speed 5149.07 samples/sec Loss 1.1318 LearningRate 0.0064 Epoch: 14 Global Step: 249230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:59:19,078-Speed 5090.28 samples/sec Loss 1.1484 LearningRate 0.0064 Epoch: 14 Global Step: 249240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:59:21,092-Speed 5087.95 samples/sec Loss 1.1448 LearningRate 0.0064 Epoch: 14 Global Step: 249250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:59:23,149-Speed 4979.73 samples/sec Loss 1.1554 LearningRate 0.0064 Epoch: 14 Global Step: 249260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:59:25,137-Speed 5151.91 samples/sec Loss 1.1503 LearningRate 0.0064 Epoch: 14 Global Step: 249270 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:59:27,122-Speed 5162.15 samples/sec Loss 1.1124 LearningRate 0.0064 Epoch: 14 Global Step: 249280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:59:29,122-Speed 5120.81 samples/sec Loss 1.1179 LearningRate 0.0064 Epoch: 14 Global Step: 249290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:59:31,109-Speed 5156.58 samples/sec Loss 1.1288 LearningRate 0.0064 Epoch: 14 Global Step: 249300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:59:33,097-Speed 5153.62 samples/sec Loss 1.1570 LearningRate 0.0064 Epoch: 14 Global Step: 249310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:59:35,081-Speed 5163.30 samples/sec Loss 1.1204 LearningRate 0.0064 Epoch: 14 Global Step: 249320 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:59:37,084-Speed 5113.28 samples/sec Loss 1.1241 LearningRate 0.0064 Epoch: 14 Global Step: 249330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:59:39,077-Speed 5138.44 samples/sec Loss 1.0889 LearningRate 0.0064 Epoch: 14 Global Step: 249340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:59:41,068-Speed 5145.98 samples/sec Loss 1.1773 LearningRate 0.0064 Epoch: 14 Global Step: 249350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:59:43,061-Speed 5141.08 samples/sec Loss 1.1263 LearningRate 0.0064 Epoch: 14 Global Step: 249360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:59:45,118-Speed 4980.00 samples/sec Loss 1.1702 LearningRate 0.0064 Epoch: 14 Global Step: 249370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:59:47,113-Speed 5135.64 samples/sec Loss 1.1542 LearningRate 0.0064 Epoch: 14 Global Step: 249380 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:59:49,134-Speed 5069.17 samples/sec Loss 1.1035 LearningRate 0.0064 Epoch: 14 Global Step: 249390 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 15:59:51,136-Speed 5118.29 samples/sec Loss 1.1336 LearningRate 0.0064 Epoch: 14 Global Step: 249400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:59:53,119-Speed 5164.11 samples/sec Loss 1.1744 LearningRate 0.0064 Epoch: 14 Global Step: 249410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:59:55,115-Speed 5131.82 samples/sec Loss 1.1612 LearningRate 0.0064 Epoch: 14 Global Step: 249420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:59:57,105-Speed 5149.21 samples/sec Loss 1.1414 LearningRate 0.0064 Epoch: 14 Global Step: 249430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 15:59:59,103-Speed 5125.28 samples/sec Loss 1.1201 LearningRate 0.0064 Epoch: 14 Global Step: 249440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:00:01,104-Speed 5119.66 samples/sec Loss 1.1689 LearningRate 0.0064 Epoch: 14 Global Step: 249450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:00:03,100-Speed 5133.12 samples/sec Loss 1.1072 LearningRate 0.0064 Epoch: 14 Global Step: 249460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:00:05,102-Speed 5117.86 samples/sec Loss 1.1079 LearningRate 0.0064 Epoch: 14 Global Step: 249470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:00:07,089-Speed 5154.48 samples/sec Loss 1.1393 LearningRate 0.0064 Epoch: 14 Global Step: 249480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:00:09,066-Speed 5182.90 samples/sec Loss 1.1095 LearningRate 0.0064 Epoch: 14 Global Step: 249490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:00:11,057-Speed 5145.05 samples/sec Loss 1.1414 LearningRate 0.0064 Epoch: 14 Global Step: 249500 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:00:13,036-Speed 5173.33 samples/sec Loss 1.1391 LearningRate 0.0064 Epoch: 14 Global Step: 249510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:00:15,029-Speed 5141.72 samples/sec Loss 1.1382 LearningRate 0.0064 Epoch: 14 Global Step: 249520 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:00:17,012-Speed 5164.33 samples/sec Loss 1.1039 LearningRate 0.0064 Epoch: 14 Global Step: 249530 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:00:19,017-Speed 5109.03 samples/sec Loss 1.1622 LearningRate 0.0064 Epoch: 14 Global Step: 249540 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:00:21,003-Speed 5158.15 samples/sec Loss 1.1502 LearningRate 0.0064 Epoch: 14 Global Step: 249550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:00:22,996-Speed 5140.28 samples/sec Loss 1.2002 LearningRate 0.0064 Epoch: 14 Global Step: 249560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:00:24,992-Speed 5131.33 samples/sec Loss 1.1553 LearningRate 0.0064 Epoch: 14 Global Step: 249570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:00:26,984-Speed 5142.89 samples/sec Loss 1.1134 LearningRate 0.0064 Epoch: 14 Global Step: 249580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:00:28,973-Speed 5151.31 samples/sec Loss 1.1285 LearningRate 0.0064 Epoch: 14 Global Step: 249590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:00:30,958-Speed 5158.39 samples/sec Loss 1.1142 LearningRate 0.0064 Epoch: 14 Global Step: 249600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:00:32,941-Speed 5166.33 samples/sec Loss 1.1478 LearningRate 0.0064 Epoch: 14 Global Step: 249610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:00:34,917-Speed 5185.16 samples/sec Loss 1.0948 LearningRate 0.0064 Epoch: 14 Global Step: 249620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:00:36,917-Speed 5119.60 samples/sec Loss 1.1463 LearningRate 0.0064 Epoch: 14 Global Step: 249630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:00:38,908-Speed 5146.48 samples/sec Loss 1.1718 LearningRate 0.0064 Epoch: 14 Global Step: 249640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:00:40,898-Speed 5149.37 samples/sec Loss 1.1182 LearningRate 0.0064 Epoch: 14 Global Step: 249650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:00:42,888-Speed 5145.16 samples/sec Loss 1.1107 LearningRate 0.0064 Epoch: 14 Global Step: 249660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:00:44,890-Speed 5118.33 samples/sec Loss 1.1258 LearningRate 0.0064 Epoch: 14 Global Step: 249670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:00:46,877-Speed 5155.39 samples/sec Loss 1.1606 LearningRate 0.0064 Epoch: 14 Global Step: 249680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:00:48,860-Speed 5163.72 samples/sec Loss 1.0913 LearningRate 0.0064 Epoch: 14 Global Step: 249690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:00:50,845-Speed 5161.73 samples/sec Loss 1.1152 LearningRate 0.0064 Epoch: 14 Global Step: 249700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:00:52,840-Speed 5133.76 samples/sec Loss 1.1519 LearningRate 0.0063 Epoch: 14 Global Step: 249710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:00:54,832-Speed 5143.00 samples/sec Loss 1.1781 LearningRate 0.0063 Epoch: 14 Global Step: 249720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:00:56,818-Speed 5157.85 samples/sec Loss 1.1189 LearningRate 0.0063 Epoch: 14 Global Step: 249730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:00:58,800-Speed 5165.92 samples/sec Loss 1.1667 LearningRate 0.0063 Epoch: 14 Global Step: 249740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:01:00,790-Speed 5149.87 samples/sec Loss 1.1529 LearningRate 0.0063 Epoch: 14 Global Step: 249750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:01:02,786-Speed 5132.02 samples/sec Loss 1.1231 LearningRate 0.0063 Epoch: 14 Global Step: 249760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:01:04,767-Speed 5171.10 samples/sec Loss 1.1527 LearningRate 0.0063 Epoch: 14 Global Step: 249770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:01:06,747-Speed 5173.31 samples/sec Loss 1.1611 LearningRate 0.0063 Epoch: 14 Global Step: 249780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:01:08,735-Speed 5152.49 samples/sec Loss 1.1144 LearningRate 0.0063 Epoch: 14 Global Step: 249790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:01:10,730-Speed 5134.73 samples/sec Loss 1.1450 LearningRate 0.0063 Epoch: 14 Global Step: 249800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:01:12,732-Speed 5115.82 samples/sec Loss 1.1854 LearningRate 0.0063 Epoch: 14 Global Step: 249810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:01:14,710-Speed 5179.49 samples/sec Loss 1.0960 LearningRate 0.0063 Epoch: 14 Global Step: 249820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:01:16,692-Speed 5168.02 samples/sec Loss 1.1435 LearningRate 0.0063 Epoch: 14 Global Step: 249830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:01:18,678-Speed 5156.71 samples/sec Loss 1.1872 LearningRate 0.0063 Epoch: 14 Global Step: 249840 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:01:20,652-Speed 5190.47 samples/sec Loss 1.0993 LearningRate 0.0063 Epoch: 14 Global Step: 249850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:01:22,639-Speed 5155.14 samples/sec Loss 1.1127 LearningRate 0.0063 Epoch: 14 Global Step: 249860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:01:24,633-Speed 5137.55 samples/sec Loss 1.1507 LearningRate 0.0063 Epoch: 14 Global Step: 249870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:01:26,645-Speed 5091.08 samples/sec Loss 1.1172 LearningRate 0.0063 Epoch: 14 Global Step: 249880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:01:28,668-Speed 5064.76 samples/sec Loss 1.1485 LearningRate 0.0063 Epoch: 14 Global Step: 249890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:01:30,664-Speed 5132.03 samples/sec Loss 1.1393 LearningRate 0.0063 Epoch: 14 Global Step: 249900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:01:32,664-Speed 5120.98 samples/sec Loss 1.1251 LearningRate 0.0063 Epoch: 14 Global Step: 249910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:01:34,699-Speed 5036.28 samples/sec Loss 1.1657 LearningRate 0.0063 Epoch: 14 Global Step: 249920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:01:36,682-Speed 5164.01 samples/sec Loss 1.1105 LearningRate 0.0063 Epoch: 14 Global Step: 249930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:01:38,681-Speed 5124.14 samples/sec Loss 1.1787 LearningRate 0.0063 Epoch: 14 Global Step: 249940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:01:40,676-Speed 5136.82 samples/sec Loss 1.0944 LearningRate 0.0063 Epoch: 14 Global Step: 249950 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:01:42,694-Speed 5074.66 samples/sec Loss 1.1500 LearningRate 0.0063 Epoch: 14 Global Step: 249960 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:01:44,679-Speed 5162.66 samples/sec Loss 1.1479 LearningRate 0.0063 Epoch: 14 Global Step: 249970 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:01:46,663-Speed 5160.78 samples/sec Loss 1.1141 LearningRate 0.0063 Epoch: 14 Global Step: 249980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:01:48,658-Speed 5134.89 samples/sec Loss 1.1550 LearningRate 0.0063 Epoch: 14 Global Step: 249990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:01:50,649-Speed 5144.73 samples/sec Loss 1.1435 LearningRate 0.0063 Epoch: 14 Global Step: 250000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:02:17,378-[lfw][250000]XNorm: 22.834659 Training: 2022-04-11 16:02:17,379-[lfw][250000]Accuracy-Flip: 0.99800+-0.00296 Training: 2022-04-11 16:02:17,379-[lfw][250000]Accuracy-Highest: 0.99833 Training: 2022-04-11 16:02:48,327-[cfp_fp][250000]XNorm: 22.407938 Training: 2022-04-11 16:02:48,327-[cfp_fp][250000]Accuracy-Flip: 0.98714+-0.00378 Training: 2022-04-11 16:02:48,328-[cfp_fp][250000]Accuracy-Highest: 0.98914 Training: 2022-04-11 16:03:14,903-[agedb_30][250000]XNorm: 23.596290 Training: 2022-04-11 16:03:14,903-[agedb_30][250000]Accuracy-Flip: 0.98250+-0.00775 Training: 2022-04-11 16:03:14,904-[agedb_30][250000]Accuracy-Highest: 0.98300 Training: 2022-04-11 16:03:16,889-Speed 118.74 samples/sec Loss 1.1122 LearningRate 0.0063 Epoch: 14 Global Step: 250010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:03:18,853-Speed 5214.81 samples/sec Loss 1.1111 LearningRate 0.0063 Epoch: 14 Global Step: 250020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:03:20,830-Speed 5181.04 samples/sec Loss 1.1279 LearningRate 0.0063 Epoch: 14 Global Step: 250030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:03:22,814-Speed 5163.79 samples/sec Loss 1.1634 LearningRate 0.0063 Epoch: 14 Global Step: 250040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:03:24,818-Speed 5112.44 samples/sec Loss 1.1244 LearningRate 0.0063 Epoch: 14 Global Step: 250050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:03:26,793-Speed 5184.33 samples/sec Loss 1.0674 LearningRate 0.0063 Epoch: 14 Global Step: 250060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:03:28,769-Speed 5185.49 samples/sec Loss 1.1316 LearningRate 0.0063 Epoch: 14 Global Step: 250070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:03:30,741-Speed 5194.28 samples/sec Loss 1.1174 LearningRate 0.0063 Epoch: 14 Global Step: 250080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:03:32,712-Speed 5198.61 samples/sec Loss 1.1448 LearningRate 0.0063 Epoch: 14 Global Step: 250090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:03:34,692-Speed 5173.60 samples/sec Loss 1.1541 LearningRate 0.0063 Epoch: 14 Global Step: 250100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:03:36,693-Speed 5117.25 samples/sec Loss 1.1353 LearningRate 0.0063 Epoch: 14 Global Step: 250110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:03:38,673-Speed 5175.26 samples/sec Loss 1.1458 LearningRate 0.0063 Epoch: 14 Global Step: 250120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:03:40,652-Speed 5175.94 samples/sec Loss 1.1442 LearningRate 0.0063 Epoch: 14 Global Step: 250130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:03:42,641-Speed 5148.13 samples/sec Loss 1.1891 LearningRate 0.0063 Epoch: 14 Global Step: 250140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:03:44,622-Speed 5170.81 samples/sec Loss 1.1869 LearningRate 0.0063 Epoch: 14 Global Step: 250150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:03:46,600-Speed 5179.26 samples/sec Loss 1.1237 LearningRate 0.0063 Epoch: 14 Global Step: 250160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:03:48,578-Speed 5179.95 samples/sec Loss 1.1309 LearningRate 0.0063 Epoch: 14 Global Step: 250170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:03:50,557-Speed 5176.44 samples/sec Loss 1.1152 LearningRate 0.0063 Epoch: 14 Global Step: 250180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:03:52,559-Speed 5114.97 samples/sec Loss 1.1445 LearningRate 0.0063 Epoch: 14 Global Step: 250190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:03:54,548-Speed 5150.79 samples/sec Loss 1.1354 LearningRate 0.0063 Epoch: 14 Global Step: 250200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:03:56,536-Speed 5151.75 samples/sec Loss 1.1257 LearningRate 0.0063 Epoch: 14 Global Step: 250210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:03:58,525-Speed 5151.36 samples/sec Loss 1.1549 LearningRate 0.0063 Epoch: 14 Global Step: 250220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:04:00,527-Speed 5115.21 samples/sec Loss 1.1660 LearningRate 0.0063 Epoch: 14 Global Step: 250230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:04:02,515-Speed 5153.12 samples/sec Loss 1.0860 LearningRate 0.0063 Epoch: 14 Global Step: 250240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:04:04,490-Speed 5185.98 samples/sec Loss 1.1341 LearningRate 0.0063 Epoch: 14 Global Step: 250250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:04:06,469-Speed 5178.38 samples/sec Loss 1.1262 LearningRate 0.0063 Epoch: 14 Global Step: 250260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:04:08,442-Speed 5190.81 samples/sec Loss 1.1012 LearningRate 0.0063 Epoch: 14 Global Step: 250270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:04:10,429-Speed 5155.33 samples/sec Loss 1.1265 LearningRate 0.0063 Epoch: 14 Global Step: 250280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:04:12,422-Speed 5140.23 samples/sec Loss 1.1458 LearningRate 0.0063 Epoch: 14 Global Step: 250290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:04:14,402-Speed 5173.37 samples/sec Loss 1.1257 LearningRate 0.0063 Epoch: 14 Global Step: 250300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:04:16,378-Speed 5184.91 samples/sec Loss 1.1616 LearningRate 0.0063 Epoch: 14 Global Step: 250310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:04:18,374-Speed 5131.83 samples/sec Loss 1.1153 LearningRate 0.0063 Epoch: 14 Global Step: 250320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:04:20,349-Speed 5184.94 samples/sec Loss 1.1256 LearningRate 0.0063 Epoch: 14 Global Step: 250330 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:04:22,337-Speed 5153.30 samples/sec Loss 1.1492 LearningRate 0.0063 Epoch: 14 Global Step: 250340 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:04:24,334-Speed 5129.68 samples/sec Loss 1.1437 LearningRate 0.0063 Epoch: 14 Global Step: 250350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:04:26,564-Speed 4592.52 samples/sec Loss 1.0977 LearningRate 0.0063 Epoch: 14 Global Step: 250360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:04:57,631-Speed 329.65 samples/sec Loss 0.9686 LearningRate 0.0062 Epoch: 15 Global Step: 250370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:04:59,899-Speed 4517.62 samples/sec Loss 0.7810 LearningRate 0.0062 Epoch: 15 Global Step: 250380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:05:01,926-Speed 5054.32 samples/sec Loss 0.7449 LearningRate 0.0062 Epoch: 15 Global Step: 250390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:05:03,901-Speed 5186.88 samples/sec Loss 0.7970 LearningRate 0.0062 Epoch: 15 Global Step: 250400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:05:05,932-Speed 5046.08 samples/sec Loss 0.7964 LearningRate 0.0062 Epoch: 15 Global Step: 250410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:05:08,068-Speed 4798.19 samples/sec Loss 0.7871 LearningRate 0.0062 Epoch: 15 Global Step: 250420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:05:10,110-Speed 5015.38 samples/sec Loss 0.7965 LearningRate 0.0062 Epoch: 15 Global Step: 250430 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:05:12,082-Speed 5196.00 samples/sec Loss 0.7734 LearningRate 0.0062 Epoch: 15 Global Step: 250440 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:05:14,104-Speed 5065.38 samples/sec Loss 0.7296 LearningRate 0.0062 Epoch: 15 Global Step: 250450 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:05:16,088-Speed 5164.47 samples/sec Loss 0.7912 LearningRate 0.0062 Epoch: 15 Global Step: 250460 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:05:18,096-Speed 5100.62 samples/sec Loss 0.7788 LearningRate 0.0062 Epoch: 15 Global Step: 250470 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:05:20,093-Speed 5129.75 samples/sec Loss 0.7768 LearningRate 0.0062 Epoch: 15 Global Step: 250480 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:05:22,060-Speed 5208.73 samples/sec Loss 0.7693 LearningRate 0.0062 Epoch: 15 Global Step: 250490 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:05:24,050-Speed 5146.96 samples/sec Loss 0.7489 LearningRate 0.0062 Epoch: 15 Global Step: 250500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:05:26,085-Speed 5035.58 samples/sec Loss 0.7607 LearningRate 0.0062 Epoch: 15 Global Step: 250510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:05:28,059-Speed 5187.61 samples/sec Loss 0.7736 LearningRate 0.0062 Epoch: 15 Global Step: 250520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:05:30,023-Speed 5217.29 samples/sec Loss 0.7648 LearningRate 0.0062 Epoch: 15 Global Step: 250530 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:05:32,018-Speed 5134.62 samples/sec Loss 0.7445 LearningRate 0.0062 Epoch: 15 Global Step: 250540 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:05:34,000-Speed 5169.96 samples/sec Loss 0.7880 LearningRate 0.0062 Epoch: 15 Global Step: 250550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:05:35,969-Speed 5202.10 samples/sec Loss 0.7749 LearningRate 0.0062 Epoch: 15 Global Step: 250560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:05:37,948-Speed 5177.00 samples/sec Loss 0.7756 LearningRate 0.0062 Epoch: 15 Global Step: 250570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:05:39,918-Speed 5200.96 samples/sec Loss 0.7733 LearningRate 0.0062 Epoch: 15 Global Step: 250580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:05:42,124-Speed 4643.33 samples/sec Loss 0.7724 LearningRate 0.0062 Epoch: 15 Global Step: 250590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:05:44,100-Speed 5183.15 samples/sec Loss 0.7669 LearningRate 0.0062 Epoch: 15 Global Step: 250600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:05:46,081-Speed 5170.66 samples/sec Loss 0.7936 LearningRate 0.0062 Epoch: 15 Global Step: 250610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:05:48,077-Speed 5132.34 samples/sec Loss 0.7713 LearningRate 0.0062 Epoch: 15 Global Step: 250620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:05:50,080-Speed 5115.22 samples/sec Loss 0.7887 LearningRate 0.0062 Epoch: 15 Global Step: 250630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:05:52,062-Speed 5170.26 samples/sec Loss 0.7657 LearningRate 0.0062 Epoch: 15 Global Step: 250640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:05:54,083-Speed 5067.78 samples/sec Loss 0.7874 LearningRate 0.0062 Epoch: 15 Global Step: 250650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:05:56,069-Speed 5158.94 samples/sec Loss 0.7777 LearningRate 0.0062 Epoch: 15 Global Step: 250660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:05:58,042-Speed 5191.75 samples/sec Loss 0.7679 LearningRate 0.0062 Epoch: 15 Global Step: 250670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:06:00,022-Speed 5176.00 samples/sec Loss 0.7904 LearningRate 0.0062 Epoch: 15 Global Step: 250680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:06:01,999-Speed 5181.02 samples/sec Loss 0.8067 LearningRate 0.0062 Epoch: 15 Global Step: 250690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:06:03,978-Speed 5175.47 samples/sec Loss 0.7862 LearningRate 0.0062 Epoch: 15 Global Step: 250700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:06:06,008-Speed 5046.80 samples/sec Loss 0.7845 LearningRate 0.0062 Epoch: 15 Global Step: 250710 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:06:07,980-Speed 5193.53 samples/sec Loss 0.7874 LearningRate 0.0062 Epoch: 15 Global Step: 250720 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:06:09,974-Speed 5139.35 samples/sec Loss 0.7395 LearningRate 0.0062 Epoch: 15 Global Step: 250730 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:06:11,963-Speed 5149.57 samples/sec Loss 0.7971 LearningRate 0.0062 Epoch: 15 Global Step: 250740 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:06:13,960-Speed 5127.87 samples/sec Loss 0.7819 LearningRate 0.0062 Epoch: 15 Global Step: 250750 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:06:15,949-Speed 5151.62 samples/sec Loss 0.7750 LearningRate 0.0062 Epoch: 15 Global Step: 250760 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:06:17,956-Speed 5104.25 samples/sec Loss 0.7742 LearningRate 0.0062 Epoch: 15 Global Step: 250770 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:06:19,942-Speed 5156.35 samples/sec Loss 0.8127 LearningRate 0.0062 Epoch: 15 Global Step: 250780 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:06:21,912-Speed 5200.05 samples/sec Loss 0.8406 LearningRate 0.0062 Epoch: 15 Global Step: 250790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:06:23,886-Speed 5190.42 samples/sec Loss 0.7702 LearningRate 0.0062 Epoch: 15 Global Step: 250800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:06:25,935-Speed 4998.69 samples/sec Loss 0.7709 LearningRate 0.0062 Epoch: 15 Global Step: 250810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:06:27,900-Speed 5211.01 samples/sec Loss 0.8239 LearningRate 0.0062 Epoch: 15 Global Step: 250820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:06:29,873-Speed 5193.00 samples/sec Loss 0.7931 LearningRate 0.0062 Epoch: 15 Global Step: 250830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:06:31,844-Speed 5197.05 samples/sec Loss 0.7845 LearningRate 0.0062 Epoch: 15 Global Step: 250840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:06:33,838-Speed 5136.12 samples/sec Loss 0.7406 LearningRate 0.0062 Epoch: 15 Global Step: 250850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:06:35,833-Speed 5135.96 samples/sec Loss 0.7819 LearningRate 0.0062 Epoch: 15 Global Step: 250860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:06:37,809-Speed 5184.11 samples/sec Loss 0.7740 LearningRate 0.0062 Epoch: 15 Global Step: 250870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:06:39,780-Speed 5197.41 samples/sec Loss 0.7787 LearningRate 0.0062 Epoch: 15 Global Step: 250880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:06:41,761-Speed 5171.54 samples/sec Loss 0.7976 LearningRate 0.0062 Epoch: 15 Global Step: 250890 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:06:43,743-Speed 5166.91 samples/sec Loss 0.7803 LearningRate 0.0062 Epoch: 15 Global Step: 250900 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:06:45,725-Speed 5169.21 samples/sec Loss 0.7812 LearningRate 0.0062 Epoch: 15 Global Step: 250910 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:06:47,698-Speed 5192.00 samples/sec Loss 0.7803 LearningRate 0.0062 Epoch: 15 Global Step: 250920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:06:49,698-Speed 5119.47 samples/sec Loss 0.7761 LearningRate 0.0062 Epoch: 15 Global Step: 250930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:06:51,673-Speed 5187.94 samples/sec Loss 0.7966 LearningRate 0.0062 Epoch: 15 Global Step: 250940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:06:53,660-Speed 5154.04 samples/sec Loss 0.7758 LearningRate 0.0062 Epoch: 15 Global Step: 250950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:06:55,638-Speed 5178.71 samples/sec Loss 0.7682 LearningRate 0.0062 Epoch: 15 Global Step: 250960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:06:57,605-Speed 5209.29 samples/sec Loss 0.7744 LearningRate 0.0062 Epoch: 15 Global Step: 250970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:06:59,600-Speed 5133.67 samples/sec Loss 0.7815 LearningRate 0.0062 Epoch: 15 Global Step: 250980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:07:01,600-Speed 5124.65 samples/sec Loss 0.7517 LearningRate 0.0062 Epoch: 15 Global Step: 250990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:07:04,000-Speed 4266.71 samples/sec Loss 0.7719 LearningRate 0.0062 Epoch: 15 Global Step: 251000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:07:06,130-Speed 4810.05 samples/sec Loss 0.7834 LearningRate 0.0062 Epoch: 15 Global Step: 251010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:07:08,109-Speed 5176.17 samples/sec Loss 0.8018 LearningRate 0.0062 Epoch: 15 Global Step: 251020 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:07:10,074-Speed 5213.10 samples/sec Loss 0.7931 LearningRate 0.0062 Epoch: 15 Global Step: 251030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:07:12,052-Speed 5176.17 samples/sec Loss 0.7582 LearningRate 0.0061 Epoch: 15 Global Step: 251040 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:07:14,025-Speed 5192.87 samples/sec Loss 0.7880 LearningRate 0.0061 Epoch: 15 Global Step: 251050 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:07:16,004-Speed 5175.52 samples/sec Loss 0.8093 LearningRate 0.0061 Epoch: 15 Global Step: 251060 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:07:17,974-Speed 5202.67 samples/sec Loss 0.7920 LearningRate 0.0061 Epoch: 15 Global Step: 251070 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:07:19,945-Speed 5194.95 samples/sec Loss 0.7928 LearningRate 0.0061 Epoch: 15 Global Step: 251080 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:07:21,921-Speed 5184.02 samples/sec Loss 0.7808 LearningRate 0.0061 Epoch: 15 Global Step: 251090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:07:23,922-Speed 5121.32 samples/sec Loss 0.8257 LearningRate 0.0061 Epoch: 15 Global Step: 251100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:07:25,901-Speed 5175.15 samples/sec Loss 0.7669 LearningRate 0.0061 Epoch: 15 Global Step: 251110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:07:27,895-Speed 5137.04 samples/sec Loss 0.8181 LearningRate 0.0061 Epoch: 15 Global Step: 251120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:07:29,885-Speed 5146.25 samples/sec Loss 0.7932 LearningRate 0.0061 Epoch: 15 Global Step: 251130 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:07:31,873-Speed 5153.68 samples/sec Loss 0.7870 LearningRate 0.0061 Epoch: 15 Global Step: 251140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:07:33,869-Speed 5130.96 samples/sec Loss 0.7751 LearningRate 0.0061 Epoch: 15 Global Step: 251150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:07:35,872-Speed 5114.92 samples/sec Loss 0.7584 LearningRate 0.0061 Epoch: 15 Global Step: 251160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:07:37,842-Speed 5200.89 samples/sec Loss 0.8199 LearningRate 0.0061 Epoch: 15 Global Step: 251170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:07:39,824-Speed 5167.52 samples/sec Loss 0.8180 LearningRate 0.0061 Epoch: 15 Global Step: 251180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:07:41,796-Speed 5196.06 samples/sec Loss 0.7686 LearningRate 0.0061 Epoch: 15 Global Step: 251190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:07:43,760-Speed 5213.85 samples/sec Loss 0.7684 LearningRate 0.0061 Epoch: 15 Global Step: 251200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:07:45,746-Speed 5158.71 samples/sec Loss 0.8094 LearningRate 0.0061 Epoch: 15 Global Step: 251210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:07:47,724-Speed 5176.90 samples/sec Loss 0.7938 LearningRate 0.0061 Epoch: 15 Global Step: 251220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:07:49,717-Speed 5139.64 samples/sec Loss 0.7844 LearningRate 0.0061 Epoch: 15 Global Step: 251230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:07:51,693-Speed 5185.88 samples/sec Loss 0.7778 LearningRate 0.0061 Epoch: 15 Global Step: 251240 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:07:53,677-Speed 5163.65 samples/sec Loss 0.7643 LearningRate 0.0061 Epoch: 15 Global Step: 251250 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:07:55,667-Speed 5145.60 samples/sec Loss 0.7969 LearningRate 0.0061 Epoch: 15 Global Step: 251260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:07:57,648-Speed 5173.11 samples/sec Loss 0.8099 LearningRate 0.0061 Epoch: 15 Global Step: 251270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:07:59,626-Speed 5177.97 samples/sec Loss 0.7783 LearningRate 0.0061 Epoch: 15 Global Step: 251280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:08:01,597-Speed 5197.28 samples/sec Loss 0.7987 LearningRate 0.0061 Epoch: 15 Global Step: 251290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:08:03,582-Speed 5161.18 samples/sec Loss 0.7755 LearningRate 0.0061 Epoch: 15 Global Step: 251300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:08:05,556-Speed 5187.63 samples/sec Loss 0.7867 LearningRate 0.0061 Epoch: 15 Global Step: 251310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:08:07,537-Speed 5172.07 samples/sec Loss 0.8147 LearningRate 0.0061 Epoch: 15 Global Step: 251320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:08:09,512-Speed 5185.56 samples/sec Loss 0.7823 LearningRate 0.0061 Epoch: 15 Global Step: 251330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:08:11,532-Speed 5071.57 samples/sec Loss 0.8042 LearningRate 0.0061 Epoch: 15 Global Step: 251340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:08:13,527-Speed 5133.77 samples/sec Loss 0.7813 LearningRate 0.0061 Epoch: 15 Global Step: 251350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:08:15,526-Speed 5123.87 samples/sec Loss 0.8173 LearningRate 0.0061 Epoch: 15 Global Step: 251360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-11 16:08:17,526-Speed 5123.13 samples/sec Loss 0.7915 LearningRate 0.0061 Epoch: 15 Global Step: 251370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:08:19,513-Speed 5155.63 samples/sec Loss 0.7797 LearningRate 0.0061 Epoch: 15 Global Step: 251380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:08:21,481-Speed 5204.61 samples/sec Loss 0.7794 LearningRate 0.0061 Epoch: 15 Global Step: 251390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:08:23,475-Speed 5137.40 samples/sec Loss 0.7903 LearningRate 0.0061 Epoch: 15 Global Step: 251400 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:08:25,464-Speed 5150.04 samples/sec Loss 0.7979 LearningRate 0.0061 Epoch: 15 Global Step: 251410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:08:27,452-Speed 5153.13 samples/sec Loss 0.7890 LearningRate 0.0061 Epoch: 15 Global Step: 251420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:08:29,447-Speed 5133.42 samples/sec Loss 0.8057 LearningRate 0.0061 Epoch: 15 Global Step: 251430 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:08:31,421-Speed 5190.98 samples/sec Loss 0.8325 LearningRate 0.0061 Epoch: 15 Global Step: 251440 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:08:33,391-Speed 5198.58 samples/sec Loss 0.7486 LearningRate 0.0061 Epoch: 15 Global Step: 251450 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:08:35,382-Speed 5147.28 samples/sec Loss 0.7930 LearningRate 0.0061 Epoch: 15 Global Step: 251460 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:08:37,375-Speed 5139.95 samples/sec Loss 0.8194 LearningRate 0.0061 Epoch: 15 Global Step: 251470 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:08:39,343-Speed 5204.13 samples/sec Loss 0.8150 LearningRate 0.0061 Epoch: 15 Global Step: 251480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:08:41,314-Speed 5197.06 samples/sec Loss 0.8031 LearningRate 0.0061 Epoch: 15 Global Step: 251490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:08:43,287-Speed 5192.48 samples/sec Loss 0.8113 LearningRate 0.0061 Epoch: 15 Global Step: 251500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:08:45,268-Speed 5169.19 samples/sec Loss 0.7947 LearningRate 0.0061 Epoch: 15 Global Step: 251510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:08:47,283-Speed 5084.83 samples/sec Loss 0.7992 LearningRate 0.0061 Epoch: 15 Global Step: 251520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:08:49,277-Speed 5135.62 samples/sec Loss 0.7888 LearningRate 0.0061 Epoch: 15 Global Step: 251530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:08:51,261-Speed 5165.71 samples/sec Loss 0.7806 LearningRate 0.0061 Epoch: 15 Global Step: 251540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:08:53,231-Speed 5199.06 samples/sec Loss 0.7977 LearningRate 0.0061 Epoch: 15 Global Step: 251550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:08:55,197-Speed 5209.66 samples/sec Loss 0.8326 LearningRate 0.0061 Epoch: 15 Global Step: 251560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:08:57,165-Speed 5206.26 samples/sec Loss 0.8216 LearningRate 0.0061 Epoch: 15 Global Step: 251570 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:08:59,163-Speed 5126.52 samples/sec Loss 0.8107 LearningRate 0.0061 Epoch: 15 Global Step: 251580 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:09:01,155-Speed 5142.33 samples/sec Loss 0.8223 LearningRate 0.0061 Epoch: 15 Global Step: 251590 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:09:03,162-Speed 5105.28 samples/sec Loss 0.8033 LearningRate 0.0061 Epoch: 15 Global Step: 251600 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:09:05,151-Speed 5150.40 samples/sec Loss 0.8100 LearningRate 0.0061 Epoch: 15 Global Step: 251610 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:09:07,123-Speed 5192.73 samples/sec Loss 0.7501 LearningRate 0.0061 Epoch: 15 Global Step: 251620 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:09:09,105-Speed 5167.53 samples/sec Loss 0.7853 LearningRate 0.0061 Epoch: 15 Global Step: 251630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:09:11,093-Speed 5155.21 samples/sec Loss 0.7825 LearningRate 0.0061 Epoch: 15 Global Step: 251640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:09:13,067-Speed 5187.55 samples/sec Loss 0.8250 LearningRate 0.0061 Epoch: 15 Global Step: 251650 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:09:15,035-Speed 5207.13 samples/sec Loss 0.7827 LearningRate 0.0061 Epoch: 15 Global Step: 251660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:09:17,031-Speed 5131.52 samples/sec Loss 0.8014 LearningRate 0.0061 Epoch: 15 Global Step: 251670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:09:19,000-Speed 5202.67 samples/sec Loss 0.7835 LearningRate 0.0061 Epoch: 15 Global Step: 251680 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:09:20,976-Speed 5182.19 samples/sec Loss 0.7741 LearningRate 0.0061 Epoch: 15 Global Step: 251690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:09:22,949-Speed 5193.04 samples/sec Loss 0.8129 LearningRate 0.0061 Epoch: 15 Global Step: 251700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:09:24,926-Speed 5180.82 samples/sec Loss 0.7915 LearningRate 0.0061 Epoch: 15 Global Step: 251710 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:09:26,925-Speed 5123.40 samples/sec Loss 0.8059 LearningRate 0.0060 Epoch: 15 Global Step: 251720 Fp16 Grad Scale: 131072 Required: 6 hours Training: 2022-04-11 16:09:28,919-Speed 5139.29 samples/sec Loss 0.8033 LearningRate 0.0060 Epoch: 15 Global Step: 251730 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:09:30,899-Speed 5172.06 samples/sec Loss 0.7831 LearningRate 0.0060 Epoch: 15 Global Step: 251740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:09:32,894-Speed 5136.71 samples/sec Loss 0.8002 LearningRate 0.0060 Epoch: 15 Global Step: 251750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:09:34,868-Speed 5188.92 samples/sec Loss 0.7963 LearningRate 0.0060 Epoch: 15 Global Step: 251760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:09:36,841-Speed 5192.25 samples/sec Loss 0.7930 LearningRate 0.0060 Epoch: 15 Global Step: 251770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:09:38,816-Speed 5185.85 samples/sec Loss 0.7924 LearningRate 0.0060 Epoch: 15 Global Step: 251780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:09:40,812-Speed 5131.13 samples/sec Loss 0.8041 LearningRate 0.0060 Epoch: 15 Global Step: 251790 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:09:42,784-Speed 5195.24 samples/sec Loss 0.7806 LearningRate 0.0060 Epoch: 15 Global Step: 251800 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:09:44,773-Speed 5149.38 samples/sec Loss 0.7983 LearningRate 0.0060 Epoch: 15 Global Step: 251810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:09:46,766-Speed 5138.78 samples/sec Loss 0.7719 LearningRate 0.0060 Epoch: 15 Global Step: 251820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:09:48,753-Speed 5157.11 samples/sec Loss 0.7533 LearningRate 0.0060 Epoch: 15 Global Step: 251830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:09:50,744-Speed 5145.33 samples/sec Loss 0.8344 LearningRate 0.0060 Epoch: 15 Global Step: 251840 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:09:52,719-Speed 5184.70 samples/sec Loss 0.7756 LearningRate 0.0060 Epoch: 15 Global Step: 251850 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:09:54,719-Speed 5123.86 samples/sec Loss 0.8297 LearningRate 0.0060 Epoch: 15 Global Step: 251860 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:09:56,693-Speed 5190.45 samples/sec Loss 0.7820 LearningRate 0.0060 Epoch: 15 Global Step: 251870 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:09:58,700-Speed 5101.51 samples/sec Loss 0.8066 LearningRate 0.0060 Epoch: 15 Global Step: 251880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:10:00,682-Speed 5169.89 samples/sec Loss 0.7946 LearningRate 0.0060 Epoch: 15 Global Step: 251890 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:10:02,651-Speed 5203.84 samples/sec Loss 0.8218 LearningRate 0.0060 Epoch: 15 Global Step: 251900 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:10:04,618-Speed 5207.23 samples/sec Loss 0.7983 LearningRate 0.0060 Epoch: 15 Global Step: 251910 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:10:06,618-Speed 5121.28 samples/sec Loss 0.7672 LearningRate 0.0060 Epoch: 15 Global Step: 251920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:10:08,598-Speed 5175.31 samples/sec Loss 0.8034 LearningRate 0.0060 Epoch: 15 Global Step: 251930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:10:10,569-Speed 5196.72 samples/sec Loss 0.8526 LearningRate 0.0060 Epoch: 15 Global Step: 251940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:10:12,552-Speed 5167.41 samples/sec Loss 0.8103 LearningRate 0.0060 Epoch: 15 Global Step: 251950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:10:14,538-Speed 5155.62 samples/sec Loss 0.8036 LearningRate 0.0060 Epoch: 15 Global Step: 251960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:10:16,511-Speed 5193.43 samples/sec Loss 0.8111 LearningRate 0.0060 Epoch: 15 Global Step: 251970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:10:18,516-Speed 5107.47 samples/sec Loss 0.8128 LearningRate 0.0060 Epoch: 15 Global Step: 251980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:10:20,487-Speed 5198.70 samples/sec Loss 0.7627 LearningRate 0.0060 Epoch: 15 Global Step: 251990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:10:22,460-Speed 5189.14 samples/sec Loss 0.8308 LearningRate 0.0060 Epoch: 15 Global Step: 252000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:10:49,359-[lfw][252000]XNorm: 22.812655 Training: 2022-04-11 16:10:49,360-[lfw][252000]Accuracy-Flip: 0.99800+-0.00296 Training: 2022-04-11 16:10:49,360-[lfw][252000]Accuracy-Highest: 0.99833 Training: 2022-04-11 16:11:20,195-[cfp_fp][252000]XNorm: 22.492190 Training: 2022-04-11 16:11:20,195-[cfp_fp][252000]Accuracy-Flip: 0.98743+-0.00360 Training: 2022-04-11 16:11:20,196-[cfp_fp][252000]Accuracy-Highest: 0.98914 Training: 2022-04-11 16:11:46,697-[agedb_30][252000]XNorm: 23.609123 Training: 2022-04-11 16:11:46,698-[agedb_30][252000]Accuracy-Flip: 0.98250+-0.00739 Training: 2022-04-11 16:11:46,698-[agedb_30][252000]Accuracy-Highest: 0.98300 Training: 2022-04-11 16:11:48,682-Speed 118.77 samples/sec Loss 0.7820 LearningRate 0.0060 Epoch: 15 Global Step: 252010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:11:50,684-Speed 5116.06 samples/sec Loss 0.7837 LearningRate 0.0060 Epoch: 15 Global Step: 252020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-11 16:11:52,705-Speed 5068.25 samples/sec Loss 0.8103 LearningRate 0.0060 Epoch: 15 Global Step: 252030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:11:54,677-Speed 5193.63 samples/sec Loss 0.7785 LearningRate 0.0060 Epoch: 15 Global Step: 252040 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:11:56,644-Speed 5209.37 samples/sec Loss 0.7703 LearningRate 0.0060 Epoch: 15 Global Step: 252050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:11:58,608-Speed 5214.76 samples/sec Loss 0.8047 LearningRate 0.0060 Epoch: 15 Global Step: 252060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:12:00,580-Speed 5196.82 samples/sec Loss 0.8044 LearningRate 0.0060 Epoch: 15 Global Step: 252070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:12:02,547-Speed 5207.66 samples/sec Loss 0.8153 LearningRate 0.0060 Epoch: 15 Global Step: 252080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:12:04,511-Speed 5217.39 samples/sec Loss 0.8479 LearningRate 0.0060 Epoch: 15 Global Step: 252090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:12:06,473-Speed 5219.86 samples/sec Loss 0.8169 LearningRate 0.0060 Epoch: 15 Global Step: 252100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:12:08,439-Speed 5211.91 samples/sec Loss 0.8146 LearningRate 0.0060 Epoch: 15 Global Step: 252110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:12:10,410-Speed 5195.86 samples/sec Loss 0.8307 LearningRate 0.0060 Epoch: 15 Global Step: 252120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:12:12,378-Speed 5205.43 samples/sec Loss 0.8220 LearningRate 0.0060 Epoch: 15 Global Step: 252130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:12:14,402-Speed 5060.60 samples/sec Loss 0.7941 LearningRate 0.0060 Epoch: 15 Global Step: 252140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:12:16,371-Speed 5203.02 samples/sec Loss 0.7895 LearningRate 0.0060 Epoch: 15 Global Step: 252150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:12:18,361-Speed 5148.13 samples/sec Loss 0.8164 LearningRate 0.0060 Epoch: 15 Global Step: 252160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:12:20,328-Speed 5206.37 samples/sec Loss 0.8370 LearningRate 0.0060 Epoch: 15 Global Step: 252170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:12:22,296-Speed 5205.14 samples/sec Loss 0.7958 LearningRate 0.0060 Epoch: 15 Global Step: 252180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:12:24,269-Speed 5193.52 samples/sec Loss 0.7685 LearningRate 0.0060 Epoch: 15 Global Step: 252190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:12:26,268-Speed 5123.92 samples/sec Loss 0.8058 LearningRate 0.0060 Epoch: 15 Global Step: 252200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:12:28,249-Speed 5170.14 samples/sec Loss 0.8053 LearningRate 0.0060 Epoch: 15 Global Step: 252210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:12:30,216-Speed 5207.78 samples/sec Loss 0.8061 LearningRate 0.0060 Epoch: 15 Global Step: 252220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:12:32,187-Speed 5197.42 samples/sec Loss 0.8394 LearningRate 0.0060 Epoch: 15 Global Step: 252230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:12:34,157-Speed 5198.35 samples/sec Loss 0.8420 LearningRate 0.0060 Epoch: 15 Global Step: 252240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:12:36,143-Speed 5158.20 samples/sec Loss 0.8104 LearningRate 0.0060 Epoch: 15 Global Step: 252250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:12:38,118-Speed 5189.55 samples/sec Loss 0.7900 LearningRate 0.0060 Epoch: 15 Global Step: 252260 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:12:40,096-Speed 5177.82 samples/sec Loss 0.8010 LearningRate 0.0060 Epoch: 15 Global Step: 252270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:12:42,080-Speed 5162.49 samples/sec Loss 0.8262 LearningRate 0.0060 Epoch: 15 Global Step: 252280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:12:44,076-Speed 5133.49 samples/sec Loss 0.8009 LearningRate 0.0060 Epoch: 15 Global Step: 252290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:12:46,048-Speed 5194.06 samples/sec Loss 0.7882 LearningRate 0.0060 Epoch: 15 Global Step: 252300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:12:48,035-Speed 5155.76 samples/sec Loss 0.8280 LearningRate 0.0060 Epoch: 15 Global Step: 252310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:12:50,024-Speed 5152.06 samples/sec Loss 0.8049 LearningRate 0.0060 Epoch: 15 Global Step: 252320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:12:52,012-Speed 5152.14 samples/sec Loss 0.8071 LearningRate 0.0060 Epoch: 15 Global Step: 252330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:12:53,988-Speed 5184.87 samples/sec Loss 0.8269 LearningRate 0.0060 Epoch: 15 Global Step: 252340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:12:55,961-Speed 5191.18 samples/sec Loss 0.8115 LearningRate 0.0060 Epoch: 15 Global Step: 252350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:12:57,945-Speed 5162.21 samples/sec Loss 0.7621 LearningRate 0.0060 Epoch: 15 Global Step: 252360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:12:59,976-Speed 5044.94 samples/sec Loss 0.8242 LearningRate 0.0060 Epoch: 15 Global Step: 252370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:13:01,969-Speed 5139.86 samples/sec Loss 0.8006 LearningRate 0.0060 Epoch: 15 Global Step: 252380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:13:03,944-Speed 5187.52 samples/sec Loss 0.7744 LearningRate 0.0060 Epoch: 15 Global Step: 252390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:13:05,919-Speed 5186.81 samples/sec Loss 0.8020 LearningRate 0.0059 Epoch: 15 Global Step: 252400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:13:07,899-Speed 5172.38 samples/sec Loss 0.7860 LearningRate 0.0059 Epoch: 15 Global Step: 252410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:13:09,893-Speed 5137.35 samples/sec Loss 0.7526 LearningRate 0.0059 Epoch: 15 Global Step: 252420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:13:11,872-Speed 5176.12 samples/sec Loss 0.8227 LearningRate 0.0059 Epoch: 15 Global Step: 252430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:13:13,848-Speed 5182.41 samples/sec Loss 0.8391 LearningRate 0.0059 Epoch: 15 Global Step: 252440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:13:15,861-Speed 5090.56 samples/sec Loss 0.7905 LearningRate 0.0059 Epoch: 15 Global Step: 252450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:13:17,850-Speed 5149.64 samples/sec Loss 0.8331 LearningRate 0.0059 Epoch: 15 Global Step: 252460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:13:19,834-Speed 5164.60 samples/sec Loss 0.7862 LearningRate 0.0059 Epoch: 15 Global Step: 252470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:13:21,810-Speed 5182.38 samples/sec Loss 0.7721 LearningRate 0.0059 Epoch: 15 Global Step: 252480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:13:23,781-Speed 5197.22 samples/sec Loss 0.8261 LearningRate 0.0059 Epoch: 15 Global Step: 252490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:13:25,769-Speed 5153.27 samples/sec Loss 0.8209 LearningRate 0.0059 Epoch: 15 Global Step: 252500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:13:27,742-Speed 5191.64 samples/sec Loss 0.8427 LearningRate 0.0059 Epoch: 15 Global Step: 252510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:13:29,736-Speed 5137.44 samples/sec Loss 0.7786 LearningRate 0.0059 Epoch: 15 Global Step: 252520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:13:31,707-Speed 5196.54 samples/sec Loss 0.8045 LearningRate 0.0059 Epoch: 15 Global Step: 252530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:13:33,688-Speed 5171.79 samples/sec Loss 0.8492 LearningRate 0.0059 Epoch: 15 Global Step: 252540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:13:35,656-Speed 5204.90 samples/sec Loss 0.8218 LearningRate 0.0059 Epoch: 15 Global Step: 252550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:13:37,648-Speed 5143.51 samples/sec Loss 0.8317 LearningRate 0.0059 Epoch: 15 Global Step: 252560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:13:39,624-Speed 5182.53 samples/sec Loss 0.7685 LearningRate 0.0059 Epoch: 15 Global Step: 252570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:13:41,604-Speed 5174.35 samples/sec Loss 0.7977 LearningRate 0.0059 Epoch: 15 Global Step: 252580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:13:43,570-Speed 5210.71 samples/sec Loss 0.8260 LearningRate 0.0059 Epoch: 15 Global Step: 252590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:13:45,556-Speed 5158.67 samples/sec Loss 0.8232 LearningRate 0.0059 Epoch: 15 Global Step: 252600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:13:47,541-Speed 5158.49 samples/sec Loss 0.7930 LearningRate 0.0059 Epoch: 15 Global Step: 252610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:13:49,543-Speed 5116.10 samples/sec Loss 0.8151 LearningRate 0.0059 Epoch: 15 Global Step: 252620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:13:51,546-Speed 5116.74 samples/sec Loss 0.7876 LearningRate 0.0059 Epoch: 15 Global Step: 252630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:13:53,558-Speed 5091.33 samples/sec Loss 0.8172 LearningRate 0.0059 Epoch: 15 Global Step: 252640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:13:55,560-Speed 5114.99 samples/sec Loss 0.8885 LearningRate 0.0059 Epoch: 15 Global Step: 252650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:13:57,574-Speed 5088.17 samples/sec Loss 0.8421 LearningRate 0.0059 Epoch: 15 Global Step: 252660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:13:59,546-Speed 5194.44 samples/sec Loss 0.8202 LearningRate 0.0059 Epoch: 15 Global Step: 252670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:14:01,522-Speed 5183.91 samples/sec Loss 0.8084 LearningRate 0.0059 Epoch: 15 Global Step: 252680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:14:03,500-Speed 5179.63 samples/sec Loss 0.8287 LearningRate 0.0059 Epoch: 15 Global Step: 252690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:14:05,489-Speed 5150.12 samples/sec Loss 0.8108 LearningRate 0.0059 Epoch: 15 Global Step: 252700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:14:07,465-Speed 5184.67 samples/sec Loss 0.8226 LearningRate 0.0059 Epoch: 15 Global Step: 252710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:14:09,436-Speed 5197.94 samples/sec Loss 0.8287 LearningRate 0.0059 Epoch: 15 Global Step: 252720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:14:11,413-Speed 5179.96 samples/sec Loss 0.7879 LearningRate 0.0059 Epoch: 15 Global Step: 252730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:14:13,410-Speed 5129.66 samples/sec Loss 0.8451 LearningRate 0.0059 Epoch: 15 Global Step: 252740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:14:15,393-Speed 5166.24 samples/sec Loss 0.7942 LearningRate 0.0059 Epoch: 15 Global Step: 252750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:14:17,379-Speed 5156.15 samples/sec Loss 0.8379 LearningRate 0.0059 Epoch: 15 Global Step: 252760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:14:19,363-Speed 5162.32 samples/sec Loss 0.8487 LearningRate 0.0059 Epoch: 15 Global Step: 252770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:14:21,356-Speed 5140.21 samples/sec Loss 0.8468 LearningRate 0.0059 Epoch: 15 Global Step: 252780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:14:23,341-Speed 5162.37 samples/sec Loss 0.8229 LearningRate 0.0059 Epoch: 15 Global Step: 252790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:14:25,314-Speed 5190.76 samples/sec Loss 0.8321 LearningRate 0.0059 Epoch: 15 Global Step: 252800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:14:27,284-Speed 5200.56 samples/sec Loss 0.7894 LearningRate 0.0059 Epoch: 15 Global Step: 252810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:14:29,263-Speed 5175.92 samples/sec Loss 0.7856 LearningRate 0.0059 Epoch: 15 Global Step: 252820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:14:31,237-Speed 5190.40 samples/sec Loss 0.8564 LearningRate 0.0059 Epoch: 15 Global Step: 252830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:14:33,213-Speed 5183.48 samples/sec Loss 0.8439 LearningRate 0.0059 Epoch: 15 Global Step: 252840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:14:35,189-Speed 5182.51 samples/sec Loss 0.8244 LearningRate 0.0059 Epoch: 15 Global Step: 252850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:14:37,159-Speed 5200.80 samples/sec Loss 0.8167 LearningRate 0.0059 Epoch: 15 Global Step: 252860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:14:39,138-Speed 5174.92 samples/sec Loss 0.8278 LearningRate 0.0059 Epoch: 15 Global Step: 252870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:14:41,124-Speed 5157.76 samples/sec Loss 0.8543 LearningRate 0.0059 Epoch: 15 Global Step: 252880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:14:43,111-Speed 5156.54 samples/sec Loss 0.8269 LearningRate 0.0059 Epoch: 15 Global Step: 252890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:14:45,079-Speed 5206.00 samples/sec Loss 0.8250 LearningRate 0.0059 Epoch: 15 Global Step: 252900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:14:47,062-Speed 5165.51 samples/sec Loss 0.8447 LearningRate 0.0059 Epoch: 15 Global Step: 252910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:14:49,033-Speed 5197.59 samples/sec Loss 0.8207 LearningRate 0.0059 Epoch: 15 Global Step: 252920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:14:51,040-Speed 5101.69 samples/sec Loss 0.8248 LearningRate 0.0059 Epoch: 15 Global Step: 252930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:14:53,046-Speed 5108.69 samples/sec Loss 0.8171 LearningRate 0.0059 Epoch: 15 Global Step: 252940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:14:55,039-Speed 5139.59 samples/sec Loss 0.8031 LearningRate 0.0059 Epoch: 15 Global Step: 252950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:14:57,042-Speed 5116.15 samples/sec Loss 0.8541 LearningRate 0.0059 Epoch: 15 Global Step: 252960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:14:59,031-Speed 5150.84 samples/sec Loss 0.8463 LearningRate 0.0059 Epoch: 15 Global Step: 252970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:15:01,054-Speed 5063.51 samples/sec Loss 0.8014 LearningRate 0.0059 Epoch: 15 Global Step: 252980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:15:03,037-Speed 5164.89 samples/sec Loss 0.8170 LearningRate 0.0059 Epoch: 15 Global Step: 252990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:15:05,027-Speed 5148.99 samples/sec Loss 0.8083 LearningRate 0.0059 Epoch: 15 Global Step: 253000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:15:07,041-Speed 5087.01 samples/sec Loss 0.8494 LearningRate 0.0059 Epoch: 15 Global Step: 253010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:15:09,041-Speed 5119.60 samples/sec Loss 0.8078 LearningRate 0.0059 Epoch: 15 Global Step: 253020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:15:11,033-Speed 5142.83 samples/sec Loss 0.7852 LearningRate 0.0059 Epoch: 15 Global Step: 253030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:15:13,025-Speed 5142.34 samples/sec Loss 0.8407 LearningRate 0.0059 Epoch: 15 Global Step: 253040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:15:15,010-Speed 5159.82 samples/sec Loss 0.8250 LearningRate 0.0059 Epoch: 15 Global Step: 253050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:15:17,009-Speed 5126.99 samples/sec Loss 0.7901 LearningRate 0.0059 Epoch: 15 Global Step: 253060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:15:18,977-Speed 5204.94 samples/sec Loss 0.8232 LearningRate 0.0059 Epoch: 15 Global Step: 253070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:15:20,973-Speed 5136.85 samples/sec Loss 0.8382 LearningRate 0.0058 Epoch: 15 Global Step: 253080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:15:22,953-Speed 5173.81 samples/sec Loss 0.8295 LearningRate 0.0058 Epoch: 15 Global Step: 253090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:15:24,950-Speed 5129.55 samples/sec Loss 0.7873 LearningRate 0.0058 Epoch: 15 Global Step: 253100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:15:26,933-Speed 5166.15 samples/sec Loss 0.8214 LearningRate 0.0058 Epoch: 15 Global Step: 253110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:15:28,941-Speed 5102.53 samples/sec Loss 0.8020 LearningRate 0.0058 Epoch: 15 Global Step: 253120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:15:30,915-Speed 5191.41 samples/sec Loss 0.8174 LearningRate 0.0058 Epoch: 15 Global Step: 253130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:15:32,895-Speed 5171.66 samples/sec Loss 0.8198 LearningRate 0.0058 Epoch: 15 Global Step: 253140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:15:34,877-Speed 5169.73 samples/sec Loss 0.7987 LearningRate 0.0058 Epoch: 15 Global Step: 253150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:15:36,860-Speed 5166.10 samples/sec Loss 0.8120 LearningRate 0.0058 Epoch: 15 Global Step: 253160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:15:38,862-Speed 5116.60 samples/sec Loss 0.8168 LearningRate 0.0058 Epoch: 15 Global Step: 253170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:15:40,832-Speed 5199.83 samples/sec Loss 0.8729 LearningRate 0.0058 Epoch: 15 Global Step: 253180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:15:42,799-Speed 5208.05 samples/sec Loss 0.8375 LearningRate 0.0058 Epoch: 15 Global Step: 253190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:15:44,775-Speed 5184.45 samples/sec Loss 0.8394 LearningRate 0.0058 Epoch: 15 Global Step: 253200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:15:46,749-Speed 5186.95 samples/sec Loss 0.8101 LearningRate 0.0058 Epoch: 15 Global Step: 253210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:15:48,755-Speed 5106.59 samples/sec Loss 0.8266 LearningRate 0.0058 Epoch: 15 Global Step: 253220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:15:50,749-Speed 5136.87 samples/sec Loss 0.8508 LearningRate 0.0058 Epoch: 15 Global Step: 253230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:15:52,735-Speed 5158.24 samples/sec Loss 0.8061 LearningRate 0.0058 Epoch: 15 Global Step: 253240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:15:54,735-Speed 5124.60 samples/sec Loss 0.8396 LearningRate 0.0058 Epoch: 15 Global Step: 253250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:15:56,717-Speed 5167.27 samples/sec Loss 0.8664 LearningRate 0.0058 Epoch: 15 Global Step: 253260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:15:58,741-Speed 5061.86 samples/sec Loss 0.8330 LearningRate 0.0058 Epoch: 15 Global Step: 253270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:16:00,721-Speed 5173.77 samples/sec Loss 0.8266 LearningRate 0.0058 Epoch: 15 Global Step: 253280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:16:02,728-Speed 5104.35 samples/sec Loss 0.8629 LearningRate 0.0058 Epoch: 15 Global Step: 253290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:16:04,703-Speed 5186.78 samples/sec Loss 0.8389 LearningRate 0.0058 Epoch: 15 Global Step: 253300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:16:06,675-Speed 5195.69 samples/sec Loss 0.7962 LearningRate 0.0058 Epoch: 15 Global Step: 253310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:16:08,669-Speed 5136.76 samples/sec Loss 0.8541 LearningRate 0.0058 Epoch: 15 Global Step: 253320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:16:10,656-Speed 5156.16 samples/sec Loss 0.8299 LearningRate 0.0058 Epoch: 15 Global Step: 253330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:16:12,655-Speed 5125.05 samples/sec Loss 0.7965 LearningRate 0.0058 Epoch: 15 Global Step: 253340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:16:14,639-Speed 5162.70 samples/sec Loss 0.8038 LearningRate 0.0058 Epoch: 15 Global Step: 253350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:16:16,614-Speed 5186.62 samples/sec Loss 0.8716 LearningRate 0.0058 Epoch: 15 Global Step: 253360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:16:18,594-Speed 5171.41 samples/sec Loss 0.8340 LearningRate 0.0058 Epoch: 15 Global Step: 253370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:16:20,585-Speed 5146.80 samples/sec Loss 0.8492 LearningRate 0.0058 Epoch: 15 Global Step: 253380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:16:22,566-Speed 5171.27 samples/sec Loss 0.7908 LearningRate 0.0058 Epoch: 15 Global Step: 253390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:16:24,555-Speed 5151.11 samples/sec Loss 0.8153 LearningRate 0.0058 Epoch: 15 Global Step: 253400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:16:26,545-Speed 5145.76 samples/sec Loss 0.8516 LearningRate 0.0058 Epoch: 15 Global Step: 253410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:16:28,549-Speed 5111.95 samples/sec Loss 0.7794 LearningRate 0.0058 Epoch: 15 Global Step: 253420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:16:30,527-Speed 5180.31 samples/sec Loss 0.8071 LearningRate 0.0058 Epoch: 15 Global Step: 253430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:16:32,506-Speed 5175.64 samples/sec Loss 0.8222 LearningRate 0.0058 Epoch: 15 Global Step: 253440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:16:34,514-Speed 5101.00 samples/sec Loss 0.8117 LearningRate 0.0058 Epoch: 15 Global Step: 253450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:16:36,506-Speed 5142.22 samples/sec Loss 0.7954 LearningRate 0.0058 Epoch: 15 Global Step: 253460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:16:38,478-Speed 5195.88 samples/sec Loss 0.8111 LearningRate 0.0058 Epoch: 15 Global Step: 253470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:16:40,475-Speed 5127.34 samples/sec Loss 0.8072 LearningRate 0.0058 Epoch: 15 Global Step: 253480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:16:42,474-Speed 5127.01 samples/sec Loss 0.8631 LearningRate 0.0058 Epoch: 15 Global Step: 253490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:16:44,461-Speed 5155.71 samples/sec Loss 0.8624 LearningRate 0.0058 Epoch: 15 Global Step: 253500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:16:46,438-Speed 5182.18 samples/sec Loss 0.8231 LearningRate 0.0058 Epoch: 15 Global Step: 253510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:16:48,425-Speed 5155.63 samples/sec Loss 0.8416 LearningRate 0.0058 Epoch: 15 Global Step: 253520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:16:50,446-Speed 5068.53 samples/sec Loss 0.8334 LearningRate 0.0058 Epoch: 15 Global Step: 253530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:16:52,448-Speed 5116.97 samples/sec Loss 0.8217 LearningRate 0.0058 Epoch: 15 Global Step: 253540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:16:54,413-Speed 5212.02 samples/sec Loss 0.8322 LearningRate 0.0058 Epoch: 15 Global Step: 253550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:16:56,385-Speed 5194.67 samples/sec Loss 0.8097 LearningRate 0.0058 Epoch: 15 Global Step: 253560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:16:58,363-Speed 5178.05 samples/sec Loss 0.8264 LearningRate 0.0058 Epoch: 15 Global Step: 253570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:17:00,352-Speed 5150.28 samples/sec Loss 0.8399 LearningRate 0.0058 Epoch: 15 Global Step: 253580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:17:02,337-Speed 5162.12 samples/sec Loss 0.8636 LearningRate 0.0058 Epoch: 15 Global Step: 253590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:17:04,326-Speed 5148.17 samples/sec Loss 0.8672 LearningRate 0.0058 Epoch: 15 Global Step: 253600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:17:06,340-Speed 5087.69 samples/sec Loss 0.8566 LearningRate 0.0058 Epoch: 15 Global Step: 253610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:17:08,325-Speed 5161.44 samples/sec Loss 0.8365 LearningRate 0.0058 Epoch: 15 Global Step: 253620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:17:10,297-Speed 5194.32 samples/sec Loss 0.8129 LearningRate 0.0058 Epoch: 15 Global Step: 253630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:17:12,277-Speed 5173.54 samples/sec Loss 0.8750 LearningRate 0.0058 Epoch: 15 Global Step: 253640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:17:14,261-Speed 5163.21 samples/sec Loss 0.8275 LearningRate 0.0058 Epoch: 15 Global Step: 253650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:17:16,248-Speed 5155.07 samples/sec Loss 0.8218 LearningRate 0.0058 Epoch: 15 Global Step: 253660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:17:18,221-Speed 5191.23 samples/sec Loss 0.8814 LearningRate 0.0058 Epoch: 15 Global Step: 253670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:17:20,192-Speed 5198.39 samples/sec Loss 0.8813 LearningRate 0.0058 Epoch: 15 Global Step: 253680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:17:22,173-Speed 5170.21 samples/sec Loss 0.8149 LearningRate 0.0058 Epoch: 15 Global Step: 253690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:17:24,163-Speed 5147.31 samples/sec Loss 0.8387 LearningRate 0.0058 Epoch: 15 Global Step: 253700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:17:26,157-Speed 5139.36 samples/sec Loss 0.8184 LearningRate 0.0058 Epoch: 15 Global Step: 253710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:17:28,135-Speed 5179.70 samples/sec Loss 0.8301 LearningRate 0.0058 Epoch: 15 Global Step: 253720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:17:30,114-Speed 5173.64 samples/sec Loss 0.8637 LearningRate 0.0058 Epoch: 15 Global Step: 253730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:17:32,111-Speed 5130.57 samples/sec Loss 0.8670 LearningRate 0.0058 Epoch: 15 Global Step: 253740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:17:34,095-Speed 5163.86 samples/sec Loss 0.8469 LearningRate 0.0058 Epoch: 15 Global Step: 253750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:17:36,069-Speed 5188.67 samples/sec Loss 0.8232 LearningRate 0.0058 Epoch: 15 Global Step: 253760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:17:38,058-Speed 5150.16 samples/sec Loss 0.8179 LearningRate 0.0058 Epoch: 15 Global Step: 253770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:17:40,063-Speed 5107.97 samples/sec Loss 0.8016 LearningRate 0.0057 Epoch: 15 Global Step: 253780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:17:42,040-Speed 5181.10 samples/sec Loss 0.8249 LearningRate 0.0057 Epoch: 15 Global Step: 253790 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:17:44,012-Speed 5194.72 samples/sec Loss 0.8347 LearningRate 0.0057 Epoch: 15 Global Step: 253800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:17:45,988-Speed 5186.52 samples/sec Loss 0.8304 LearningRate 0.0057 Epoch: 15 Global Step: 253810 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:17:47,981-Speed 5137.62 samples/sec Loss 0.7894 LearningRate 0.0057 Epoch: 15 Global Step: 253820 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:17:49,954-Speed 5192.94 samples/sec Loss 0.8318 LearningRate 0.0057 Epoch: 15 Global Step: 253830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:17:51,930-Speed 5182.37 samples/sec Loss 0.8657 LearningRate 0.0057 Epoch: 15 Global Step: 253840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:17:53,910-Speed 5173.64 samples/sec Loss 0.8192 LearningRate 0.0057 Epoch: 15 Global Step: 253850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:17:55,884-Speed 5189.17 samples/sec Loss 0.8517 LearningRate 0.0057 Epoch: 15 Global Step: 253860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:17:57,860-Speed 5183.77 samples/sec Loss 0.8201 LearningRate 0.0057 Epoch: 15 Global Step: 253870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:17:59,839-Speed 5176.57 samples/sec Loss 0.8490 LearningRate 0.0057 Epoch: 15 Global Step: 253880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:18:01,825-Speed 5159.75 samples/sec Loss 0.8365 LearningRate 0.0057 Epoch: 15 Global Step: 253890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:18:03,806-Speed 5169.89 samples/sec Loss 0.8381 LearningRate 0.0057 Epoch: 15 Global Step: 253900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:18:05,778-Speed 5194.44 samples/sec Loss 0.8543 LearningRate 0.0057 Epoch: 15 Global Step: 253910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:18:07,753-Speed 5187.48 samples/sec Loss 0.8022 LearningRate 0.0057 Epoch: 15 Global Step: 253920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:18:09,728-Speed 5187.13 samples/sec Loss 0.8204 LearningRate 0.0057 Epoch: 15 Global Step: 253930 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:18:11,706-Speed 5176.69 samples/sec Loss 0.8590 LearningRate 0.0057 Epoch: 15 Global Step: 253940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:18:13,698-Speed 5144.43 samples/sec Loss 0.8399 LearningRate 0.0057 Epoch: 15 Global Step: 253950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:18:15,692-Speed 5137.69 samples/sec Loss 0.8727 LearningRate 0.0057 Epoch: 15 Global Step: 253960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:18:17,701-Speed 5097.95 samples/sec Loss 0.8170 LearningRate 0.0057 Epoch: 15 Global Step: 253970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:18:19,679-Speed 5178.42 samples/sec Loss 0.8440 LearningRate 0.0057 Epoch: 15 Global Step: 253980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:18:21,663-Speed 5162.00 samples/sec Loss 0.8269 LearningRate 0.0057 Epoch: 15 Global Step: 253990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:18:23,654-Speed 5146.74 samples/sec Loss 0.8446 LearningRate 0.0057 Epoch: 15 Global Step: 254000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:18:50,203-[lfw][254000]XNorm: 21.698530 Training: 2022-04-11 16:18:50,203-[lfw][254000]Accuracy-Flip: 0.99783+-0.00289 Training: 2022-04-11 16:18:50,204-[lfw][254000]Accuracy-Highest: 0.99833 Training: 2022-04-11 16:19:21,109-[cfp_fp][254000]XNorm: 21.622808 Training: 2022-04-11 16:19:21,110-[cfp_fp][254000]Accuracy-Flip: 0.98800+-0.00487 Training: 2022-04-11 16:19:21,111-[cfp_fp][254000]Accuracy-Highest: 0.98914 Training: 2022-04-11 16:19:47,783-[agedb_30][254000]XNorm: 22.554634 Training: 2022-04-11 16:19:47,784-[agedb_30][254000]Accuracy-Flip: 0.98200+-0.00710 Training: 2022-04-11 16:19:47,784-[agedb_30][254000]Accuracy-Highest: 0.98300 Training: 2022-04-11 16:19:49,808-Speed 118.86 samples/sec Loss 0.8174 LearningRate 0.0057 Epoch: 15 Global Step: 254010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:19:51,777-Speed 5202.83 samples/sec Loss 0.8568 LearningRate 0.0057 Epoch: 15 Global Step: 254020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:19:53,747-Speed 5200.67 samples/sec Loss 0.8300 LearningRate 0.0057 Epoch: 15 Global Step: 254030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:19:55,711-Speed 5214.77 samples/sec Loss 0.8590 LearningRate 0.0057 Epoch: 15 Global Step: 254040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:19:57,705-Speed 5137.05 samples/sec Loss 0.8370 LearningRate 0.0057 Epoch: 15 Global Step: 254050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:19:59,670-Speed 5212.10 samples/sec Loss 0.8593 LearningRate 0.0057 Epoch: 15 Global Step: 254060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:20:01,639-Speed 5201.97 samples/sec Loss 0.8647 LearningRate 0.0057 Epoch: 15 Global Step: 254070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:20:03,633-Speed 5138.45 samples/sec Loss 0.7947 LearningRate 0.0057 Epoch: 15 Global Step: 254080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:20:05,612-Speed 5176.89 samples/sec Loss 0.8596 LearningRate 0.0057 Epoch: 15 Global Step: 254090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:20:07,576-Speed 5214.39 samples/sec Loss 0.8661 LearningRate 0.0057 Epoch: 15 Global Step: 254100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:20:09,547-Speed 5197.53 samples/sec Loss 0.8585 LearningRate 0.0057 Epoch: 15 Global Step: 254110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:20:11,524-Speed 5180.96 samples/sec Loss 0.8233 LearningRate 0.0057 Epoch: 15 Global Step: 254120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:20:13,509-Speed 5159.40 samples/sec Loss 0.8159 LearningRate 0.0057 Epoch: 15 Global Step: 254130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:20:15,504-Speed 5135.06 samples/sec Loss 0.8530 LearningRate 0.0057 Epoch: 15 Global Step: 254140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:20:17,491-Speed 5156.55 samples/sec Loss 0.8206 LearningRate 0.0057 Epoch: 15 Global Step: 254150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:20:19,467-Speed 5184.04 samples/sec Loss 0.8747 LearningRate 0.0057 Epoch: 15 Global Step: 254160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:20:21,447-Speed 5173.14 samples/sec Loss 0.8656 LearningRate 0.0057 Epoch: 15 Global Step: 254170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:20:23,436-Speed 5151.43 samples/sec Loss 0.8292 LearningRate 0.0057 Epoch: 15 Global Step: 254180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:20:25,456-Speed 5069.50 samples/sec Loss 0.8540 LearningRate 0.0057 Epoch: 15 Global Step: 254190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:20:27,429-Speed 5193.60 samples/sec Loss 0.8893 LearningRate 0.0057 Epoch: 15 Global Step: 254200 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:20:29,448-Speed 5074.73 samples/sec Loss 0.8416 LearningRate 0.0057 Epoch: 15 Global Step: 254210 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:20:31,417-Speed 5202.78 samples/sec Loss 0.8785 LearningRate 0.0057 Epoch: 15 Global Step: 254220 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:20:33,379-Speed 5221.12 samples/sec Loss 0.8129 LearningRate 0.0057 Epoch: 15 Global Step: 254230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:20:35,357-Speed 5177.11 samples/sec Loss 0.8536 LearningRate 0.0057 Epoch: 15 Global Step: 254240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:20:37,358-Speed 5120.90 samples/sec Loss 0.8719 LearningRate 0.0057 Epoch: 15 Global Step: 254250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:20:39,340-Speed 5168.10 samples/sec Loss 0.8865 LearningRate 0.0057 Epoch: 15 Global Step: 254260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:20:41,321-Speed 5171.60 samples/sec Loss 0.8350 LearningRate 0.0057 Epoch: 15 Global Step: 254270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:20:43,290-Speed 5200.97 samples/sec Loss 0.8439 LearningRate 0.0057 Epoch: 15 Global Step: 254280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:20:45,311-Speed 5069.69 samples/sec Loss 0.8705 LearningRate 0.0057 Epoch: 15 Global Step: 254290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:20:47,290-Speed 5177.18 samples/sec Loss 0.8484 LearningRate 0.0057 Epoch: 15 Global Step: 254300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:20:49,267-Speed 5181.09 samples/sec Loss 0.8219 LearningRate 0.0057 Epoch: 15 Global Step: 254310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:20:51,258-Speed 5144.33 samples/sec Loss 0.8772 LearningRate 0.0057 Epoch: 15 Global Step: 254320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:20:53,234-Speed 5185.36 samples/sec Loss 0.8666 LearningRate 0.0057 Epoch: 15 Global Step: 254330 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:20:55,205-Speed 5197.69 samples/sec Loss 0.8458 LearningRate 0.0057 Epoch: 15 Global Step: 254340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:20:57,182-Speed 5181.02 samples/sec Loss 0.8264 LearningRate 0.0057 Epoch: 15 Global Step: 254350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:20:59,208-Speed 5056.26 samples/sec Loss 0.8253 LearningRate 0.0057 Epoch: 15 Global Step: 254360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:21:01,185-Speed 5182.60 samples/sec Loss 0.8836 LearningRate 0.0057 Epoch: 15 Global Step: 254370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:21:03,178-Speed 5140.72 samples/sec Loss 0.8389 LearningRate 0.0057 Epoch: 15 Global Step: 254380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:21:05,204-Speed 5055.47 samples/sec Loss 0.8303 LearningRate 0.0057 Epoch: 15 Global Step: 254390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:21:07,180-Speed 5185.30 samples/sec Loss 0.8570 LearningRate 0.0057 Epoch: 15 Global Step: 254400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:21:09,159-Speed 5176.61 samples/sec Loss 0.8323 LearningRate 0.0057 Epoch: 15 Global Step: 254410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:21:11,131-Speed 5192.11 samples/sec Loss 0.8410 LearningRate 0.0057 Epoch: 15 Global Step: 254420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:21:13,119-Speed 5154.73 samples/sec Loss 0.8494 LearningRate 0.0057 Epoch: 15 Global Step: 254430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:21:15,139-Speed 5070.80 samples/sec Loss 0.8357 LearningRate 0.0057 Epoch: 15 Global Step: 254440 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:21:17,121-Speed 5168.33 samples/sec Loss 0.8615 LearningRate 0.0057 Epoch: 15 Global Step: 254450 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:21:19,094-Speed 5191.53 samples/sec Loss 0.8580 LearningRate 0.0057 Epoch: 15 Global Step: 254460 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:21:21,067-Speed 5192.21 samples/sec Loss 0.8764 LearningRate 0.0057 Epoch: 15 Global Step: 254470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:21:23,037-Speed 5200.58 samples/sec Loss 0.8356 LearningRate 0.0056 Epoch: 15 Global Step: 254480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:21:25,008-Speed 5197.02 samples/sec Loss 0.8399 LearningRate 0.0056 Epoch: 15 Global Step: 254490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:21:26,982-Speed 5187.76 samples/sec Loss 0.8212 LearningRate 0.0056 Epoch: 15 Global Step: 254500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:21:28,973-Speed 5146.37 samples/sec Loss 0.8809 LearningRate 0.0056 Epoch: 15 Global Step: 254510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:21:30,945-Speed 5193.81 samples/sec Loss 0.8629 LearningRate 0.0056 Epoch: 15 Global Step: 254520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:21:32,922-Speed 5179.83 samples/sec Loss 0.9126 LearningRate 0.0056 Epoch: 15 Global Step: 254530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:21:34,922-Speed 5123.15 samples/sec Loss 0.8342 LearningRate 0.0056 Epoch: 15 Global Step: 254540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:21:36,911-Speed 5152.69 samples/sec Loss 0.8511 LearningRate 0.0056 Epoch: 15 Global Step: 254550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:21:38,900-Speed 5149.63 samples/sec Loss 0.8544 LearningRate 0.0056 Epoch: 15 Global Step: 254560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:21:40,872-Speed 5192.90 samples/sec Loss 0.8791 LearningRate 0.0056 Epoch: 15 Global Step: 254570 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:21:42,888-Speed 5082.61 samples/sec Loss 0.8324 LearningRate 0.0056 Epoch: 15 Global Step: 254580 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:21:44,864-Speed 5182.12 samples/sec Loss 0.8574 LearningRate 0.0056 Epoch: 15 Global Step: 254590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:21:46,871-Speed 5105.37 samples/sec Loss 0.8341 LearningRate 0.0056 Epoch: 15 Global Step: 254600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:21:48,868-Speed 5127.44 samples/sec Loss 0.8158 LearningRate 0.0056 Epoch: 15 Global Step: 254610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:21:50,854-Speed 5159.40 samples/sec Loss 0.8552 LearningRate 0.0056 Epoch: 15 Global Step: 254620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:21:52,850-Speed 5131.58 samples/sec Loss 0.8161 LearningRate 0.0056 Epoch: 15 Global Step: 254630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:21:54,826-Speed 5185.61 samples/sec Loss 0.8639 LearningRate 0.0056 Epoch: 15 Global Step: 254640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:21:56,825-Speed 5124.11 samples/sec Loss 0.8543 LearningRate 0.0056 Epoch: 15 Global Step: 254650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:21:58,814-Speed 5152.13 samples/sec Loss 0.8326 LearningRate 0.0056 Epoch: 15 Global Step: 254660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:22:00,803-Speed 5149.66 samples/sec Loss 0.8953 LearningRate 0.0056 Epoch: 15 Global Step: 254670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:22:02,824-Speed 5068.15 samples/sec Loss 0.8757 LearningRate 0.0056 Epoch: 15 Global Step: 254680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:22:04,812-Speed 5154.89 samples/sec Loss 0.8624 LearningRate 0.0056 Epoch: 15 Global Step: 254690 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:22:06,781-Speed 5200.77 samples/sec Loss 0.8688 LearningRate 0.0056 Epoch: 15 Global Step: 254700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:22:08,804-Speed 5064.13 samples/sec Loss 0.8740 LearningRate 0.0056 Epoch: 15 Global Step: 254710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:22:10,780-Speed 5185.40 samples/sec Loss 0.8720 LearningRate 0.0056 Epoch: 15 Global Step: 254720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:22:12,756-Speed 5183.35 samples/sec Loss 0.8471 LearningRate 0.0056 Epoch: 15 Global Step: 254730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:22:14,740-Speed 5161.82 samples/sec Loss 0.8531 LearningRate 0.0056 Epoch: 15 Global Step: 254740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:22:16,738-Speed 5126.33 samples/sec Loss 0.8288 LearningRate 0.0056 Epoch: 15 Global Step: 254750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:22:18,738-Speed 5124.15 samples/sec Loss 0.8018 LearningRate 0.0056 Epoch: 15 Global Step: 254760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:22:20,709-Speed 5196.40 samples/sec Loss 0.8698 LearningRate 0.0056 Epoch: 15 Global Step: 254770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:22:22,685-Speed 5184.08 samples/sec Loss 0.8276 LearningRate 0.0056 Epoch: 15 Global Step: 254780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:22:24,660-Speed 5187.80 samples/sec Loss 0.8378 LearningRate 0.0056 Epoch: 15 Global Step: 254790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:22:26,658-Speed 5125.06 samples/sec Loss 0.9026 LearningRate 0.0056 Epoch: 15 Global Step: 254800 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:22:28,629-Speed 5198.65 samples/sec Loss 0.8692 LearningRate 0.0056 Epoch: 15 Global Step: 254810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:22:30,610-Speed 5171.83 samples/sec Loss 0.8839 LearningRate 0.0056 Epoch: 15 Global Step: 254820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:22:32,589-Speed 5175.12 samples/sec Loss 0.8272 LearningRate 0.0056 Epoch: 15 Global Step: 254830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:22:34,580-Speed 5143.24 samples/sec Loss 0.8200 LearningRate 0.0056 Epoch: 15 Global Step: 254840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:22:36,568-Speed 5154.34 samples/sec Loss 0.8527 LearningRate 0.0056 Epoch: 15 Global Step: 254850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:22:38,545-Speed 5182.83 samples/sec Loss 0.8396 LearningRate 0.0056 Epoch: 15 Global Step: 254860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:22:40,527-Speed 5168.32 samples/sec Loss 0.8482 LearningRate 0.0056 Epoch: 15 Global Step: 254870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:22:42,513-Speed 5156.58 samples/sec Loss 0.8415 LearningRate 0.0056 Epoch: 15 Global Step: 254880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:22:44,490-Speed 5181.75 samples/sec Loss 0.8714 LearningRate 0.0056 Epoch: 15 Global Step: 254890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:22:46,486-Speed 5133.26 samples/sec Loss 0.8461 LearningRate 0.0056 Epoch: 15 Global Step: 254900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:22:48,488-Speed 5116.23 samples/sec Loss 0.8303 LearningRate 0.0056 Epoch: 15 Global Step: 254910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:22:50,530-Speed 5018.31 samples/sec Loss 0.8856 LearningRate 0.0056 Epoch: 15 Global Step: 254920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:22:52,532-Speed 5117.49 samples/sec Loss 0.8495 LearningRate 0.0056 Epoch: 15 Global Step: 254930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:22:54,533-Speed 5118.55 samples/sec Loss 0.8555 LearningRate 0.0056 Epoch: 15 Global Step: 254940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:22:56,562-Speed 5048.56 samples/sec Loss 0.8625 LearningRate 0.0056 Epoch: 15 Global Step: 254950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:22:58,543-Speed 5173.36 samples/sec Loss 0.8880 LearningRate 0.0056 Epoch: 15 Global Step: 254960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:23:00,518-Speed 5186.18 samples/sec Loss 0.8244 LearningRate 0.0056 Epoch: 15 Global Step: 254970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:23:02,511-Speed 5139.30 samples/sec Loss 0.8383 LearningRate 0.0056 Epoch: 15 Global Step: 254980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:23:04,517-Speed 5105.58 samples/sec Loss 0.8482 LearningRate 0.0056 Epoch: 15 Global Step: 254990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:23:06,504-Speed 5158.31 samples/sec Loss 0.8584 LearningRate 0.0056 Epoch: 15 Global Step: 255000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:23:08,483-Speed 5175.29 samples/sec Loss 0.8400 LearningRate 0.0056 Epoch: 15 Global Step: 255010 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:23:10,468-Speed 5160.50 samples/sec Loss 0.8541 LearningRate 0.0056 Epoch: 15 Global Step: 255020 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:23:12,462-Speed 5138.96 samples/sec Loss 0.8639 LearningRate 0.0056 Epoch: 15 Global Step: 255030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:23:14,435-Speed 5191.10 samples/sec Loss 0.8385 LearningRate 0.0056 Epoch: 15 Global Step: 255040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:23:16,425-Speed 5146.42 samples/sec Loss 0.8338 LearningRate 0.0056 Epoch: 15 Global Step: 255050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:23:18,410-Speed 5162.34 samples/sec Loss 0.8678 LearningRate 0.0056 Epoch: 15 Global Step: 255060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:23:20,386-Speed 5183.43 samples/sec Loss 0.8148 LearningRate 0.0056 Epoch: 15 Global Step: 255070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:23:22,357-Speed 5197.16 samples/sec Loss 0.8381 LearningRate 0.0056 Epoch: 15 Global Step: 255080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:23:24,334-Speed 5182.54 samples/sec Loss 0.8564 LearningRate 0.0056 Epoch: 15 Global Step: 255090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:23:26,325-Speed 5145.16 samples/sec Loss 0.8031 LearningRate 0.0056 Epoch: 15 Global Step: 255100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:23:28,302-Speed 5180.76 samples/sec Loss 0.8259 LearningRate 0.0056 Epoch: 15 Global Step: 255110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:23:30,288-Speed 5157.90 samples/sec Loss 0.8069 LearningRate 0.0056 Epoch: 15 Global Step: 255120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:23:32,261-Speed 5193.07 samples/sec Loss 0.8613 LearningRate 0.0056 Epoch: 15 Global Step: 255130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:23:34,237-Speed 5182.26 samples/sec Loss 0.8561 LearningRate 0.0056 Epoch: 15 Global Step: 255140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:23:36,228-Speed 5146.45 samples/sec Loss 0.8383 LearningRate 0.0056 Epoch: 15 Global Step: 255150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:23:38,211-Speed 5165.13 samples/sec Loss 0.9050 LearningRate 0.0056 Epoch: 15 Global Step: 255160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:23:40,201-Speed 5149.05 samples/sec Loss 0.8453 LearningRate 0.0056 Epoch: 15 Global Step: 255170 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:23:42,175-Speed 5189.21 samples/sec Loss 0.8527 LearningRate 0.0055 Epoch: 15 Global Step: 255180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:23:44,141-Speed 5209.15 samples/sec Loss 0.8645 LearningRate 0.0055 Epoch: 15 Global Step: 255190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:23:46,117-Speed 5184.64 samples/sec Loss 0.8723 LearningRate 0.0055 Epoch: 15 Global Step: 255200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:23:48,101-Speed 5162.68 samples/sec Loss 0.8506 LearningRate 0.0055 Epoch: 15 Global Step: 255210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:23:50,120-Speed 5074.48 samples/sec Loss 0.8292 LearningRate 0.0055 Epoch: 15 Global Step: 255220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:23:52,162-Speed 5018.65 samples/sec Loss 0.8502 LearningRate 0.0055 Epoch: 15 Global Step: 255230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:23:54,140-Speed 5178.11 samples/sec Loss 0.8503 LearningRate 0.0055 Epoch: 15 Global Step: 255240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:23:56,127-Speed 5154.98 samples/sec Loss 0.8574 LearningRate 0.0055 Epoch: 15 Global Step: 255250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:23:58,114-Speed 5157.78 samples/sec Loss 0.8505 LearningRate 0.0055 Epoch: 15 Global Step: 255260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:00,099-Speed 5159.63 samples/sec Loss 0.8369 LearningRate 0.0055 Epoch: 15 Global Step: 255270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:02,138-Speed 5024.70 samples/sec Loss 0.8541 LearningRate 0.0055 Epoch: 15 Global Step: 255280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:04,110-Speed 5196.58 samples/sec Loss 0.8310 LearningRate 0.0055 Epoch: 15 Global Step: 255290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:06,102-Speed 5142.81 samples/sec Loss 0.8497 LearningRate 0.0055 Epoch: 15 Global Step: 255300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:08,084-Speed 5168.55 samples/sec Loss 0.8577 LearningRate 0.0055 Epoch: 15 Global Step: 255310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:10,066-Speed 5167.93 samples/sec Loss 0.8164 LearningRate 0.0055 Epoch: 15 Global Step: 255320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:12,074-Speed 5102.61 samples/sec Loss 0.8597 LearningRate 0.0055 Epoch: 15 Global Step: 255330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:14,057-Speed 5165.48 samples/sec Loss 0.8648 LearningRate 0.0055 Epoch: 15 Global Step: 255340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:16,042-Speed 5160.12 samples/sec Loss 0.8846 LearningRate 0.0055 Epoch: 15 Global Step: 255350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:18,037-Speed 5134.70 samples/sec Loss 0.8654 LearningRate 0.0055 Epoch: 15 Global Step: 255360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:20,031-Speed 5137.54 samples/sec Loss 0.8615 LearningRate 0.0055 Epoch: 15 Global Step: 255370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:22,011-Speed 5173.55 samples/sec Loss 0.8617 LearningRate 0.0055 Epoch: 15 Global Step: 255380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:23,984-Speed 5192.54 samples/sec Loss 0.8631 LearningRate 0.0055 Epoch: 15 Global Step: 255390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:24:25,975-Speed 5145.39 samples/sec Loss 0.8476 LearningRate 0.0055 Epoch: 15 Global Step: 255400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:24:27,947-Speed 5194.59 samples/sec Loss 0.8623 LearningRate 0.0055 Epoch: 15 Global Step: 255410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:29,957-Speed 5095.33 samples/sec Loss 0.8737 LearningRate 0.0055 Epoch: 15 Global Step: 255420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:31,927-Speed 5200.08 samples/sec Loss 0.8663 LearningRate 0.0055 Epoch: 15 Global Step: 255430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:33,903-Speed 5185.90 samples/sec Loss 0.8610 LearningRate 0.0055 Epoch: 15 Global Step: 255440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:35,911-Speed 5101.58 samples/sec Loss 0.9066 LearningRate 0.0055 Epoch: 15 Global Step: 255450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:37,885-Speed 5188.52 samples/sec Loss 0.8692 LearningRate 0.0055 Epoch: 15 Global Step: 255460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:39,857-Speed 5195.64 samples/sec Loss 0.8732 LearningRate 0.0055 Epoch: 15 Global Step: 255470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:41,844-Speed 5153.51 samples/sec Loss 0.8749 LearningRate 0.0055 Epoch: 15 Global Step: 255480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:43,829-Speed 5161.21 samples/sec Loss 0.8554 LearningRate 0.0055 Epoch: 15 Global Step: 255490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:45,841-Speed 5091.80 samples/sec Loss 0.8667 LearningRate 0.0055 Epoch: 15 Global Step: 255500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:47,821-Speed 5174.24 samples/sec Loss 0.8462 LearningRate 0.0055 Epoch: 15 Global Step: 255510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:49,857-Speed 5035.53 samples/sec Loss 0.8655 LearningRate 0.0055 Epoch: 15 Global Step: 255520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:51,829-Speed 5193.54 samples/sec Loss 0.8266 LearningRate 0.0055 Epoch: 15 Global Step: 255530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:53,822-Speed 5139.58 samples/sec Loss 0.8548 LearningRate 0.0055 Epoch: 15 Global Step: 255540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:55,800-Speed 5180.04 samples/sec Loss 0.8208 LearningRate 0.0055 Epoch: 15 Global Step: 255550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:57,795-Speed 5133.03 samples/sec Loss 0.8492 LearningRate 0.0055 Epoch: 15 Global Step: 255560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:24:59,801-Speed 5109.69 samples/sec Loss 0.8449 LearningRate 0.0055 Epoch: 15 Global Step: 255570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:25:01,798-Speed 5129.42 samples/sec Loss 0.8779 LearningRate 0.0055 Epoch: 15 Global Step: 255580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:25:03,796-Speed 5125.17 samples/sec Loss 0.8854 LearningRate 0.0055 Epoch: 15 Global Step: 255590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:25:05,776-Speed 5175.18 samples/sec Loss 0.9003 LearningRate 0.0055 Epoch: 15 Global Step: 255600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:25:07,772-Speed 5131.65 samples/sec Loss 0.8168 LearningRate 0.0055 Epoch: 15 Global Step: 255610 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:25:09,744-Speed 5195.71 samples/sec Loss 0.8324 LearningRate 0.0055 Epoch: 15 Global Step: 255620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:25:11,720-Speed 5184.21 samples/sec Loss 0.8805 LearningRate 0.0055 Epoch: 15 Global Step: 255630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:25:13,697-Speed 5179.54 samples/sec Loss 0.8323 LearningRate 0.0055 Epoch: 15 Global Step: 255640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:25:15,671-Speed 5190.90 samples/sec Loss 0.8878 LearningRate 0.0055 Epoch: 15 Global Step: 255650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:25:17,653-Speed 5167.87 samples/sec Loss 0.8557 LearningRate 0.0055 Epoch: 15 Global Step: 255660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:25:19,655-Speed 5117.18 samples/sec Loss 0.8935 LearningRate 0.0055 Epoch: 15 Global Step: 255670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:25:21,632-Speed 5183.27 samples/sec Loss 0.8872 LearningRate 0.0055 Epoch: 15 Global Step: 255680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:25:23,635-Speed 5114.58 samples/sec Loss 0.8405 LearningRate 0.0055 Epoch: 15 Global Step: 255690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:25:25,611-Speed 5183.86 samples/sec Loss 0.8484 LearningRate 0.0055 Epoch: 15 Global Step: 255700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:25:27,598-Speed 5153.94 samples/sec Loss 0.8929 LearningRate 0.0055 Epoch: 15 Global Step: 255710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:25:29,593-Speed 5137.27 samples/sec Loss 0.8529 LearningRate 0.0055 Epoch: 15 Global Step: 255720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:25:31,572-Speed 5176.49 samples/sec Loss 0.8631 LearningRate 0.0055 Epoch: 15 Global Step: 255730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:25:33,550-Speed 5177.51 samples/sec Loss 0.8784 LearningRate 0.0055 Epoch: 15 Global Step: 255740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:25:35,546-Speed 5132.48 samples/sec Loss 0.8314 LearningRate 0.0055 Epoch: 15 Global Step: 255750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:25:37,533-Speed 5155.48 samples/sec Loss 0.8745 LearningRate 0.0055 Epoch: 15 Global Step: 255760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:25:39,508-Speed 5186.53 samples/sec Loss 0.9037 LearningRate 0.0055 Epoch: 15 Global Step: 255770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:25:41,497-Speed 5151.86 samples/sec Loss 0.8633 LearningRate 0.0055 Epoch: 15 Global Step: 255780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:25:43,468-Speed 5197.23 samples/sec Loss 0.8779 LearningRate 0.0055 Epoch: 15 Global Step: 255790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:25:45,445-Speed 5180.61 samples/sec Loss 0.8359 LearningRate 0.0055 Epoch: 15 Global Step: 255800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:25:47,436-Speed 5144.94 samples/sec Loss 0.8651 LearningRate 0.0055 Epoch: 15 Global Step: 255810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:25:49,439-Speed 5114.23 samples/sec Loss 0.8773 LearningRate 0.0055 Epoch: 15 Global Step: 255820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:25:51,434-Speed 5136.93 samples/sec Loss 0.8780 LearningRate 0.0055 Epoch: 15 Global Step: 255830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:25:53,459-Speed 5057.68 samples/sec Loss 0.8598 LearningRate 0.0055 Epoch: 15 Global Step: 255840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:25:55,468-Speed 5099.96 samples/sec Loss 0.8641 LearningRate 0.0055 Epoch: 15 Global Step: 255850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:25:57,446-Speed 5179.75 samples/sec Loss 0.8762 LearningRate 0.0055 Epoch: 15 Global Step: 255860 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:25:59,422-Speed 5184.57 samples/sec Loss 0.8861 LearningRate 0.0055 Epoch: 15 Global Step: 255870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:26:01,442-Speed 5070.51 samples/sec Loss 0.8384 LearningRate 0.0055 Epoch: 15 Global Step: 255880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:26:03,428-Speed 5158.78 samples/sec Loss 0.9060 LearningRate 0.0054 Epoch: 15 Global Step: 255890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:26:05,435-Speed 5105.15 samples/sec Loss 0.8762 LearningRate 0.0054 Epoch: 15 Global Step: 255900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:26:07,408-Speed 5190.79 samples/sec Loss 0.8640 LearningRate 0.0054 Epoch: 15 Global Step: 255910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:26:09,402-Speed 5137.13 samples/sec Loss 0.8715 LearningRate 0.0054 Epoch: 15 Global Step: 255920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:26:11,419-Speed 5080.75 samples/sec Loss 0.8540 LearningRate 0.0054 Epoch: 15 Global Step: 255930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:26:13,417-Speed 5128.44 samples/sec Loss 0.8439 LearningRate 0.0054 Epoch: 15 Global Step: 255940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:26:15,421-Speed 5110.69 samples/sec Loss 0.8537 LearningRate 0.0054 Epoch: 15 Global Step: 255950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:26:17,403-Speed 5169.67 samples/sec Loss 0.8601 LearningRate 0.0054 Epoch: 15 Global Step: 255960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:26:19,407-Speed 5110.17 samples/sec Loss 0.8796 LearningRate 0.0054 Epoch: 15 Global Step: 255970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:26:21,398-Speed 5148.55 samples/sec Loss 0.8680 LearningRate 0.0054 Epoch: 15 Global Step: 255980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:26:23,378-Speed 5172.34 samples/sec Loss 0.8717 LearningRate 0.0054 Epoch: 15 Global Step: 255990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:26:25,358-Speed 5173.98 samples/sec Loss 0.8732 LearningRate 0.0054 Epoch: 15 Global Step: 256000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:26:52,133-[lfw][256000]XNorm: 21.466629 Training: 2022-04-11 16:26:52,133-[lfw][256000]Accuracy-Flip: 0.99783+-0.00289 Training: 2022-04-11 16:26:52,134-[lfw][256000]Accuracy-Highest: 0.99833 Training: 2022-04-11 16:27:22,909-[cfp_fp][256000]XNorm: 21.373191 Training: 2022-04-11 16:27:22,909-[cfp_fp][256000]Accuracy-Flip: 0.98843+-0.00454 Training: 2022-04-11 16:27:22,910-[cfp_fp][256000]Accuracy-Highest: 0.98914 Training: 2022-04-11 16:27:49,428-[agedb_30][256000]XNorm: 22.334925 Training: 2022-04-11 16:27:49,429-[agedb_30][256000]Accuracy-Flip: 0.98200+-0.00756 Training: 2022-04-11 16:27:49,429-[agedb_30][256000]Accuracy-Highest: 0.98300 Training: 2022-04-11 16:27:51,422-Speed 118.98 samples/sec Loss 0.9205 LearningRate 0.0054 Epoch: 15 Global Step: 256010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:27:53,412-Speed 5146.71 samples/sec Loss 0.9369 LearningRate 0.0054 Epoch: 15 Global Step: 256020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:27:55,397-Speed 5160.87 samples/sec Loss 0.8616 LearningRate 0.0054 Epoch: 15 Global Step: 256030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:27:57,366-Speed 5204.09 samples/sec Loss 0.8589 LearningRate 0.0054 Epoch: 15 Global Step: 256040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:27:59,345-Speed 5175.53 samples/sec Loss 0.8913 LearningRate 0.0054 Epoch: 15 Global Step: 256050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:28:01,356-Speed 5095.37 samples/sec Loss 0.8518 LearningRate 0.0054 Epoch: 15 Global Step: 256060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:28:03,325-Speed 5201.55 samples/sec Loss 0.8899 LearningRate 0.0054 Epoch: 15 Global Step: 256070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:28:05,292-Speed 5206.88 samples/sec Loss 0.8281 LearningRate 0.0054 Epoch: 15 Global Step: 256080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:28:07,256-Speed 5216.05 samples/sec Loss 0.8394 LearningRate 0.0054 Epoch: 15 Global Step: 256090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:28:09,240-Speed 5164.40 samples/sec Loss 0.8616 LearningRate 0.0054 Epoch: 15 Global Step: 256100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:28:11,233-Speed 5140.32 samples/sec Loss 0.8227 LearningRate 0.0054 Epoch: 15 Global Step: 256110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:28:13,207-Speed 5188.81 samples/sec Loss 0.8900 LearningRate 0.0054 Epoch: 15 Global Step: 256120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:28:15,178-Speed 5196.16 samples/sec Loss 0.8759 LearningRate 0.0054 Epoch: 15 Global Step: 256130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:28:17,148-Speed 5201.24 samples/sec Loss 0.8662 LearningRate 0.0054 Epoch: 15 Global Step: 256140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:28:19,119-Speed 5195.36 samples/sec Loss 0.8593 LearningRate 0.0054 Epoch: 15 Global Step: 256150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:28:21,099-Speed 5173.84 samples/sec Loss 0.9020 LearningRate 0.0054 Epoch: 15 Global Step: 256160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:28:23,096-Speed 5129.94 samples/sec Loss 0.8917 LearningRate 0.0054 Epoch: 15 Global Step: 256170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:28:25,095-Speed 5125.60 samples/sec Loss 0.9086 LearningRate 0.0054 Epoch: 15 Global Step: 256180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:28:27,070-Speed 5185.18 samples/sec Loss 0.8662 LearningRate 0.0054 Epoch: 15 Global Step: 256190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:28:29,056-Speed 5159.59 samples/sec Loss 0.8671 LearningRate 0.0054 Epoch: 15 Global Step: 256200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:28:31,028-Speed 5193.22 samples/sec Loss 0.8520 LearningRate 0.0054 Epoch: 15 Global Step: 256210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:28:32,997-Speed 5202.76 samples/sec Loss 0.8568 LearningRate 0.0054 Epoch: 15 Global Step: 256220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:28:34,974-Speed 5183.19 samples/sec Loss 0.8511 LearningRate 0.0054 Epoch: 15 Global Step: 256230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:28:36,983-Speed 5097.80 samples/sec Loss 0.9452 LearningRate 0.0054 Epoch: 15 Global Step: 256240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:28:38,993-Speed 5096.80 samples/sec Loss 0.9051 LearningRate 0.0054 Epoch: 15 Global Step: 256250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:28:40,987-Speed 5138.10 samples/sec Loss 0.8998 LearningRate 0.0054 Epoch: 15 Global Step: 256260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:28:42,968-Speed 5172.67 samples/sec Loss 0.8944 LearningRate 0.0054 Epoch: 15 Global Step: 256270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:28:44,962-Speed 5138.70 samples/sec Loss 0.8747 LearningRate 0.0054 Epoch: 15 Global Step: 256280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:28:46,967-Speed 5109.44 samples/sec Loss 0.9083 LearningRate 0.0054 Epoch: 15 Global Step: 256290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:28:48,948-Speed 5169.76 samples/sec Loss 0.8909 LearningRate 0.0054 Epoch: 15 Global Step: 256300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:28:50,925-Speed 5181.35 samples/sec Loss 0.8260 LearningRate 0.0054 Epoch: 15 Global Step: 256310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:28:52,893-Speed 5204.52 samples/sec Loss 0.8391 LearningRate 0.0054 Epoch: 15 Global Step: 256320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:28:54,883-Speed 5148.64 samples/sec Loss 0.8579 LearningRate 0.0054 Epoch: 15 Global Step: 256330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:28:56,871-Speed 5154.01 samples/sec Loss 0.8215 LearningRate 0.0054 Epoch: 15 Global Step: 256340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:28:58,847-Speed 5182.48 samples/sec Loss 0.8634 LearningRate 0.0054 Epoch: 15 Global Step: 256350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:29:00,865-Speed 5077.42 samples/sec Loss 0.9084 LearningRate 0.0054 Epoch: 15 Global Step: 256360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:29:02,848-Speed 5165.54 samples/sec Loss 0.8785 LearningRate 0.0054 Epoch: 15 Global Step: 256370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:29:04,835-Speed 5155.57 samples/sec Loss 0.8764 LearningRate 0.0054 Epoch: 15 Global Step: 256380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:29:06,818-Speed 5166.23 samples/sec Loss 0.8443 LearningRate 0.0054 Epoch: 15 Global Step: 256390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:29:08,791-Speed 5194.38 samples/sec Loss 0.8788 LearningRate 0.0054 Epoch: 15 Global Step: 256400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:29:10,775-Speed 5161.34 samples/sec Loss 0.8988 LearningRate 0.0054 Epoch: 15 Global Step: 256410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:29:12,769-Speed 5136.32 samples/sec Loss 0.8713 LearningRate 0.0054 Epoch: 15 Global Step: 256420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:29:14,758-Speed 5152.21 samples/sec Loss 0.8726 LearningRate 0.0054 Epoch: 15 Global Step: 256430 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:29:16,736-Speed 5178.22 samples/sec Loss 0.8747 LearningRate 0.0054 Epoch: 15 Global Step: 256440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:29:18,712-Speed 5185.86 samples/sec Loss 0.8828 LearningRate 0.0054 Epoch: 15 Global Step: 256450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:29:20,700-Speed 5151.21 samples/sec Loss 0.8769 LearningRate 0.0054 Epoch: 15 Global Step: 256460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:29:22,676-Speed 5185.13 samples/sec Loss 0.8571 LearningRate 0.0054 Epoch: 15 Global Step: 256470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:29:24,663-Speed 5153.70 samples/sec Loss 0.8760 LearningRate 0.0054 Epoch: 15 Global Step: 256480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:29:26,646-Speed 5166.56 samples/sec Loss 0.8521 LearningRate 0.0054 Epoch: 15 Global Step: 256490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:29:28,639-Speed 5138.37 samples/sec Loss 0.8510 LearningRate 0.0054 Epoch: 15 Global Step: 256500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:29:30,619-Speed 5175.62 samples/sec Loss 0.8526 LearningRate 0.0054 Epoch: 15 Global Step: 256510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:29:32,592-Speed 5192.05 samples/sec Loss 0.8270 LearningRate 0.0054 Epoch: 15 Global Step: 256520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:29:34,607-Speed 5083.37 samples/sec Loss 0.8606 LearningRate 0.0054 Epoch: 15 Global Step: 256530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:29:36,581-Speed 5189.74 samples/sec Loss 0.8755 LearningRate 0.0054 Epoch: 15 Global Step: 256540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:29:38,566-Speed 5160.90 samples/sec Loss 0.8796 LearningRate 0.0054 Epoch: 15 Global Step: 256550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:29:40,554-Speed 5152.71 samples/sec Loss 0.8445 LearningRate 0.0054 Epoch: 15 Global Step: 256560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:29:42,531-Speed 5180.37 samples/sec Loss 0.8939 LearningRate 0.0054 Epoch: 15 Global Step: 256570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:29:44,502-Speed 5196.30 samples/sec Loss 0.8651 LearningRate 0.0054 Epoch: 15 Global Step: 256580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:29:46,496-Speed 5139.29 samples/sec Loss 0.8661 LearningRate 0.0054 Epoch: 15 Global Step: 256590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:29:48,474-Speed 5176.39 samples/sec Loss 0.9026 LearningRate 0.0054 Epoch: 15 Global Step: 256600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:29:50,463-Speed 5152.61 samples/sec Loss 0.8762 LearningRate 0.0053 Epoch: 15 Global Step: 256610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:29:52,457-Speed 5136.04 samples/sec Loss 0.8936 LearningRate 0.0053 Epoch: 15 Global Step: 256620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:29:54,445-Speed 5153.36 samples/sec Loss 0.8935 LearningRate 0.0053 Epoch: 15 Global Step: 256630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:29:56,445-Speed 5121.03 samples/sec Loss 0.9080 LearningRate 0.0053 Epoch: 15 Global Step: 256640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:29:58,426-Speed 5172.25 samples/sec Loss 0.8405 LearningRate 0.0053 Epoch: 15 Global Step: 256650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:30:00,414-Speed 5153.57 samples/sec Loss 0.8617 LearningRate 0.0053 Epoch: 15 Global Step: 256660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:30:02,400-Speed 5158.77 samples/sec Loss 0.9070 LearningRate 0.0053 Epoch: 15 Global Step: 256670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:30:04,389-Speed 5150.13 samples/sec Loss 0.8772 LearningRate 0.0053 Epoch: 15 Global Step: 256680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:30:06,373-Speed 5163.70 samples/sec Loss 0.8542 LearningRate 0.0053 Epoch: 15 Global Step: 256690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:30:08,352-Speed 5176.32 samples/sec Loss 0.8767 LearningRate 0.0053 Epoch: 15 Global Step: 256700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:30:10,347-Speed 5135.20 samples/sec Loss 0.8424 LearningRate 0.0053 Epoch: 15 Global Step: 256710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:30:12,368-Speed 5067.58 samples/sec Loss 0.8614 LearningRate 0.0053 Epoch: 15 Global Step: 256720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:30:14,343-Speed 5187.69 samples/sec Loss 0.8557 LearningRate 0.0053 Epoch: 15 Global Step: 256730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:30:16,359-Speed 5082.62 samples/sec Loss 0.8871 LearningRate 0.0053 Epoch: 15 Global Step: 256740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:30:18,348-Speed 5150.96 samples/sec Loss 0.8897 LearningRate 0.0053 Epoch: 15 Global Step: 256750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:30:20,353-Speed 5108.34 samples/sec Loss 0.8606 LearningRate 0.0053 Epoch: 15 Global Step: 256760 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:30:22,335-Speed 5169.51 samples/sec Loss 0.8666 LearningRate 0.0053 Epoch: 15 Global Step: 256770 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:30:24,338-Speed 5113.01 samples/sec Loss 0.8517 LearningRate 0.0053 Epoch: 15 Global Step: 256780 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:30:26,307-Speed 5204.29 samples/sec Loss 0.8945 LearningRate 0.0053 Epoch: 15 Global Step: 256790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:30:28,306-Speed 5123.47 samples/sec Loss 0.8551 LearningRate 0.0053 Epoch: 15 Global Step: 256800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:30:30,285-Speed 5177.00 samples/sec Loss 0.8582 LearningRate 0.0053 Epoch: 15 Global Step: 256810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:30:32,257-Speed 5194.61 samples/sec Loss 0.8512 LearningRate 0.0053 Epoch: 15 Global Step: 256820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:30:34,257-Speed 5124.00 samples/sec Loss 0.8932 LearningRate 0.0053 Epoch: 15 Global Step: 256830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:30:36,241-Speed 5163.78 samples/sec Loss 0.8677 LearningRate 0.0053 Epoch: 15 Global Step: 256840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:30:38,219-Speed 5178.33 samples/sec Loss 0.8582 LearningRate 0.0053 Epoch: 15 Global Step: 256850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:30:40,198-Speed 5175.96 samples/sec Loss 0.8779 LearningRate 0.0053 Epoch: 15 Global Step: 256860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:30:42,190-Speed 5141.46 samples/sec Loss 0.8203 LearningRate 0.0053 Epoch: 15 Global Step: 256870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:30:44,178-Speed 5154.96 samples/sec Loss 0.8808 LearningRate 0.0053 Epoch: 15 Global Step: 256880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:30:46,171-Speed 5138.49 samples/sec Loss 0.8518 LearningRate 0.0053 Epoch: 15 Global Step: 256890 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:30:48,147-Speed 5185.73 samples/sec Loss 0.8775 LearningRate 0.0053 Epoch: 15 Global Step: 256900 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:30:50,133-Speed 5157.61 samples/sec Loss 0.8803 LearningRate 0.0053 Epoch: 15 Global Step: 256910 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:30:52,109-Speed 5183.50 samples/sec Loss 0.8626 LearningRate 0.0053 Epoch: 15 Global Step: 256920 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:30:54,076-Speed 5207.75 samples/sec Loss 0.8683 LearningRate 0.0053 Epoch: 15 Global Step: 256930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:30:56,061-Speed 5160.13 samples/sec Loss 0.8850 LearningRate 0.0053 Epoch: 15 Global Step: 256940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:30:58,034-Speed 5190.06 samples/sec Loss 0.8762 LearningRate 0.0053 Epoch: 15 Global Step: 256950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:31:00,008-Speed 5190.29 samples/sec Loss 0.8867 LearningRate 0.0053 Epoch: 15 Global Step: 256960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:31:01,985-Speed 5182.48 samples/sec Loss 0.8861 LearningRate 0.0053 Epoch: 15 Global Step: 256970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:31:03,958-Speed 5191.86 samples/sec Loss 0.9173 LearningRate 0.0053 Epoch: 15 Global Step: 256980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:31:05,945-Speed 5154.13 samples/sec Loss 0.8608 LearningRate 0.0053 Epoch: 15 Global Step: 256990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:31:07,925-Speed 5174.27 samples/sec Loss 0.8446 LearningRate 0.0053 Epoch: 15 Global Step: 257000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:31:09,923-Speed 5128.08 samples/sec Loss 0.8910 LearningRate 0.0053 Epoch: 15 Global Step: 257010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:31:11,911-Speed 5151.63 samples/sec Loss 0.8628 LearningRate 0.0053 Epoch: 15 Global Step: 257020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:31:13,915-Speed 5112.21 samples/sec Loss 0.8849 LearningRate 0.0053 Epoch: 15 Global Step: 257030 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:31:15,908-Speed 5140.29 samples/sec Loss 0.8444 LearningRate 0.0053 Epoch: 15 Global Step: 257040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:31:17,900-Speed 5142.18 samples/sec Loss 0.8738 LearningRate 0.0053 Epoch: 15 Global Step: 257050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:31:19,872-Speed 5193.78 samples/sec Loss 0.8924 LearningRate 0.0053 Epoch: 15 Global Step: 257060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:31:21,851-Speed 5178.09 samples/sec Loss 0.8802 LearningRate 0.0053 Epoch: 15 Global Step: 257070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:31:23,831-Speed 5172.23 samples/sec Loss 0.8585 LearningRate 0.0053 Epoch: 15 Global Step: 257080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:31:25,809-Speed 5177.68 samples/sec Loss 0.8978 LearningRate 0.0053 Epoch: 15 Global Step: 257090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:31:27,832-Speed 5064.54 samples/sec Loss 0.8852 LearningRate 0.0053 Epoch: 15 Global Step: 257100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:31:29,810-Speed 5180.52 samples/sec Loss 0.8913 LearningRate 0.0053 Epoch: 15 Global Step: 257110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:31:31,782-Speed 5194.70 samples/sec Loss 0.8463 LearningRate 0.0053 Epoch: 15 Global Step: 257120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:31:33,783-Speed 5119.01 samples/sec Loss 0.8965 LearningRate 0.0053 Epoch: 15 Global Step: 257130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:31:35,759-Speed 5185.78 samples/sec Loss 0.8725 LearningRate 0.0053 Epoch: 15 Global Step: 257140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:31:37,736-Speed 5179.32 samples/sec Loss 0.8714 LearningRate 0.0053 Epoch: 15 Global Step: 257150 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:31:39,718-Speed 5169.47 samples/sec Loss 0.8666 LearningRate 0.0053 Epoch: 15 Global Step: 257160 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:31:41,686-Speed 5204.42 samples/sec Loss 0.8799 LearningRate 0.0053 Epoch: 15 Global Step: 257170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:31:43,680-Speed 5136.74 samples/sec Loss 0.8991 LearningRate 0.0053 Epoch: 15 Global Step: 257180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:31:45,657-Speed 5183.31 samples/sec Loss 0.8485 LearningRate 0.0053 Epoch: 15 Global Step: 257190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:31:47,633-Speed 5184.26 samples/sec Loss 0.8617 LearningRate 0.0053 Epoch: 15 Global Step: 257200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:31:49,633-Speed 5122.34 samples/sec Loss 0.8815 LearningRate 0.0053 Epoch: 15 Global Step: 257210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:31:51,622-Speed 5149.48 samples/sec Loss 0.8938 LearningRate 0.0053 Epoch: 15 Global Step: 257220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:31:53,598-Speed 5185.42 samples/sec Loss 0.8730 LearningRate 0.0053 Epoch: 15 Global Step: 257230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:31:55,581-Speed 5164.93 samples/sec Loss 0.8769 LearningRate 0.0053 Epoch: 15 Global Step: 257240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:31:57,564-Speed 5163.61 samples/sec Loss 0.8895 LearningRate 0.0053 Epoch: 15 Global Step: 257250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:31:59,554-Speed 5149.10 samples/sec Loss 0.8983 LearningRate 0.0053 Epoch: 15 Global Step: 257260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:32:01,571-Speed 5079.37 samples/sec Loss 0.8459 LearningRate 0.0053 Epoch: 15 Global Step: 257270 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:32:03,575-Speed 5110.10 samples/sec Loss 0.8816 LearningRate 0.0053 Epoch: 15 Global Step: 257280 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:32:05,568-Speed 5142.00 samples/sec Loss 0.8636 LearningRate 0.0053 Epoch: 15 Global Step: 257290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:32:07,549-Speed 5170.56 samples/sec Loss 0.8405 LearningRate 0.0053 Epoch: 15 Global Step: 257300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:32:09,524-Speed 5187.88 samples/sec Loss 0.9013 LearningRate 0.0053 Epoch: 15 Global Step: 257310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:32:11,503-Speed 5174.93 samples/sec Loss 0.9059 LearningRate 0.0053 Epoch: 15 Global Step: 257320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:32:13,491-Speed 5151.61 samples/sec Loss 0.8947 LearningRate 0.0053 Epoch: 15 Global Step: 257330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:32:15,497-Speed 5109.36 samples/sec Loss 0.8785 LearningRate 0.0052 Epoch: 15 Global Step: 257340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:32:17,484-Speed 5154.76 samples/sec Loss 0.8804 LearningRate 0.0052 Epoch: 15 Global Step: 257350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:32:19,474-Speed 5147.97 samples/sec Loss 0.8870 LearningRate 0.0052 Epoch: 15 Global Step: 257360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:32:21,476-Speed 5118.38 samples/sec Loss 0.8752 LearningRate 0.0052 Epoch: 15 Global Step: 257370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:32:23,474-Speed 5126.93 samples/sec Loss 0.8334 LearningRate 0.0052 Epoch: 15 Global Step: 257380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:32:25,452-Speed 5179.17 samples/sec Loss 0.8382 LearningRate 0.0052 Epoch: 15 Global Step: 257390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:32:27,434-Speed 5168.19 samples/sec Loss 0.9153 LearningRate 0.0052 Epoch: 15 Global Step: 257400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:32:29,441-Speed 5104.98 samples/sec Loss 0.8826 LearningRate 0.0052 Epoch: 15 Global Step: 257410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:32:31,433-Speed 5141.98 samples/sec Loss 0.8523 LearningRate 0.0052 Epoch: 15 Global Step: 257420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:32:33,396-Speed 5218.52 samples/sec Loss 0.8951 LearningRate 0.0052 Epoch: 15 Global Step: 257430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:32:35,380-Speed 5162.62 samples/sec Loss 0.8814 LearningRate 0.0052 Epoch: 15 Global Step: 257440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:32:37,374-Speed 5137.66 samples/sec Loss 0.8641 LearningRate 0.0052 Epoch: 15 Global Step: 257450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:32:39,360-Speed 5161.29 samples/sec Loss 0.8792 LearningRate 0.0052 Epoch: 15 Global Step: 257460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:32:41,336-Speed 5182.28 samples/sec Loss 0.8737 LearningRate 0.0052 Epoch: 15 Global Step: 257470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:32:43,348-Speed 5093.09 samples/sec Loss 0.8886 LearningRate 0.0052 Epoch: 15 Global Step: 257480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:32:45,323-Speed 5186.83 samples/sec Loss 0.8786 LearningRate 0.0052 Epoch: 15 Global Step: 257490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:32:47,305-Speed 5166.89 samples/sec Loss 0.9020 LearningRate 0.0052 Epoch: 15 Global Step: 257500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:32:49,283-Speed 5180.75 samples/sec Loss 0.9315 LearningRate 0.0052 Epoch: 15 Global Step: 257510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:32:51,270-Speed 5153.86 samples/sec Loss 0.8731 LearningRate 0.0052 Epoch: 15 Global Step: 257520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:32:53,250-Speed 5173.49 samples/sec Loss 0.8807 LearningRate 0.0052 Epoch: 15 Global Step: 257530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:32:55,216-Speed 5212.32 samples/sec Loss 0.8600 LearningRate 0.0052 Epoch: 15 Global Step: 257540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:32:57,186-Speed 5199.02 samples/sec Loss 0.8470 LearningRate 0.0052 Epoch: 15 Global Step: 257550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:32:59,167-Speed 5170.69 samples/sec Loss 0.8959 LearningRate 0.0052 Epoch: 15 Global Step: 257560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:01,144-Speed 5181.80 samples/sec Loss 0.8533 LearningRate 0.0052 Epoch: 15 Global Step: 257570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:03,132-Speed 5153.65 samples/sec Loss 0.8729 LearningRate 0.0052 Epoch: 15 Global Step: 257580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:05,120-Speed 5151.53 samples/sec Loss 0.9015 LearningRate 0.0052 Epoch: 15 Global Step: 257590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:07,125-Speed 5109.10 samples/sec Loss 0.9016 LearningRate 0.0052 Epoch: 15 Global Step: 257600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:09,114-Speed 5150.72 samples/sec Loss 0.8586 LearningRate 0.0052 Epoch: 15 Global Step: 257610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:11,105-Speed 5144.88 samples/sec Loss 0.8681 LearningRate 0.0052 Epoch: 15 Global Step: 257620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:13,102-Speed 5129.47 samples/sec Loss 0.8705 LearningRate 0.0052 Epoch: 15 Global Step: 257630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:15,086-Speed 5162.89 samples/sec Loss 0.8650 LearningRate 0.0052 Epoch: 15 Global Step: 257640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:17,117-Speed 5045.89 samples/sec Loss 0.8607 LearningRate 0.0052 Epoch: 15 Global Step: 257650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:19,102-Speed 5160.57 samples/sec Loss 0.8890 LearningRate 0.0052 Epoch: 15 Global Step: 257660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:21,077-Speed 5187.37 samples/sec Loss 0.8423 LearningRate 0.0052 Epoch: 15 Global Step: 257670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:23,080-Speed 5114.42 samples/sec Loss 0.8537 LearningRate 0.0052 Epoch: 15 Global Step: 257680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:25,060-Speed 5173.03 samples/sec Loss 0.8764 LearningRate 0.0052 Epoch: 15 Global Step: 257690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:27,036-Speed 5183.48 samples/sec Loss 0.8280 LearningRate 0.0052 Epoch: 15 Global Step: 257700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:29,031-Speed 5134.89 samples/sec Loss 0.8545 LearningRate 0.0052 Epoch: 15 Global Step: 257710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:31,056-Speed 5060.73 samples/sec Loss 0.8580 LearningRate 0.0052 Epoch: 15 Global Step: 257720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:33,025-Speed 5202.30 samples/sec Loss 0.8936 LearningRate 0.0052 Epoch: 15 Global Step: 257730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:35,024-Speed 5123.88 samples/sec Loss 0.8917 LearningRate 0.0052 Epoch: 15 Global Step: 257740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:33:37,027-Speed 5113.88 samples/sec Loss 0.9023 LearningRate 0.0052 Epoch: 15 Global Step: 257750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:39,002-Speed 5187.68 samples/sec Loss 0.9270 LearningRate 0.0052 Epoch: 15 Global Step: 257760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:40,993-Speed 5145.85 samples/sec Loss 0.8731 LearningRate 0.0052 Epoch: 15 Global Step: 257770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:42,997-Speed 5111.85 samples/sec Loss 0.9278 LearningRate 0.0052 Epoch: 15 Global Step: 257780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:44,971-Speed 5190.36 samples/sec Loss 0.8712 LearningRate 0.0052 Epoch: 15 Global Step: 257790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:46,951-Speed 5172.15 samples/sec Loss 0.8407 LearningRate 0.0052 Epoch: 15 Global Step: 257800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:48,952-Speed 5120.31 samples/sec Loss 0.9009 LearningRate 0.0052 Epoch: 15 Global Step: 257810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:50,937-Speed 5161.00 samples/sec Loss 0.8900 LearningRate 0.0052 Epoch: 15 Global Step: 257820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:52,911-Speed 5188.45 samples/sec Loss 0.9042 LearningRate 0.0052 Epoch: 15 Global Step: 257830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:54,884-Speed 5191.65 samples/sec Loss 0.8623 LearningRate 0.0052 Epoch: 15 Global Step: 257840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:56,859-Speed 5187.57 samples/sec Loss 0.9158 LearningRate 0.0052 Epoch: 15 Global Step: 257850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:33:58,838-Speed 5174.70 samples/sec Loss 0.8964 LearningRate 0.0052 Epoch: 15 Global Step: 257860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:34:00,817-Speed 5178.00 samples/sec Loss 0.8682 LearningRate 0.0052 Epoch: 15 Global Step: 257870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:34:02,797-Speed 5173.86 samples/sec Loss 0.8757 LearningRate 0.0052 Epoch: 15 Global Step: 257880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:34:04,777-Speed 5173.90 samples/sec Loss 0.8854 LearningRate 0.0052 Epoch: 15 Global Step: 257890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:34:06,753-Speed 5183.51 samples/sec Loss 0.9033 LearningRate 0.0052 Epoch: 15 Global Step: 257900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:34:08,727-Speed 5189.30 samples/sec Loss 0.9051 LearningRate 0.0052 Epoch: 15 Global Step: 257910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:34:10,701-Speed 5190.18 samples/sec Loss 0.8928 LearningRate 0.0052 Epoch: 15 Global Step: 257920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:34:12,676-Speed 5186.44 samples/sec Loss 0.8987 LearningRate 0.0052 Epoch: 15 Global Step: 257930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:34:14,654-Speed 5178.28 samples/sec Loss 0.8432 LearningRate 0.0052 Epoch: 15 Global Step: 257940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:34:16,637-Speed 5165.16 samples/sec Loss 0.8709 LearningRate 0.0052 Epoch: 15 Global Step: 257950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:34:18,614-Speed 5183.45 samples/sec Loss 0.8652 LearningRate 0.0052 Epoch: 15 Global Step: 257960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:34:20,591-Speed 5180.04 samples/sec Loss 0.8787 LearningRate 0.0052 Epoch: 15 Global Step: 257970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:34:22,603-Speed 5091.75 samples/sec Loss 0.9232 LearningRate 0.0052 Epoch: 15 Global Step: 257980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:34:24,594-Speed 5145.80 samples/sec Loss 0.9041 LearningRate 0.0052 Epoch: 15 Global Step: 257990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:34:26,572-Speed 5177.62 samples/sec Loss 0.8847 LearningRate 0.0052 Epoch: 15 Global Step: 258000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:34:53,241-[lfw][258000]XNorm: 22.058127 Training: 2022-04-11 16:34:53,242-[lfw][258000]Accuracy-Flip: 0.99750+-0.00281 Training: 2022-04-11 16:34:53,242-[lfw][258000]Accuracy-Highest: 0.99833 Training: 2022-04-11 16:35:23,936-[cfp_fp][258000]XNorm: 21.999481 Training: 2022-04-11 16:35:23,936-[cfp_fp][258000]Accuracy-Flip: 0.98857+-0.00527 Training: 2022-04-11 16:35:23,937-[cfp_fp][258000]Accuracy-Highest: 0.98914 Training: 2022-04-11 16:35:50,495-[agedb_30][258000]XNorm: 22.814138 Training: 2022-04-11 16:35:50,495-[agedb_30][258000]Accuracy-Flip: 0.98233+-0.00779 Training: 2022-04-11 16:35:50,496-[agedb_30][258000]Accuracy-Highest: 0.98300 Training: 2022-04-11 16:35:52,506-Speed 119.16 samples/sec Loss 0.8410 LearningRate 0.0052 Epoch: 15 Global Step: 258010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:35:54,474-Speed 5203.94 samples/sec Loss 0.8579 LearningRate 0.0052 Epoch: 15 Global Step: 258020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:35:56,458-Speed 5164.12 samples/sec Loss 0.9219 LearningRate 0.0052 Epoch: 15 Global Step: 258030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:35:58,424-Speed 5209.70 samples/sec Loss 0.8633 LearningRate 0.0052 Epoch: 15 Global Step: 258040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:36:00,403-Speed 5177.55 samples/sec Loss 0.9306 LearningRate 0.0052 Epoch: 15 Global Step: 258050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:36:02,384-Speed 5171.05 samples/sec Loss 0.9345 LearningRate 0.0052 Epoch: 15 Global Step: 258060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:36:04,366-Speed 5166.33 samples/sec Loss 0.8772 LearningRate 0.0051 Epoch: 15 Global Step: 258070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:36:06,340-Speed 5189.52 samples/sec Loss 0.8741 LearningRate 0.0051 Epoch: 15 Global Step: 258080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:36:08,310-Speed 5200.15 samples/sec Loss 0.8860 LearningRate 0.0051 Epoch: 15 Global Step: 258090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:36:10,296-Speed 5157.74 samples/sec Loss 0.8641 LearningRate 0.0051 Epoch: 15 Global Step: 258100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:36:12,303-Speed 5104.63 samples/sec Loss 0.8731 LearningRate 0.0051 Epoch: 15 Global Step: 258110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:36:14,278-Speed 5188.52 samples/sec Loss 0.8753 LearningRate 0.0051 Epoch: 15 Global Step: 258120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:36:16,250-Speed 5193.87 samples/sec Loss 0.8560 LearningRate 0.0051 Epoch: 15 Global Step: 258130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:36:18,228-Speed 5178.51 samples/sec Loss 0.8983 LearningRate 0.0051 Epoch: 15 Global Step: 258140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:36:20,203-Speed 5187.37 samples/sec Loss 0.9060 LearningRate 0.0051 Epoch: 15 Global Step: 258150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:36:22,177-Speed 5188.15 samples/sec Loss 0.9010 LearningRate 0.0051 Epoch: 15 Global Step: 258160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:36:25,111-Speed 3491.14 samples/sec Loss 0.8783 LearningRate 0.0051 Epoch: 15 Global Step: 258170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:36:27,117-Speed 5107.06 samples/sec Loss 0.8685 LearningRate 0.0051 Epoch: 15 Global Step: 258180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:36:29,086-Speed 5200.58 samples/sec Loss 0.8496 LearningRate 0.0051 Epoch: 15 Global Step: 258190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:36:31,072-Speed 5161.18 samples/sec Loss 0.8782 LearningRate 0.0051 Epoch: 15 Global Step: 258200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:36:33,038-Speed 5210.97 samples/sec Loss 0.9011 LearningRate 0.0051 Epoch: 15 Global Step: 258210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:36:35,023-Speed 5159.96 samples/sec Loss 0.8722 LearningRate 0.0051 Epoch: 15 Global Step: 258220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:36:37,008-Speed 5160.58 samples/sec Loss 0.9007 LearningRate 0.0051 Epoch: 15 Global Step: 258230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:36:38,991-Speed 5165.75 samples/sec Loss 0.8902 LearningRate 0.0051 Epoch: 15 Global Step: 258240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:36:41,026-Speed 5035.18 samples/sec Loss 0.9284 LearningRate 0.0051 Epoch: 15 Global Step: 258250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:36:43,002-Speed 5182.40 samples/sec Loss 0.8588 LearningRate 0.0051 Epoch: 15 Global Step: 258260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:36:45,021-Speed 5074.84 samples/sec Loss 0.8876 LearningRate 0.0051 Epoch: 15 Global Step: 258270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:36:46,996-Speed 5187.94 samples/sec Loss 0.8694 LearningRate 0.0051 Epoch: 15 Global Step: 258280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:36:48,979-Speed 5165.68 samples/sec Loss 0.8763 LearningRate 0.0051 Epoch: 15 Global Step: 258290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:36:50,975-Speed 5132.59 samples/sec Loss 0.8368 LearningRate 0.0051 Epoch: 15 Global Step: 258300 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:36:52,960-Speed 5160.16 samples/sec Loss 0.8911 LearningRate 0.0051 Epoch: 15 Global Step: 258310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:36:54,946-Speed 5156.72 samples/sec Loss 0.9064 LearningRate 0.0051 Epoch: 15 Global Step: 258320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:36:56,925-Speed 5180.03 samples/sec Loss 0.8748 LearningRate 0.0051 Epoch: 15 Global Step: 258330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:36:58,905-Speed 5172.60 samples/sec Loss 0.8819 LearningRate 0.0051 Epoch: 15 Global Step: 258340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:37:00,902-Speed 5129.82 samples/sec Loss 0.8696 LearningRate 0.0051 Epoch: 15 Global Step: 258350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:37:02,898-Speed 5132.42 samples/sec Loss 0.8893 LearningRate 0.0051 Epoch: 15 Global Step: 258360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:37:04,872-Speed 5188.80 samples/sec Loss 0.8781 LearningRate 0.0051 Epoch: 15 Global Step: 258370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:37:06,853-Speed 5170.99 samples/sec Loss 0.9313 LearningRate 0.0051 Epoch: 15 Global Step: 258380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:37:08,825-Speed 5194.73 samples/sec Loss 0.8544 LearningRate 0.0051 Epoch: 15 Global Step: 258390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:37:10,801-Speed 5184.15 samples/sec Loss 0.9281 LearningRate 0.0051 Epoch: 15 Global Step: 258400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:37:12,777-Speed 5183.94 samples/sec Loss 0.8856 LearningRate 0.0051 Epoch: 15 Global Step: 258410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:37:14,754-Speed 5181.47 samples/sec Loss 0.8873 LearningRate 0.0051 Epoch: 15 Global Step: 258420 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:37:16,749-Speed 5134.93 samples/sec Loss 0.8838 LearningRate 0.0051 Epoch: 15 Global Step: 258430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:37:18,737-Speed 5153.62 samples/sec Loss 0.8914 LearningRate 0.0051 Epoch: 15 Global Step: 258440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:37:20,729-Speed 5142.44 samples/sec Loss 0.8633 LearningRate 0.0051 Epoch: 15 Global Step: 258450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:37:22,728-Speed 5124.57 samples/sec Loss 0.8633 LearningRate 0.0051 Epoch: 15 Global Step: 258460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:37:24,711-Speed 5166.10 samples/sec Loss 0.9235 LearningRate 0.0051 Epoch: 15 Global Step: 258470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:37:26,702-Speed 5143.28 samples/sec Loss 0.8806 LearningRate 0.0051 Epoch: 15 Global Step: 258480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:37:28,683-Speed 5171.44 samples/sec Loss 0.9031 LearningRate 0.0051 Epoch: 15 Global Step: 258490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:37:30,660-Speed 5180.84 samples/sec Loss 0.8767 LearningRate 0.0051 Epoch: 15 Global Step: 258500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:37:32,641-Speed 5171.20 samples/sec Loss 0.9360 LearningRate 0.0051 Epoch: 15 Global Step: 258510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:37:34,615-Speed 5188.51 samples/sec Loss 0.8869 LearningRate 0.0051 Epoch: 15 Global Step: 258520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:37:36,590-Speed 5187.14 samples/sec Loss 0.8599 LearningRate 0.0051 Epoch: 15 Global Step: 258530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:37:38,583-Speed 5139.39 samples/sec Loss 0.9111 LearningRate 0.0051 Epoch: 15 Global Step: 258540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:37:40,586-Speed 5115.86 samples/sec Loss 0.8603 LearningRate 0.0051 Epoch: 15 Global Step: 258550 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:37:42,575-Speed 5148.70 samples/sec Loss 0.8905 LearningRate 0.0051 Epoch: 15 Global Step: 258560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:37:44,553-Speed 5178.44 samples/sec Loss 0.8933 LearningRate 0.0051 Epoch: 15 Global Step: 258570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:37:46,548-Speed 5134.54 samples/sec Loss 0.8990 LearningRate 0.0051 Epoch: 15 Global Step: 258580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:37:48,539-Speed 5145.61 samples/sec Loss 0.9064 LearningRate 0.0051 Epoch: 15 Global Step: 258590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:37:50,524-Speed 5160.00 samples/sec Loss 0.8713 LearningRate 0.0051 Epoch: 15 Global Step: 258600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:37:52,542-Speed 5076.87 samples/sec Loss 0.8637 LearningRate 0.0051 Epoch: 15 Global Step: 258610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:37:54,519-Speed 5183.55 samples/sec Loss 0.8860 LearningRate 0.0051 Epoch: 15 Global Step: 258620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:37:56,490-Speed 5196.01 samples/sec Loss 0.8951 LearningRate 0.0051 Epoch: 15 Global Step: 258630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:37:58,499-Speed 5099.83 samples/sec Loss 0.8681 LearningRate 0.0051 Epoch: 15 Global Step: 258640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:38:00,487-Speed 5151.97 samples/sec Loss 0.8824 LearningRate 0.0051 Epoch: 15 Global Step: 258650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:38:02,461-Speed 5191.02 samples/sec Loss 0.8864 LearningRate 0.0051 Epoch: 15 Global Step: 258660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:38:04,462-Speed 5117.08 samples/sec Loss 0.9320 LearningRate 0.0051 Epoch: 15 Global Step: 258670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:38:06,458-Speed 5133.18 samples/sec Loss 0.8601 LearningRate 0.0051 Epoch: 15 Global Step: 258680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:38:08,446-Speed 5154.25 samples/sec Loss 0.8988 LearningRate 0.0051 Epoch: 15 Global Step: 258690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:38:10,425-Speed 5175.86 samples/sec Loss 0.8838 LearningRate 0.0051 Epoch: 15 Global Step: 258700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:38:12,408-Speed 5165.06 samples/sec Loss 0.8713 LearningRate 0.0051 Epoch: 15 Global Step: 258710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:38:14,381-Speed 5190.89 samples/sec Loss 0.9053 LearningRate 0.0051 Epoch: 15 Global Step: 258720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:38:16,359-Speed 5180.95 samples/sec Loss 0.8572 LearningRate 0.0051 Epoch: 15 Global Step: 258730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:38:18,330-Speed 5197.43 samples/sec Loss 0.8830 LearningRate 0.0051 Epoch: 15 Global Step: 258740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:38:20,303-Speed 5190.18 samples/sec Loss 0.9132 LearningRate 0.0051 Epoch: 15 Global Step: 258750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:38:22,274-Speed 5198.40 samples/sec Loss 0.8989 LearningRate 0.0051 Epoch: 15 Global Step: 258760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:38:24,250-Speed 5182.97 samples/sec Loss 0.8663 LearningRate 0.0051 Epoch: 15 Global Step: 258770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:38:26,248-Speed 5127.53 samples/sec Loss 0.9259 LearningRate 0.0051 Epoch: 15 Global Step: 258780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:38:28,232-Speed 5164.32 samples/sec Loss 0.8528 LearningRate 0.0051 Epoch: 15 Global Step: 258790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:38:30,206-Speed 5190.22 samples/sec Loss 0.9082 LearningRate 0.0051 Epoch: 15 Global Step: 258800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:38:32,182-Speed 5182.90 samples/sec Loss 0.8953 LearningRate 0.0050 Epoch: 15 Global Step: 258810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:38:34,153-Speed 5196.31 samples/sec Loss 0.8600 LearningRate 0.0050 Epoch: 15 Global Step: 258820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:38:36,140-Speed 5155.09 samples/sec Loss 0.8751 LearningRate 0.0050 Epoch: 15 Global Step: 258830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:38:38,184-Speed 5014.08 samples/sec Loss 0.8707 LearningRate 0.0050 Epoch: 15 Global Step: 258840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:38:40,172-Speed 5154.26 samples/sec Loss 0.8702 LearningRate 0.0050 Epoch: 15 Global Step: 258850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:38:42,206-Speed 5036.25 samples/sec Loss 0.9722 LearningRate 0.0050 Epoch: 15 Global Step: 258860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:38:44,179-Speed 5192.56 samples/sec Loss 0.9285 LearningRate 0.0050 Epoch: 15 Global Step: 258870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:38:46,154-Speed 5186.48 samples/sec Loss 0.9055 LearningRate 0.0050 Epoch: 15 Global Step: 258880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:38:48,164-Speed 5096.74 samples/sec Loss 0.8819 LearningRate 0.0050 Epoch: 15 Global Step: 258890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:38:50,140-Speed 5185.23 samples/sec Loss 0.9079 LearningRate 0.0050 Epoch: 15 Global Step: 258900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:38:52,120-Speed 5171.90 samples/sec Loss 0.8673 LearningRate 0.0050 Epoch: 15 Global Step: 258910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:38:54,097-Speed 5180.60 samples/sec Loss 0.8700 LearningRate 0.0050 Epoch: 15 Global Step: 258920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:38:56,068-Speed 5197.28 samples/sec Loss 0.8813 LearningRate 0.0050 Epoch: 15 Global Step: 258930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:38:58,064-Speed 5134.42 samples/sec Loss 0.8719 LearningRate 0.0050 Epoch: 15 Global Step: 258940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:39:00,041-Speed 5181.06 samples/sec Loss 0.8991 LearningRate 0.0050 Epoch: 15 Global Step: 258950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:39:02,037-Speed 5132.98 samples/sec Loss 0.8448 LearningRate 0.0050 Epoch: 15 Global Step: 258960 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:39:04,025-Speed 5153.93 samples/sec Loss 0.9177 LearningRate 0.0050 Epoch: 15 Global Step: 258970 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:39:06,011-Speed 5157.23 samples/sec Loss 0.8828 LearningRate 0.0050 Epoch: 15 Global Step: 258980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:39:07,992-Speed 5172.27 samples/sec Loss 0.8603 LearningRate 0.0050 Epoch: 15 Global Step: 258990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:39:09,974-Speed 5167.28 samples/sec Loss 0.8849 LearningRate 0.0050 Epoch: 15 Global Step: 259000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:39:11,964-Speed 5148.88 samples/sec Loss 0.8866 LearningRate 0.0050 Epoch: 15 Global Step: 259010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:39:13,946-Speed 5168.44 samples/sec Loss 0.8835 LearningRate 0.0050 Epoch: 15 Global Step: 259020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:39:15,932-Speed 5157.35 samples/sec Loss 0.8758 LearningRate 0.0050 Epoch: 15 Global Step: 259030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:39:17,904-Speed 5194.33 samples/sec Loss 0.8944 LearningRate 0.0050 Epoch: 15 Global Step: 259040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:39:19,876-Speed 5194.16 samples/sec Loss 0.8717 LearningRate 0.0050 Epoch: 15 Global Step: 259050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:39:21,867-Speed 5145.90 samples/sec Loss 0.8511 LearningRate 0.0050 Epoch: 15 Global Step: 259060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:39:23,851-Speed 5163.82 samples/sec Loss 0.9173 LearningRate 0.0050 Epoch: 15 Global Step: 259070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:39:25,837-Speed 5157.55 samples/sec Loss 0.8880 LearningRate 0.0050 Epoch: 15 Global Step: 259080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:39:27,853-Speed 5081.42 samples/sec Loss 0.8568 LearningRate 0.0050 Epoch: 15 Global Step: 259090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:39:29,839-Speed 5156.35 samples/sec Loss 0.9216 LearningRate 0.0050 Epoch: 15 Global Step: 259100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:39:31,810-Speed 5198.87 samples/sec Loss 0.9007 LearningRate 0.0050 Epoch: 15 Global Step: 259110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:39:33,788-Speed 5178.84 samples/sec Loss 0.8660 LearningRate 0.0050 Epoch: 15 Global Step: 259120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:39:35,824-Speed 5031.21 samples/sec Loss 0.9046 LearningRate 0.0050 Epoch: 15 Global Step: 259130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:39:37,806-Speed 5170.85 samples/sec Loss 0.8722 LearningRate 0.0050 Epoch: 15 Global Step: 259140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:39:39,803-Speed 5128.43 samples/sec Loss 0.9182 LearningRate 0.0050 Epoch: 15 Global Step: 259150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:39:41,779-Speed 5184.68 samples/sec Loss 0.8845 LearningRate 0.0050 Epoch: 15 Global Step: 259160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:39:43,774-Speed 5134.31 samples/sec Loss 0.8743 LearningRate 0.0050 Epoch: 15 Global Step: 259170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:39:45,761-Speed 5155.47 samples/sec Loss 0.8521 LearningRate 0.0050 Epoch: 15 Global Step: 259180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:39:47,752-Speed 5148.05 samples/sec Loss 0.9191 LearningRate 0.0050 Epoch: 15 Global Step: 259190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:39:49,735-Speed 5166.00 samples/sec Loss 0.8931 LearningRate 0.0050 Epoch: 15 Global Step: 259200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:39:51,727-Speed 5140.71 samples/sec Loss 0.8832 LearningRate 0.0050 Epoch: 15 Global Step: 259210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:39:53,735-Speed 5103.83 samples/sec Loss 0.8911 LearningRate 0.0050 Epoch: 15 Global Step: 259220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:39:55,713-Speed 5179.02 samples/sec Loss 0.8543 LearningRate 0.0050 Epoch: 15 Global Step: 259230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:39:57,693-Speed 5173.16 samples/sec Loss 0.8492 LearningRate 0.0050 Epoch: 15 Global Step: 259240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:39:59,691-Speed 5126.17 samples/sec Loss 0.8925 LearningRate 0.0050 Epoch: 15 Global Step: 259250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:40:01,666-Speed 5187.60 samples/sec Loss 0.9325 LearningRate 0.0050 Epoch: 15 Global Step: 259260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:40:03,642-Speed 5183.58 samples/sec Loss 0.8965 LearningRate 0.0050 Epoch: 15 Global Step: 259270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:40:05,632-Speed 5148.47 samples/sec Loss 0.9017 LearningRate 0.0050 Epoch: 15 Global Step: 259280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:40:07,614-Speed 5166.72 samples/sec Loss 0.9070 LearningRate 0.0050 Epoch: 15 Global Step: 259290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:40:09,603-Speed 5151.82 samples/sec Loss 0.8979 LearningRate 0.0050 Epoch: 15 Global Step: 259300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:40:11,610-Speed 5102.80 samples/sec Loss 0.8622 LearningRate 0.0050 Epoch: 15 Global Step: 259310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:40:13,597-Speed 5157.61 samples/sec Loss 0.8432 LearningRate 0.0050 Epoch: 15 Global Step: 259320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:40:15,573-Speed 5182.49 samples/sec Loss 0.8682 LearningRate 0.0050 Epoch: 15 Global Step: 259330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:40:17,547-Speed 5189.49 samples/sec Loss 0.8452 LearningRate 0.0050 Epoch: 15 Global Step: 259340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:40:19,524-Speed 5181.41 samples/sec Loss 0.9048 LearningRate 0.0050 Epoch: 15 Global Step: 259350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:40:21,501-Speed 5182.56 samples/sec Loss 0.8861 LearningRate 0.0050 Epoch: 15 Global Step: 259360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:40:23,479-Speed 5176.54 samples/sec Loss 0.8835 LearningRate 0.0050 Epoch: 15 Global Step: 259370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:40:25,456-Speed 5183.38 samples/sec Loss 0.8804 LearningRate 0.0050 Epoch: 15 Global Step: 259380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:40:27,454-Speed 5126.59 samples/sec Loss 0.8861 LearningRate 0.0050 Epoch: 15 Global Step: 259390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:40:29,450-Speed 5133.05 samples/sec Loss 0.9066 LearningRate 0.0050 Epoch: 15 Global Step: 259400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:40:31,434-Speed 5170.27 samples/sec Loss 0.9106 LearningRate 0.0050 Epoch: 15 Global Step: 259410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:40:33,429-Speed 5135.66 samples/sec Loss 0.8767 LearningRate 0.0050 Epoch: 15 Global Step: 259420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:40:35,411-Speed 5168.39 samples/sec Loss 0.8760 LearningRate 0.0050 Epoch: 15 Global Step: 259430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:40:37,440-Speed 5047.74 samples/sec Loss 0.8732 LearningRate 0.0050 Epoch: 15 Global Step: 259440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:40:39,453-Speed 5089.60 samples/sec Loss 0.8822 LearningRate 0.0050 Epoch: 15 Global Step: 259450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:40:41,447-Speed 5137.17 samples/sec Loss 0.8485 LearningRate 0.0050 Epoch: 15 Global Step: 259460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:40:43,427-Speed 5173.04 samples/sec Loss 0.8604 LearningRate 0.0050 Epoch: 15 Global Step: 259470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:40:45,404-Speed 5183.38 samples/sec Loss 0.9043 LearningRate 0.0050 Epoch: 15 Global Step: 259480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:40:47,390-Speed 5157.82 samples/sec Loss 0.8824 LearningRate 0.0050 Epoch: 15 Global Step: 259490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:40:49,406-Speed 5082.27 samples/sec Loss 0.8622 LearningRate 0.0050 Epoch: 15 Global Step: 259500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:40:51,390-Speed 5163.03 samples/sec Loss 0.8995 LearningRate 0.0050 Epoch: 15 Global Step: 259510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:40:53,374-Speed 5161.14 samples/sec Loss 0.8419 LearningRate 0.0050 Epoch: 15 Global Step: 259520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:40:55,351-Speed 5181.39 samples/sec Loss 0.8917 LearningRate 0.0050 Epoch: 15 Global Step: 259530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:40:57,321-Speed 5200.05 samples/sec Loss 0.9000 LearningRate 0.0050 Epoch: 15 Global Step: 259540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:40:59,326-Speed 5109.23 samples/sec Loss 0.8879 LearningRate 0.0049 Epoch: 15 Global Step: 259550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:41:01,313-Speed 5153.93 samples/sec Loss 0.8851 LearningRate 0.0049 Epoch: 15 Global Step: 259560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:41:03,313-Speed 5122.17 samples/sec Loss 0.8962 LearningRate 0.0049 Epoch: 15 Global Step: 259570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:41:05,303-Speed 5147.47 samples/sec Loss 0.8716 LearningRate 0.0049 Epoch: 15 Global Step: 259580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:41:07,284-Speed 5171.22 samples/sec Loss 0.8788 LearningRate 0.0049 Epoch: 15 Global Step: 259590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:41:09,270-Speed 5159.88 samples/sec Loss 0.8839 LearningRate 0.0049 Epoch: 15 Global Step: 259600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:41:11,251-Speed 5171.69 samples/sec Loss 0.8795 LearningRate 0.0049 Epoch: 15 Global Step: 259610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:41:13,229-Speed 5178.09 samples/sec Loss 0.9060 LearningRate 0.0049 Epoch: 15 Global Step: 259620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:41:15,218-Speed 5149.23 samples/sec Loss 0.8812 LearningRate 0.0049 Epoch: 15 Global Step: 259630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:41:17,193-Speed 5186.98 samples/sec Loss 0.8597 LearningRate 0.0049 Epoch: 15 Global Step: 259640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:41:19,168-Speed 5188.14 samples/sec Loss 0.8967 LearningRate 0.0049 Epoch: 15 Global Step: 259650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:41:21,149-Speed 5169.07 samples/sec Loss 0.8861 LearningRate 0.0049 Epoch: 15 Global Step: 259660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:41:23,137-Speed 5152.09 samples/sec Loss 0.8465 LearningRate 0.0049 Epoch: 15 Global Step: 259670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:41:25,132-Speed 5136.79 samples/sec Loss 0.8816 LearningRate 0.0049 Epoch: 15 Global Step: 259680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:41:27,117-Speed 5158.96 samples/sec Loss 0.8799 LearningRate 0.0049 Epoch: 15 Global Step: 259690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:41:29,132-Speed 5083.93 samples/sec Loss 0.8799 LearningRate 0.0049 Epoch: 15 Global Step: 259700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:41:31,128-Speed 5132.43 samples/sec Loss 0.9108 LearningRate 0.0049 Epoch: 15 Global Step: 259710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:41:33,144-Speed 5083.56 samples/sec Loss 0.9147 LearningRate 0.0049 Epoch: 15 Global Step: 259720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:41:35,157-Speed 5088.40 samples/sec Loss 0.9090 LearningRate 0.0049 Epoch: 15 Global Step: 259730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:41:37,169-Speed 5092.20 samples/sec Loss 0.9003 LearningRate 0.0049 Epoch: 15 Global Step: 259740 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:41:39,148-Speed 5175.02 samples/sec Loss 0.8614 LearningRate 0.0049 Epoch: 15 Global Step: 259750 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:41:41,129-Speed 5171.08 samples/sec Loss 0.8966 LearningRate 0.0049 Epoch: 15 Global Step: 259760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:41:43,127-Speed 5128.11 samples/sec Loss 0.9028 LearningRate 0.0049 Epoch: 15 Global Step: 259770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:41:45,123-Speed 5133.06 samples/sec Loss 0.8505 LearningRate 0.0049 Epoch: 15 Global Step: 259780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:41:47,105-Speed 5167.49 samples/sec Loss 0.8562 LearningRate 0.0049 Epoch: 15 Global Step: 259790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:41:49,096-Speed 5146.18 samples/sec Loss 0.8833 LearningRate 0.0049 Epoch: 15 Global Step: 259800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:41:51,107-Speed 5092.40 samples/sec Loss 0.8724 LearningRate 0.0049 Epoch: 15 Global Step: 259810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:41:53,084-Speed 5182.81 samples/sec Loss 0.9227 LearningRate 0.0049 Epoch: 15 Global Step: 259820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:41:55,086-Speed 5117.96 samples/sec Loss 0.9185 LearningRate 0.0049 Epoch: 15 Global Step: 259830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:41:57,069-Speed 5165.21 samples/sec Loss 0.9224 LearningRate 0.0049 Epoch: 15 Global Step: 259840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:41:59,059-Speed 5145.85 samples/sec Loss 0.8737 LearningRate 0.0049 Epoch: 15 Global Step: 259850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:42:01,037-Speed 5180.00 samples/sec Loss 0.8826 LearningRate 0.0049 Epoch: 15 Global Step: 259860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:42:03,026-Speed 5150.43 samples/sec Loss 0.9007 LearningRate 0.0049 Epoch: 15 Global Step: 259870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:42:05,019-Speed 5139.16 samples/sec Loss 0.8683 LearningRate 0.0049 Epoch: 15 Global Step: 259880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:42:06,997-Speed 5178.77 samples/sec Loss 0.9191 LearningRate 0.0049 Epoch: 15 Global Step: 259890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:42:08,972-Speed 5186.15 samples/sec Loss 0.8616 LearningRate 0.0049 Epoch: 15 Global Step: 259900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:42:10,950-Speed 5181.08 samples/sec Loss 0.9016 LearningRate 0.0049 Epoch: 15 Global Step: 259910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:42:12,931-Speed 5171.06 samples/sec Loss 0.9017 LearningRate 0.0049 Epoch: 15 Global Step: 259920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:42:14,913-Speed 5167.14 samples/sec Loss 0.8883 LearningRate 0.0049 Epoch: 15 Global Step: 259930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:42:16,893-Speed 5174.64 samples/sec Loss 0.9066 LearningRate 0.0049 Epoch: 15 Global Step: 259940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:42:18,874-Speed 5170.24 samples/sec Loss 0.9104 LearningRate 0.0049 Epoch: 15 Global Step: 259950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:42:20,857-Speed 5166.36 samples/sec Loss 0.9089 LearningRate 0.0049 Epoch: 15 Global Step: 259960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:42:22,828-Speed 5197.81 samples/sec Loss 0.8495 LearningRate 0.0049 Epoch: 15 Global Step: 259970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:42:24,843-Speed 5082.69 samples/sec Loss 0.9119 LearningRate 0.0049 Epoch: 15 Global Step: 259980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:42:26,834-Speed 5144.17 samples/sec Loss 0.8801 LearningRate 0.0049 Epoch: 15 Global Step: 259990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:42:28,829-Speed 5135.42 samples/sec Loss 0.8829 LearningRate 0.0049 Epoch: 15 Global Step: 260000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:42:55,440-[lfw][260000]XNorm: 21.598124 Training: 2022-04-11 16:42:55,441-[lfw][260000]Accuracy-Flip: 0.99817+-0.00263 Training: 2022-04-11 16:42:55,441-[lfw][260000]Accuracy-Highest: 0.99833 Training: 2022-04-11 16:43:26,292-[cfp_fp][260000]XNorm: 21.548612 Training: 2022-04-11 16:43:26,292-[cfp_fp][260000]Accuracy-Flip: 0.98886+-0.00432 Training: 2022-04-11 16:43:26,293-[cfp_fp][260000]Accuracy-Highest: 0.98914 Training: 2022-04-11 16:43:52,800-[agedb_30][260000]XNorm: 22.610195 Training: 2022-04-11 16:43:52,800-[agedb_30][260000]Accuracy-Flip: 0.98250+-0.00735 Training: 2022-04-11 16:43:52,801-[agedb_30][260000]Accuracy-Highest: 0.98300 Training: 2022-04-11 16:43:54,783-Speed 119.13 samples/sec Loss 0.8804 LearningRate 0.0049 Epoch: 15 Global Step: 260010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:43:56,744-Speed 5221.80 samples/sec Loss 0.8611 LearningRate 0.0049 Epoch: 15 Global Step: 260020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:43:58,716-Speed 5195.43 samples/sec Loss 0.8851 LearningRate 0.0049 Epoch: 15 Global Step: 260030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:44:00,681-Speed 5212.56 samples/sec Loss 0.9184 LearningRate 0.0049 Epoch: 15 Global Step: 260040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:44:02,657-Speed 5184.92 samples/sec Loss 0.9121 LearningRate 0.0049 Epoch: 15 Global Step: 260050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:44:04,626-Speed 5200.53 samples/sec Loss 0.8772 LearningRate 0.0049 Epoch: 15 Global Step: 260060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:44:06,606-Speed 5174.97 samples/sec Loss 0.9347 LearningRate 0.0049 Epoch: 15 Global Step: 260070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:44:08,591-Speed 5159.75 samples/sec Loss 0.9137 LearningRate 0.0049 Epoch: 15 Global Step: 260080 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:44:10,586-Speed 5133.35 samples/sec Loss 0.8973 LearningRate 0.0049 Epoch: 15 Global Step: 260090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:44:12,564-Speed 5179.10 samples/sec Loss 0.9224 LearningRate 0.0049 Epoch: 15 Global Step: 260100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:44:14,547-Speed 5165.91 samples/sec Loss 0.8573 LearningRate 0.0049 Epoch: 15 Global Step: 260110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:44:16,534-Speed 5156.52 samples/sec Loss 0.9069 LearningRate 0.0049 Epoch: 15 Global Step: 260120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:44:18,508-Speed 5187.55 samples/sec Loss 0.8779 LearningRate 0.0049 Epoch: 15 Global Step: 260130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:44:20,484-Speed 5185.19 samples/sec Loss 0.8859 LearningRate 0.0049 Epoch: 15 Global Step: 260140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:44:22,464-Speed 5173.69 samples/sec Loss 0.9060 LearningRate 0.0049 Epoch: 15 Global Step: 260150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:44:24,450-Speed 5158.39 samples/sec Loss 0.9085 LearningRate 0.0049 Epoch: 15 Global Step: 260160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:44:26,431-Speed 5171.04 samples/sec Loss 0.9066 LearningRate 0.0049 Epoch: 15 Global Step: 260170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:44:28,406-Speed 5186.82 samples/sec Loss 0.8694 LearningRate 0.0049 Epoch: 15 Global Step: 260180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:44:30,378-Speed 5193.95 samples/sec Loss 0.9441 LearningRate 0.0049 Epoch: 15 Global Step: 260190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:44:32,354-Speed 5184.61 samples/sec Loss 0.8724 LearningRate 0.0049 Epoch: 15 Global Step: 260200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:44:34,354-Speed 5121.91 samples/sec Loss 0.8625 LearningRate 0.0049 Epoch: 15 Global Step: 260210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:44:36,353-Speed 5125.48 samples/sec Loss 0.8603 LearningRate 0.0049 Epoch: 15 Global Step: 260220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:44:38,328-Speed 5186.45 samples/sec Loss 0.8671 LearningRate 0.0049 Epoch: 15 Global Step: 260230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:44:40,313-Speed 5160.72 samples/sec Loss 0.8912 LearningRate 0.0049 Epoch: 15 Global Step: 260240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:44:42,290-Speed 5182.06 samples/sec Loss 0.8617 LearningRate 0.0049 Epoch: 15 Global Step: 260250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:44:44,258-Speed 5203.69 samples/sec Loss 0.8940 LearningRate 0.0049 Epoch: 15 Global Step: 260260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:44:46,244-Speed 5157.82 samples/sec Loss 0.8768 LearningRate 0.0049 Epoch: 15 Global Step: 260270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:44:48,217-Speed 5190.82 samples/sec Loss 0.9174 LearningRate 0.0049 Epoch: 15 Global Step: 260280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:44:50,203-Speed 5158.73 samples/sec Loss 0.9092 LearningRate 0.0049 Epoch: 15 Global Step: 260290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:44:52,218-Speed 5084.01 samples/sec Loss 0.9058 LearningRate 0.0049 Epoch: 15 Global Step: 260300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:44:54,211-Speed 5141.51 samples/sec Loss 0.9049 LearningRate 0.0048 Epoch: 15 Global Step: 260310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:44:56,200-Speed 5148.65 samples/sec Loss 0.9167 LearningRate 0.0048 Epoch: 15 Global Step: 260320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:44:58,172-Speed 5195.42 samples/sec Loss 0.8665 LearningRate 0.0048 Epoch: 15 Global Step: 260330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:00,196-Speed 5059.94 samples/sec Loss 0.9304 LearningRate 0.0048 Epoch: 15 Global Step: 260340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:02,195-Speed 5125.89 samples/sec Loss 0.8981 LearningRate 0.0048 Epoch: 15 Global Step: 260350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:04,179-Speed 5163.39 samples/sec Loss 0.8899 LearningRate 0.0048 Epoch: 15 Global Step: 260360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:45:06,163-Speed 5163.20 samples/sec Loss 0.8742 LearningRate 0.0048 Epoch: 15 Global Step: 260370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:08,138-Speed 5186.48 samples/sec Loss 0.8846 LearningRate 0.0048 Epoch: 15 Global Step: 260380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:10,130-Speed 5144.62 samples/sec Loss 0.8967 LearningRate 0.0048 Epoch: 15 Global Step: 260390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:12,112-Speed 5165.99 samples/sec Loss 0.8874 LearningRate 0.0048 Epoch: 15 Global Step: 260400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:14,098-Speed 5158.42 samples/sec Loss 0.8976 LearningRate 0.0048 Epoch: 15 Global Step: 260410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:16,075-Speed 5180.48 samples/sec Loss 0.9023 LearningRate 0.0048 Epoch: 15 Global Step: 260420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:18,059-Speed 5165.19 samples/sec Loss 0.8938 LearningRate 0.0048 Epoch: 15 Global Step: 260430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:20,042-Speed 5165.00 samples/sec Loss 0.8724 LearningRate 0.0048 Epoch: 15 Global Step: 260440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:22,019-Speed 5182.82 samples/sec Loss 0.8728 LearningRate 0.0048 Epoch: 15 Global Step: 260450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:23,996-Speed 5180.13 samples/sec Loss 0.9313 LearningRate 0.0048 Epoch: 15 Global Step: 260460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:25,986-Speed 5147.95 samples/sec Loss 0.9094 LearningRate 0.0048 Epoch: 15 Global Step: 260470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:45:27,962-Speed 5184.00 samples/sec Loss 0.9159 LearningRate 0.0048 Epoch: 15 Global Step: 260480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:29,945-Speed 5166.59 samples/sec Loss 0.8535 LearningRate 0.0048 Epoch: 15 Global Step: 260490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:31,924-Speed 5174.27 samples/sec Loss 0.9105 LearningRate 0.0048 Epoch: 15 Global Step: 260500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:33,905-Speed 5171.41 samples/sec Loss 0.8549 LearningRate 0.0048 Epoch: 15 Global Step: 260510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:35,892-Speed 5154.86 samples/sec Loss 0.8527 LearningRate 0.0048 Epoch: 15 Global Step: 260520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:37,881-Speed 5152.23 samples/sec Loss 0.8917 LearningRate 0.0048 Epoch: 15 Global Step: 260530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:39,864-Speed 5164.80 samples/sec Loss 0.8973 LearningRate 0.0048 Epoch: 15 Global Step: 260540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:41,849-Speed 5161.00 samples/sec Loss 0.8732 LearningRate 0.0048 Epoch: 15 Global Step: 260550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:43,827-Speed 5179.25 samples/sec Loss 0.8989 LearningRate 0.0048 Epoch: 15 Global Step: 260560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:45,811-Speed 5162.87 samples/sec Loss 0.9017 LearningRate 0.0048 Epoch: 15 Global Step: 260570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:47,784-Speed 5190.62 samples/sec Loss 0.8840 LearningRate 0.0048 Epoch: 15 Global Step: 260580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:49,775-Speed 5147.71 samples/sec Loss 0.9412 LearningRate 0.0048 Epoch: 15 Global Step: 260590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:51,795-Speed 5070.18 samples/sec Loss 0.9012 LearningRate 0.0048 Epoch: 15 Global Step: 260600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:53,779-Speed 5165.00 samples/sec Loss 0.8787 LearningRate 0.0048 Epoch: 15 Global Step: 260610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:55,772-Speed 5138.71 samples/sec Loss 0.9113 LearningRate 0.0048 Epoch: 15 Global Step: 260620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:57,744-Speed 5195.40 samples/sec Loss 0.9078 LearningRate 0.0048 Epoch: 15 Global Step: 260630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:45:59,729-Speed 5161.00 samples/sec Loss 0.8970 LearningRate 0.0048 Epoch: 15 Global Step: 260640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:46:01,714-Speed 5160.56 samples/sec Loss 0.8924 LearningRate 0.0048 Epoch: 15 Global Step: 260650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:46:03,692-Speed 5176.88 samples/sec Loss 0.9232 LearningRate 0.0048 Epoch: 15 Global Step: 260660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:46:05,666-Speed 5190.39 samples/sec Loss 0.8673 LearningRate 0.0048 Epoch: 15 Global Step: 260670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:46:07,643-Speed 5186.79 samples/sec Loss 0.9139 LearningRate 0.0048 Epoch: 15 Global Step: 260680 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:46:09,612-Speed 5203.68 samples/sec Loss 0.8656 LearningRate 0.0048 Epoch: 15 Global Step: 260690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:46:11,585-Speed 5191.62 samples/sec Loss 0.8982 LearningRate 0.0048 Epoch: 15 Global Step: 260700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:46:13,557-Speed 5194.95 samples/sec Loss 0.8497 LearningRate 0.0048 Epoch: 15 Global Step: 260710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:46:15,529-Speed 5191.89 samples/sec Loss 0.8955 LearningRate 0.0048 Epoch: 15 Global Step: 260720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:46:17,507-Speed 5179.41 samples/sec Loss 0.8881 LearningRate 0.0048 Epoch: 15 Global Step: 260730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:46:19,470-Speed 5219.36 samples/sec Loss 0.9301 LearningRate 0.0048 Epoch: 15 Global Step: 260740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:46:21,446-Speed 5184.30 samples/sec Loss 0.9079 LearningRate 0.0048 Epoch: 15 Global Step: 260750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:46:23,421-Speed 5184.44 samples/sec Loss 0.8850 LearningRate 0.0048 Epoch: 15 Global Step: 260760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:46:25,407-Speed 5158.21 samples/sec Loss 0.8770 LearningRate 0.0048 Epoch: 15 Global Step: 260770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:46:27,378-Speed 5197.79 samples/sec Loss 0.8893 LearningRate 0.0048 Epoch: 15 Global Step: 260780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:46:29,360-Speed 5169.20 samples/sec Loss 0.8780 LearningRate 0.0048 Epoch: 15 Global Step: 260790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:46:31,331-Speed 5196.93 samples/sec Loss 0.8958 LearningRate 0.0048 Epoch: 15 Global Step: 260800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:46:33,327-Speed 5133.37 samples/sec Loss 0.9001 LearningRate 0.0048 Epoch: 15 Global Step: 260810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:46:35,324-Speed 5127.73 samples/sec Loss 0.9302 LearningRate 0.0048 Epoch: 15 Global Step: 260820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:46:37,303-Speed 5176.94 samples/sec Loss 0.8778 LearningRate 0.0048 Epoch: 15 Global Step: 260830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:46:39,297-Speed 5138.04 samples/sec Loss 0.8692 LearningRate 0.0048 Epoch: 15 Global Step: 260840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:46:41,296-Speed 5122.82 samples/sec Loss 0.9247 LearningRate 0.0048 Epoch: 15 Global Step: 260850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:46:43,275-Speed 5175.46 samples/sec Loss 0.9423 LearningRate 0.0048 Epoch: 15 Global Step: 260860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:46:45,265-Speed 5147.74 samples/sec Loss 0.9446 LearningRate 0.0048 Epoch: 15 Global Step: 260870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:46:47,253-Speed 5152.97 samples/sec Loss 0.9244 LearningRate 0.0048 Epoch: 15 Global Step: 260880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:46:49,259-Speed 5108.53 samples/sec Loss 0.8985 LearningRate 0.0048 Epoch: 15 Global Step: 260890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:46:51,233-Speed 5190.71 samples/sec Loss 0.9202 LearningRate 0.0048 Epoch: 15 Global Step: 260900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:46:53,248-Speed 5082.51 samples/sec Loss 0.9456 LearningRate 0.0048 Epoch: 15 Global Step: 260910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:46:55,256-Speed 5103.25 samples/sec Loss 0.8967 LearningRate 0.0048 Epoch: 15 Global Step: 260920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:46:57,229-Speed 5190.44 samples/sec Loss 0.8702 LearningRate 0.0048 Epoch: 15 Global Step: 260930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:46:59,207-Speed 5181.52 samples/sec Loss 0.8831 LearningRate 0.0048 Epoch: 15 Global Step: 260940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:47:01,228-Speed 5066.63 samples/sec Loss 0.8927 LearningRate 0.0048 Epoch: 15 Global Step: 260950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:47:03,215-Speed 5155.59 samples/sec Loss 0.8848 LearningRate 0.0048 Epoch: 15 Global Step: 260960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:47:05,193-Speed 5179.30 samples/sec Loss 0.8928 LearningRate 0.0048 Epoch: 15 Global Step: 260970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:47:07,172-Speed 5176.74 samples/sec Loss 0.8464 LearningRate 0.0048 Epoch: 15 Global Step: 260980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:47:09,158-Speed 5158.84 samples/sec Loss 0.9042 LearningRate 0.0048 Epoch: 15 Global Step: 260990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:47:11,159-Speed 5118.69 samples/sec Loss 0.8678 LearningRate 0.0048 Epoch: 15 Global Step: 261000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:47:13,153-Speed 5137.65 samples/sec Loss 0.9001 LearningRate 0.0048 Epoch: 15 Global Step: 261010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:47:15,140-Speed 5155.22 samples/sec Loss 0.8945 LearningRate 0.0048 Epoch: 15 Global Step: 261020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:47:17,157-Speed 5077.70 samples/sec Loss 0.9309 LearningRate 0.0048 Epoch: 15 Global Step: 261030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:47:19,138-Speed 5169.87 samples/sec Loss 0.9045 LearningRate 0.0048 Epoch: 15 Global Step: 261040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:47:21,115-Speed 5185.34 samples/sec Loss 0.8618 LearningRate 0.0048 Epoch: 15 Global Step: 261050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:47:23,152-Speed 5029.65 samples/sec Loss 0.8678 LearningRate 0.0048 Epoch: 15 Global Step: 261060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:47:25,134-Speed 5169.43 samples/sec Loss 0.9131 LearningRate 0.0047 Epoch: 15 Global Step: 261070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:47:27,110-Speed 5181.78 samples/sec Loss 0.9058 LearningRate 0.0047 Epoch: 15 Global Step: 261080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:47:29,096-Speed 5160.68 samples/sec Loss 0.9127 LearningRate 0.0047 Epoch: 15 Global Step: 261090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:47:31,070-Speed 5190.43 samples/sec Loss 0.8735 LearningRate 0.0047 Epoch: 15 Global Step: 261100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:47:33,043-Speed 5191.20 samples/sec Loss 0.9160 LearningRate 0.0047 Epoch: 15 Global Step: 261110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:47:35,019-Speed 5183.63 samples/sec Loss 0.8685 LearningRate 0.0047 Epoch: 15 Global Step: 261120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:47:37,009-Speed 5150.24 samples/sec Loss 0.8710 LearningRate 0.0047 Epoch: 15 Global Step: 261130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:47:38,991-Speed 5167.77 samples/sec Loss 0.8509 LearningRate 0.0047 Epoch: 15 Global Step: 261140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:47:40,961-Speed 5202.16 samples/sec Loss 0.8805 LearningRate 0.0047 Epoch: 15 Global Step: 261150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:47:42,940-Speed 5175.53 samples/sec Loss 0.8726 LearningRate 0.0047 Epoch: 15 Global Step: 261160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:47:44,971-Speed 5042.06 samples/sec Loss 0.9003 LearningRate 0.0047 Epoch: 15 Global Step: 261170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:47:46,963-Speed 5144.42 samples/sec Loss 0.9115 LearningRate 0.0047 Epoch: 15 Global Step: 261180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:47:48,951-Speed 5151.24 samples/sec Loss 0.8625 LearningRate 0.0047 Epoch: 15 Global Step: 261190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:47:50,936-Speed 5162.01 samples/sec Loss 0.9058 LearningRate 0.0047 Epoch: 15 Global Step: 261200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:47:52,930-Speed 5137.92 samples/sec Loss 0.9252 LearningRate 0.0047 Epoch: 15 Global Step: 261210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:47:54,908-Speed 5180.25 samples/sec Loss 0.9028 LearningRate 0.0047 Epoch: 15 Global Step: 261220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:47:56,900-Speed 5141.55 samples/sec Loss 0.9020 LearningRate 0.0047 Epoch: 15 Global Step: 261230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:47:58,881-Speed 5170.68 samples/sec Loss 0.8878 LearningRate 0.0047 Epoch: 15 Global Step: 261240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:48:00,869-Speed 5152.61 samples/sec Loss 0.8864 LearningRate 0.0047 Epoch: 15 Global Step: 261250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:48:02,915-Speed 5008.07 samples/sec Loss 0.8820 LearningRate 0.0047 Epoch: 15 Global Step: 261260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:48:04,898-Speed 5167.24 samples/sec Loss 0.8673 LearningRate 0.0047 Epoch: 15 Global Step: 261270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:48:06,876-Speed 5175.99 samples/sec Loss 0.9186 LearningRate 0.0047 Epoch: 15 Global Step: 261280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:48:08,853-Speed 5181.67 samples/sec Loss 0.8966 LearningRate 0.0047 Epoch: 15 Global Step: 261290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:48:10,837-Speed 5164.97 samples/sec Loss 0.9275 LearningRate 0.0047 Epoch: 15 Global Step: 261300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:48:12,813-Speed 5183.92 samples/sec Loss 0.9144 LearningRate 0.0047 Epoch: 15 Global Step: 261310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:48:14,793-Speed 5171.89 samples/sec Loss 0.8891 LearningRate 0.0047 Epoch: 15 Global Step: 261320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:48:16,788-Speed 5134.91 samples/sec Loss 0.8783 LearningRate 0.0047 Epoch: 15 Global Step: 261330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:48:18,761-Speed 5191.79 samples/sec Loss 0.8737 LearningRate 0.0047 Epoch: 15 Global Step: 261340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:48:20,758-Speed 5130.20 samples/sec Loss 0.8815 LearningRate 0.0047 Epoch: 15 Global Step: 261350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:48:22,754-Speed 5134.00 samples/sec Loss 0.8898 LearningRate 0.0047 Epoch: 15 Global Step: 261360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:48:24,732-Speed 5177.73 samples/sec Loss 0.9040 LearningRate 0.0047 Epoch: 15 Global Step: 261370 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:48:26,708-Speed 5182.86 samples/sec Loss 0.8797 LearningRate 0.0047 Epoch: 15 Global Step: 261380 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:48:28,685-Speed 5183.21 samples/sec Loss 0.9287 LearningRate 0.0047 Epoch: 15 Global Step: 261390 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:48:30,684-Speed 5123.98 samples/sec Loss 0.9338 LearningRate 0.0047 Epoch: 15 Global Step: 261400 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:48:32,660-Speed 5184.58 samples/sec Loss 0.8828 LearningRate 0.0047 Epoch: 15 Global Step: 261410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:48:34,629-Speed 5201.49 samples/sec Loss 0.9187 LearningRate 0.0047 Epoch: 15 Global Step: 261420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:48:36,611-Speed 5170.33 samples/sec Loss 0.9379 LearningRate 0.0047 Epoch: 15 Global Step: 261430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:48:38,595-Speed 5161.89 samples/sec Loss 0.9074 LearningRate 0.0047 Epoch: 15 Global Step: 261440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:48:40,625-Speed 5047.17 samples/sec Loss 0.8886 LearningRate 0.0047 Epoch: 15 Global Step: 261450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:48:42,617-Speed 5143.87 samples/sec Loss 0.9021 LearningRate 0.0047 Epoch: 15 Global Step: 261460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:48:44,596-Speed 5176.23 samples/sec Loss 0.9206 LearningRate 0.0047 Epoch: 15 Global Step: 261470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:48:46,592-Speed 5129.87 samples/sec Loss 0.8761 LearningRate 0.0047 Epoch: 15 Global Step: 261480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:48:48,587-Speed 5139.82 samples/sec Loss 0.8842 LearningRate 0.0047 Epoch: 15 Global Step: 261490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:48:50,578-Speed 5145.38 samples/sec Loss 0.8930 LearningRate 0.0047 Epoch: 15 Global Step: 261500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:48:52,612-Speed 5036.65 samples/sec Loss 0.9022 LearningRate 0.0047 Epoch: 15 Global Step: 261510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:48:54,589-Speed 5179.26 samples/sec Loss 0.8930 LearningRate 0.0047 Epoch: 15 Global Step: 261520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:48:56,596-Speed 5105.10 samples/sec Loss 0.8942 LearningRate 0.0047 Epoch: 15 Global Step: 261530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:48:58,575-Speed 5176.67 samples/sec Loss 0.8899 LearningRate 0.0047 Epoch: 15 Global Step: 261540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:49:00,550-Speed 5187.40 samples/sec Loss 0.8840 LearningRate 0.0047 Epoch: 15 Global Step: 261550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:49:02,558-Speed 5102.56 samples/sec Loss 0.9568 LearningRate 0.0047 Epoch: 15 Global Step: 261560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:49:04,550-Speed 5140.90 samples/sec Loss 0.9513 LearningRate 0.0047 Epoch: 15 Global Step: 261570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:49:06,530-Speed 5174.95 samples/sec Loss 0.8915 LearningRate 0.0047 Epoch: 15 Global Step: 261580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:49:08,522-Speed 5142.74 samples/sec Loss 0.8860 LearningRate 0.0047 Epoch: 15 Global Step: 261590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:49:10,500-Speed 5180.13 samples/sec Loss 0.8956 LearningRate 0.0047 Epoch: 15 Global Step: 261600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:49:12,477-Speed 5179.96 samples/sec Loss 0.8649 LearningRate 0.0047 Epoch: 15 Global Step: 261610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:49:14,469-Speed 5143.03 samples/sec Loss 0.8916 LearningRate 0.0047 Epoch: 15 Global Step: 261620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:49:16,459-Speed 5147.57 samples/sec Loss 0.8784 LearningRate 0.0047 Epoch: 15 Global Step: 261630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:49:18,462-Speed 5114.14 samples/sec Loss 0.9205 LearningRate 0.0047 Epoch: 15 Global Step: 261640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:49:20,457-Speed 5135.20 samples/sec Loss 0.8768 LearningRate 0.0047 Epoch: 15 Global Step: 261650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:49:22,452-Speed 5134.35 samples/sec Loss 0.9706 LearningRate 0.0047 Epoch: 15 Global Step: 261660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:49:24,441-Speed 5149.32 samples/sec Loss 0.8624 LearningRate 0.0047 Epoch: 15 Global Step: 261670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:49:26,425-Speed 5162.91 samples/sec Loss 0.9314 LearningRate 0.0047 Epoch: 15 Global Step: 261680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:49:28,401-Speed 5186.03 samples/sec Loss 0.9229 LearningRate 0.0047 Epoch: 15 Global Step: 261690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:49:30,379-Speed 5177.01 samples/sec Loss 0.8938 LearningRate 0.0047 Epoch: 15 Global Step: 261700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:49:32,356-Speed 5182.84 samples/sec Loss 0.9180 LearningRate 0.0047 Epoch: 15 Global Step: 261710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:49:34,338-Speed 5167.18 samples/sec Loss 0.8978 LearningRate 0.0047 Epoch: 15 Global Step: 261720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:49:36,329-Speed 5145.27 samples/sec Loss 0.8939 LearningRate 0.0047 Epoch: 15 Global Step: 261730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:49:38,319-Speed 5147.24 samples/sec Loss 0.9039 LearningRate 0.0047 Epoch: 15 Global Step: 261740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:49:40,301-Speed 5170.60 samples/sec Loss 0.9138 LearningRate 0.0047 Epoch: 15 Global Step: 261750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:49:42,281-Speed 5172.50 samples/sec Loss 0.9271 LearningRate 0.0047 Epoch: 15 Global Step: 261760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:49:44,271-Speed 5146.78 samples/sec Loss 0.8413 LearningRate 0.0047 Epoch: 15 Global Step: 261770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:49:46,266-Speed 5136.18 samples/sec Loss 0.8910 LearningRate 0.0047 Epoch: 15 Global Step: 261780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:49:48,251-Speed 5159.87 samples/sec Loss 0.8720 LearningRate 0.0047 Epoch: 15 Global Step: 261790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:49:50,237-Speed 5157.25 samples/sec Loss 0.8791 LearningRate 0.0047 Epoch: 15 Global Step: 261800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:49:52,220-Speed 5167.38 samples/sec Loss 0.8674 LearningRate 0.0047 Epoch: 15 Global Step: 261810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:49:54,206-Speed 5158.46 samples/sec Loss 0.9295 LearningRate 0.0047 Epoch: 15 Global Step: 261820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:49:56,200-Speed 5136.73 samples/sec Loss 0.8886 LearningRate 0.0047 Epoch: 15 Global Step: 261830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:49:58,187-Speed 5155.76 samples/sec Loss 0.9099 LearningRate 0.0046 Epoch: 15 Global Step: 261840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:50:00,180-Speed 5140.54 samples/sec Loss 0.8933 LearningRate 0.0046 Epoch: 15 Global Step: 261850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:50:02,218-Speed 5026.58 samples/sec Loss 0.8710 LearningRate 0.0046 Epoch: 15 Global Step: 261860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:50:04,198-Speed 5174.78 samples/sec Loss 0.8963 LearningRate 0.0046 Epoch: 15 Global Step: 261870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:50:06,185-Speed 5158.51 samples/sec Loss 0.9153 LearningRate 0.0046 Epoch: 15 Global Step: 261880 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:50:08,159-Speed 5187.23 samples/sec Loss 0.9106 LearningRate 0.0046 Epoch: 15 Global Step: 261890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:50:10,142-Speed 5166.34 samples/sec Loss 0.8746 LearningRate 0.0046 Epoch: 15 Global Step: 261900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:50:12,119-Speed 5179.66 samples/sec Loss 0.8692 LearningRate 0.0046 Epoch: 15 Global Step: 261910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:50:14,103-Speed 5165.06 samples/sec Loss 0.8573 LearningRate 0.0046 Epoch: 15 Global Step: 261920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:50:16,083-Speed 5173.76 samples/sec Loss 0.9162 LearningRate 0.0046 Epoch: 15 Global Step: 261930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:50:18,061-Speed 5178.42 samples/sec Loss 0.9151 LearningRate 0.0046 Epoch: 15 Global Step: 261940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:50:20,037-Speed 5184.68 samples/sec Loss 0.8898 LearningRate 0.0046 Epoch: 15 Global Step: 261950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:50:22,054-Speed 5079.80 samples/sec Loss 0.9298 LearningRate 0.0046 Epoch: 15 Global Step: 261960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:50:24,035-Speed 5171.24 samples/sec Loss 0.8864 LearningRate 0.0046 Epoch: 15 Global Step: 261970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:50:26,014-Speed 5175.91 samples/sec Loss 0.8769 LearningRate 0.0046 Epoch: 15 Global Step: 261980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:50:28,008-Speed 5134.64 samples/sec Loss 0.9179 LearningRate 0.0046 Epoch: 15 Global Step: 261990 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:50:29,980-Speed 5195.75 samples/sec Loss 0.8802 LearningRate 0.0046 Epoch: 15 Global Step: 262000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:50:56,611-[lfw][262000]XNorm: 21.844253 Training: 2022-04-11 16:50:56,612-[lfw][262000]Accuracy-Flip: 0.99800+-0.00296 Training: 2022-04-11 16:50:56,613-[lfw][262000]Accuracy-Highest: 0.99833 Training: 2022-04-11 16:51:27,402-[cfp_fp][262000]XNorm: 21.733456 Training: 2022-04-11 16:51:27,403-[cfp_fp][262000]Accuracy-Flip: 0.98857+-0.00438 Training: 2022-04-11 16:51:27,403-[cfp_fp][262000]Accuracy-Highest: 0.98914 Training: 2022-04-11 16:51:54,024-[agedb_30][262000]XNorm: 22.437829 Training: 2022-04-11 16:51:54,024-[agedb_30][262000]Accuracy-Flip: 0.98133+-0.00777 Training: 2022-04-11 16:51:54,025-[agedb_30][262000]Accuracy-Highest: 0.98300 Training: 2022-04-11 16:51:56,011-Speed 119.03 samples/sec Loss 0.8904 LearningRate 0.0046 Epoch: 15 Global Step: 262010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:51:57,984-Speed 5192.53 samples/sec Loss 0.9228 LearningRate 0.0046 Epoch: 15 Global Step: 262020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:51:59,963-Speed 5176.13 samples/sec Loss 0.9369 LearningRate 0.0046 Epoch: 15 Global Step: 262030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:52:01,992-Speed 5047.98 samples/sec Loss 0.8907 LearningRate 0.0046 Epoch: 15 Global Step: 262040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:52:03,976-Speed 5165.17 samples/sec Loss 0.9006 LearningRate 0.0046 Epoch: 15 Global Step: 262050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:52:05,967-Speed 5145.00 samples/sec Loss 0.8723 LearningRate 0.0046 Epoch: 15 Global Step: 262060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:52:07,972-Speed 5109.35 samples/sec Loss 0.8785 LearningRate 0.0046 Epoch: 15 Global Step: 262070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:52:09,957-Speed 5158.74 samples/sec Loss 0.9151 LearningRate 0.0046 Epoch: 15 Global Step: 262080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:52:11,938-Speed 5173.13 samples/sec Loss 0.9278 LearningRate 0.0046 Epoch: 15 Global Step: 262090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:52:13,913-Speed 5185.33 samples/sec Loss 0.8791 LearningRate 0.0046 Epoch: 15 Global Step: 262100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:52:15,903-Speed 5147.59 samples/sec Loss 0.8825 LearningRate 0.0046 Epoch: 15 Global Step: 262110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:52:17,895-Speed 5143.44 samples/sec Loss 0.8889 LearningRate 0.0046 Epoch: 15 Global Step: 262120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:52:19,878-Speed 5165.63 samples/sec Loss 0.8881 LearningRate 0.0046 Epoch: 15 Global Step: 262130 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:52:21,859-Speed 5169.36 samples/sec Loss 0.9013 LearningRate 0.0046 Epoch: 15 Global Step: 262140 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:52:23,830-Speed 5196.53 samples/sec Loss 0.8740 LearningRate 0.0046 Epoch: 15 Global Step: 262150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:52:25,809-Speed 5177.27 samples/sec Loss 0.9160 LearningRate 0.0046 Epoch: 15 Global Step: 262160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:52:27,790-Speed 5170.04 samples/sec Loss 0.8734 LearningRate 0.0046 Epoch: 15 Global Step: 262170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:52:29,765-Speed 5188.00 samples/sec Loss 0.8683 LearningRate 0.0046 Epoch: 15 Global Step: 262180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:52:31,740-Speed 5186.51 samples/sec Loss 0.8982 LearningRate 0.0046 Epoch: 15 Global Step: 262190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:52:33,731-Speed 5145.93 samples/sec Loss 0.8675 LearningRate 0.0046 Epoch: 15 Global Step: 262200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:52:35,743-Speed 5090.60 samples/sec Loss 0.8575 LearningRate 0.0046 Epoch: 15 Global Step: 262210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:52:37,714-Speed 5197.24 samples/sec Loss 0.8772 LearningRate 0.0046 Epoch: 15 Global Step: 262220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:52:39,692-Speed 5179.56 samples/sec Loss 0.8788 LearningRate 0.0046 Epoch: 15 Global Step: 262230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:52:41,668-Speed 5184.71 samples/sec Loss 0.9217 LearningRate 0.0046 Epoch: 15 Global Step: 262240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:52:43,637-Speed 5201.42 samples/sec Loss 0.8759 LearningRate 0.0046 Epoch: 15 Global Step: 262250 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:52:45,624-Speed 5153.90 samples/sec Loss 0.9363 LearningRate 0.0046 Epoch: 15 Global Step: 262260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:52:47,598-Speed 5190.56 samples/sec Loss 0.8983 LearningRate 0.0046 Epoch: 15 Global Step: 262270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:52:49,576-Speed 5179.11 samples/sec Loss 0.8770 LearningRate 0.0046 Epoch: 15 Global Step: 262280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:52:51,549-Speed 5191.20 samples/sec Loss 0.8647 LearningRate 0.0046 Epoch: 15 Global Step: 262290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:52:53,538-Speed 5150.16 samples/sec Loss 0.8745 LearningRate 0.0046 Epoch: 15 Global Step: 262300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:52:55,526-Speed 5151.43 samples/sec Loss 0.8803 LearningRate 0.0046 Epoch: 15 Global Step: 262310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:52:57,508-Speed 5170.39 samples/sec Loss 0.9098 LearningRate 0.0046 Epoch: 15 Global Step: 262320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:52:59,505-Speed 5128.66 samples/sec Loss 0.8703 LearningRate 0.0046 Epoch: 15 Global Step: 262330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:53:01,488-Speed 5167.56 samples/sec Loss 0.9304 LearningRate 0.0046 Epoch: 15 Global Step: 262340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:53:03,465-Speed 5181.26 samples/sec Loss 0.8756 LearningRate 0.0046 Epoch: 15 Global Step: 262350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:53:05,461-Speed 5133.01 samples/sec Loss 0.9293 LearningRate 0.0046 Epoch: 15 Global Step: 262360 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:53:07,427-Speed 5209.09 samples/sec Loss 0.9058 LearningRate 0.0046 Epoch: 15 Global Step: 262370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:53:09,429-Speed 5119.50 samples/sec Loss 0.9261 LearningRate 0.0046 Epoch: 15 Global Step: 262380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:53:11,408-Speed 5173.33 samples/sec Loss 0.8686 LearningRate 0.0046 Epoch: 15 Global Step: 262390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:53:13,382-Speed 5190.09 samples/sec Loss 0.9212 LearningRate 0.0046 Epoch: 15 Global Step: 262400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:53:15,366-Speed 5164.18 samples/sec Loss 0.8737 LearningRate 0.0046 Epoch: 15 Global Step: 262410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:53:17,338-Speed 5193.36 samples/sec Loss 0.8963 LearningRate 0.0046 Epoch: 15 Global Step: 262420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:53:19,320-Speed 5170.79 samples/sec Loss 0.9296 LearningRate 0.0046 Epoch: 15 Global Step: 262430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:53:21,322-Speed 5115.74 samples/sec Loss 0.8849 LearningRate 0.0046 Epoch: 15 Global Step: 262440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:53:23,298-Speed 5186.63 samples/sec Loss 0.8826 LearningRate 0.0046 Epoch: 15 Global Step: 262450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:53:25,279-Speed 5169.25 samples/sec Loss 0.8659 LearningRate 0.0046 Epoch: 15 Global Step: 262460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:53:27,262-Speed 5167.38 samples/sec Loss 0.9155 LearningRate 0.0046 Epoch: 15 Global Step: 262470 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:53:29,242-Speed 5172.23 samples/sec Loss 0.8709 LearningRate 0.0046 Epoch: 15 Global Step: 262480 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:53:31,222-Speed 5174.59 samples/sec Loss 0.9029 LearningRate 0.0046 Epoch: 15 Global Step: 262490 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:53:33,193-Speed 5197.15 samples/sec Loss 0.9203 LearningRate 0.0046 Epoch: 15 Global Step: 262500 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:53:35,175-Speed 5167.08 samples/sec Loss 0.8963 LearningRate 0.0046 Epoch: 15 Global Step: 262510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:53:37,169-Speed 5139.21 samples/sec Loss 0.9100 LearningRate 0.0046 Epoch: 15 Global Step: 262520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:53:39,141-Speed 5194.32 samples/sec Loss 0.8704 LearningRate 0.0046 Epoch: 15 Global Step: 262530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:53:41,114-Speed 5191.04 samples/sec Loss 0.8987 LearningRate 0.0046 Epoch: 15 Global Step: 262540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:53:43,088-Speed 5188.51 samples/sec Loss 0.9164 LearningRate 0.0046 Epoch: 15 Global Step: 262550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:53:45,068-Speed 5175.30 samples/sec Loss 0.8797 LearningRate 0.0046 Epoch: 15 Global Step: 262560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:53:47,049-Speed 5169.54 samples/sec Loss 0.9095 LearningRate 0.0046 Epoch: 15 Global Step: 262570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:53:49,041-Speed 5143.70 samples/sec Loss 0.8896 LearningRate 0.0046 Epoch: 15 Global Step: 262580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:53:51,046-Speed 5109.30 samples/sec Loss 0.9092 LearningRate 0.0046 Epoch: 15 Global Step: 262590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:53:53,014-Speed 5204.60 samples/sec Loss 0.9164 LearningRate 0.0046 Epoch: 15 Global Step: 262600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:53:55,003-Speed 5150.33 samples/sec Loss 0.9008 LearningRate 0.0046 Epoch: 15 Global Step: 262610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:53:56,998-Speed 5133.86 samples/sec Loss 0.9199 LearningRate 0.0045 Epoch: 15 Global Step: 262620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:53:58,970-Speed 5195.02 samples/sec Loss 0.8989 LearningRate 0.0045 Epoch: 15 Global Step: 262630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:54:00,949-Speed 5175.90 samples/sec Loss 0.9017 LearningRate 0.0045 Epoch: 15 Global Step: 262640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:54:02,936-Speed 5155.98 samples/sec Loss 0.9010 LearningRate 0.0045 Epoch: 15 Global Step: 262650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:54:04,938-Speed 5117.60 samples/sec Loss 0.8726 LearningRate 0.0045 Epoch: 15 Global Step: 262660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:54:06,910-Speed 5195.31 samples/sec Loss 0.8876 LearningRate 0.0045 Epoch: 15 Global Step: 262670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:54:08,900-Speed 5145.56 samples/sec Loss 0.9080 LearningRate 0.0045 Epoch: 15 Global Step: 262680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:54:10,878-Speed 5180.48 samples/sec Loss 0.8977 LearningRate 0.0045 Epoch: 15 Global Step: 262690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:54:12,847-Speed 5200.69 samples/sec Loss 0.9012 LearningRate 0.0045 Epoch: 15 Global Step: 262700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:54:14,843-Speed 5133.64 samples/sec Loss 0.8893 LearningRate 0.0045 Epoch: 15 Global Step: 262710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:54:16,817-Speed 5189.84 samples/sec Loss 0.8730 LearningRate 0.0045 Epoch: 15 Global Step: 262720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:54:18,798-Speed 5168.67 samples/sec Loss 0.9298 LearningRate 0.0045 Epoch: 15 Global Step: 262730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:54:20,792-Speed 5136.58 samples/sec Loss 0.9421 LearningRate 0.0045 Epoch: 15 Global Step: 262740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:54:22,782-Speed 5149.27 samples/sec Loss 0.8754 LearningRate 0.0045 Epoch: 15 Global Step: 262750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:54:24,769-Speed 5156.64 samples/sec Loss 0.9363 LearningRate 0.0045 Epoch: 15 Global Step: 262760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:54:26,756-Speed 5155.58 samples/sec Loss 0.9065 LearningRate 0.0045 Epoch: 15 Global Step: 262770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:54:28,738-Speed 5167.73 samples/sec Loss 0.9109 LearningRate 0.0045 Epoch: 15 Global Step: 262780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:54:30,718-Speed 5175.23 samples/sec Loss 0.9051 LearningRate 0.0045 Epoch: 15 Global Step: 262790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:54:32,700-Speed 5167.14 samples/sec Loss 0.9106 LearningRate 0.0045 Epoch: 15 Global Step: 262800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:54:34,697-Speed 5130.77 samples/sec Loss 0.8896 LearningRate 0.0045 Epoch: 15 Global Step: 262810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:54:36,672-Speed 5186.19 samples/sec Loss 0.9388 LearningRate 0.0045 Epoch: 15 Global Step: 262820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:54:38,652-Speed 5175.41 samples/sec Loss 0.8952 LearningRate 0.0045 Epoch: 15 Global Step: 262830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:54:40,625-Speed 5191.44 samples/sec Loss 0.8990 LearningRate 0.0045 Epoch: 15 Global Step: 262840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:54:42,595-Speed 5197.49 samples/sec Loss 0.8636 LearningRate 0.0045 Epoch: 15 Global Step: 262850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:54:44,578-Speed 5166.92 samples/sec Loss 0.8923 LearningRate 0.0045 Epoch: 15 Global Step: 262860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:54:46,552-Speed 5190.72 samples/sec Loss 0.9188 LearningRate 0.0045 Epoch: 15 Global Step: 262870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:54:48,529-Speed 5180.64 samples/sec Loss 0.9087 LearningRate 0.0045 Epoch: 15 Global Step: 262880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:54:50,526-Speed 5130.29 samples/sec Loss 0.8954 LearningRate 0.0045 Epoch: 15 Global Step: 262890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:54:52,556-Speed 5045.81 samples/sec Loss 0.9161 LearningRate 0.0045 Epoch: 15 Global Step: 262900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:54:54,582-Speed 5057.51 samples/sec Loss 0.8981 LearningRate 0.0045 Epoch: 15 Global Step: 262910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:54:56,579-Speed 5130.43 samples/sec Loss 0.9285 LearningRate 0.0045 Epoch: 15 Global Step: 262920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:54:58,576-Speed 5127.52 samples/sec Loss 0.8918 LearningRate 0.0045 Epoch: 15 Global Step: 262930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:55:00,571-Speed 5136.35 samples/sec Loss 0.8725 LearningRate 0.0045 Epoch: 15 Global Step: 262940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:55:02,560-Speed 5149.06 samples/sec Loss 0.9532 LearningRate 0.0045 Epoch: 15 Global Step: 262950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:55:04,538-Speed 5180.51 samples/sec Loss 0.8863 LearningRate 0.0045 Epoch: 15 Global Step: 262960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:55:06,530-Speed 5142.40 samples/sec Loss 0.8862 LearningRate 0.0045 Epoch: 15 Global Step: 262970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:55:08,523-Speed 5139.65 samples/sec Loss 0.8873 LearningRate 0.0045 Epoch: 15 Global Step: 262980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:55:10,494-Speed 5196.06 samples/sec Loss 0.8670 LearningRate 0.0045 Epoch: 15 Global Step: 262990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:55:12,475-Speed 5171.44 samples/sec Loss 0.8733 LearningRate 0.0045 Epoch: 15 Global Step: 263000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:55:14,448-Speed 5193.88 samples/sec Loss 0.9168 LearningRate 0.0045 Epoch: 15 Global Step: 263010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:55:16,422-Speed 5188.70 samples/sec Loss 0.8772 LearningRate 0.0045 Epoch: 15 Global Step: 263020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:55:18,399-Speed 5181.12 samples/sec Loss 0.9096 LearningRate 0.0045 Epoch: 15 Global Step: 263030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:55:20,370-Speed 5195.75 samples/sec Loss 0.8754 LearningRate 0.0045 Epoch: 15 Global Step: 263040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:55:22,350-Speed 5173.81 samples/sec Loss 0.9057 LearningRate 0.0045 Epoch: 15 Global Step: 263050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:55:24,337-Speed 5156.20 samples/sec Loss 0.9112 LearningRate 0.0045 Epoch: 15 Global Step: 263060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:55:26,312-Speed 5185.52 samples/sec Loss 0.9153 LearningRate 0.0045 Epoch: 15 Global Step: 263070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:55:28,293-Speed 5172.04 samples/sec Loss 0.8830 LearningRate 0.0045 Epoch: 15 Global Step: 263080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:55:30,269-Speed 5185.17 samples/sec Loss 0.9259 LearningRate 0.0045 Epoch: 15 Global Step: 263090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:55:32,248-Speed 5176.39 samples/sec Loss 0.9094 LearningRate 0.0045 Epoch: 15 Global Step: 263100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:55:34,233-Speed 5159.33 samples/sec Loss 0.8956 LearningRate 0.0045 Epoch: 15 Global Step: 263110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:55:36,223-Speed 5149.63 samples/sec Loss 0.8878 LearningRate 0.0045 Epoch: 15 Global Step: 263120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:55:38,221-Speed 5127.03 samples/sec Loss 0.9471 LearningRate 0.0045 Epoch: 15 Global Step: 263130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:55:40,199-Speed 5178.29 samples/sec Loss 0.9090 LearningRate 0.0045 Epoch: 15 Global Step: 263140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:55:42,178-Speed 5175.53 samples/sec Loss 0.9133 LearningRate 0.0045 Epoch: 15 Global Step: 263150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:55:44,156-Speed 5180.35 samples/sec Loss 0.9196 LearningRate 0.0045 Epoch: 15 Global Step: 263160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:55:46,139-Speed 5166.66 samples/sec Loss 0.9157 LearningRate 0.0045 Epoch: 15 Global Step: 263170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:55:48,114-Speed 5186.21 samples/sec Loss 0.9052 LearningRate 0.0045 Epoch: 15 Global Step: 263180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:55:50,117-Speed 5113.46 samples/sec Loss 0.8834 LearningRate 0.0045 Epoch: 15 Global Step: 263190 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:55:52,102-Speed 5160.93 samples/sec Loss 0.9088 LearningRate 0.0045 Epoch: 15 Global Step: 263200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:55:54,079-Speed 5182.63 samples/sec Loss 0.8980 LearningRate 0.0045 Epoch: 15 Global Step: 263210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:55:56,060-Speed 5168.81 samples/sec Loss 0.8701 LearningRate 0.0045 Epoch: 15 Global Step: 263220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:55:58,049-Speed 5151.33 samples/sec Loss 0.8948 LearningRate 0.0045 Epoch: 15 Global Step: 263230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:56:00,029-Speed 5173.31 samples/sec Loss 0.8904 LearningRate 0.0045 Epoch: 15 Global Step: 263240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:56:02,021-Speed 5140.94 samples/sec Loss 0.8853 LearningRate 0.0045 Epoch: 15 Global Step: 263250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:56:04,011-Speed 5149.30 samples/sec Loss 0.9209 LearningRate 0.0045 Epoch: 15 Global Step: 263260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:56:06,002-Speed 5143.72 samples/sec Loss 0.9038 LearningRate 0.0045 Epoch: 15 Global Step: 263270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:56:07,978-Speed 5184.45 samples/sec Loss 0.9223 LearningRate 0.0045 Epoch: 15 Global Step: 263280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:56:09,980-Speed 5115.04 samples/sec Loss 0.9126 LearningRate 0.0045 Epoch: 15 Global Step: 263290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:56:11,973-Speed 5141.72 samples/sec Loss 0.9264 LearningRate 0.0045 Epoch: 15 Global Step: 263300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:56:13,950-Speed 5182.11 samples/sec Loss 0.8763 LearningRate 0.0045 Epoch: 15 Global Step: 263310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:56:15,925-Speed 5187.35 samples/sec Loss 0.8469 LearningRate 0.0045 Epoch: 15 Global Step: 263320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:56:17,899-Speed 5187.14 samples/sec Loss 0.9026 LearningRate 0.0045 Epoch: 15 Global Step: 263330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:56:19,876-Speed 5182.87 samples/sec Loss 0.8781 LearningRate 0.0045 Epoch: 15 Global Step: 263340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:56:21,858-Speed 5165.93 samples/sec Loss 0.9073 LearningRate 0.0045 Epoch: 15 Global Step: 263350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:56:23,839-Speed 5171.13 samples/sec Loss 0.8943 LearningRate 0.0045 Epoch: 15 Global Step: 263360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:56:25,832-Speed 5140.53 samples/sec Loss 0.9179 LearningRate 0.0045 Epoch: 15 Global Step: 263370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:56:27,811-Speed 5176.95 samples/sec Loss 0.8599 LearningRate 0.0045 Epoch: 15 Global Step: 263380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:56:29,803-Speed 5140.77 samples/sec Loss 0.9252 LearningRate 0.0045 Epoch: 15 Global Step: 263390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:56:31,775-Speed 5193.51 samples/sec Loss 0.8942 LearningRate 0.0045 Epoch: 15 Global Step: 263400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:56:33,758-Speed 5168.43 samples/sec Loss 0.9127 LearningRate 0.0044 Epoch: 15 Global Step: 263410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:56:35,739-Speed 5168.96 samples/sec Loss 0.9111 LearningRate 0.0044 Epoch: 15 Global Step: 263420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:56:37,725-Speed 5158.99 samples/sec Loss 0.8989 LearningRate 0.0044 Epoch: 15 Global Step: 263430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:56:39,708-Speed 5165.75 samples/sec Loss 0.8878 LearningRate 0.0044 Epoch: 15 Global Step: 263440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:56:41,687-Speed 5177.26 samples/sec Loss 0.8755 LearningRate 0.0044 Epoch: 15 Global Step: 263450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:56:43,665-Speed 5178.20 samples/sec Loss 0.9164 LearningRate 0.0044 Epoch: 15 Global Step: 263460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:56:45,652-Speed 5155.72 samples/sec Loss 0.9438 LearningRate 0.0044 Epoch: 15 Global Step: 263470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:56:47,684-Speed 5039.78 samples/sec Loss 0.8734 LearningRate 0.0044 Epoch: 15 Global Step: 263480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:56:49,674-Speed 5147.44 samples/sec Loss 0.9243 LearningRate 0.0044 Epoch: 15 Global Step: 263490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:56:51,661-Speed 5154.36 samples/sec Loss 0.8746 LearningRate 0.0044 Epoch: 15 Global Step: 263500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:56:53,646-Speed 5161.76 samples/sec Loss 0.8919 LearningRate 0.0044 Epoch: 15 Global Step: 263510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:56:55,628-Speed 5168.60 samples/sec Loss 0.8473 LearningRate 0.0044 Epoch: 15 Global Step: 263520 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:56:57,602-Speed 5190.62 samples/sec Loss 0.9339 LearningRate 0.0044 Epoch: 15 Global Step: 263530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:56:59,584-Speed 5167.27 samples/sec Loss 0.9431 LearningRate 0.0044 Epoch: 15 Global Step: 263540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:57:01,575-Speed 5145.55 samples/sec Loss 0.9145 LearningRate 0.0044 Epoch: 15 Global Step: 263550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:57:03,617-Speed 5014.95 samples/sec Loss 0.9187 LearningRate 0.0044 Epoch: 15 Global Step: 263560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:57:05,600-Speed 5165.89 samples/sec Loss 0.8764 LearningRate 0.0044 Epoch: 15 Global Step: 263570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:57:07,573-Speed 5191.59 samples/sec Loss 0.8912 LearningRate 0.0044 Epoch: 15 Global Step: 263580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:57:09,560-Speed 5154.89 samples/sec Loss 0.8801 LearningRate 0.0044 Epoch: 15 Global Step: 263590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:57:11,537-Speed 5181.90 samples/sec Loss 0.9023 LearningRate 0.0044 Epoch: 15 Global Step: 263600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:57:13,524-Speed 5156.84 samples/sec Loss 0.9117 LearningRate 0.0044 Epoch: 15 Global Step: 263610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:57:15,504-Speed 5173.02 samples/sec Loss 0.8985 LearningRate 0.0044 Epoch: 15 Global Step: 263620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:57:17,488-Speed 5163.87 samples/sec Loss 0.9147 LearningRate 0.0044 Epoch: 15 Global Step: 263630 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:57:19,478-Speed 5146.19 samples/sec Loss 0.9187 LearningRate 0.0044 Epoch: 15 Global Step: 263640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:57:21,480-Speed 5116.81 samples/sec Loss 0.8874 LearningRate 0.0044 Epoch: 15 Global Step: 263650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:57:23,470-Speed 5146.24 samples/sec Loss 0.8897 LearningRate 0.0044 Epoch: 15 Global Step: 263660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:57:25,498-Speed 5050.83 samples/sec Loss 0.9132 LearningRate 0.0044 Epoch: 15 Global Step: 263670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:57:27,476-Speed 5180.06 samples/sec Loss 0.9160 LearningRate 0.0044 Epoch: 15 Global Step: 263680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:57:29,468-Speed 5140.77 samples/sec Loss 0.9140 LearningRate 0.0044 Epoch: 15 Global Step: 263690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:57:31,445-Speed 5182.42 samples/sec Loss 0.8850 LearningRate 0.0044 Epoch: 15 Global Step: 263700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:57:33,420-Speed 5187.66 samples/sec Loss 0.8909 LearningRate 0.0044 Epoch: 15 Global Step: 263710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:57:35,397-Speed 5181.23 samples/sec Loss 0.9137 LearningRate 0.0044 Epoch: 15 Global Step: 263720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:57:37,376-Speed 5177.43 samples/sec Loss 0.8686 LearningRate 0.0044 Epoch: 15 Global Step: 263730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:57:39,373-Speed 5127.41 samples/sec Loss 0.9094 LearningRate 0.0044 Epoch: 15 Global Step: 263740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:57:41,346-Speed 5191.23 samples/sec Loss 0.9145 LearningRate 0.0044 Epoch: 15 Global Step: 263750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:57:43,334-Speed 5154.54 samples/sec Loss 0.9235 LearningRate 0.0044 Epoch: 15 Global Step: 263760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:57:45,321-Speed 5155.60 samples/sec Loss 0.9014 LearningRate 0.0044 Epoch: 15 Global Step: 263770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:57:47,301-Speed 5172.76 samples/sec Loss 0.8596 LearningRate 0.0044 Epoch: 15 Global Step: 263780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:57:49,281-Speed 5173.48 samples/sec Loss 0.9064 LearningRate 0.0044 Epoch: 15 Global Step: 263790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:57:51,259-Speed 5179.12 samples/sec Loss 0.9240 LearningRate 0.0044 Epoch: 15 Global Step: 263800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:57:53,272-Speed 5087.83 samples/sec Loss 0.8694 LearningRate 0.0044 Epoch: 15 Global Step: 263810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:57:55,247-Speed 5188.07 samples/sec Loss 0.8265 LearningRate 0.0044 Epoch: 15 Global Step: 263820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:57:57,242-Speed 5134.36 samples/sec Loss 0.9500 LearningRate 0.0044 Epoch: 15 Global Step: 263830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:57:59,237-Speed 5133.34 samples/sec Loss 0.9013 LearningRate 0.0044 Epoch: 15 Global Step: 263840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 16:58:01,214-Speed 5181.37 samples/sec Loss 0.9092 LearningRate 0.0044 Epoch: 15 Global Step: 263850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:58:03,200-Speed 5158.63 samples/sec Loss 0.8670 LearningRate 0.0044 Epoch: 15 Global Step: 263860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:58:05,198-Speed 5125.45 samples/sec Loss 0.9482 LearningRate 0.0044 Epoch: 15 Global Step: 263870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:58:07,172-Speed 5190.50 samples/sec Loss 0.9105 LearningRate 0.0044 Epoch: 15 Global Step: 263880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:58:09,160-Speed 5150.81 samples/sec Loss 0.8873 LearningRate 0.0044 Epoch: 15 Global Step: 263890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:58:11,158-Speed 5128.08 samples/sec Loss 0.8578 LearningRate 0.0044 Epoch: 15 Global Step: 263900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:58:13,137-Speed 5176.53 samples/sec Loss 0.9090 LearningRate 0.0044 Epoch: 15 Global Step: 263910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:58:15,135-Speed 5126.58 samples/sec Loss 0.9238 LearningRate 0.0044 Epoch: 15 Global Step: 263920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:58:17,122-Speed 5155.96 samples/sec Loss 0.8870 LearningRate 0.0044 Epoch: 15 Global Step: 263930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:58:19,117-Speed 5134.01 samples/sec Loss 0.9102 LearningRate 0.0044 Epoch: 15 Global Step: 263940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:58:21,094-Speed 5181.16 samples/sec Loss 0.8518 LearningRate 0.0044 Epoch: 15 Global Step: 263950 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 16:58:23,068-Speed 5192.51 samples/sec Loss 0.9014 LearningRate 0.0044 Epoch: 15 Global Step: 263960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:58:25,054-Speed 5156.40 samples/sec Loss 0.8963 LearningRate 0.0044 Epoch: 15 Global Step: 263970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:58:27,046-Speed 5143.59 samples/sec Loss 0.8799 LearningRate 0.0044 Epoch: 15 Global Step: 263980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:58:29,049-Speed 5112.89 samples/sec Loss 0.9320 LearningRate 0.0044 Epoch: 15 Global Step: 263990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:58:31,039-Speed 5148.13 samples/sec Loss 0.9161 LearningRate 0.0044 Epoch: 15 Global Step: 264000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:58:57,906-[lfw][264000]XNorm: 22.479052 Training: 2022-04-11 16:58:57,906-[lfw][264000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 16:58:57,907-[lfw][264000]Accuracy-Highest: 0.99833 Training: 2022-04-11 16:59:28,638-[cfp_fp][264000]XNorm: 22.328141 Training: 2022-04-11 16:59:28,639-[cfp_fp][264000]Accuracy-Flip: 0.98857+-0.00332 Training: 2022-04-11 16:59:28,639-[cfp_fp][264000]Accuracy-Highest: 0.98914 Training: 2022-04-11 16:59:55,430-[agedb_30][264000]XNorm: 23.060629 Training: 2022-04-11 16:59:55,430-[agedb_30][264000]Accuracy-Flip: 0.98250+-0.00647 Training: 2022-04-11 16:59:55,431-[agedb_30][264000]Accuracy-Highest: 0.98300 Training: 2022-04-11 16:59:57,417-Speed 118.55 samples/sec Loss 0.8998 LearningRate 0.0044 Epoch: 15 Global Step: 264010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 16:59:59,385-Speed 5204.46 samples/sec Loss 0.8728 LearningRate 0.0044 Epoch: 15 Global Step: 264020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:00:01,350-Speed 5211.40 samples/sec Loss 0.9198 LearningRate 0.0044 Epoch: 15 Global Step: 264030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:00:03,347-Speed 5131.16 samples/sec Loss 0.9177 LearningRate 0.0044 Epoch: 15 Global Step: 264040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:00:05,321-Speed 5190.90 samples/sec Loss 0.8971 LearningRate 0.0044 Epoch: 15 Global Step: 264050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:00:07,309-Speed 5153.86 samples/sec Loss 0.8950 LearningRate 0.0044 Epoch: 15 Global Step: 264060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:00:09,305-Speed 5132.18 samples/sec Loss 0.9070 LearningRate 0.0044 Epoch: 15 Global Step: 264070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:00:11,270-Speed 5211.45 samples/sec Loss 0.8929 LearningRate 0.0044 Epoch: 15 Global Step: 264080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:00:13,250-Speed 5173.83 samples/sec Loss 0.8909 LearningRate 0.0044 Epoch: 15 Global Step: 264090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:00:15,228-Speed 5178.75 samples/sec Loss 0.9055 LearningRate 0.0044 Epoch: 15 Global Step: 264100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:00:17,194-Speed 5210.10 samples/sec Loss 0.9231 LearningRate 0.0044 Epoch: 15 Global Step: 264110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:00:19,169-Speed 5185.77 samples/sec Loss 0.8728 LearningRate 0.0044 Epoch: 15 Global Step: 264120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:00:21,155-Speed 5157.82 samples/sec Loss 0.8872 LearningRate 0.0044 Epoch: 15 Global Step: 264130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:00:23,147-Speed 5147.33 samples/sec Loss 0.8681 LearningRate 0.0044 Epoch: 15 Global Step: 264140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:00:25,130-Speed 5164.81 samples/sec Loss 0.8993 LearningRate 0.0044 Epoch: 15 Global Step: 264150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:00:27,104-Speed 5189.32 samples/sec Loss 0.8850 LearningRate 0.0044 Epoch: 15 Global Step: 264160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:00:29,084-Speed 5172.67 samples/sec Loss 0.8938 LearningRate 0.0044 Epoch: 15 Global Step: 264170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:00:31,062-Speed 5180.11 samples/sec Loss 0.8456 LearningRate 0.0044 Epoch: 15 Global Step: 264180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:00:33,057-Speed 5133.46 samples/sec Loss 0.9010 LearningRate 0.0044 Epoch: 15 Global Step: 264190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:00:35,045-Speed 5152.95 samples/sec Loss 0.9006 LearningRate 0.0043 Epoch: 15 Global Step: 264200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:00:37,061-Speed 5083.17 samples/sec Loss 0.9105 LearningRate 0.0043 Epoch: 15 Global Step: 264210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:00:39,039-Speed 5178.01 samples/sec Loss 0.9032 LearningRate 0.0043 Epoch: 15 Global Step: 264220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:00:41,015-Speed 5182.49 samples/sec Loss 0.9032 LearningRate 0.0043 Epoch: 15 Global Step: 264230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:00:43,027-Speed 5092.07 samples/sec Loss 0.9273 LearningRate 0.0043 Epoch: 15 Global Step: 264240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:00:45,020-Speed 5143.26 samples/sec Loss 0.9102 LearningRate 0.0043 Epoch: 15 Global Step: 264250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:00:47,007-Speed 5155.86 samples/sec Loss 0.9442 LearningRate 0.0043 Epoch: 15 Global Step: 264260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:00:48,986-Speed 5174.42 samples/sec Loss 0.8823 LearningRate 0.0043 Epoch: 15 Global Step: 264270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:00:50,960-Speed 5188.81 samples/sec Loss 0.9040 LearningRate 0.0043 Epoch: 15 Global Step: 264280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:00:52,960-Speed 5126.50 samples/sec Loss 0.9399 LearningRate 0.0043 Epoch: 15 Global Step: 264290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:00:54,952-Speed 5142.64 samples/sec Loss 0.9273 LearningRate 0.0043 Epoch: 15 Global Step: 264300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:00:56,935-Speed 5166.77 samples/sec Loss 0.9199 LearningRate 0.0043 Epoch: 15 Global Step: 264310 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:00:58,914-Speed 5174.78 samples/sec Loss 0.9349 LearningRate 0.0043 Epoch: 15 Global Step: 264320 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:01:00,893-Speed 5175.31 samples/sec Loss 0.9108 LearningRate 0.0043 Epoch: 15 Global Step: 264330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:01:02,870-Speed 5181.29 samples/sec Loss 0.9269 LearningRate 0.0043 Epoch: 15 Global Step: 264340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:01:04,844-Speed 5188.05 samples/sec Loss 0.8807 LearningRate 0.0043 Epoch: 15 Global Step: 264350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:01:06,815-Speed 5198.82 samples/sec Loss 0.8855 LearningRate 0.0043 Epoch: 15 Global Step: 264360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:01:08,807-Speed 5142.66 samples/sec Loss 0.8814 LearningRate 0.0043 Epoch: 15 Global Step: 264370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:01:10,794-Speed 5155.37 samples/sec Loss 0.9164 LearningRate 0.0043 Epoch: 15 Global Step: 264380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:01:12,766-Speed 5193.92 samples/sec Loss 0.8613 LearningRate 0.0043 Epoch: 15 Global Step: 264390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:01:14,762-Speed 5133.85 samples/sec Loss 0.9099 LearningRate 0.0043 Epoch: 15 Global Step: 264400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:01:16,757-Speed 5134.05 samples/sec Loss 0.8779 LearningRate 0.0043 Epoch: 15 Global Step: 264410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:01:18,731-Speed 5190.87 samples/sec Loss 0.8777 LearningRate 0.0043 Epoch: 15 Global Step: 264420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:01:20,696-Speed 5212.20 samples/sec Loss 0.8899 LearningRate 0.0043 Epoch: 15 Global Step: 264430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:01:22,713-Speed 5079.50 samples/sec Loss 0.9498 LearningRate 0.0043 Epoch: 15 Global Step: 264440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:01:24,700-Speed 5153.39 samples/sec Loss 0.9260 LearningRate 0.0043 Epoch: 15 Global Step: 264450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:01:26,683-Speed 5168.06 samples/sec Loss 0.9357 LearningRate 0.0043 Epoch: 15 Global Step: 264460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:01:28,664-Speed 5170.38 samples/sec Loss 0.9477 LearningRate 0.0043 Epoch: 15 Global Step: 264470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:01:30,666-Speed 5117.27 samples/sec Loss 0.8681 LearningRate 0.0043 Epoch: 15 Global Step: 264480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:01:32,646-Speed 5172.30 samples/sec Loss 0.9066 LearningRate 0.0043 Epoch: 15 Global Step: 264490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:01:34,624-Speed 5180.28 samples/sec Loss 0.9000 LearningRate 0.0043 Epoch: 15 Global Step: 264500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:01:36,626-Speed 5116.80 samples/sec Loss 0.9577 LearningRate 0.0043 Epoch: 15 Global Step: 264510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:01:38,643-Speed 5079.40 samples/sec Loss 0.8799 LearningRate 0.0043 Epoch: 15 Global Step: 264520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:01:40,626-Speed 5165.35 samples/sec Loss 0.8807 LearningRate 0.0043 Epoch: 15 Global Step: 264530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:01:42,591-Speed 5213.82 samples/sec Loss 0.8875 LearningRate 0.0043 Epoch: 15 Global Step: 264540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:01:44,573-Speed 5168.60 samples/sec Loss 0.9027 LearningRate 0.0043 Epoch: 15 Global Step: 264550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:01:46,585-Speed 5091.56 samples/sec Loss 0.9011 LearningRate 0.0043 Epoch: 15 Global Step: 264560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:01:48,566-Speed 5171.84 samples/sec Loss 0.8963 LearningRate 0.0043 Epoch: 15 Global Step: 264570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:01:50,554-Speed 5152.62 samples/sec Loss 0.8898 LearningRate 0.0043 Epoch: 15 Global Step: 264580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:01:52,534-Speed 5171.74 samples/sec Loss 0.9258 LearningRate 0.0043 Epoch: 15 Global Step: 264590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:01:54,507-Speed 5193.59 samples/sec Loss 0.8737 LearningRate 0.0043 Epoch: 15 Global Step: 264600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:01:56,483-Speed 5182.04 samples/sec Loss 0.9076 LearningRate 0.0043 Epoch: 15 Global Step: 264610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:01:58,473-Speed 5149.95 samples/sec Loss 0.8832 LearningRate 0.0043 Epoch: 15 Global Step: 264620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:02:00,464-Speed 5142.88 samples/sec Loss 0.8882 LearningRate 0.0043 Epoch: 15 Global Step: 264630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:02:02,437-Speed 5191.30 samples/sec Loss 0.9018 LearningRate 0.0043 Epoch: 15 Global Step: 264640 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:02:04,428-Speed 5147.22 samples/sec Loss 0.9087 LearningRate 0.0043 Epoch: 15 Global Step: 264650 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:02:06,407-Speed 5175.54 samples/sec Loss 0.9085 LearningRate 0.0043 Epoch: 15 Global Step: 264660 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:02:08,372-Speed 5211.91 samples/sec Loss 0.8741 LearningRate 0.0043 Epoch: 15 Global Step: 264670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:02:10,360-Speed 5153.71 samples/sec Loss 0.9217 LearningRate 0.0043 Epoch: 15 Global Step: 264680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:02:12,342-Speed 5167.24 samples/sec Loss 0.8674 LearningRate 0.0043 Epoch: 15 Global Step: 264690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:02:14,329-Speed 5156.37 samples/sec Loss 0.8834 LearningRate 0.0043 Epoch: 15 Global Step: 264700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:02:16,316-Speed 5156.62 samples/sec Loss 0.9272 LearningRate 0.0043 Epoch: 15 Global Step: 264710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:02:18,352-Speed 5031.33 samples/sec Loss 0.9093 LearningRate 0.0043 Epoch: 15 Global Step: 264720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:02:20,327-Speed 5187.64 samples/sec Loss 0.9183 LearningRate 0.0043 Epoch: 15 Global Step: 264730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:02:22,330-Speed 5113.22 samples/sec Loss 0.8975 LearningRate 0.0043 Epoch: 15 Global Step: 264740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:02:24,335-Speed 5109.49 samples/sec Loss 0.9125 LearningRate 0.0043 Epoch: 15 Global Step: 264750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:02:26,314-Speed 5177.16 samples/sec Loss 0.8765 LearningRate 0.0043 Epoch: 15 Global Step: 264760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:02:28,319-Speed 5108.12 samples/sec Loss 0.8963 LearningRate 0.0043 Epoch: 15 Global Step: 264770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:02:30,300-Speed 5171.11 samples/sec Loss 0.8846 LearningRate 0.0043 Epoch: 15 Global Step: 264780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:02:32,281-Speed 5169.40 samples/sec Loss 0.9141 LearningRate 0.0043 Epoch: 15 Global Step: 264790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:02:34,262-Speed 5174.02 samples/sec Loss 0.8951 LearningRate 0.0043 Epoch: 15 Global Step: 264800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:02:36,264-Speed 5116.01 samples/sec Loss 0.9042 LearningRate 0.0043 Epoch: 15 Global Step: 264810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:02:38,248-Speed 5163.40 samples/sec Loss 0.9063 LearningRate 0.0043 Epoch: 15 Global Step: 264820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:02:40,227-Speed 5175.68 samples/sec Loss 0.8803 LearningRate 0.0043 Epoch: 15 Global Step: 264830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:02:42,228-Speed 5119.92 samples/sec Loss 0.9177 LearningRate 0.0043 Epoch: 15 Global Step: 264840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:02:44,208-Speed 5174.00 samples/sec Loss 0.8564 LearningRate 0.0043 Epoch: 15 Global Step: 264850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:02:46,202-Speed 5136.52 samples/sec Loss 0.9358 LearningRate 0.0043 Epoch: 15 Global Step: 264860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:02:48,197-Speed 5134.80 samples/sec Loss 0.9157 LearningRate 0.0043 Epoch: 15 Global Step: 264870 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:02:50,169-Speed 5197.30 samples/sec Loss 0.8794 LearningRate 0.0043 Epoch: 15 Global Step: 264880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:02:52,167-Speed 5125.07 samples/sec Loss 0.9521 LearningRate 0.0043 Epoch: 15 Global Step: 264890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:02:54,141-Speed 5189.99 samples/sec Loss 0.9043 LearningRate 0.0043 Epoch: 15 Global Step: 264900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:02:56,115-Speed 5188.16 samples/sec Loss 0.8749 LearningRate 0.0043 Epoch: 15 Global Step: 264910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:02:58,111-Speed 5132.75 samples/sec Loss 0.9408 LearningRate 0.0043 Epoch: 15 Global Step: 264920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:03:00,082-Speed 5197.28 samples/sec Loss 0.9047 LearningRate 0.0043 Epoch: 15 Global Step: 264930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:03:02,061-Speed 5178.33 samples/sec Loss 0.8890 LearningRate 0.0043 Epoch: 15 Global Step: 264940 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:03:04,032-Speed 5195.37 samples/sec Loss 0.8724 LearningRate 0.0043 Epoch: 15 Global Step: 264950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:03:06,016-Speed 5161.94 samples/sec Loss 0.9186 LearningRate 0.0043 Epoch: 15 Global Step: 264960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:03:07,993-Speed 5183.30 samples/sec Loss 0.9090 LearningRate 0.0043 Epoch: 15 Global Step: 264970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:03:09,977-Speed 5160.85 samples/sec Loss 0.9139 LearningRate 0.0043 Epoch: 15 Global Step: 264980 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:03:11,945-Speed 5207.71 samples/sec Loss 0.9106 LearningRate 0.0043 Epoch: 15 Global Step: 264990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:03:13,924-Speed 5175.49 samples/sec Loss 0.8993 LearningRate 0.0043 Epoch: 15 Global Step: 265000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:03:15,921-Speed 5129.37 samples/sec Loss 0.9196 LearningRate 0.0042 Epoch: 15 Global Step: 265010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:03:17,915-Speed 5137.04 samples/sec Loss 0.9411 LearningRate 0.0042 Epoch: 15 Global Step: 265020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:03:19,910-Speed 5135.13 samples/sec Loss 0.9216 LearningRate 0.0042 Epoch: 15 Global Step: 265030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:03:21,921-Speed 5093.94 samples/sec Loss 0.8452 LearningRate 0.0042 Epoch: 15 Global Step: 265040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:03:23,898-Speed 5179.74 samples/sec Loss 0.9054 LearningRate 0.0042 Epoch: 15 Global Step: 265050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:03:25,889-Speed 5146.51 samples/sec Loss 0.8885 LearningRate 0.0042 Epoch: 15 Global Step: 265060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:03:27,869-Speed 5173.71 samples/sec Loss 0.8752 LearningRate 0.0042 Epoch: 15 Global Step: 265070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:03:29,849-Speed 5172.34 samples/sec Loss 0.9139 LearningRate 0.0042 Epoch: 15 Global Step: 265080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:03:31,837-Speed 5153.06 samples/sec Loss 0.8995 LearningRate 0.0042 Epoch: 15 Global Step: 265090 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:03:33,827-Speed 5147.50 samples/sec Loss 0.9419 LearningRate 0.0042 Epoch: 15 Global Step: 265100 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:03:35,809-Speed 5167.54 samples/sec Loss 0.8838 LearningRate 0.0042 Epoch: 15 Global Step: 265110 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:03:37,798-Speed 5149.25 samples/sec Loss 0.9120 LearningRate 0.0042 Epoch: 15 Global Step: 265120 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:03:39,773-Speed 5186.41 samples/sec Loss 0.8988 LearningRate 0.0042 Epoch: 15 Global Step: 265130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:03:41,761-Speed 5152.99 samples/sec Loss 0.8971 LearningRate 0.0042 Epoch: 15 Global Step: 265140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:03:43,755-Speed 5138.40 samples/sec Loss 0.9095 LearningRate 0.0042 Epoch: 15 Global Step: 265150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:03:45,741-Speed 5157.95 samples/sec Loss 0.9113 LearningRate 0.0042 Epoch: 15 Global Step: 265160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:03:47,725-Speed 5163.08 samples/sec Loss 0.9075 LearningRate 0.0042 Epoch: 15 Global Step: 265170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:03:49,711-Speed 5158.62 samples/sec Loss 0.8947 LearningRate 0.0042 Epoch: 15 Global Step: 265180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:03:51,688-Speed 5183.04 samples/sec Loss 0.9113 LearningRate 0.0042 Epoch: 15 Global Step: 265190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:03:53,661-Speed 5191.56 samples/sec Loss 0.8926 LearningRate 0.0042 Epoch: 15 Global Step: 265200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:03:55,636-Speed 5186.17 samples/sec Loss 0.8640 LearningRate 0.0042 Epoch: 15 Global Step: 265210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:03:57,625-Speed 5151.43 samples/sec Loss 0.8803 LearningRate 0.0042 Epoch: 15 Global Step: 265220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:03:59,634-Speed 5097.98 samples/sec Loss 0.8922 LearningRate 0.0042 Epoch: 15 Global Step: 265230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:04:01,611-Speed 5182.55 samples/sec Loss 0.8961 LearningRate 0.0042 Epoch: 15 Global Step: 265240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:04:03,591-Speed 5173.91 samples/sec Loss 0.8732 LearningRate 0.0042 Epoch: 15 Global Step: 265250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:04:05,577-Speed 5158.24 samples/sec Loss 0.8887 LearningRate 0.0042 Epoch: 15 Global Step: 265260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:04:07,561-Speed 5162.54 samples/sec Loss 0.9156 LearningRate 0.0042 Epoch: 15 Global Step: 265270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:04:09,543-Speed 5166.71 samples/sec Loss 0.9261 LearningRate 0.0042 Epoch: 15 Global Step: 265280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:04:11,543-Speed 5123.84 samples/sec Loss 0.9199 LearningRate 0.0042 Epoch: 15 Global Step: 265290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:04:13,556-Speed 5087.94 samples/sec Loss 0.8837 LearningRate 0.0042 Epoch: 15 Global Step: 265300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:04:15,538-Speed 5168.37 samples/sec Loss 0.9017 LearningRate 0.0042 Epoch: 15 Global Step: 265310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:04:17,523-Speed 5161.79 samples/sec Loss 0.9213 LearningRate 0.0042 Epoch: 15 Global Step: 265320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:04:19,495-Speed 5194.48 samples/sec Loss 0.9273 LearningRate 0.0042 Epoch: 15 Global Step: 265330 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 17:04:21,473-Speed 5176.93 samples/sec Loss 0.9161 LearningRate 0.0042 Epoch: 15 Global Step: 265340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 17:04:23,475-Speed 5116.44 samples/sec Loss 0.8786 LearningRate 0.0042 Epoch: 15 Global Step: 265350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 17:04:25,456-Speed 5172.25 samples/sec Loss 0.8870 LearningRate 0.0042 Epoch: 15 Global Step: 265360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 17:04:27,439-Speed 5165.48 samples/sec Loss 0.8831 LearningRate 0.0042 Epoch: 15 Global Step: 265370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 17:04:29,449-Speed 5097.24 samples/sec Loss 0.9002 LearningRate 0.0042 Epoch: 15 Global Step: 265380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 17:04:31,442-Speed 5137.36 samples/sec Loss 0.9205 LearningRate 0.0042 Epoch: 15 Global Step: 265390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 17:04:33,433-Speed 5145.98 samples/sec Loss 0.8777 LearningRate 0.0042 Epoch: 15 Global Step: 265400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 17:04:35,417-Speed 5162.76 samples/sec Loss 0.8715 LearningRate 0.0042 Epoch: 15 Global Step: 265410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 17:04:37,406-Speed 5152.20 samples/sec Loss 0.8804 LearningRate 0.0042 Epoch: 15 Global Step: 265420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 17:04:39,386-Speed 5173.07 samples/sec Loss 0.9202 LearningRate 0.0042 Epoch: 15 Global Step: 265430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:04:41,367-Speed 5168.37 samples/sec Loss 0.8505 LearningRate 0.0042 Epoch: 15 Global Step: 265440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:04:43,338-Speed 5198.36 samples/sec Loss 0.8727 LearningRate 0.0042 Epoch: 15 Global Step: 265450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:04:45,345-Speed 5104.14 samples/sec Loss 0.8957 LearningRate 0.0042 Epoch: 15 Global Step: 265460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:04:47,346-Speed 5118.60 samples/sec Loss 0.9442 LearningRate 0.0042 Epoch: 15 Global Step: 265470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:04:49,325-Speed 5177.45 samples/sec Loss 0.8783 LearningRate 0.0042 Epoch: 15 Global Step: 265480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:04:51,303-Speed 5180.10 samples/sec Loss 0.8819 LearningRate 0.0042 Epoch: 15 Global Step: 265490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:04:53,328-Speed 5059.09 samples/sec Loss 0.8542 LearningRate 0.0042 Epoch: 15 Global Step: 265500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:04:55,302-Speed 5188.78 samples/sec Loss 0.8943 LearningRate 0.0042 Epoch: 15 Global Step: 265510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:04:57,278-Speed 5183.53 samples/sec Loss 0.8810 LearningRate 0.0042 Epoch: 15 Global Step: 265520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:04:59,261-Speed 5166.35 samples/sec Loss 0.9361 LearningRate 0.0042 Epoch: 15 Global Step: 265530 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:05:01,282-Speed 5068.62 samples/sec Loss 0.9100 LearningRate 0.0042 Epoch: 15 Global Step: 265540 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:05:03,271-Speed 5150.00 samples/sec Loss 0.8931 LearningRate 0.0042 Epoch: 15 Global Step: 265550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:05:05,258-Speed 5156.01 samples/sec Loss 0.8854 LearningRate 0.0042 Epoch: 15 Global Step: 265560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:05:07,242-Speed 5162.71 samples/sec Loss 0.9247 LearningRate 0.0042 Epoch: 15 Global Step: 265570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:05:09,231-Speed 5150.48 samples/sec Loss 0.9379 LearningRate 0.0042 Epoch: 15 Global Step: 265580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:05:11,221-Speed 5145.51 samples/sec Loss 0.8900 LearningRate 0.0042 Epoch: 15 Global Step: 265590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:05:13,211-Speed 5149.08 samples/sec Loss 0.8628 LearningRate 0.0042 Epoch: 15 Global Step: 265600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:05:15,207-Speed 5132.02 samples/sec Loss 0.9286 LearningRate 0.0042 Epoch: 15 Global Step: 265610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:05:17,204-Speed 5128.49 samples/sec Loss 0.8997 LearningRate 0.0042 Epoch: 15 Global Step: 265620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:05:19,178-Speed 5190.56 samples/sec Loss 0.9163 LearningRate 0.0042 Epoch: 15 Global Step: 265630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 17:05:21,152-Speed 5188.36 samples/sec Loss 0.8643 LearningRate 0.0042 Epoch: 15 Global Step: 265640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 17:05:23,132-Speed 5174.42 samples/sec Loss 0.8792 LearningRate 0.0042 Epoch: 15 Global Step: 265650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 17:05:25,116-Speed 5162.69 samples/sec Loss 0.9470 LearningRate 0.0042 Epoch: 15 Global Step: 265660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 17:05:27,117-Speed 5118.25 samples/sec Loss 0.9050 LearningRate 0.0042 Epoch: 15 Global Step: 265670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 17:05:29,100-Speed 5165.70 samples/sec Loss 0.9172 LearningRate 0.0042 Epoch: 15 Global Step: 265680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 17:05:31,080-Speed 5173.14 samples/sec Loss 0.8889 LearningRate 0.0042 Epoch: 15 Global Step: 265690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 17:05:33,061-Speed 5171.45 samples/sec Loss 0.9072 LearningRate 0.0042 Epoch: 15 Global Step: 265700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 17:05:35,046-Speed 5160.36 samples/sec Loss 0.8999 LearningRate 0.0042 Epoch: 15 Global Step: 265710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 17:05:37,023-Speed 5181.52 samples/sec Loss 0.8943 LearningRate 0.0042 Epoch: 15 Global Step: 265720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-11 17:05:39,003-Speed 5173.91 samples/sec Loss 0.9315 LearningRate 0.0042 Epoch: 15 Global Step: 265730 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:05:40,988-Speed 5158.71 samples/sec Loss 0.8714 LearningRate 0.0042 Epoch: 15 Global Step: 265740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:05:42,964-Speed 5187.02 samples/sec Loss 0.8596 LearningRate 0.0042 Epoch: 15 Global Step: 265750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:05:44,944-Speed 5171.62 samples/sec Loss 0.8850 LearningRate 0.0042 Epoch: 15 Global Step: 265760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:05:46,948-Speed 5110.70 samples/sec Loss 0.8931 LearningRate 0.0042 Epoch: 15 Global Step: 265770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:05:48,941-Speed 5140.49 samples/sec Loss 0.8539 LearningRate 0.0042 Epoch: 15 Global Step: 265780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:05:50,915-Speed 5189.03 samples/sec Loss 0.9163 LearningRate 0.0042 Epoch: 15 Global Step: 265790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:05:52,913-Speed 5128.91 samples/sec Loss 0.8996 LearningRate 0.0042 Epoch: 15 Global Step: 265800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:05:54,885-Speed 5191.97 samples/sec Loss 0.8783 LearningRate 0.0042 Epoch: 15 Global Step: 265810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:05:56,862-Speed 5181.63 samples/sec Loss 0.8648 LearningRate 0.0041 Epoch: 15 Global Step: 265820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:05:58,856-Speed 5138.23 samples/sec Loss 0.9159 LearningRate 0.0041 Epoch: 15 Global Step: 265830 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:06:00,860-Speed 5112.47 samples/sec Loss 0.8789 LearningRate 0.0041 Epoch: 15 Global Step: 265840 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:06:02,848-Speed 5153.33 samples/sec Loss 0.8927 LearningRate 0.0041 Epoch: 15 Global Step: 265850 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:06:04,842-Speed 5136.60 samples/sec Loss 0.9283 LearningRate 0.0041 Epoch: 15 Global Step: 265860 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:06:06,820-Speed 5179.88 samples/sec Loss 0.9252 LearningRate 0.0041 Epoch: 15 Global Step: 265870 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:06:08,797-Speed 5179.66 samples/sec Loss 0.8765 LearningRate 0.0041 Epoch: 15 Global Step: 265880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:06:10,771-Speed 5190.82 samples/sec Loss 0.8740 LearningRate 0.0041 Epoch: 15 Global Step: 265890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:06:12,746-Speed 5186.23 samples/sec Loss 0.9286 LearningRate 0.0041 Epoch: 15 Global Step: 265900 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:06:14,751-Speed 5108.46 samples/sec Loss 0.8981 LearningRate 0.0041 Epoch: 15 Global Step: 265910 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:06:16,734-Speed 5166.71 samples/sec Loss 0.8933 LearningRate 0.0041 Epoch: 15 Global Step: 265920 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:06:18,713-Speed 5175.54 samples/sec Loss 0.9146 LearningRate 0.0041 Epoch: 15 Global Step: 265930 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:06:20,701-Speed 5153.75 samples/sec Loss 0.9035 LearningRate 0.0041 Epoch: 15 Global Step: 265940 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:06:22,679-Speed 5179.03 samples/sec Loss 0.9226 LearningRate 0.0041 Epoch: 15 Global Step: 265950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:06:24,714-Speed 5033.60 samples/sec Loss 0.9384 LearningRate 0.0041 Epoch: 15 Global Step: 265960 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:06:26,697-Speed 5165.05 samples/sec Loss 0.9077 LearningRate 0.0041 Epoch: 15 Global Step: 265970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:06:28,672-Speed 5185.64 samples/sec Loss 0.8844 LearningRate 0.0041 Epoch: 15 Global Step: 265980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:06:30,667-Speed 5134.53 samples/sec Loss 0.8570 LearningRate 0.0041 Epoch: 15 Global Step: 265990 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:06:32,661-Speed 5137.76 samples/sec Loss 0.8922 LearningRate 0.0041 Epoch: 15 Global Step: 266000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:06:59,507-[lfw][266000]XNorm: 21.262057 Training: 2022-04-11 17:06:59,508-[lfw][266000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 17:06:59,508-[lfw][266000]Accuracy-Highest: 0.99833 Training: 2022-04-11 17:07:30,232-[cfp_fp][266000]XNorm: 21.196214 Training: 2022-04-11 17:07:30,232-[cfp_fp][266000]Accuracy-Flip: 0.98829+-0.00349 Training: 2022-04-11 17:07:30,233-[cfp_fp][266000]Accuracy-Highest: 0.98914 Training: 2022-04-11 17:07:56,867-[agedb_30][266000]XNorm: 22.113255 Training: 2022-04-11 17:07:56,867-[agedb_30][266000]Accuracy-Flip: 0.98250+-0.00704 Training: 2022-04-11 17:07:56,868-[agedb_30][266000]Accuracy-Highest: 0.98300 Training: 2022-04-11 17:07:58,852-Speed 118.81 samples/sec Loss 0.8902 LearningRate 0.0041 Epoch: 15 Global Step: 266010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:08:00,838-Speed 5157.76 samples/sec Loss 0.9041 LearningRate 0.0041 Epoch: 15 Global Step: 266020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:08:02,818-Speed 5172.50 samples/sec Loss 0.8810 LearningRate 0.0041 Epoch: 15 Global Step: 266030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:08:04,803-Speed 5160.35 samples/sec Loss 0.9314 LearningRate 0.0041 Epoch: 15 Global Step: 266040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:08:06,782-Speed 5175.73 samples/sec Loss 0.8696 LearningRate 0.0041 Epoch: 15 Global Step: 266050 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:08:08,762-Speed 5173.27 samples/sec Loss 0.8657 LearningRate 0.0041 Epoch: 15 Global Step: 266060 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:08:10,734-Speed 5196.33 samples/sec Loss 0.8658 LearningRate 0.0041 Epoch: 15 Global Step: 266070 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:08:12,717-Speed 5165.51 samples/sec Loss 0.8531 LearningRate 0.0041 Epoch: 15 Global Step: 266080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:08:14,688-Speed 5194.87 samples/sec Loss 0.9149 LearningRate 0.0041 Epoch: 15 Global Step: 266090 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:08:16,673-Speed 5161.54 samples/sec Loss 0.9316 LearningRate 0.0041 Epoch: 15 Global Step: 266100 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:08:18,648-Speed 5185.74 samples/sec Loss 0.8650 LearningRate 0.0041 Epoch: 15 Global Step: 266110 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:08:20,639-Speed 5145.89 samples/sec Loss 0.8677 LearningRate 0.0041 Epoch: 15 Global Step: 266120 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:08:22,625-Speed 5158.83 samples/sec Loss 0.9133 LearningRate 0.0041 Epoch: 15 Global Step: 266130 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:08:24,616-Speed 5147.14 samples/sec Loss 0.8875 LearningRate 0.0041 Epoch: 15 Global Step: 266140 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:08:26,596-Speed 5173.21 samples/sec Loss 0.8428 LearningRate 0.0041 Epoch: 15 Global Step: 266150 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:08:28,574-Speed 5178.67 samples/sec Loss 0.8668 LearningRate 0.0041 Epoch: 15 Global Step: 266160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:08:30,554-Speed 5174.69 samples/sec Loss 0.8952 LearningRate 0.0041 Epoch: 15 Global Step: 266170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:08:32,550-Speed 5131.13 samples/sec Loss 0.8622 LearningRate 0.0041 Epoch: 15 Global Step: 266180 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:08:34,544-Speed 5136.95 samples/sec Loss 0.8433 LearningRate 0.0041 Epoch: 15 Global Step: 266190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:08:36,526-Speed 5169.38 samples/sec Loss 0.8847 LearningRate 0.0041 Epoch: 15 Global Step: 266200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:08:38,512-Speed 5155.52 samples/sec Loss 0.8973 LearningRate 0.0041 Epoch: 15 Global Step: 266210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:08:40,517-Speed 5110.16 samples/sec Loss 0.9172 LearningRate 0.0041 Epoch: 15 Global Step: 266220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:08:42,510-Speed 5140.57 samples/sec Loss 0.8601 LearningRate 0.0041 Epoch: 15 Global Step: 266230 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:08:44,494-Speed 5161.17 samples/sec Loss 0.8680 LearningRate 0.0041 Epoch: 15 Global Step: 266240 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:08:46,487-Speed 5139.89 samples/sec Loss 0.8688 LearningRate 0.0041 Epoch: 15 Global Step: 266250 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:08:48,466-Speed 5178.15 samples/sec Loss 0.8830 LearningRate 0.0041 Epoch: 15 Global Step: 266260 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:08:50,456-Speed 5146.23 samples/sec Loss 0.9076 LearningRate 0.0041 Epoch: 15 Global Step: 266270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:08:52,456-Speed 5122.89 samples/sec Loss 0.8933 LearningRate 0.0041 Epoch: 15 Global Step: 266280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:08:54,437-Speed 5171.55 samples/sec Loss 0.8654 LearningRate 0.0041 Epoch: 15 Global Step: 266290 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:08:56,471-Speed 5037.53 samples/sec Loss 0.8973 LearningRate 0.0041 Epoch: 15 Global Step: 266300 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:08:58,451-Speed 5174.32 samples/sec Loss 0.8892 LearningRate 0.0041 Epoch: 15 Global Step: 266310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:00,430-Speed 5175.19 samples/sec Loss 0.8688 LearningRate 0.0041 Epoch: 15 Global Step: 266320 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:02,423-Speed 5141.65 samples/sec Loss 0.8614 LearningRate 0.0041 Epoch: 15 Global Step: 266330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:04,404-Speed 5170.69 samples/sec Loss 0.8760 LearningRate 0.0041 Epoch: 15 Global Step: 266340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:06,387-Speed 5164.85 samples/sec Loss 0.9034 LearningRate 0.0041 Epoch: 15 Global Step: 266350 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:08,362-Speed 5187.36 samples/sec Loss 0.8654 LearningRate 0.0041 Epoch: 15 Global Step: 266360 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:10,349-Speed 5155.53 samples/sec Loss 0.8755 LearningRate 0.0041 Epoch: 15 Global Step: 266370 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:12,349-Speed 5121.74 samples/sec Loss 0.8685 LearningRate 0.0041 Epoch: 15 Global Step: 266380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:14,348-Speed 5123.82 samples/sec Loss 0.9116 LearningRate 0.0041 Epoch: 15 Global Step: 266390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:16,354-Speed 5107.20 samples/sec Loss 0.8995 LearningRate 0.0041 Epoch: 15 Global Step: 266400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:18,331-Speed 5180.69 samples/sec Loss 0.9018 LearningRate 0.0041 Epoch: 15 Global Step: 266410 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:09:20,303-Speed 5195.85 samples/sec Loss 0.9108 LearningRate 0.0041 Epoch: 15 Global Step: 266420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:22,294-Speed 5144.67 samples/sec Loss 0.9026 LearningRate 0.0041 Epoch: 15 Global Step: 266430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:24,278-Speed 5160.88 samples/sec Loss 0.8631 LearningRate 0.0041 Epoch: 15 Global Step: 266440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:26,253-Speed 5188.25 samples/sec Loss 0.9022 LearningRate 0.0041 Epoch: 15 Global Step: 266450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:28,248-Speed 5135.37 samples/sec Loss 0.8603 LearningRate 0.0041 Epoch: 15 Global Step: 266460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:30,227-Speed 5176.21 samples/sec Loss 0.8872 LearningRate 0.0041 Epoch: 15 Global Step: 266470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:32,197-Speed 5199.47 samples/sec Loss 0.8988 LearningRate 0.0041 Epoch: 15 Global Step: 266480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:34,169-Speed 5194.57 samples/sec Loss 0.8947 LearningRate 0.0041 Epoch: 15 Global Step: 266490 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:36,151-Speed 5169.49 samples/sec Loss 0.8954 LearningRate 0.0041 Epoch: 15 Global Step: 266500 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:38,126-Speed 5184.51 samples/sec Loss 0.8748 LearningRate 0.0041 Epoch: 15 Global Step: 266510 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:40,118-Speed 5143.40 samples/sec Loss 0.9431 LearningRate 0.0041 Epoch: 15 Global Step: 266520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:42,093-Speed 5185.16 samples/sec Loss 0.8933 LearningRate 0.0041 Epoch: 15 Global Step: 266530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:44,073-Speed 5173.40 samples/sec Loss 0.8854 LearningRate 0.0041 Epoch: 15 Global Step: 266540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:46,059-Speed 5157.40 samples/sec Loss 0.9152 LearningRate 0.0041 Epoch: 15 Global Step: 266550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:48,081-Speed 5067.41 samples/sec Loss 0.8955 LearningRate 0.0041 Epoch: 15 Global Step: 266560 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:50,061-Speed 5173.90 samples/sec Loss 0.8977 LearningRate 0.0041 Epoch: 15 Global Step: 266570 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:52,041-Speed 5174.06 samples/sec Loss 0.9125 LearningRate 0.0041 Epoch: 15 Global Step: 266580 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:54,017-Speed 5184.81 samples/sec Loss 0.9062 LearningRate 0.0041 Epoch: 15 Global Step: 266590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:55,991-Speed 5188.67 samples/sec Loss 0.8758 LearningRate 0.0041 Epoch: 15 Global Step: 266600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:57,969-Speed 5179.67 samples/sec Loss 0.8927 LearningRate 0.0041 Epoch: 15 Global Step: 266610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:09:59,957-Speed 5150.57 samples/sec Loss 0.9281 LearningRate 0.0041 Epoch: 15 Global Step: 266620 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:10:01,949-Speed 5143.36 samples/sec Loss 0.8769 LearningRate 0.0041 Epoch: 15 Global Step: 266630 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:10:03,924-Speed 5186.42 samples/sec Loss 0.8935 LearningRate 0.0041 Epoch: 15 Global Step: 266640 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:10:05,900-Speed 5182.96 samples/sec Loss 0.9075 LearningRate 0.0040 Epoch: 15 Global Step: 266650 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:10:07,879-Speed 5175.73 samples/sec Loss 0.8886 LearningRate 0.0040 Epoch: 15 Global Step: 266660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:10:09,872-Speed 5140.08 samples/sec Loss 0.8907 LearningRate 0.0040 Epoch: 15 Global Step: 266670 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:10:11,871-Speed 5124.50 samples/sec Loss 0.8913 LearningRate 0.0040 Epoch: 15 Global Step: 266680 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:10:13,860-Speed 5150.65 samples/sec Loss 0.9148 LearningRate 0.0040 Epoch: 15 Global Step: 266690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:10:15,837-Speed 5183.00 samples/sec Loss 0.8870 LearningRate 0.0040 Epoch: 15 Global Step: 266700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:10:17,853-Speed 5081.86 samples/sec Loss 0.8484 LearningRate 0.0040 Epoch: 15 Global Step: 266710 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:10:19,838-Speed 5159.86 samples/sec Loss 0.9457 LearningRate 0.0040 Epoch: 15 Global Step: 266720 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:10:21,842-Speed 5113.40 samples/sec Loss 0.9007 LearningRate 0.0040 Epoch: 15 Global Step: 266730 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:10:23,828-Speed 5157.49 samples/sec Loss 0.8824 LearningRate 0.0040 Epoch: 15 Global Step: 266740 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:10:25,803-Speed 5185.89 samples/sec Loss 0.8586 LearningRate 0.0040 Epoch: 15 Global Step: 266750 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:10:27,821-Speed 5076.09 samples/sec Loss 0.8688 LearningRate 0.0040 Epoch: 15 Global Step: 266760 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:10:29,802-Speed 5172.98 samples/sec Loss 0.9073 LearningRate 0.0040 Epoch: 15 Global Step: 266770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:10:31,812-Speed 5095.61 samples/sec Loss 0.9251 LearningRate 0.0040 Epoch: 15 Global Step: 266780 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:10:38,077-Speed 1634.63 samples/sec Loss 0.9071 LearningRate 0.0040 Epoch: 15 Global Step: 266790 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:10:40,050-Speed 5193.61 samples/sec Loss 0.8940 LearningRate 0.0040 Epoch: 15 Global Step: 266800 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:10:42,033-Speed 5164.71 samples/sec Loss 0.8878 LearningRate 0.0040 Epoch: 15 Global Step: 266810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:10:44,023-Speed 5146.64 samples/sec Loss 0.8886 LearningRate 0.0040 Epoch: 15 Global Step: 266820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:10:46,012-Speed 5150.34 samples/sec Loss 0.9109 LearningRate 0.0040 Epoch: 15 Global Step: 266830 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-11 17:10:48,031-Speed 5075.02 samples/sec Loss 0.9049 LearningRate 0.0040 Epoch: 15 Global Step: 266840 Fp16 Grad Scale: 131072 Required: 5 hours Training: 2022-04-11 17:10:50,026-Speed 5133.19 samples/sec Loss 0.8869 LearningRate 0.0040 Epoch: 15 Global Step: 266850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:10:52,035-Speed 5101.58 samples/sec Loss 0.9029 LearningRate 0.0040 Epoch: 15 Global Step: 266860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:10:54,009-Speed 5187.42 samples/sec Loss 0.9011 LearningRate 0.0040 Epoch: 15 Global Step: 266870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:10:55,992-Speed 5167.51 samples/sec Loss 0.8921 LearningRate 0.0040 Epoch: 15 Global Step: 266880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:10:57,974-Speed 5166.03 samples/sec Loss 0.8756 LearningRate 0.0040 Epoch: 15 Global Step: 266890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:10:59,979-Speed 5109.94 samples/sec Loss 0.8923 LearningRate 0.0040 Epoch: 15 Global Step: 266900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:11:01,986-Speed 5105.68 samples/sec Loss 0.8733 LearningRate 0.0040 Epoch: 15 Global Step: 266910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:11:03,970-Speed 5163.14 samples/sec Loss 0.8812 LearningRate 0.0040 Epoch: 15 Global Step: 266920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:11:05,965-Speed 5135.24 samples/sec Loss 0.8773 LearningRate 0.0040 Epoch: 15 Global Step: 266930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:11:07,964-Speed 5126.36 samples/sec Loss 0.8743 LearningRate 0.0040 Epoch: 15 Global Step: 266940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:11:09,997-Speed 5038.40 samples/sec Loss 0.9275 LearningRate 0.0040 Epoch: 15 Global Step: 266950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:11:11,980-Speed 5165.53 samples/sec Loss 0.8777 LearningRate 0.0040 Epoch: 15 Global Step: 266960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:11:13,982-Speed 5116.60 samples/sec Loss 0.9163 LearningRate 0.0040 Epoch: 15 Global Step: 266970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:11:16,009-Speed 5054.77 samples/sec Loss 0.8776 LearningRate 0.0040 Epoch: 15 Global Step: 266980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:11:17,988-Speed 5175.33 samples/sec Loss 0.9140 LearningRate 0.0040 Epoch: 15 Global Step: 266990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:11:19,974-Speed 5165.14 samples/sec Loss 0.8804 LearningRate 0.0040 Epoch: 15 Global Step: 267000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:11:21,956-Speed 5168.66 samples/sec Loss 0.8970 LearningRate 0.0040 Epoch: 15 Global Step: 267010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:11:23,951-Speed 5132.94 samples/sec Loss 0.9051 LearningRate 0.0040 Epoch: 15 Global Step: 267020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:11:25,957-Speed 5108.22 samples/sec Loss 0.8963 LearningRate 0.0040 Epoch: 15 Global Step: 267030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:11:27,934-Speed 5181.93 samples/sec Loss 0.8684 LearningRate 0.0040 Epoch: 15 Global Step: 267040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:11:30,179-Speed 4561.86 samples/sec Loss 0.9116 LearningRate 0.0040 Epoch: 15 Global Step: 267050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:12:01,301-Speed 329.05 samples/sec Loss 0.7689 LearningRate 0.0040 Epoch: 16 Global Step: 267060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:12:03,325-Speed 5061.55 samples/sec Loss 0.6025 LearningRate 0.0040 Epoch: 16 Global Step: 267070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:12:05,296-Speed 5196.82 samples/sec Loss 0.6113 LearningRate 0.0040 Epoch: 16 Global Step: 267080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:12:07,597-Speed 4453.23 samples/sec Loss 0.6327 LearningRate 0.0040 Epoch: 16 Global Step: 267090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:12:09,854-Speed 4539.62 samples/sec Loss 0.6243 LearningRate 0.0040 Epoch: 16 Global Step: 267100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:12:11,853-Speed 5122.02 samples/sec Loss 0.6065 LearningRate 0.0040 Epoch: 16 Global Step: 267110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:12:13,839-Speed 5170.50 samples/sec Loss 0.6102 LearningRate 0.0040 Epoch: 16 Global Step: 267120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:12:15,814-Speed 5191.86 samples/sec Loss 0.6488 LearningRate 0.0040 Epoch: 16 Global Step: 267130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:12:17,823-Speed 5099.33 samples/sec Loss 0.6246 LearningRate 0.0040 Epoch: 16 Global Step: 267140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:12:19,784-Speed 5226.49 samples/sec Loss 0.6093 LearningRate 0.0040 Epoch: 16 Global Step: 267150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:12:21,764-Speed 5171.79 samples/sec Loss 0.6188 LearningRate 0.0040 Epoch: 16 Global Step: 267160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:12:23,751-Speed 5156.85 samples/sec Loss 0.6357 LearningRate 0.0040 Epoch: 16 Global Step: 267170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:12:25,760-Speed 5098.60 samples/sec Loss 0.6048 LearningRate 0.0040 Epoch: 16 Global Step: 267180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:12:27,760-Speed 5122.81 samples/sec Loss 0.6205 LearningRate 0.0040 Epoch: 16 Global Step: 267190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:12:29,734-Speed 5187.50 samples/sec Loss 0.6253 LearningRate 0.0040 Epoch: 16 Global Step: 267200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:12:31,715-Speed 5171.85 samples/sec Loss 0.5974 LearningRate 0.0040 Epoch: 16 Global Step: 267210 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:12:33,684-Speed 5204.17 samples/sec Loss 0.5818 LearningRate 0.0040 Epoch: 16 Global Step: 267220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:12:35,683-Speed 5125.73 samples/sec Loss 0.6373 LearningRate 0.0040 Epoch: 16 Global Step: 267230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:12:37,690-Speed 5106.09 samples/sec Loss 0.5732 LearningRate 0.0040 Epoch: 16 Global Step: 267240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:12:39,663-Speed 5192.01 samples/sec Loss 0.6264 LearningRate 0.0040 Epoch: 16 Global Step: 267250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:12:41,649-Speed 5158.20 samples/sec Loss 0.6164 LearningRate 0.0040 Epoch: 16 Global Step: 267260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:12:43,628-Speed 5177.43 samples/sec Loss 0.6012 LearningRate 0.0040 Epoch: 16 Global Step: 267270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:12:45,608-Speed 5172.24 samples/sec Loss 0.6064 LearningRate 0.0040 Epoch: 16 Global Step: 267280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:12:47,630-Speed 5067.46 samples/sec Loss 0.5819 LearningRate 0.0040 Epoch: 16 Global Step: 267290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:12:49,648-Speed 5077.09 samples/sec Loss 0.6136 LearningRate 0.0040 Epoch: 16 Global Step: 267300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:12:51,641-Speed 5142.63 samples/sec Loss 0.6050 LearningRate 0.0040 Epoch: 16 Global Step: 267310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:12:53,609-Speed 5203.65 samples/sec Loss 0.5915 LearningRate 0.0040 Epoch: 16 Global Step: 267320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:12:55,579-Speed 5201.60 samples/sec Loss 0.6143 LearningRate 0.0040 Epoch: 16 Global Step: 267330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:12:57,604-Speed 5057.51 samples/sec Loss 0.6303 LearningRate 0.0040 Epoch: 16 Global Step: 267340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:12:59,606-Speed 5117.56 samples/sec Loss 0.6317 LearningRate 0.0040 Epoch: 16 Global Step: 267350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:13:01,620-Speed 5087.05 samples/sec Loss 0.6140 LearningRate 0.0040 Epoch: 16 Global Step: 267360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:13:03,591-Speed 5197.39 samples/sec Loss 0.6128 LearningRate 0.0040 Epoch: 16 Global Step: 267370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:13:05,594-Speed 5114.64 samples/sec Loss 0.6005 LearningRate 0.0040 Epoch: 16 Global Step: 267380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:13:07,566-Speed 5193.13 samples/sec Loss 0.6092 LearningRate 0.0040 Epoch: 16 Global Step: 267390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:13:09,542-Speed 5184.99 samples/sec Loss 0.6060 LearningRate 0.0040 Epoch: 16 Global Step: 267400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:13:11,547-Speed 5109.85 samples/sec Loss 0.6292 LearningRate 0.0040 Epoch: 16 Global Step: 267410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:13:13,524-Speed 5180.55 samples/sec Loss 0.6439 LearningRate 0.0040 Epoch: 16 Global Step: 267420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:13:15,506-Speed 5169.90 samples/sec Loss 0.6447 LearningRate 0.0040 Epoch: 16 Global Step: 267430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:13:17,783-Speed 4498.04 samples/sec Loss 0.5933 LearningRate 0.0040 Epoch: 16 Global Step: 267440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:13:19,786-Speed 5113.28 samples/sec Loss 0.5705 LearningRate 0.0040 Epoch: 16 Global Step: 267450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:13:21,795-Speed 5100.38 samples/sec Loss 0.5893 LearningRate 0.0040 Epoch: 16 Global Step: 267460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:13:23,817-Speed 5066.49 samples/sec Loss 0.5837 LearningRate 0.0040 Epoch: 16 Global Step: 267470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:13:25,797-Speed 5173.10 samples/sec Loss 0.6030 LearningRate 0.0039 Epoch: 16 Global Step: 267480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:13:27,775-Speed 5179.46 samples/sec Loss 0.6046 LearningRate 0.0039 Epoch: 16 Global Step: 267490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:13:29,909-Speed 4801.73 samples/sec Loss 0.6297 LearningRate 0.0039 Epoch: 16 Global Step: 267500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:13:31,882-Speed 5191.34 samples/sec Loss 0.6486 LearningRate 0.0039 Epoch: 16 Global Step: 267510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:13:33,876-Speed 5136.58 samples/sec Loss 0.5831 LearningRate 0.0039 Epoch: 16 Global Step: 267520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:13:35,861-Speed 5160.76 samples/sec Loss 0.6241 LearningRate 0.0039 Epoch: 16 Global Step: 267530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:13:37,870-Speed 5098.73 samples/sec Loss 0.6273 LearningRate 0.0039 Epoch: 16 Global Step: 267540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:13:39,874-Speed 5113.80 samples/sec Loss 0.6137 LearningRate 0.0039 Epoch: 16 Global Step: 267550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:13:41,858-Speed 5164.02 samples/sec Loss 0.6328 LearningRate 0.0039 Epoch: 16 Global Step: 267560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:13:43,832-Speed 5188.98 samples/sec Loss 0.5837 LearningRate 0.0039 Epoch: 16 Global Step: 267570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:13:45,806-Speed 5189.01 samples/sec Loss 0.6086 LearningRate 0.0039 Epoch: 16 Global Step: 267580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:13:47,801-Speed 5132.31 samples/sec Loss 0.5916 LearningRate 0.0039 Epoch: 16 Global Step: 267590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:13:49,777-Speed 5184.60 samples/sec Loss 0.6238 LearningRate 0.0039 Epoch: 16 Global Step: 267600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:13:51,782-Speed 5109.56 samples/sec Loss 0.6211 LearningRate 0.0039 Epoch: 16 Global Step: 267610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:13:53,761-Speed 5178.07 samples/sec Loss 0.6181 LearningRate 0.0039 Epoch: 16 Global Step: 267620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:13:55,734-Speed 5190.90 samples/sec Loss 0.6229 LearningRate 0.0039 Epoch: 16 Global Step: 267630 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:13:57,762-Speed 5051.03 samples/sec Loss 0.6551 LearningRate 0.0039 Epoch: 16 Global Step: 267640 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:13:59,734-Speed 5195.94 samples/sec Loss 0.6337 LearningRate 0.0039 Epoch: 16 Global Step: 267650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:14:01,741-Speed 5102.99 samples/sec Loss 0.6101 LearningRate 0.0039 Epoch: 16 Global Step: 267660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:14:03,718-Speed 5184.27 samples/sec Loss 0.6543 LearningRate 0.0039 Epoch: 16 Global Step: 267670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:14:05,701-Speed 5164.34 samples/sec Loss 0.6389 LearningRate 0.0039 Epoch: 16 Global Step: 267680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:14:07,707-Speed 5107.52 samples/sec Loss 0.6020 LearningRate 0.0039 Epoch: 16 Global Step: 267690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:14:09,705-Speed 5127.19 samples/sec Loss 0.6513 LearningRate 0.0039 Epoch: 16 Global Step: 267700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:14:11,718-Speed 5090.39 samples/sec Loss 0.5869 LearningRate 0.0039 Epoch: 16 Global Step: 267710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:14:13,721-Speed 5113.71 samples/sec Loss 0.6055 LearningRate 0.0039 Epoch: 16 Global Step: 267720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:14:15,752-Speed 5042.89 samples/sec Loss 0.6001 LearningRate 0.0039 Epoch: 16 Global Step: 267730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:14:17,730-Speed 5181.32 samples/sec Loss 0.6007 LearningRate 0.0039 Epoch: 16 Global Step: 267740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:14:19,710-Speed 5173.22 samples/sec Loss 0.5960 LearningRate 0.0039 Epoch: 16 Global Step: 267750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:14:21,709-Speed 5124.24 samples/sec Loss 0.6386 LearningRate 0.0039 Epoch: 16 Global Step: 267760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:14:23,685-Speed 5183.91 samples/sec Loss 0.5947 LearningRate 0.0039 Epoch: 16 Global Step: 267770 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:14:25,660-Speed 5187.08 samples/sec Loss 0.5971 LearningRate 0.0039 Epoch: 16 Global Step: 267780 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:14:27,661-Speed 5119.58 samples/sec Loss 0.6242 LearningRate 0.0039 Epoch: 16 Global Step: 267790 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:14:29,661-Speed 5120.96 samples/sec Loss 0.6531 LearningRate 0.0039 Epoch: 16 Global Step: 267800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:14:31,635-Speed 5189.59 samples/sec Loss 0.6234 LearningRate 0.0039 Epoch: 16 Global Step: 267810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:14:33,610-Speed 5186.83 samples/sec Loss 0.6119 LearningRate 0.0039 Epoch: 16 Global Step: 267820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:14:35,601-Speed 5144.85 samples/sec Loss 0.6516 LearningRate 0.0039 Epoch: 16 Global Step: 267830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:14:37,578-Speed 5182.38 samples/sec Loss 0.6043 LearningRate 0.0039 Epoch: 16 Global Step: 267840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:14:39,549-Speed 5195.62 samples/sec Loss 0.5987 LearningRate 0.0039 Epoch: 16 Global Step: 267850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:14:41,531-Speed 5168.31 samples/sec Loss 0.6552 LearningRate 0.0039 Epoch: 16 Global Step: 267860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:14:43,518-Speed 5157.66 samples/sec Loss 0.6414 LearningRate 0.0039 Epoch: 16 Global Step: 267870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:14:45,494-Speed 5184.19 samples/sec Loss 0.5677 LearningRate 0.0039 Epoch: 16 Global Step: 267880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:14:47,509-Speed 5083.79 samples/sec Loss 0.6152 LearningRate 0.0039 Epoch: 16 Global Step: 267890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:14:49,511-Speed 5117.59 samples/sec Loss 0.5723 LearningRate 0.0039 Epoch: 16 Global Step: 267900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:14:51,510-Speed 5126.41 samples/sec Loss 0.6146 LearningRate 0.0039 Epoch: 16 Global Step: 267910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:14:53,489-Speed 5176.45 samples/sec Loss 0.6297 LearningRate 0.0039 Epoch: 16 Global Step: 267920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:14:55,461-Speed 5193.40 samples/sec Loss 0.6184 LearningRate 0.0039 Epoch: 16 Global Step: 267930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:14:57,449-Speed 5154.10 samples/sec Loss 0.6065 LearningRate 0.0039 Epoch: 16 Global Step: 267940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:14:59,420-Speed 5196.73 samples/sec Loss 0.6093 LearningRate 0.0039 Epoch: 16 Global Step: 267950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:15:01,399-Speed 5174.86 samples/sec Loss 0.6167 LearningRate 0.0039 Epoch: 16 Global Step: 267960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:15:03,402-Speed 5115.68 samples/sec Loss 0.6212 LearningRate 0.0039 Epoch: 16 Global Step: 267970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:15:05,388-Speed 5157.73 samples/sec Loss 0.6050 LearningRate 0.0039 Epoch: 16 Global Step: 267980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:15:07,360-Speed 5194.19 samples/sec Loss 0.6270 LearningRate 0.0039 Epoch: 16 Global Step: 267990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:15:09,339-Speed 5178.12 samples/sec Loss 0.6003 LearningRate 0.0039 Epoch: 16 Global Step: 268000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:15:36,424-[lfw][268000]XNorm: 22.223501 Training: 2022-04-11 17:15:36,424-[lfw][268000]Accuracy-Flip: 0.99783+-0.00259 Training: 2022-04-11 17:15:36,424-[lfw][268000]Accuracy-Highest: 0.99833 Training: 2022-04-11 17:16:07,825-[cfp_fp][268000]XNorm: 22.449235 Training: 2022-04-11 17:16:07,826-[cfp_fp][268000]Accuracy-Flip: 0.98814+-0.00378 Training: 2022-04-11 17:16:07,826-[cfp_fp][268000]Accuracy-Highest: 0.98914 Training: 2022-04-11 17:16:34,603-[agedb_30][268000]XNorm: 23.241685 Training: 2022-04-11 17:16:34,603-[agedb_30][268000]Accuracy-Flip: 0.98267+-0.00638 Training: 2022-04-11 17:16:34,604-[agedb_30][268000]Accuracy-Highest: 0.98300 Training: 2022-04-11 17:16:36,618-Speed 117.33 samples/sec Loss 0.6565 LearningRate 0.0039 Epoch: 16 Global Step: 268010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:16:38,604-Speed 5157.26 samples/sec Loss 0.6417 LearningRate 0.0039 Epoch: 16 Global Step: 268020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:16:40,572-Speed 5205.78 samples/sec Loss 0.6456 LearningRate 0.0039 Epoch: 16 Global Step: 268030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:16:42,543-Speed 5198.72 samples/sec Loss 0.6029 LearningRate 0.0039 Epoch: 16 Global Step: 268040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:16:44,530-Speed 5154.30 samples/sec Loss 0.6310 LearningRate 0.0039 Epoch: 16 Global Step: 268050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:16:46,511-Speed 5170.45 samples/sec Loss 0.5959 LearningRate 0.0039 Epoch: 16 Global Step: 268060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:16:48,509-Speed 5128.09 samples/sec Loss 0.6212 LearningRate 0.0039 Epoch: 16 Global Step: 268070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:16:50,483-Speed 5189.69 samples/sec Loss 0.6200 LearningRate 0.0039 Epoch: 16 Global Step: 268080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:16:52,466-Speed 5164.99 samples/sec Loss 0.6043 LearningRate 0.0039 Epoch: 16 Global Step: 268090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:16:54,437-Speed 5198.54 samples/sec Loss 0.6363 LearningRate 0.0039 Epoch: 16 Global Step: 268100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:16:56,403-Speed 5210.31 samples/sec Loss 0.6305 LearningRate 0.0039 Epoch: 16 Global Step: 268110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:16:58,387-Speed 5164.01 samples/sec Loss 0.5979 LearningRate 0.0039 Epoch: 16 Global Step: 268120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:17:00,355-Speed 5203.91 samples/sec Loss 0.5952 LearningRate 0.0039 Epoch: 16 Global Step: 268130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:17:02,316-Speed 5223.42 samples/sec Loss 0.6114 LearningRate 0.0039 Epoch: 16 Global Step: 268140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:17:04,292-Speed 5184.95 samples/sec Loss 0.5964 LearningRate 0.0039 Epoch: 16 Global Step: 268150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:17:06,266-Speed 5188.33 samples/sec Loss 0.6416 LearningRate 0.0039 Epoch: 16 Global Step: 268160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:17:08,233-Speed 5207.52 samples/sec Loss 0.6571 LearningRate 0.0039 Epoch: 16 Global Step: 268170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:17:10,218-Speed 5160.45 samples/sec Loss 0.5896 LearningRate 0.0039 Epoch: 16 Global Step: 268180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:17:12,207-Speed 5150.56 samples/sec Loss 0.6021 LearningRate 0.0039 Epoch: 16 Global Step: 268190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:17:14,227-Speed 5073.40 samples/sec Loss 0.6050 LearningRate 0.0039 Epoch: 16 Global Step: 268200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:17:16,196-Speed 5200.95 samples/sec Loss 0.6038 LearningRate 0.0039 Epoch: 16 Global Step: 268210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:17:18,180-Speed 5163.54 samples/sec Loss 0.6242 LearningRate 0.0039 Epoch: 16 Global Step: 268220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:17:20,192-Speed 5210.36 samples/sec Loss 0.6179 LearningRate 0.0039 Epoch: 16 Global Step: 268230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:17:22,166-Speed 5188.61 samples/sec Loss 0.6102 LearningRate 0.0039 Epoch: 16 Global Step: 268240 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:17:24,150-Speed 5162.16 samples/sec Loss 0.6067 LearningRate 0.0039 Epoch: 16 Global Step: 268250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:17:26,114-Speed 5215.42 samples/sec Loss 0.6490 LearningRate 0.0039 Epoch: 16 Global Step: 268260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:17:28,085-Speed 5197.03 samples/sec Loss 0.6394 LearningRate 0.0039 Epoch: 16 Global Step: 268270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:17:30,056-Speed 5197.22 samples/sec Loss 0.6185 LearningRate 0.0039 Epoch: 16 Global Step: 268280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:17:32,035-Speed 5176.52 samples/sec Loss 0.6320 LearningRate 0.0039 Epoch: 16 Global Step: 268290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:17:34,008-Speed 5194.12 samples/sec Loss 0.6386 LearningRate 0.0039 Epoch: 16 Global Step: 268300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:17:35,989-Speed 5170.78 samples/sec Loss 0.6264 LearningRate 0.0039 Epoch: 16 Global Step: 268310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:17:37,969-Speed 5173.96 samples/sec Loss 0.6432 LearningRate 0.0038 Epoch: 16 Global Step: 268320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:17:39,944-Speed 5184.99 samples/sec Loss 0.6110 LearningRate 0.0038 Epoch: 16 Global Step: 268330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:17:41,923-Speed 5178.46 samples/sec Loss 0.6214 LearningRate 0.0038 Epoch: 16 Global Step: 268340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:17:43,904-Speed 5170.07 samples/sec Loss 0.6479 LearningRate 0.0038 Epoch: 16 Global Step: 268350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:17:45,883-Speed 5174.40 samples/sec Loss 0.6602 LearningRate 0.0038 Epoch: 16 Global Step: 268360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:17:47,871-Speed 5153.98 samples/sec Loss 0.6328 LearningRate 0.0038 Epoch: 16 Global Step: 268370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:17:49,871-Speed 5121.84 samples/sec Loss 0.6065 LearningRate 0.0038 Epoch: 16 Global Step: 268380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:17:51,847-Speed 5183.83 samples/sec Loss 0.6228 LearningRate 0.0038 Epoch: 16 Global Step: 268390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:17:53,825-Speed 5180.40 samples/sec Loss 0.6205 LearningRate 0.0038 Epoch: 16 Global Step: 268400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:17:55,796-Speed 5196.17 samples/sec Loss 0.6340 LearningRate 0.0038 Epoch: 16 Global Step: 268410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:17:57,776-Speed 5174.42 samples/sec Loss 0.6007 LearningRate 0.0038 Epoch: 16 Global Step: 268420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:17:59,760-Speed 5161.37 samples/sec Loss 0.6475 LearningRate 0.0038 Epoch: 16 Global Step: 268430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:18:01,738-Speed 5178.28 samples/sec Loss 0.6047 LearningRate 0.0038 Epoch: 16 Global Step: 268440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:18:03,727-Speed 5152.55 samples/sec Loss 0.6221 LearningRate 0.0038 Epoch: 16 Global Step: 268450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:18:05,722-Speed 5134.05 samples/sec Loss 0.6457 LearningRate 0.0038 Epoch: 16 Global Step: 268460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:18:07,714-Speed 5142.81 samples/sec Loss 0.6263 LearningRate 0.0038 Epoch: 16 Global Step: 268470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:18:09,689-Speed 5185.54 samples/sec Loss 0.6071 LearningRate 0.0038 Epoch: 16 Global Step: 268480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:18:11,663-Speed 5190.87 samples/sec Loss 0.6172 LearningRate 0.0038 Epoch: 16 Global Step: 268490 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:18:13,640-Speed 5181.76 samples/sec Loss 0.6105 LearningRate 0.0038 Epoch: 16 Global Step: 268500 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:18:15,616-Speed 5183.61 samples/sec Loss 0.6318 LearningRate 0.0038 Epoch: 16 Global Step: 268510 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:18:17,601-Speed 5160.78 samples/sec Loss 0.6039 LearningRate 0.0038 Epoch: 16 Global Step: 268520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:18:19,587-Speed 5158.68 samples/sec Loss 0.6451 LearningRate 0.0038 Epoch: 16 Global Step: 268530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:18:21,562-Speed 5186.91 samples/sec Loss 0.6298 LearningRate 0.0038 Epoch: 16 Global Step: 268540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:18:23,572-Speed 5095.64 samples/sec Loss 0.6220 LearningRate 0.0038 Epoch: 16 Global Step: 268550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:18:25,597-Speed 5057.75 samples/sec Loss 0.6286 LearningRate 0.0038 Epoch: 16 Global Step: 268560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:18:27,582-Speed 5161.39 samples/sec Loss 0.6155 LearningRate 0.0038 Epoch: 16 Global Step: 268570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:18:29,555-Speed 5193.09 samples/sec Loss 0.6217 LearningRate 0.0038 Epoch: 16 Global Step: 268580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:18:31,520-Speed 5212.74 samples/sec Loss 0.6292 LearningRate 0.0038 Epoch: 16 Global Step: 268590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:18:33,500-Speed 5172.27 samples/sec Loss 0.6796 LearningRate 0.0038 Epoch: 16 Global Step: 268600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:18:35,484-Speed 5162.75 samples/sec Loss 0.6132 LearningRate 0.0038 Epoch: 16 Global Step: 268610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:18:37,473-Speed 5151.65 samples/sec Loss 0.6166 LearningRate 0.0038 Epoch: 16 Global Step: 268620 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:18:39,445-Speed 5193.05 samples/sec Loss 0.6451 LearningRate 0.0038 Epoch: 16 Global Step: 268630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:18:41,441-Speed 5133.56 samples/sec Loss 0.6140 LearningRate 0.0038 Epoch: 16 Global Step: 268640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:18:43,428-Speed 5156.47 samples/sec Loss 0.6179 LearningRate 0.0038 Epoch: 16 Global Step: 268650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:18:45,398-Speed 5198.27 samples/sec Loss 0.6575 LearningRate 0.0038 Epoch: 16 Global Step: 268660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:18:47,375-Speed 5182.35 samples/sec Loss 0.6352 LearningRate 0.0038 Epoch: 16 Global Step: 268670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:18:49,511-Speed 5178.56 samples/sec Loss 0.6260 LearningRate 0.0038 Epoch: 16 Global Step: 268680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:18:51,497-Speed 5156.69 samples/sec Loss 0.6514 LearningRate 0.0038 Epoch: 16 Global Step: 268690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:18:53,471-Speed 5189.34 samples/sec Loss 0.6255 LearningRate 0.0038 Epoch: 16 Global Step: 268700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:18:55,441-Speed 5199.79 samples/sec Loss 0.6249 LearningRate 0.0038 Epoch: 16 Global Step: 268710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:18:57,413-Speed 5193.23 samples/sec Loss 0.6282 LearningRate 0.0038 Epoch: 16 Global Step: 268720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:18:59,406-Speed 5140.29 samples/sec Loss 0.6182 LearningRate 0.0038 Epoch: 16 Global Step: 268730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:19:01,377-Speed 5198.12 samples/sec Loss 0.6036 LearningRate 0.0038 Epoch: 16 Global Step: 268740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:19:03,360-Speed 5167.18 samples/sec Loss 0.5993 LearningRate 0.0038 Epoch: 16 Global Step: 268750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:19:05,333-Speed 5190.03 samples/sec Loss 0.6446 LearningRate 0.0038 Epoch: 16 Global Step: 268760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:19:07,306-Speed 5192.30 samples/sec Loss 0.6369 LearningRate 0.0038 Epoch: 16 Global Step: 268770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:19:09,271-Speed 5213.88 samples/sec Loss 0.6493 LearningRate 0.0038 Epoch: 16 Global Step: 268780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:19:11,255-Speed 5163.17 samples/sec Loss 0.6057 LearningRate 0.0038 Epoch: 16 Global Step: 268790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:19:13,234-Speed 5175.99 samples/sec Loss 0.6184 LearningRate 0.0038 Epoch: 16 Global Step: 268800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:19:15,220-Speed 5156.47 samples/sec Loss 0.6304 LearningRate 0.0038 Epoch: 16 Global Step: 268810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:19:17,206-Speed 5158.91 samples/sec Loss 0.6232 LearningRate 0.0038 Epoch: 16 Global Step: 268820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:19:19,185-Speed 5177.21 samples/sec Loss 0.6113 LearningRate 0.0038 Epoch: 16 Global Step: 268830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:19:21,191-Speed 5105.47 samples/sec Loss 0.6449 LearningRate 0.0038 Epoch: 16 Global Step: 268840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:19:23,179-Speed 5153.95 samples/sec Loss 0.6264 LearningRate 0.0038 Epoch: 16 Global Step: 268850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:19:25,151-Speed 5193.73 samples/sec Loss 0.6233 LearningRate 0.0038 Epoch: 16 Global Step: 268860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:19:27,128-Speed 5182.84 samples/sec Loss 0.6384 LearningRate 0.0038 Epoch: 16 Global Step: 268870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:19:29,103-Speed 5185.91 samples/sec Loss 0.6692 LearningRate 0.0038 Epoch: 16 Global Step: 268880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:19:31,072-Speed 5200.19 samples/sec Loss 0.6241 LearningRate 0.0038 Epoch: 16 Global Step: 268890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:19:33,049-Speed 5182.32 samples/sec Loss 0.6160 LearningRate 0.0038 Epoch: 16 Global Step: 268900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:19:35,045-Speed 5131.53 samples/sec Loss 0.6596 LearningRate 0.0038 Epoch: 16 Global Step: 268910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:19:37,032-Speed 5156.64 samples/sec Loss 0.6197 LearningRate 0.0038 Epoch: 16 Global Step: 268920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:19:39,020-Speed 5151.93 samples/sec Loss 0.6556 LearningRate 0.0038 Epoch: 16 Global Step: 268930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:19:41,065-Speed 5010.03 samples/sec Loss 0.6287 LearningRate 0.0038 Epoch: 16 Global Step: 268940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:19:43,045-Speed 5175.57 samples/sec Loss 0.6482 LearningRate 0.0038 Epoch: 16 Global Step: 268950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:19:45,013-Speed 5203.59 samples/sec Loss 0.5951 LearningRate 0.0038 Epoch: 16 Global Step: 268960 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:19:47,016-Speed 5115.46 samples/sec Loss 0.6249 LearningRate 0.0038 Epoch: 16 Global Step: 268970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:19:48,987-Speed 5195.57 samples/sec Loss 0.6243 LearningRate 0.0038 Epoch: 16 Global Step: 268980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:19:50,967-Speed 5174.31 samples/sec Loss 0.6507 LearningRate 0.0038 Epoch: 16 Global Step: 268990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:19:52,943-Speed 5185.89 samples/sec Loss 0.6145 LearningRate 0.0038 Epoch: 16 Global Step: 269000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:19:54,915-Speed 5194.20 samples/sec Loss 0.6734 LearningRate 0.0038 Epoch: 16 Global Step: 269010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:19:56,892-Speed 5181.25 samples/sec Loss 0.6311 LearningRate 0.0038 Epoch: 16 Global Step: 269020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:19:58,881-Speed 5148.54 samples/sec Loss 0.6311 LearningRate 0.0038 Epoch: 16 Global Step: 269030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:20:00,859-Speed 5179.95 samples/sec Loss 0.6144 LearningRate 0.0038 Epoch: 16 Global Step: 269040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:20:02,839-Speed 5172.62 samples/sec Loss 0.6279 LearningRate 0.0038 Epoch: 16 Global Step: 269050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:20:04,815-Speed 5184.49 samples/sec Loss 0.6710 LearningRate 0.0038 Epoch: 16 Global Step: 269060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:20:06,787-Speed 5195.65 samples/sec Loss 0.6214 LearningRate 0.0038 Epoch: 16 Global Step: 269070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:20:08,759-Speed 5193.58 samples/sec Loss 0.6371 LearningRate 0.0038 Epoch: 16 Global Step: 269080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:20:10,756-Speed 5130.01 samples/sec Loss 0.6304 LearningRate 0.0038 Epoch: 16 Global Step: 269090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:20:12,752-Speed 5133.94 samples/sec Loss 0.6394 LearningRate 0.0038 Epoch: 16 Global Step: 269100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:20:14,728-Speed 5181.92 samples/sec Loss 0.6263 LearningRate 0.0038 Epoch: 16 Global Step: 269110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:20:16,708-Speed 5175.17 samples/sec Loss 0.6355 LearningRate 0.0038 Epoch: 16 Global Step: 269120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:20:18,688-Speed 5172.08 samples/sec Loss 0.6196 LearningRate 0.0038 Epoch: 16 Global Step: 269130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:20:20,662-Speed 5189.04 samples/sec Loss 0.6486 LearningRate 0.0038 Epoch: 16 Global Step: 269140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:20:22,639-Speed 5182.97 samples/sec Loss 0.6496 LearningRate 0.0038 Epoch: 16 Global Step: 269150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:20:24,623-Speed 5160.67 samples/sec Loss 0.6687 LearningRate 0.0038 Epoch: 16 Global Step: 269160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:20:26,619-Speed 5134.17 samples/sec Loss 0.6352 LearningRate 0.0038 Epoch: 16 Global Step: 269170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:20:28,591-Speed 5193.69 samples/sec Loss 0.6266 LearningRate 0.0037 Epoch: 16 Global Step: 269180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:20:30,576-Speed 5161.45 samples/sec Loss 0.6311 LearningRate 0.0037 Epoch: 16 Global Step: 269190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:20:32,547-Speed 5197.11 samples/sec Loss 0.6766 LearningRate 0.0037 Epoch: 16 Global Step: 269200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:20:34,558-Speed 5092.99 samples/sec Loss 0.6177 LearningRate 0.0037 Epoch: 16 Global Step: 269210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:20:36,552-Speed 5137.07 samples/sec Loss 0.6179 LearningRate 0.0037 Epoch: 16 Global Step: 269220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:20:38,590-Speed 5027.55 samples/sec Loss 0.6422 LearningRate 0.0037 Epoch: 16 Global Step: 269230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:20:40,564-Speed 5190.57 samples/sec Loss 0.6404 LearningRate 0.0037 Epoch: 16 Global Step: 269240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:20:42,533-Speed 5200.68 samples/sec Loss 0.6683 LearningRate 0.0037 Epoch: 16 Global Step: 269250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:20:44,526-Speed 5142.58 samples/sec Loss 0.6380 LearningRate 0.0037 Epoch: 16 Global Step: 269260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:20:46,500-Speed 5189.07 samples/sec Loss 0.6132 LearningRate 0.0037 Epoch: 16 Global Step: 269270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:20:48,488-Speed 5151.02 samples/sec Loss 0.6062 LearningRate 0.0037 Epoch: 16 Global Step: 269280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:20:50,462-Speed 5191.59 samples/sec Loss 0.6232 LearningRate 0.0037 Epoch: 16 Global Step: 269290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:20:52,445-Speed 5165.29 samples/sec Loss 0.6073 LearningRate 0.0037 Epoch: 16 Global Step: 269300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:20:54,446-Speed 5118.73 samples/sec Loss 0.6093 LearningRate 0.0037 Epoch: 16 Global Step: 269310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:20:56,439-Speed 5141.06 samples/sec Loss 0.6295 LearningRate 0.0037 Epoch: 16 Global Step: 269320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:20:58,450-Speed 5091.53 samples/sec Loss 0.6339 LearningRate 0.0037 Epoch: 16 Global Step: 269330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:21:00,460-Speed 5098.56 samples/sec Loss 0.6727 LearningRate 0.0037 Epoch: 16 Global Step: 269340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:21:02,441-Speed 5169.72 samples/sec Loss 0.6209 LearningRate 0.0037 Epoch: 16 Global Step: 269350 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:21:04,420-Speed 5178.18 samples/sec Loss 0.6278 LearningRate 0.0037 Epoch: 16 Global Step: 269360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:21:06,415-Speed 5134.38 samples/sec Loss 0.6239 LearningRate 0.0037 Epoch: 16 Global Step: 269370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:21:08,386-Speed 5196.91 samples/sec Loss 0.6522 LearningRate 0.0037 Epoch: 16 Global Step: 269380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:21:10,380-Speed 5138.30 samples/sec Loss 0.6158 LearningRate 0.0037 Epoch: 16 Global Step: 269390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:21:12,359-Speed 5175.96 samples/sec Loss 0.6467 LearningRate 0.0037 Epoch: 16 Global Step: 269400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:21:14,375-Speed 5080.14 samples/sec Loss 0.6658 LearningRate 0.0037 Epoch: 16 Global Step: 269410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:21:16,360-Speed 5163.12 samples/sec Loss 0.6007 LearningRate 0.0037 Epoch: 16 Global Step: 269420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:21:18,344-Speed 5164.03 samples/sec Loss 0.6611 LearningRate 0.0037 Epoch: 16 Global Step: 269430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:21:20,318-Speed 5189.89 samples/sec Loss 0.6370 LearningRate 0.0037 Epoch: 16 Global Step: 269440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:21:22,301-Speed 5165.60 samples/sec Loss 0.6677 LearningRate 0.0037 Epoch: 16 Global Step: 269450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:21:24,285-Speed 5161.64 samples/sec Loss 0.6442 LearningRate 0.0037 Epoch: 16 Global Step: 269460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:21:26,278-Speed 5141.01 samples/sec Loss 0.6403 LearningRate 0.0037 Epoch: 16 Global Step: 269470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:21:28,249-Speed 5199.03 samples/sec Loss 0.6157 LearningRate 0.0037 Epoch: 16 Global Step: 269480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:21:30,223-Speed 5186.88 samples/sec Loss 0.6568 LearningRate 0.0037 Epoch: 16 Global Step: 269490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:21:32,201-Speed 5181.44 samples/sec Loss 0.6466 LearningRate 0.0037 Epoch: 16 Global Step: 269500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:21:34,190-Speed 5149.97 samples/sec Loss 0.6366 LearningRate 0.0037 Epoch: 16 Global Step: 269510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:21:36,234-Speed 5012.54 samples/sec Loss 0.6709 LearningRate 0.0037 Epoch: 16 Global Step: 269520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:21:38,247-Speed 5089.68 samples/sec Loss 0.6442 LearningRate 0.0037 Epoch: 16 Global Step: 269530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:21:40,308-Speed 4969.03 samples/sec Loss 0.6176 LearningRate 0.0037 Epoch: 16 Global Step: 269540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:21:42,286-Speed 5180.49 samples/sec Loss 0.6477 LearningRate 0.0037 Epoch: 16 Global Step: 269550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:21:44,272-Speed 5157.58 samples/sec Loss 0.6813 LearningRate 0.0037 Epoch: 16 Global Step: 269560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:21:46,261-Speed 5150.42 samples/sec Loss 0.6489 LearningRate 0.0037 Epoch: 16 Global Step: 269570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:21:48,234-Speed 5192.67 samples/sec Loss 0.6518 LearningRate 0.0037 Epoch: 16 Global Step: 269580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:21:50,216-Speed 5170.85 samples/sec Loss 0.6450 LearningRate 0.0037 Epoch: 16 Global Step: 269590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:21:52,198-Speed 5168.03 samples/sec Loss 0.6588 LearningRate 0.0037 Epoch: 16 Global Step: 269600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:21:54,194-Speed 5132.84 samples/sec Loss 0.6385 LearningRate 0.0037 Epoch: 16 Global Step: 269610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:21:56,165-Speed 5195.31 samples/sec Loss 0.6572 LearningRate 0.0037 Epoch: 16 Global Step: 269620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:21:58,208-Speed 5015.46 samples/sec Loss 0.6340 LearningRate 0.0037 Epoch: 16 Global Step: 269630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:22:00,188-Speed 5176.34 samples/sec Loss 0.6415 LearningRate 0.0037 Epoch: 16 Global Step: 269640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:22:02,211-Speed 5063.93 samples/sec Loss 0.6905 LearningRate 0.0037 Epoch: 16 Global Step: 269650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:22:04,210-Speed 5125.44 samples/sec Loss 0.6535 LearningRate 0.0037 Epoch: 16 Global Step: 269660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:22:06,189-Speed 5175.26 samples/sec Loss 0.6534 LearningRate 0.0037 Epoch: 16 Global Step: 269670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:22:08,184-Speed 5136.23 samples/sec Loss 0.6398 LearningRate 0.0037 Epoch: 16 Global Step: 269680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:22:10,161-Speed 5183.71 samples/sec Loss 0.6574 LearningRate 0.0037 Epoch: 16 Global Step: 269690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:22:12,182-Speed 5068.21 samples/sec Loss 0.6530 LearningRate 0.0037 Epoch: 16 Global Step: 269700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:22:14,160-Speed 5180.52 samples/sec Loss 0.6395 LearningRate 0.0037 Epoch: 16 Global Step: 269710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:22:16,193-Speed 5039.22 samples/sec Loss 0.6412 LearningRate 0.0037 Epoch: 16 Global Step: 269720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:22:18,178-Speed 5161.54 samples/sec Loss 0.6788 LearningRate 0.0037 Epoch: 16 Global Step: 269730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:22:20,149-Speed 5197.19 samples/sec Loss 0.6462 LearningRate 0.0037 Epoch: 16 Global Step: 269740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:22:22,154-Speed 5111.27 samples/sec Loss 0.6372 LearningRate 0.0037 Epoch: 16 Global Step: 269750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:22:24,130-Speed 5184.04 samples/sec Loss 0.6464 LearningRate 0.0037 Epoch: 16 Global Step: 269760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:22:26,137-Speed 5104.73 samples/sec Loss 0.6582 LearningRate 0.0037 Epoch: 16 Global Step: 269770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:22:28,111-Speed 5189.46 samples/sec Loss 0.6882 LearningRate 0.0037 Epoch: 16 Global Step: 269780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:22:30,118-Speed 5102.49 samples/sec Loss 0.6653 LearningRate 0.0037 Epoch: 16 Global Step: 269790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:22:32,091-Speed 5192.93 samples/sec Loss 0.6878 LearningRate 0.0037 Epoch: 16 Global Step: 269800 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:22:34,088-Speed 5129.49 samples/sec Loss 0.6361 LearningRate 0.0037 Epoch: 16 Global Step: 269810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:22:36,099-Speed 5095.10 samples/sec Loss 0.6547 LearningRate 0.0037 Epoch: 16 Global Step: 269820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:22:38,075-Speed 5183.61 samples/sec Loss 0.6414 LearningRate 0.0037 Epoch: 16 Global Step: 269830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:22:40,069-Speed 5138.73 samples/sec Loss 0.6562 LearningRate 0.0037 Epoch: 16 Global Step: 269840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:22:42,246-Speed 5137.36 samples/sec Loss 0.6531 LearningRate 0.0037 Epoch: 16 Global Step: 269850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:22:44,224-Speed 5178.08 samples/sec Loss 0.6447 LearningRate 0.0037 Epoch: 16 Global Step: 269860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:22:46,216-Speed 5143.63 samples/sec Loss 0.6294 LearningRate 0.0037 Epoch: 16 Global Step: 269870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:22:48,224-Speed 5103.61 samples/sec Loss 0.6701 LearningRate 0.0037 Epoch: 16 Global Step: 269880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:22:50,233-Speed 5098.32 samples/sec Loss 0.6565 LearningRate 0.0037 Epoch: 16 Global Step: 269890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:22:52,219-Speed 5158.21 samples/sec Loss 0.6293 LearningRate 0.0037 Epoch: 16 Global Step: 269900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:22:54,193-Speed 5188.96 samples/sec Loss 0.6376 LearningRate 0.0037 Epoch: 16 Global Step: 269910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:22:56,190-Speed 5130.52 samples/sec Loss 0.6665 LearningRate 0.0037 Epoch: 16 Global Step: 269920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:22:58,162-Speed 5194.61 samples/sec Loss 0.6384 LearningRate 0.0037 Epoch: 16 Global Step: 269930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:23:00,149-Speed 5154.95 samples/sec Loss 0.6434 LearningRate 0.0037 Epoch: 16 Global Step: 269940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:23:02,126-Speed 5181.28 samples/sec Loss 0.6180 LearningRate 0.0037 Epoch: 16 Global Step: 269950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:23:04,124-Speed 5127.37 samples/sec Loss 0.6697 LearningRate 0.0037 Epoch: 16 Global Step: 269960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:23:06,097-Speed 5192.99 samples/sec Loss 0.6762 LearningRate 0.0037 Epoch: 16 Global Step: 269970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:23:08,109-Speed 5093.19 samples/sec Loss 0.6535 LearningRate 0.0037 Epoch: 16 Global Step: 269980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:23:10,101-Speed 5140.51 samples/sec Loss 0.6565 LearningRate 0.0037 Epoch: 16 Global Step: 269990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:23:12,119-Speed 5078.51 samples/sec Loss 0.6506 LearningRate 0.0037 Epoch: 16 Global Step: 270000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:23:38,872-[lfw][270000]XNorm: 22.591805 Training: 2022-04-11 17:23:38,872-[lfw][270000]Accuracy-Flip: 0.99800+-0.00267 Training: 2022-04-11 17:23:38,873-[lfw][270000]Accuracy-Highest: 0.99833 Training: 2022-04-11 17:24:09,602-[cfp_fp][270000]XNorm: 22.520252 Training: 2022-04-11 17:24:09,602-[cfp_fp][270000]Accuracy-Flip: 0.98929+-0.00444 Training: 2022-04-11 17:24:09,602-[cfp_fp][270000]Accuracy-Highest: 0.98929 Training: 2022-04-11 17:24:36,298-[agedb_30][270000]XNorm: 23.511905 Training: 2022-04-11 17:24:36,298-[agedb_30][270000]Accuracy-Flip: 0.98083+-0.00727 Training: 2022-04-11 17:24:36,299-[agedb_30][270000]Accuracy-Highest: 0.98300 Training: 2022-04-11 17:24:38,335-Speed 118.77 samples/sec Loss 0.6700 LearningRate 0.0037 Epoch: 16 Global Step: 270010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:24:40,301-Speed 5210.20 samples/sec Loss 0.6505 LearningRate 0.0037 Epoch: 16 Global Step: 270020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:24:42,269-Speed 5205.98 samples/sec Loss 0.6551 LearningRate 0.0037 Epoch: 16 Global Step: 270030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:24:44,243-Speed 5190.57 samples/sec Loss 0.6613 LearningRate 0.0037 Epoch: 16 Global Step: 270040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:24:46,212-Speed 5202.69 samples/sec Loss 0.6112 LearningRate 0.0036 Epoch: 16 Global Step: 270050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:24:48,193-Speed 5169.37 samples/sec Loss 0.6187 LearningRate 0.0036 Epoch: 16 Global Step: 270060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:24:50,201-Speed 5102.23 samples/sec Loss 0.6612 LearningRate 0.0036 Epoch: 16 Global Step: 270070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:24:52,177-Speed 5185.46 samples/sec Loss 0.6564 LearningRate 0.0036 Epoch: 16 Global Step: 270080 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:24:54,160-Speed 5166.69 samples/sec Loss 0.6328 LearningRate 0.0036 Epoch: 16 Global Step: 270090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:24:56,155-Speed 5133.40 samples/sec Loss 0.6411 LearningRate 0.0036 Epoch: 16 Global Step: 270100 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:24:58,133-Speed 5179.47 samples/sec Loss 0.6420 LearningRate 0.0036 Epoch: 16 Global Step: 270110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:25:00,126-Speed 5140.31 samples/sec Loss 0.6151 LearningRate 0.0036 Epoch: 16 Global Step: 270120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:25:02,104-Speed 5179.70 samples/sec Loss 0.6612 LearningRate 0.0036 Epoch: 16 Global Step: 270130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:25:04,108-Speed 5111.97 samples/sec Loss 0.6429 LearningRate 0.0036 Epoch: 16 Global Step: 270140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:25:06,075-Speed 5209.11 samples/sec Loss 0.6349 LearningRate 0.0036 Epoch: 16 Global Step: 270150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:25:08,043-Speed 5205.66 samples/sec Loss 0.6703 LearningRate 0.0036 Epoch: 16 Global Step: 270160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:25:10,036-Speed 5138.58 samples/sec Loss 0.6537 LearningRate 0.0036 Epoch: 16 Global Step: 270170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:25:12,024-Speed 5193.65 samples/sec Loss 0.6281 LearningRate 0.0036 Epoch: 16 Global Step: 270180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:25:13,990-Speed 5209.18 samples/sec Loss 0.6424 LearningRate 0.0036 Epoch: 16 Global Step: 270190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:25:15,958-Speed 5204.45 samples/sec Loss 0.6344 LearningRate 0.0036 Epoch: 16 Global Step: 270200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:25:17,924-Speed 5211.69 samples/sec Loss 0.6307 LearningRate 0.0036 Epoch: 16 Global Step: 270210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:25:19,891-Speed 5206.85 samples/sec Loss 0.6771 LearningRate 0.0036 Epoch: 16 Global Step: 270220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:25:21,888-Speed 5129.73 samples/sec Loss 0.6663 LearningRate 0.0036 Epoch: 16 Global Step: 270230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:25:23,868-Speed 5174.48 samples/sec Loss 0.6464 LearningRate 0.0036 Epoch: 16 Global Step: 270240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:25:25,853-Speed 5160.53 samples/sec Loss 0.6458 LearningRate 0.0036 Epoch: 16 Global Step: 270250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:25:27,824-Speed 5197.68 samples/sec Loss 0.6089 LearningRate 0.0036 Epoch: 16 Global Step: 270260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:25:29,799-Speed 5185.48 samples/sec Loss 0.6378 LearningRate 0.0036 Epoch: 16 Global Step: 270270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:25:31,775-Speed 5184.52 samples/sec Loss 0.6345 LearningRate 0.0036 Epoch: 16 Global Step: 270280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:25:33,777-Speed 5117.00 samples/sec Loss 0.6419 LearningRate 0.0036 Epoch: 16 Global Step: 270290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:25:35,765-Speed 5151.85 samples/sec Loss 0.6389 LearningRate 0.0036 Epoch: 16 Global Step: 270300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:25:37,734-Speed 5204.75 samples/sec Loss 0.6234 LearningRate 0.0036 Epoch: 16 Global Step: 270310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:25:39,764-Speed 5045.04 samples/sec Loss 0.6709 LearningRate 0.0036 Epoch: 16 Global Step: 270320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:25:41,734-Speed 5201.60 samples/sec Loss 0.6433 LearningRate 0.0036 Epoch: 16 Global Step: 270330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:25:43,720-Speed 5158.30 samples/sec Loss 0.6484 LearningRate 0.0036 Epoch: 16 Global Step: 270340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:25:45,698-Speed 5179.59 samples/sec Loss 0.6417 LearningRate 0.0036 Epoch: 16 Global Step: 270350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:25:47,682-Speed 5162.16 samples/sec Loss 0.6402 LearningRate 0.0036 Epoch: 16 Global Step: 270360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:25:49,656-Speed 5190.65 samples/sec Loss 0.6534 LearningRate 0.0036 Epoch: 16 Global Step: 270370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:25:51,636-Speed 5174.70 samples/sec Loss 0.6392 LearningRate 0.0036 Epoch: 16 Global Step: 270380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:25:53,617-Speed 5170.60 samples/sec Loss 0.7246 LearningRate 0.0036 Epoch: 16 Global Step: 270390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:25:55,601-Speed 5161.22 samples/sec Loss 0.6572 LearningRate 0.0036 Epoch: 16 Global Step: 270400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:25:57,597-Speed 5133.30 samples/sec Loss 0.6543 LearningRate 0.0036 Epoch: 16 Global Step: 270410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:25:59,566-Speed 5201.70 samples/sec Loss 0.6295 LearningRate 0.0036 Epoch: 16 Global Step: 270420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:26:01,552-Speed 5158.48 samples/sec Loss 0.6503 LearningRate 0.0036 Epoch: 16 Global Step: 270430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:26:03,525-Speed 5191.83 samples/sec Loss 0.6702 LearningRate 0.0036 Epoch: 16 Global Step: 270440 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:26:05,493-Speed 5205.80 samples/sec Loss 0.6547 LearningRate 0.0036 Epoch: 16 Global Step: 270450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:26:07,467-Speed 5189.38 samples/sec Loss 0.6559 LearningRate 0.0036 Epoch: 16 Global Step: 270460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:26:09,430-Speed 5216.26 samples/sec Loss 0.6606 LearningRate 0.0036 Epoch: 16 Global Step: 270470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:26:11,398-Speed 5206.37 samples/sec Loss 0.6482 LearningRate 0.0036 Epoch: 16 Global Step: 270480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:26:13,368-Speed 5198.97 samples/sec Loss 0.6769 LearningRate 0.0036 Epoch: 16 Global Step: 270490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:26:15,342-Speed 5189.22 samples/sec Loss 0.6548 LearningRate 0.0036 Epoch: 16 Global Step: 270500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:26:17,314-Speed 5196.69 samples/sec Loss 0.6504 LearningRate 0.0036 Epoch: 16 Global Step: 270510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:26:19,289-Speed 5186.09 samples/sec Loss 0.6539 LearningRate 0.0036 Epoch: 16 Global Step: 270520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:26:21,264-Speed 5187.57 samples/sec Loss 0.6627 LearningRate 0.0036 Epoch: 16 Global Step: 270530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:26:23,237-Speed 5191.46 samples/sec Loss 0.6573 LearningRate 0.0036 Epoch: 16 Global Step: 270540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:26:25,209-Speed 5193.81 samples/sec Loss 0.6162 LearningRate 0.0036 Epoch: 16 Global Step: 270550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:26:27,181-Speed 5193.23 samples/sec Loss 0.6648 LearningRate 0.0036 Epoch: 16 Global Step: 270560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:26:29,150-Speed 5202.94 samples/sec Loss 0.6459 LearningRate 0.0036 Epoch: 16 Global Step: 270570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:26:31,122-Speed 5194.51 samples/sec Loss 0.6464 LearningRate 0.0036 Epoch: 16 Global Step: 270580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:26:33,112-Speed 5148.28 samples/sec Loss 0.6802 LearningRate 0.0036 Epoch: 16 Global Step: 270590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:26:35,093-Speed 5171.22 samples/sec Loss 0.6255 LearningRate 0.0036 Epoch: 16 Global Step: 270600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:26:37,068-Speed 5185.65 samples/sec Loss 0.6633 LearningRate 0.0036 Epoch: 16 Global Step: 270610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:26:39,042-Speed 5190.79 samples/sec Loss 0.6385 LearningRate 0.0036 Epoch: 16 Global Step: 270620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:26:41,017-Speed 5186.31 samples/sec Loss 0.6641 LearningRate 0.0036 Epoch: 16 Global Step: 270630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:26:43,015-Speed 5128.34 samples/sec Loss 0.6465 LearningRate 0.0036 Epoch: 16 Global Step: 270640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:26:44,998-Speed 5166.83 samples/sec Loss 0.6464 LearningRate 0.0036 Epoch: 16 Global Step: 270650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:26:47,018-Speed 5071.97 samples/sec Loss 0.6406 LearningRate 0.0036 Epoch: 16 Global Step: 270660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:26:48,998-Speed 5173.91 samples/sec Loss 0.6675 LearningRate 0.0036 Epoch: 16 Global Step: 270670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:26:50,967-Speed 5201.51 samples/sec Loss 0.6238 LearningRate 0.0036 Epoch: 16 Global Step: 270680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:26:52,961-Speed 5136.28 samples/sec Loss 0.6515 LearningRate 0.0036 Epoch: 16 Global Step: 270690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:26:54,953-Speed 5143.62 samples/sec Loss 0.6484 LearningRate 0.0036 Epoch: 16 Global Step: 270700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:26:56,927-Speed 5190.59 samples/sec Loss 0.6675 LearningRate 0.0036 Epoch: 16 Global Step: 270710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:26:58,911-Speed 5163.29 samples/sec Loss 0.6639 LearningRate 0.0036 Epoch: 16 Global Step: 270720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:27:00,883-Speed 5192.28 samples/sec Loss 0.6195 LearningRate 0.0036 Epoch: 16 Global Step: 270730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:27:02,863-Speed 5174.13 samples/sec Loss 0.6478 LearningRate 0.0036 Epoch: 16 Global Step: 270740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:27:04,836-Speed 5191.88 samples/sec Loss 0.6979 LearningRate 0.0036 Epoch: 16 Global Step: 270750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:27:06,812-Speed 5183.90 samples/sec Loss 0.6472 LearningRate 0.0036 Epoch: 16 Global Step: 270760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:27:08,813-Speed 5120.16 samples/sec Loss 0.6451 LearningRate 0.0036 Epoch: 16 Global Step: 270770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:27:10,795-Speed 5170.63 samples/sec Loss 0.6836 LearningRate 0.0036 Epoch: 16 Global Step: 270780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:27:12,772-Speed 5181.01 samples/sec Loss 0.6258 LearningRate 0.0036 Epoch: 16 Global Step: 270790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:27:14,764-Speed 5141.41 samples/sec Loss 0.6617 LearningRate 0.0036 Epoch: 16 Global Step: 270800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:27:16,733-Speed 5202.59 samples/sec Loss 0.6738 LearningRate 0.0036 Epoch: 16 Global Step: 270810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:27:18,699-Speed 5211.44 samples/sec Loss 0.6546 LearningRate 0.0036 Epoch: 16 Global Step: 270820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:27:20,674-Speed 5185.80 samples/sec Loss 0.6652 LearningRate 0.0036 Epoch: 16 Global Step: 270830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:27:22,667-Speed 5140.27 samples/sec Loss 0.6632 LearningRate 0.0036 Epoch: 16 Global Step: 270840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:27:24,697-Speed 5046.28 samples/sec Loss 0.6555 LearningRate 0.0036 Epoch: 16 Global Step: 270850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:27:26,664-Speed 5207.67 samples/sec Loss 0.6563 LearningRate 0.0036 Epoch: 16 Global Step: 270860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:27:28,648-Speed 5164.82 samples/sec Loss 0.6446 LearningRate 0.0036 Epoch: 16 Global Step: 270870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:27:30,635-Speed 5155.25 samples/sec Loss 0.6150 LearningRate 0.0036 Epoch: 16 Global Step: 270880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:27:32,630-Speed 5133.69 samples/sec Loss 0.6345 LearningRate 0.0036 Epoch: 16 Global Step: 270890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:27:34,626-Speed 5133.71 samples/sec Loss 0.6655 LearningRate 0.0036 Epoch: 16 Global Step: 270900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:27:36,624-Speed 5125.87 samples/sec Loss 0.6707 LearningRate 0.0036 Epoch: 16 Global Step: 270910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:27:38,624-Speed 5121.75 samples/sec Loss 0.6576 LearningRate 0.0036 Epoch: 16 Global Step: 270920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:27:40,594-Speed 5199.67 samples/sec Loss 0.6466 LearningRate 0.0035 Epoch: 16 Global Step: 270930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:27:42,566-Speed 5195.05 samples/sec Loss 0.6413 LearningRate 0.0035 Epoch: 16 Global Step: 270940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:27:44,531-Speed 5213.15 samples/sec Loss 0.6867 LearningRate 0.0035 Epoch: 16 Global Step: 270950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:27:46,498-Speed 5207.16 samples/sec Loss 0.6309 LearningRate 0.0035 Epoch: 16 Global Step: 270960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:27:48,465-Speed 5209.42 samples/sec Loss 0.6508 LearningRate 0.0035 Epoch: 16 Global Step: 270970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:27:50,439-Speed 5188.60 samples/sec Loss 0.6788 LearningRate 0.0035 Epoch: 16 Global Step: 270980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:27:52,410-Speed 5195.13 samples/sec Loss 0.6649 LearningRate 0.0035 Epoch: 16 Global Step: 270990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:27:54,380-Speed 5201.17 samples/sec Loss 0.6351 LearningRate 0.0035 Epoch: 16 Global Step: 271000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:27:56,354-Speed 5190.10 samples/sec Loss 0.6588 LearningRate 0.0035 Epoch: 16 Global Step: 271010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:27:58,333-Speed 5175.59 samples/sec Loss 0.6574 LearningRate 0.0035 Epoch: 16 Global Step: 271020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:00,334-Speed 5120.48 samples/sec Loss 0.6480 LearningRate 0.0035 Epoch: 16 Global Step: 271030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:02,320-Speed 5157.58 samples/sec Loss 0.6829 LearningRate 0.0035 Epoch: 16 Global Step: 271040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:04,333-Speed 5088.14 samples/sec Loss 0.6710 LearningRate 0.0035 Epoch: 16 Global Step: 271050 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:28:06,302-Speed 5202.82 samples/sec Loss 0.6734 LearningRate 0.0035 Epoch: 16 Global Step: 271060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:08,280-Speed 5177.59 samples/sec Loss 0.6686 LearningRate 0.0035 Epoch: 16 Global Step: 271070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:10,254-Speed 5191.00 samples/sec Loss 0.6636 LearningRate 0.0035 Epoch: 16 Global Step: 271080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:12,227-Speed 5191.21 samples/sec Loss 0.6606 LearningRate 0.0035 Epoch: 16 Global Step: 271090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:14,206-Speed 5175.63 samples/sec Loss 0.6335 LearningRate 0.0035 Epoch: 16 Global Step: 271100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:16,186-Speed 5172.26 samples/sec Loss 0.6507 LearningRate 0.0035 Epoch: 16 Global Step: 271110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:18,158-Speed 5195.40 samples/sec Loss 0.6635 LearningRate 0.0035 Epoch: 16 Global Step: 271120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:20,155-Speed 5129.26 samples/sec Loss 0.6877 LearningRate 0.0035 Epoch: 16 Global Step: 271130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:22,125-Speed 5199.86 samples/sec Loss 0.6969 LearningRate 0.0035 Epoch: 16 Global Step: 271140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:24,111-Speed 5159.93 samples/sec Loss 0.6555 LearningRate 0.0035 Epoch: 16 Global Step: 271150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:26,080-Speed 5201.52 samples/sec Loss 0.6596 LearningRate 0.0035 Epoch: 16 Global Step: 271160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:28,057-Speed 5180.51 samples/sec Loss 0.6670 LearningRate 0.0035 Epoch: 16 Global Step: 271170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:30,028-Speed 5197.60 samples/sec Loss 0.6502 LearningRate 0.0035 Epoch: 16 Global Step: 271180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:32,002-Speed 5188.64 samples/sec Loss 0.6633 LearningRate 0.0035 Epoch: 16 Global Step: 271190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:33,972-Speed 5200.54 samples/sec Loss 0.6519 LearningRate 0.0035 Epoch: 16 Global Step: 271200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:35,956-Speed 5161.65 samples/sec Loss 0.7183 LearningRate 0.0035 Epoch: 16 Global Step: 271210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:37,950-Speed 5137.81 samples/sec Loss 0.6663 LearningRate 0.0035 Epoch: 16 Global Step: 271220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:39,956-Speed 5108.43 samples/sec Loss 0.6616 LearningRate 0.0035 Epoch: 16 Global Step: 271230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:41,926-Speed 5198.72 samples/sec Loss 0.6638 LearningRate 0.0035 Epoch: 16 Global Step: 271240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:43,899-Speed 5193.40 samples/sec Loss 0.6834 LearningRate 0.0035 Epoch: 16 Global Step: 271250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:45,873-Speed 5189.97 samples/sec Loss 0.6406 LearningRate 0.0035 Epoch: 16 Global Step: 271260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:28:47,862-Speed 5148.72 samples/sec Loss 0.6417 LearningRate 0.0035 Epoch: 16 Global Step: 271270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:49,845-Speed 5167.03 samples/sec Loss 0.6603 LearningRate 0.0035 Epoch: 16 Global Step: 271280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:51,821-Speed 5182.63 samples/sec Loss 0.6167 LearningRate 0.0035 Epoch: 16 Global Step: 271290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:53,804-Speed 5165.96 samples/sec Loss 0.6549 LearningRate 0.0035 Epoch: 16 Global Step: 271300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:55,810-Speed 5108.06 samples/sec Loss 0.6197 LearningRate 0.0035 Epoch: 16 Global Step: 271310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:57,785-Speed 5186.97 samples/sec Loss 0.6567 LearningRate 0.0035 Epoch: 16 Global Step: 271320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:28:59,766-Speed 5170.45 samples/sec Loss 0.6633 LearningRate 0.0035 Epoch: 16 Global Step: 271330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:29:01,758-Speed 5141.12 samples/sec Loss 0.6698 LearningRate 0.0035 Epoch: 16 Global Step: 271340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:29:03,740-Speed 5168.46 samples/sec Loss 0.6764 LearningRate 0.0035 Epoch: 16 Global Step: 271350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:29:05,735-Speed 5135.88 samples/sec Loss 0.6215 LearningRate 0.0035 Epoch: 16 Global Step: 271360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:29:07,704-Speed 5202.21 samples/sec Loss 0.6278 LearningRate 0.0035 Epoch: 16 Global Step: 271370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:29:09,680-Speed 5183.92 samples/sec Loss 0.6594 LearningRate 0.0035 Epoch: 16 Global Step: 271380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:29:11,680-Speed 5121.96 samples/sec Loss 0.6620 LearningRate 0.0035 Epoch: 16 Global Step: 271390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:29:13,676-Speed 5133.33 samples/sec Loss 0.6685 LearningRate 0.0035 Epoch: 16 Global Step: 271400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:29:15,673-Speed 5128.41 samples/sec Loss 0.6268 LearningRate 0.0035 Epoch: 16 Global Step: 271410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:29:17,650-Speed 5182.98 samples/sec Loss 0.6613 LearningRate 0.0035 Epoch: 16 Global Step: 271420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:29:19,624-Speed 5188.76 samples/sec Loss 0.6903 LearningRate 0.0035 Epoch: 16 Global Step: 271430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:29:21,618-Speed 5136.59 samples/sec Loss 0.6577 LearningRate 0.0035 Epoch: 16 Global Step: 271440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:29:23,605-Speed 5156.75 samples/sec Loss 0.6311 LearningRate 0.0035 Epoch: 16 Global Step: 271450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:29:25,596-Speed 5146.38 samples/sec Loss 0.6642 LearningRate 0.0035 Epoch: 16 Global Step: 271460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:29:27,611-Speed 5082.78 samples/sec Loss 0.6638 LearningRate 0.0035 Epoch: 16 Global Step: 271470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:29:29,611-Speed 5122.58 samples/sec Loss 0.6504 LearningRate 0.0035 Epoch: 16 Global Step: 271480 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:29:31,599-Speed 5152.82 samples/sec Loss 0.6418 LearningRate 0.0035 Epoch: 16 Global Step: 271490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:29:33,592-Speed 5138.87 samples/sec Loss 0.6883 LearningRate 0.0035 Epoch: 16 Global Step: 271500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:29:35,578-Speed 5158.74 samples/sec Loss 0.6627 LearningRate 0.0035 Epoch: 16 Global Step: 271510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:29:37,558-Speed 5172.36 samples/sec Loss 0.6531 LearningRate 0.0035 Epoch: 16 Global Step: 271520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:29:39,529-Speed 5199.79 samples/sec Loss 0.6828 LearningRate 0.0035 Epoch: 16 Global Step: 271530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:29:41,514-Speed 5158.08 samples/sec Loss 0.6419 LearningRate 0.0035 Epoch: 16 Global Step: 271540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:29:43,499-Speed 5162.09 samples/sec Loss 0.6211 LearningRate 0.0035 Epoch: 16 Global Step: 271550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:29:45,495-Speed 5132.22 samples/sec Loss 0.6559 LearningRate 0.0035 Epoch: 16 Global Step: 271560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:29:47,472-Speed 5182.71 samples/sec Loss 0.6690 LearningRate 0.0035 Epoch: 16 Global Step: 271570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:29:49,445-Speed 5190.72 samples/sec Loss 0.6762 LearningRate 0.0035 Epoch: 16 Global Step: 271580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:29:51,418-Speed 5190.90 samples/sec Loss 0.6642 LearningRate 0.0035 Epoch: 16 Global Step: 271590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:29:53,392-Speed 5188.60 samples/sec Loss 0.6518 LearningRate 0.0035 Epoch: 16 Global Step: 271600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:29:55,366-Speed 5190.53 samples/sec Loss 0.6618 LearningRate 0.0035 Epoch: 16 Global Step: 271610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:29:57,339-Speed 5191.46 samples/sec Loss 0.6681 LearningRate 0.0035 Epoch: 16 Global Step: 271620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:29:59,333-Speed 5136.23 samples/sec Loss 0.6458 LearningRate 0.0035 Epoch: 16 Global Step: 271630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:30:01,321-Speed 5154.02 samples/sec Loss 0.6672 LearningRate 0.0035 Epoch: 16 Global Step: 271640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:30:03,312-Speed 5144.65 samples/sec Loss 0.6541 LearningRate 0.0035 Epoch: 16 Global Step: 271650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:30:05,314-Speed 5117.19 samples/sec Loss 0.6514 LearningRate 0.0035 Epoch: 16 Global Step: 271660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:30:07,292-Speed 5177.36 samples/sec Loss 0.6637 LearningRate 0.0035 Epoch: 16 Global Step: 271670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:30:09,304-Speed 5092.36 samples/sec Loss 0.6610 LearningRate 0.0035 Epoch: 16 Global Step: 271680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:30:11,283-Speed 5178.72 samples/sec Loss 0.6547 LearningRate 0.0035 Epoch: 16 Global Step: 271690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:30:13,292-Speed 5098.63 samples/sec Loss 0.6253 LearningRate 0.0035 Epoch: 16 Global Step: 271700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:30:15,280-Speed 5152.54 samples/sec Loss 0.6501 LearningRate 0.0035 Epoch: 16 Global Step: 271710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:30:17,252-Speed 5194.36 samples/sec Loss 0.6636 LearningRate 0.0035 Epoch: 16 Global Step: 271720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:30:19,221-Speed 5202.42 samples/sec Loss 0.6980 LearningRate 0.0035 Epoch: 16 Global Step: 271730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:30:21,199-Speed 5180.14 samples/sec Loss 0.6799 LearningRate 0.0035 Epoch: 16 Global Step: 271740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:30:23,169-Speed 5197.11 samples/sec Loss 0.6683 LearningRate 0.0035 Epoch: 16 Global Step: 271750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:30:25,158-Speed 5151.72 samples/sec Loss 0.6800 LearningRate 0.0035 Epoch: 16 Global Step: 271760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:30:27,180-Speed 5067.96 samples/sec Loss 0.6376 LearningRate 0.0035 Epoch: 16 Global Step: 271770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:30:29,164-Speed 5160.91 samples/sec Loss 0.6887 LearningRate 0.0035 Epoch: 16 Global Step: 271780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:30:31,134-Speed 5200.16 samples/sec Loss 0.7178 LearningRate 0.0035 Epoch: 16 Global Step: 271790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:30:33,119-Speed 5159.59 samples/sec Loss 0.6707 LearningRate 0.0035 Epoch: 16 Global Step: 271800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:30:35,117-Speed 5127.48 samples/sec Loss 0.6364 LearningRate 0.0035 Epoch: 16 Global Step: 271810 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:30:37,082-Speed 5214.18 samples/sec Loss 0.6549 LearningRate 0.0034 Epoch: 16 Global Step: 271820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:30:39,063-Speed 5170.74 samples/sec Loss 0.7055 LearningRate 0.0034 Epoch: 16 Global Step: 271830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:30:41,051-Speed 5152.02 samples/sec Loss 0.6734 LearningRate 0.0034 Epoch: 16 Global Step: 271840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:30:43,026-Speed 5188.14 samples/sec Loss 0.6812 LearningRate 0.0034 Epoch: 16 Global Step: 271850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:30:45,018-Speed 5142.61 samples/sec Loss 0.6889 LearningRate 0.0034 Epoch: 16 Global Step: 271860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:30:47,005-Speed 5153.84 samples/sec Loss 0.6516 LearningRate 0.0034 Epoch: 16 Global Step: 271870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:30:48,988-Speed 5166.81 samples/sec Loss 0.6894 LearningRate 0.0034 Epoch: 16 Global Step: 271880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:30:50,990-Speed 5118.17 samples/sec Loss 0.6697 LearningRate 0.0034 Epoch: 16 Global Step: 271890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:30:52,961-Speed 5196.43 samples/sec Loss 0.6613 LearningRate 0.0034 Epoch: 16 Global Step: 271900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:30:54,933-Speed 5195.01 samples/sec Loss 0.6233 LearningRate 0.0034 Epoch: 16 Global Step: 271910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:30:56,924-Speed 5146.04 samples/sec Loss 0.6638 LearningRate 0.0034 Epoch: 16 Global Step: 271920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:30:58,910-Speed 5158.81 samples/sec Loss 0.6511 LearningRate 0.0034 Epoch: 16 Global Step: 271930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:31:00,886-Speed 5183.59 samples/sec Loss 0.6615 LearningRate 0.0034 Epoch: 16 Global Step: 271940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:31:02,880-Speed 5137.77 samples/sec Loss 0.6675 LearningRate 0.0034 Epoch: 16 Global Step: 271950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:31:04,866-Speed 5157.50 samples/sec Loss 0.6912 LearningRate 0.0034 Epoch: 16 Global Step: 271960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:31:06,852-Speed 5159.66 samples/sec Loss 0.6201 LearningRate 0.0034 Epoch: 16 Global Step: 271970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:31:08,834-Speed 5168.39 samples/sec Loss 0.6563 LearningRate 0.0034 Epoch: 16 Global Step: 271980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:31:10,810-Speed 5185.63 samples/sec Loss 0.6747 LearningRate 0.0034 Epoch: 16 Global Step: 271990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:31:12,802-Speed 5141.85 samples/sec Loss 0.6629 LearningRate 0.0034 Epoch: 16 Global Step: 272000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:31:39,805-[lfw][272000]XNorm: 22.444066 Training: 2022-04-11 17:31:39,806-[lfw][272000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 17:31:39,807-[lfw][272000]Accuracy-Highest: 0.99833 Training: 2022-04-11 17:32:10,825-[cfp_fp][272000]XNorm: 22.398544 Training: 2022-04-11 17:32:10,826-[cfp_fp][272000]Accuracy-Flip: 0.98914+-0.00512 Training: 2022-04-11 17:32:10,827-[cfp_fp][272000]Accuracy-Highest: 0.98929 Training: 2022-04-11 17:32:37,511-[agedb_30][272000]XNorm: 23.342252 Training: 2022-04-11 17:32:37,512-[agedb_30][272000]Accuracy-Flip: 0.98333+-0.00650 Training: 2022-04-11 17:32:37,512-[agedb_30][272000]Accuracy-Highest: 0.98333 Training: 2022-04-11 17:32:39,488-Speed 118.13 samples/sec Loss 0.6702 LearningRate 0.0034 Epoch: 16 Global Step: 272010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:32:41,456-Speed 5207.15 samples/sec Loss 0.6653 LearningRate 0.0034 Epoch: 16 Global Step: 272020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:32:43,461-Speed 5108.19 samples/sec Loss 0.6430 LearningRate 0.0034 Epoch: 16 Global Step: 272030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:32:45,438-Speed 5181.65 samples/sec Loss 0.6731 LearningRate 0.0034 Epoch: 16 Global Step: 272040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:32:47,421-Speed 5167.18 samples/sec Loss 0.6669 LearningRate 0.0034 Epoch: 16 Global Step: 272050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:32:49,393-Speed 5193.73 samples/sec Loss 0.6932 LearningRate 0.0034 Epoch: 16 Global Step: 272060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:32:51,357-Speed 5217.05 samples/sec Loss 0.6646 LearningRate 0.0034 Epoch: 16 Global Step: 272070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:32:53,333-Speed 5182.83 samples/sec Loss 0.6599 LearningRate 0.0034 Epoch: 16 Global Step: 272080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:32:55,313-Speed 5173.28 samples/sec Loss 0.6503 LearningRate 0.0034 Epoch: 16 Global Step: 272090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:32:57,338-Speed 5059.30 samples/sec Loss 0.6685 LearningRate 0.0034 Epoch: 16 Global Step: 272100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:32:59,316-Speed 5180.04 samples/sec Loss 0.6892 LearningRate 0.0034 Epoch: 16 Global Step: 272110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:33:01,295-Speed 5176.48 samples/sec Loss 0.6640 LearningRate 0.0034 Epoch: 16 Global Step: 272120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:33:03,289-Speed 5137.18 samples/sec Loss 0.6717 LearningRate 0.0034 Epoch: 16 Global Step: 272130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:33:05,295-Speed 5108.35 samples/sec Loss 0.6553 LearningRate 0.0034 Epoch: 16 Global Step: 272140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:33:07,272-Speed 5181.92 samples/sec Loss 0.6288 LearningRate 0.0034 Epoch: 16 Global Step: 272150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:33:09,252-Speed 5173.58 samples/sec Loss 0.6700 LearningRate 0.0034 Epoch: 16 Global Step: 272160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:33:11,228-Speed 5182.83 samples/sec Loss 0.6705 LearningRate 0.0034 Epoch: 16 Global Step: 272170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:33:13,217-Speed 5151.62 samples/sec Loss 0.6599 LearningRate 0.0034 Epoch: 16 Global Step: 272180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:33:15,206-Speed 5149.66 samples/sec Loss 0.6601 LearningRate 0.0034 Epoch: 16 Global Step: 272190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:33:17,189-Speed 5165.78 samples/sec Loss 0.6795 LearningRate 0.0034 Epoch: 16 Global Step: 272200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:33:19,157-Speed 5204.92 samples/sec Loss 0.7025 LearningRate 0.0034 Epoch: 16 Global Step: 272210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:33:21,125-Speed 5206.12 samples/sec Loss 0.7113 LearningRate 0.0034 Epoch: 16 Global Step: 272220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:33:23,095-Speed 5200.86 samples/sec Loss 0.6753 LearningRate 0.0034 Epoch: 16 Global Step: 272230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:33:25,061-Speed 5210.24 samples/sec Loss 0.6558 LearningRate 0.0034 Epoch: 16 Global Step: 272240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:33:27,041-Speed 5173.30 samples/sec Loss 0.6839 LearningRate 0.0034 Epoch: 16 Global Step: 272250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:33:29,015-Speed 5187.91 samples/sec Loss 0.6683 LearningRate 0.0034 Epoch: 16 Global Step: 272260 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:33:30,978-Speed 5218.68 samples/sec Loss 0.6261 LearningRate 0.0034 Epoch: 16 Global Step: 272270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:33:32,970-Speed 5143.30 samples/sec Loss 0.6510 LearningRate 0.0034 Epoch: 16 Global Step: 272280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:33:34,953-Speed 5166.57 samples/sec Loss 0.6420 LearningRate 0.0034 Epoch: 16 Global Step: 272290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:33:36,936-Speed 5165.59 samples/sec Loss 0.6150 LearningRate 0.0034 Epoch: 16 Global Step: 272300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:33:38,902-Speed 5207.96 samples/sec Loss 0.6768 LearningRate 0.0034 Epoch: 16 Global Step: 272310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:33:40,872-Speed 5199.93 samples/sec Loss 0.6747 LearningRate 0.0034 Epoch: 16 Global Step: 272320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:33:42,850-Speed 5181.06 samples/sec Loss 0.6560 LearningRate 0.0034 Epoch: 16 Global Step: 272330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:33:44,823-Speed 5190.82 samples/sec Loss 0.6622 LearningRate 0.0034 Epoch: 16 Global Step: 272340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:33:46,800-Speed 5182.20 samples/sec Loss 0.6418 LearningRate 0.0034 Epoch: 16 Global Step: 272350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:33:48,782-Speed 5167.78 samples/sec Loss 0.6349 LearningRate 0.0034 Epoch: 16 Global Step: 272360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:33:50,778-Speed 5134.53 samples/sec Loss 0.7002 LearningRate 0.0034 Epoch: 16 Global Step: 272370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:33:52,769-Speed 5143.85 samples/sec Loss 0.6865 LearningRate 0.0034 Epoch: 16 Global Step: 272380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:33:54,733-Speed 5215.48 samples/sec Loss 0.6674 LearningRate 0.0034 Epoch: 16 Global Step: 272390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:33:56,735-Speed 5118.11 samples/sec Loss 0.6966 LearningRate 0.0034 Epoch: 16 Global Step: 272400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:33:58,699-Speed 5215.48 samples/sec Loss 0.6775 LearningRate 0.0034 Epoch: 16 Global Step: 272410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:34:00,680-Speed 5172.16 samples/sec Loss 0.6749 LearningRate 0.0034 Epoch: 16 Global Step: 272420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:34:02,675-Speed 5134.71 samples/sec Loss 0.6776 LearningRate 0.0034 Epoch: 16 Global Step: 272430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:34:04,648-Speed 5192.50 samples/sec Loss 0.6497 LearningRate 0.0034 Epoch: 16 Global Step: 272440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:34:06,633-Speed 5161.36 samples/sec Loss 0.6736 LearningRate 0.0034 Epoch: 16 Global Step: 272450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:34:08,617-Speed 5163.41 samples/sec Loss 0.6438 LearningRate 0.0034 Epoch: 16 Global Step: 272460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:34:10,593-Speed 5184.88 samples/sec Loss 0.6419 LearningRate 0.0034 Epoch: 16 Global Step: 272470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:34:12,590-Speed 5130.05 samples/sec Loss 0.6210 LearningRate 0.0034 Epoch: 16 Global Step: 272480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:34:14,561-Speed 5198.14 samples/sec Loss 0.6483 LearningRate 0.0034 Epoch: 16 Global Step: 272490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:34:16,556-Speed 5134.79 samples/sec Loss 0.6620 LearningRate 0.0034 Epoch: 16 Global Step: 272500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:34:18,571-Speed 5083.27 samples/sec Loss 0.6367 LearningRate 0.0034 Epoch: 16 Global Step: 272510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:34:20,541-Speed 5201.16 samples/sec Loss 0.6633 LearningRate 0.0034 Epoch: 16 Global Step: 272520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:34:22,537-Speed 5133.09 samples/sec Loss 0.6803 LearningRate 0.0034 Epoch: 16 Global Step: 272530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:34:24,512-Speed 5185.63 samples/sec Loss 0.6625 LearningRate 0.0034 Epoch: 16 Global Step: 272540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:34:26,500-Speed 5152.26 samples/sec Loss 0.6895 LearningRate 0.0034 Epoch: 16 Global Step: 272550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:34:28,495-Speed 5136.08 samples/sec Loss 0.6867 LearningRate 0.0034 Epoch: 16 Global Step: 272560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:34:30,467-Speed 5194.04 samples/sec Loss 0.6865 LearningRate 0.0034 Epoch: 16 Global Step: 272570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:34:32,445-Speed 5179.86 samples/sec Loss 0.6674 LearningRate 0.0034 Epoch: 16 Global Step: 272580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:34:34,415-Speed 5197.30 samples/sec Loss 0.6160 LearningRate 0.0034 Epoch: 16 Global Step: 272590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:34:36,395-Speed 5175.82 samples/sec Loss 0.7079 LearningRate 0.0034 Epoch: 16 Global Step: 272600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:34:38,392-Speed 5127.68 samples/sec Loss 0.6785 LearningRate 0.0034 Epoch: 16 Global Step: 272610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:34:40,362-Speed 5199.86 samples/sec Loss 0.6879 LearningRate 0.0034 Epoch: 16 Global Step: 272620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:34:42,341-Speed 5176.71 samples/sec Loss 0.6922 LearningRate 0.0034 Epoch: 16 Global Step: 272630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:34:44,353-Speed 5091.46 samples/sec Loss 0.6743 LearningRate 0.0034 Epoch: 16 Global Step: 272640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:34:46,359-Speed 5106.64 samples/sec Loss 0.6579 LearningRate 0.0034 Epoch: 16 Global Step: 272650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:34:48,324-Speed 5213.14 samples/sec Loss 0.6810 LearningRate 0.0034 Epoch: 16 Global Step: 272660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:34:50,315-Speed 5145.80 samples/sec Loss 0.6908 LearningRate 0.0034 Epoch: 16 Global Step: 272670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:34:52,301-Speed 5157.68 samples/sec Loss 0.6843 LearningRate 0.0034 Epoch: 16 Global Step: 272680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:34:54,293-Speed 5140.96 samples/sec Loss 0.6780 LearningRate 0.0034 Epoch: 16 Global Step: 272690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:34:56,265-Speed 5196.17 samples/sec Loss 0.6604 LearningRate 0.0034 Epoch: 16 Global Step: 272700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:34:58,275-Speed 5094.56 samples/sec Loss 0.6593 LearningRate 0.0034 Epoch: 16 Global Step: 272710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:35:00,263-Speed 5152.55 samples/sec Loss 0.6694 LearningRate 0.0034 Epoch: 16 Global Step: 272720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:35:02,254-Speed 5146.10 samples/sec Loss 0.6798 LearningRate 0.0033 Epoch: 16 Global Step: 272730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:35:04,232-Speed 5177.81 samples/sec Loss 0.6461 LearningRate 0.0033 Epoch: 16 Global Step: 272740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:35:06,229-Speed 5129.42 samples/sec Loss 0.6584 LearningRate 0.0033 Epoch: 16 Global Step: 272750 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:35:08,210-Speed 5172.85 samples/sec Loss 0.7121 LearningRate 0.0033 Epoch: 16 Global Step: 272760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:35:10,187-Speed 5180.80 samples/sec Loss 0.6575 LearningRate 0.0033 Epoch: 16 Global Step: 272770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:35:12,166-Speed 5175.95 samples/sec Loss 0.6791 LearningRate 0.0033 Epoch: 16 Global Step: 272780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:35:14,156-Speed 5147.35 samples/sec Loss 0.6652 LearningRate 0.0033 Epoch: 16 Global Step: 272790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:35:16,141-Speed 5160.50 samples/sec Loss 0.6735 LearningRate 0.0033 Epoch: 16 Global Step: 272800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:35:18,116-Speed 5186.29 samples/sec Loss 0.7017 LearningRate 0.0033 Epoch: 16 Global Step: 272810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:35:20,110-Speed 5137.75 samples/sec Loss 0.6680 LearningRate 0.0033 Epoch: 16 Global Step: 272820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:35:22,087-Speed 5180.92 samples/sec Loss 0.6804 LearningRate 0.0033 Epoch: 16 Global Step: 272830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:35:24,095-Speed 5100.93 samples/sec Loss 0.6756 LearningRate 0.0033 Epoch: 16 Global Step: 272840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:35:26,098-Speed 5116.55 samples/sec Loss 0.6939 LearningRate 0.0033 Epoch: 16 Global Step: 272850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:35:28,075-Speed 5180.40 samples/sec Loss 0.6569 LearningRate 0.0033 Epoch: 16 Global Step: 272860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:35:30,073-Speed 5128.41 samples/sec Loss 0.6510 LearningRate 0.0033 Epoch: 16 Global Step: 272870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:35:32,041-Speed 5206.45 samples/sec Loss 0.6521 LearningRate 0.0033 Epoch: 16 Global Step: 272880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:35:34,066-Speed 5058.72 samples/sec Loss 0.6998 LearningRate 0.0033 Epoch: 16 Global Step: 272890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:35:36,041-Speed 5186.50 samples/sec Loss 0.6652 LearningRate 0.0033 Epoch: 16 Global Step: 272900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:35:38,059-Speed 5075.76 samples/sec Loss 0.6770 LearningRate 0.0033 Epoch: 16 Global Step: 272910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:35:40,052-Speed 5141.34 samples/sec Loss 0.6538 LearningRate 0.0033 Epoch: 16 Global Step: 272920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:35:42,033-Speed 5171.25 samples/sec Loss 0.6975 LearningRate 0.0033 Epoch: 16 Global Step: 272930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:35:44,013-Speed 5173.27 samples/sec Loss 0.6373 LearningRate 0.0033 Epoch: 16 Global Step: 272940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:35:45,984-Speed 5199.99 samples/sec Loss 0.6547 LearningRate 0.0033 Epoch: 16 Global Step: 272950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:35:47,963-Speed 5174.35 samples/sec Loss 0.6730 LearningRate 0.0033 Epoch: 16 Global Step: 272960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:35:49,948-Speed 5162.45 samples/sec Loss 0.7054 LearningRate 0.0033 Epoch: 16 Global Step: 272970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:35:51,935-Speed 5155.95 samples/sec Loss 0.6822 LearningRate 0.0033 Epoch: 16 Global Step: 272980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:35:53,909-Speed 5188.86 samples/sec Loss 0.6490 LearningRate 0.0033 Epoch: 16 Global Step: 272990 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:35:55,876-Speed 5207.71 samples/sec Loss 0.7211 LearningRate 0.0033 Epoch: 16 Global Step: 273000 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:35:57,846-Speed 5200.85 samples/sec Loss 0.6343 LearningRate 0.0033 Epoch: 16 Global Step: 273010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:35:59,859-Speed 5086.40 samples/sec Loss 0.6738 LearningRate 0.0033 Epoch: 16 Global Step: 273020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:36:01,848-Speed 5150.50 samples/sec Loss 0.6713 LearningRate 0.0033 Epoch: 16 Global Step: 273030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:36:03,836-Speed 5154.15 samples/sec Loss 0.6993 LearningRate 0.0033 Epoch: 16 Global Step: 273040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:36:05,814-Speed 5177.85 samples/sec Loss 0.7000 LearningRate 0.0033 Epoch: 16 Global Step: 273050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:36:07,790-Speed 5184.64 samples/sec Loss 0.6442 LearningRate 0.0033 Epoch: 16 Global Step: 273060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:36:09,770-Speed 5172.62 samples/sec Loss 0.6791 LearningRate 0.0033 Epoch: 16 Global Step: 273070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:36:11,747-Speed 5181.61 samples/sec Loss 0.6552 LearningRate 0.0033 Epoch: 16 Global Step: 273080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:36:13,713-Speed 5210.12 samples/sec Loss 0.6925 LearningRate 0.0033 Epoch: 16 Global Step: 273090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:36:15,692-Speed 5175.84 samples/sec Loss 0.6583 LearningRate 0.0033 Epoch: 16 Global Step: 273100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:36:17,680-Speed 5153.16 samples/sec Loss 0.6538 LearningRate 0.0033 Epoch: 16 Global Step: 273110 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:36:19,680-Speed 5122.07 samples/sec Loss 0.6905 LearningRate 0.0033 Epoch: 16 Global Step: 273120 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:36:21,654-Speed 5191.36 samples/sec Loss 0.6970 LearningRate 0.0033 Epoch: 16 Global Step: 273130 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:36:23,630-Speed 5183.46 samples/sec Loss 0.6304 LearningRate 0.0033 Epoch: 16 Global Step: 273140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:36:25,611-Speed 5170.75 samples/sec Loss 0.6349 LearningRate 0.0033 Epoch: 16 Global Step: 273150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:36:27,586-Speed 5185.79 samples/sec Loss 0.7009 LearningRate 0.0033 Epoch: 16 Global Step: 273160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:36:29,553-Speed 5207.51 samples/sec Loss 0.6725 LearningRate 0.0033 Epoch: 16 Global Step: 273170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:36:31,543-Speed 5147.43 samples/sec Loss 0.6571 LearningRate 0.0033 Epoch: 16 Global Step: 273180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:36:33,522-Speed 5174.64 samples/sec Loss 0.6754 LearningRate 0.0033 Epoch: 16 Global Step: 273190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:36:35,530-Speed 5103.64 samples/sec Loss 0.6465 LearningRate 0.0033 Epoch: 16 Global Step: 273200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:36:37,542-Speed 5089.75 samples/sec Loss 0.6829 LearningRate 0.0033 Epoch: 16 Global Step: 273210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:36:39,523-Speed 5172.41 samples/sec Loss 0.6710 LearningRate 0.0033 Epoch: 16 Global Step: 273220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:36:41,524-Speed 5120.32 samples/sec Loss 0.6582 LearningRate 0.0033 Epoch: 16 Global Step: 273230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:36:43,501-Speed 5182.21 samples/sec Loss 0.6950 LearningRate 0.0033 Epoch: 16 Global Step: 273240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:36:45,471-Speed 5200.59 samples/sec Loss 0.6532 LearningRate 0.0033 Epoch: 16 Global Step: 273250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:36:47,472-Speed 5117.02 samples/sec Loss 0.6851 LearningRate 0.0033 Epoch: 16 Global Step: 273260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:36:49,475-Speed 5115.82 samples/sec Loss 0.6865 LearningRate 0.0033 Epoch: 16 Global Step: 273270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:36:51,447-Speed 5196.34 samples/sec Loss 0.6700 LearningRate 0.0033 Epoch: 16 Global Step: 273280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:36:53,417-Speed 5200.02 samples/sec Loss 0.6703 LearningRate 0.0033 Epoch: 16 Global Step: 273290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:36:55,390-Speed 5190.70 samples/sec Loss 0.6965 LearningRate 0.0033 Epoch: 16 Global Step: 273300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:36:57,418-Speed 5051.47 samples/sec Loss 0.6496 LearningRate 0.0033 Epoch: 16 Global Step: 273310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:36:59,403-Speed 5160.45 samples/sec Loss 0.6571 LearningRate 0.0033 Epoch: 16 Global Step: 273320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:37:01,379-Speed 5183.70 samples/sec Loss 0.6851 LearningRate 0.0033 Epoch: 16 Global Step: 273330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:37:03,371-Speed 5143.11 samples/sec Loss 0.6826 LearningRate 0.0033 Epoch: 16 Global Step: 273340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:37:05,351-Speed 5173.29 samples/sec Loss 0.6485 LearningRate 0.0033 Epoch: 16 Global Step: 273350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:37:07,320-Speed 5203.26 samples/sec Loss 0.6889 LearningRate 0.0033 Epoch: 16 Global Step: 273360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:37:09,310-Speed 5147.03 samples/sec Loss 0.6802 LearningRate 0.0033 Epoch: 16 Global Step: 273370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:37:11,297-Speed 5156.63 samples/sec Loss 0.6916 LearningRate 0.0033 Epoch: 16 Global Step: 273380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:37:13,276-Speed 5176.77 samples/sec Loss 0.6549 LearningRate 0.0033 Epoch: 16 Global Step: 273390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:37:15,248-Speed 5193.44 samples/sec Loss 0.6765 LearningRate 0.0033 Epoch: 16 Global Step: 273400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:37:17,234-Speed 5159.67 samples/sec Loss 0.7036 LearningRate 0.0033 Epoch: 16 Global Step: 273410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:37:19,202-Speed 5204.75 samples/sec Loss 0.6963 LearningRate 0.0033 Epoch: 16 Global Step: 273420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:37:21,173-Speed 5195.31 samples/sec Loss 0.7038 LearningRate 0.0033 Epoch: 16 Global Step: 273430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:37:23,164-Speed 5144.73 samples/sec Loss 0.6432 LearningRate 0.0033 Epoch: 16 Global Step: 273440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:37:25,158-Speed 5138.66 samples/sec Loss 0.6808 LearningRate 0.0033 Epoch: 16 Global Step: 273450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:37:27,178-Speed 5070.87 samples/sec Loss 0.7119 LearningRate 0.0033 Epoch: 16 Global Step: 273460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:37:29,161-Speed 5166.18 samples/sec Loss 0.6564 LearningRate 0.0033 Epoch: 16 Global Step: 273470 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:37:31,130-Speed 5203.07 samples/sec Loss 0.6820 LearningRate 0.0033 Epoch: 16 Global Step: 273480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:37:33,129-Speed 5124.73 samples/sec Loss 0.6942 LearningRate 0.0033 Epoch: 16 Global Step: 273490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:37:35,105-Speed 5184.24 samples/sec Loss 0.6952 LearningRate 0.0033 Epoch: 16 Global Step: 273500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:37:37,087-Speed 5167.04 samples/sec Loss 0.6803 LearningRate 0.0033 Epoch: 16 Global Step: 273510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:37:39,097-Speed 5097.97 samples/sec Loss 0.6838 LearningRate 0.0033 Epoch: 16 Global Step: 273520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:37:41,069-Speed 5194.46 samples/sec Loss 0.6769 LearningRate 0.0033 Epoch: 16 Global Step: 273530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:37:43,044-Speed 5188.06 samples/sec Loss 0.6946 LearningRate 0.0033 Epoch: 16 Global Step: 273540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:37:45,026-Speed 5168.24 samples/sec Loss 0.6782 LearningRate 0.0033 Epoch: 16 Global Step: 273550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:37:47,028-Speed 5116.32 samples/sec Loss 0.6992 LearningRate 0.0033 Epoch: 16 Global Step: 273560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:37:49,010-Speed 5169.21 samples/sec Loss 0.6848 LearningRate 0.0033 Epoch: 16 Global Step: 273570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:37:50,984-Speed 5190.27 samples/sec Loss 0.7078 LearningRate 0.0033 Epoch: 16 Global Step: 273580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:37:52,968-Speed 5162.57 samples/sec Loss 0.6588 LearningRate 0.0033 Epoch: 16 Global Step: 273590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:37:54,945-Speed 5180.35 samples/sec Loss 0.6754 LearningRate 0.0033 Epoch: 16 Global Step: 273600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:37:56,956-Speed 5093.93 samples/sec Loss 0.6401 LearningRate 0.0033 Epoch: 16 Global Step: 273610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:37:58,943-Speed 5155.61 samples/sec Loss 0.6482 LearningRate 0.0033 Epoch: 16 Global Step: 273620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:38:00,928-Speed 5161.49 samples/sec Loss 0.6429 LearningRate 0.0033 Epoch: 16 Global Step: 273630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:38:02,922-Speed 5137.59 samples/sec Loss 0.6512 LearningRate 0.0032 Epoch: 16 Global Step: 273640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:38:04,903-Speed 5172.76 samples/sec Loss 0.6991 LearningRate 0.0032 Epoch: 16 Global Step: 273650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:38:06,897-Speed 5136.23 samples/sec Loss 0.6693 LearningRate 0.0032 Epoch: 16 Global Step: 273660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:38:08,872-Speed 5188.86 samples/sec Loss 0.6842 LearningRate 0.0032 Epoch: 16 Global Step: 273670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:38:10,845-Speed 5191.57 samples/sec Loss 0.6508 LearningRate 0.0032 Epoch: 16 Global Step: 273680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:38:12,824-Speed 5174.93 samples/sec Loss 0.6799 LearningRate 0.0032 Epoch: 16 Global Step: 273690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:38:14,830-Speed 5106.93 samples/sec Loss 0.6843 LearningRate 0.0032 Epoch: 16 Global Step: 273700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:38:16,802-Speed 5195.14 samples/sec Loss 0.6836 LearningRate 0.0032 Epoch: 16 Global Step: 273710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:38:18,779-Speed 5180.68 samples/sec Loss 0.6589 LearningRate 0.0032 Epoch: 16 Global Step: 273720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:38:20,745-Speed 5211.03 samples/sec Loss 0.6963 LearningRate 0.0032 Epoch: 16 Global Step: 273730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:38:22,736-Speed 5146.87 samples/sec Loss 0.7006 LearningRate 0.0032 Epoch: 16 Global Step: 273740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:38:24,717-Speed 5169.75 samples/sec Loss 0.6800 LearningRate 0.0032 Epoch: 16 Global Step: 273750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:38:26,723-Speed 5105.85 samples/sec Loss 0.6724 LearningRate 0.0032 Epoch: 16 Global Step: 273760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:38:28,704-Speed 5170.80 samples/sec Loss 0.6765 LearningRate 0.0032 Epoch: 16 Global Step: 273770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:38:30,689-Speed 5162.51 samples/sec Loss 0.6358 LearningRate 0.0032 Epoch: 16 Global Step: 273780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:38:32,658-Speed 5201.74 samples/sec Loss 0.6908 LearningRate 0.0032 Epoch: 16 Global Step: 273790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:38:34,629-Speed 5197.68 samples/sec Loss 0.6676 LearningRate 0.0032 Epoch: 16 Global Step: 273800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:38:36,604-Speed 5186.62 samples/sec Loss 0.6421 LearningRate 0.0032 Epoch: 16 Global Step: 273810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:38:38,600-Speed 5132.39 samples/sec Loss 0.6524 LearningRate 0.0032 Epoch: 16 Global Step: 273820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:38:40,572-Speed 5194.87 samples/sec Loss 0.6528 LearningRate 0.0032 Epoch: 16 Global Step: 273830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:38:42,551-Speed 5174.91 samples/sec Loss 0.6753 LearningRate 0.0032 Epoch: 16 Global Step: 273840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:38:44,557-Speed 5109.37 samples/sec Loss 0.6655 LearningRate 0.0032 Epoch: 16 Global Step: 273850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:38:46,543-Speed 5157.00 samples/sec Loss 0.6804 LearningRate 0.0032 Epoch: 16 Global Step: 273860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:38:48,517-Speed 5188.63 samples/sec Loss 0.6686 LearningRate 0.0032 Epoch: 16 Global Step: 273870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:38:50,503-Speed 5158.99 samples/sec Loss 0.6848 LearningRate 0.0032 Epoch: 16 Global Step: 273880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:38:52,482-Speed 5176.14 samples/sec Loss 0.6926 LearningRate 0.0032 Epoch: 16 Global Step: 273890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:38:54,471-Speed 5150.00 samples/sec Loss 0.6786 LearningRate 0.0032 Epoch: 16 Global Step: 273900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:38:56,449-Speed 5179.65 samples/sec Loss 0.6689 LearningRate 0.0032 Epoch: 16 Global Step: 273910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:38:58,424-Speed 5185.92 samples/sec Loss 0.6967 LearningRate 0.0032 Epoch: 16 Global Step: 273920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:39:00,411-Speed 5156.92 samples/sec Loss 0.6928 LearningRate 0.0032 Epoch: 16 Global Step: 273930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:39:02,382-Speed 5196.46 samples/sec Loss 0.6492 LearningRate 0.0032 Epoch: 16 Global Step: 273940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:39:04,355-Speed 5191.44 samples/sec Loss 0.6683 LearningRate 0.0032 Epoch: 16 Global Step: 273950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:39:06,335-Speed 5172.48 samples/sec Loss 0.6389 LearningRate 0.0032 Epoch: 16 Global Step: 273960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:39:08,313-Speed 5180.60 samples/sec Loss 0.6816 LearningRate 0.0032 Epoch: 16 Global Step: 273970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:39:10,302-Speed 5150.77 samples/sec Loss 0.6853 LearningRate 0.0032 Epoch: 16 Global Step: 273980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:39:12,308-Speed 5107.20 samples/sec Loss 0.6617 LearningRate 0.0032 Epoch: 16 Global Step: 273990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:39:14,300-Speed 5144.25 samples/sec Loss 0.6920 LearningRate 0.0032 Epoch: 16 Global Step: 274000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:39:41,131-[lfw][274000]XNorm: 21.309907 Training: 2022-04-11 17:39:41,131-[lfw][274000]Accuracy-Flip: 0.99800+-0.00296 Training: 2022-04-11 17:39:41,132-[lfw][274000]Accuracy-Highest: 0.99833 Training: 2022-04-11 17:40:12,082-[cfp_fp][274000]XNorm: 21.601243 Training: 2022-04-11 17:40:12,083-[cfp_fp][274000]Accuracy-Flip: 0.98886+-0.00494 Training: 2022-04-11 17:40:12,083-[cfp_fp][274000]Accuracy-Highest: 0.98929 Training: 2022-04-11 17:40:38,745-[agedb_30][274000]XNorm: 22.433696 Training: 2022-04-11 17:40:38,745-[agedb_30][274000]Accuracy-Flip: 0.98133+-0.00702 Training: 2022-04-11 17:40:38,746-[agedb_30][274000]Accuracy-Highest: 0.98333 Training: 2022-04-11 17:40:40,731-Speed 118.48 samples/sec Loss 0.6889 LearningRate 0.0032 Epoch: 16 Global Step: 274010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:40:42,713-Speed 5169.49 samples/sec Loss 0.6993 LearningRate 0.0032 Epoch: 16 Global Step: 274020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:40:44,713-Speed 5121.09 samples/sec Loss 0.6829 LearningRate 0.0032 Epoch: 16 Global Step: 274030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:40:46,673-Speed 5229.09 samples/sec Loss 0.6633 LearningRate 0.0032 Epoch: 16 Global Step: 274040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:40:48,687-Speed 5085.64 samples/sec Loss 0.6833 LearningRate 0.0032 Epoch: 16 Global Step: 274050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:40:50,660-Speed 5191.27 samples/sec Loss 0.6996 LearningRate 0.0032 Epoch: 16 Global Step: 274060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:40:52,634-Speed 5192.08 samples/sec Loss 0.6735 LearningRate 0.0032 Epoch: 16 Global Step: 274070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:40:54,619-Speed 5159.54 samples/sec Loss 0.6751 LearningRate 0.0032 Epoch: 16 Global Step: 274080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:40:56,611-Speed 5143.30 samples/sec Loss 0.6645 LearningRate 0.0032 Epoch: 16 Global Step: 274090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:40:58,589-Speed 5178.04 samples/sec Loss 0.7021 LearningRate 0.0032 Epoch: 16 Global Step: 274100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:00,559-Speed 5199.22 samples/sec Loss 0.6752 LearningRate 0.0032 Epoch: 16 Global Step: 274110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:02,533-Speed 5188.38 samples/sec Loss 0.6920 LearningRate 0.0032 Epoch: 16 Global Step: 274120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:04,516-Speed 5167.47 samples/sec Loss 0.6646 LearningRate 0.0032 Epoch: 16 Global Step: 274130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:06,499-Speed 5164.85 samples/sec Loss 0.6780 LearningRate 0.0032 Epoch: 16 Global Step: 274140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:41:08,461-Speed 5221.96 samples/sec Loss 0.6819 LearningRate 0.0032 Epoch: 16 Global Step: 274150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:10,432-Speed 5196.47 samples/sec Loss 0.6796 LearningRate 0.0032 Epoch: 16 Global Step: 274160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:12,402-Speed 5200.60 samples/sec Loss 0.6633 LearningRate 0.0032 Epoch: 16 Global Step: 274170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:14,397-Speed 5135.59 samples/sec Loss 0.6903 LearningRate 0.0032 Epoch: 16 Global Step: 274180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:16,382-Speed 5158.85 samples/sec Loss 0.6850 LearningRate 0.0032 Epoch: 16 Global Step: 274190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:18,356-Speed 5188.93 samples/sec Loss 0.6436 LearningRate 0.0032 Epoch: 16 Global Step: 274200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:20,328-Speed 5194.42 samples/sec Loss 0.6673 LearningRate 0.0032 Epoch: 16 Global Step: 274210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:22,315-Speed 5154.92 samples/sec Loss 0.6913 LearningRate 0.0032 Epoch: 16 Global Step: 274220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:24,312-Speed 5129.75 samples/sec Loss 0.6772 LearningRate 0.0032 Epoch: 16 Global Step: 274230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:26,296-Speed 5164.60 samples/sec Loss 0.6721 LearningRate 0.0032 Epoch: 16 Global Step: 274240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:28,279-Speed 5164.33 samples/sec Loss 0.6818 LearningRate 0.0032 Epoch: 16 Global Step: 274250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:41:30,262-Speed 5166.36 samples/sec Loss 0.6999 LearningRate 0.0032 Epoch: 16 Global Step: 274260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:32,232-Speed 5199.11 samples/sec Loss 0.6796 LearningRate 0.0032 Epoch: 16 Global Step: 274270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:34,221-Speed 5150.85 samples/sec Loss 0.6563 LearningRate 0.0032 Epoch: 16 Global Step: 274280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:36,190-Speed 5202.64 samples/sec Loss 0.6753 LearningRate 0.0032 Epoch: 16 Global Step: 274290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:38,178-Speed 5152.81 samples/sec Loss 0.6776 LearningRate 0.0032 Epoch: 16 Global Step: 274300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:40,162-Speed 5162.77 samples/sec Loss 0.6684 LearningRate 0.0032 Epoch: 16 Global Step: 274310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:42,141-Speed 5175.66 samples/sec Loss 0.6829 LearningRate 0.0032 Epoch: 16 Global Step: 274320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:44,110-Speed 5203.28 samples/sec Loss 0.6923 LearningRate 0.0032 Epoch: 16 Global Step: 274330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:46,102-Speed 5142.95 samples/sec Loss 0.6992 LearningRate 0.0032 Epoch: 16 Global Step: 274340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:48,085-Speed 5164.94 samples/sec Loss 0.6660 LearningRate 0.0032 Epoch: 16 Global Step: 274350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:50,052-Speed 5207.37 samples/sec Loss 0.6586 LearningRate 0.0032 Epoch: 16 Global Step: 274360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:52,025-Speed 5192.48 samples/sec Loss 0.6830 LearningRate 0.0032 Epoch: 16 Global Step: 274370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:53,997-Speed 5194.64 samples/sec Loss 0.7129 LearningRate 0.0032 Epoch: 16 Global Step: 274380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:56,017-Speed 5072.31 samples/sec Loss 0.6861 LearningRate 0.0032 Epoch: 16 Global Step: 274390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:57,997-Speed 5173.83 samples/sec Loss 0.6593 LearningRate 0.0032 Epoch: 16 Global Step: 274400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:41:59,995-Speed 5125.78 samples/sec Loss 0.6914 LearningRate 0.0032 Epoch: 16 Global Step: 274410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:42:02,008-Speed 5090.42 samples/sec Loss 0.6956 LearningRate 0.0032 Epoch: 16 Global Step: 274420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:42:03,980-Speed 5195.60 samples/sec Loss 0.6577 LearningRate 0.0032 Epoch: 16 Global Step: 274430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:42:05,954-Speed 5189.64 samples/sec Loss 0.6590 LearningRate 0.0032 Epoch: 16 Global Step: 274440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:42:07,926-Speed 5192.74 samples/sec Loss 0.6827 LearningRate 0.0032 Epoch: 16 Global Step: 274450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:42:09,915-Speed 5149.43 samples/sec Loss 0.6673 LearningRate 0.0032 Epoch: 16 Global Step: 274460 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:42:11,921-Speed 5108.78 samples/sec Loss 0.6757 LearningRate 0.0032 Epoch: 16 Global Step: 274470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:42:13,897-Speed 5185.17 samples/sec Loss 0.6770 LearningRate 0.0032 Epoch: 16 Global Step: 274480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:42:15,868-Speed 5196.74 samples/sec Loss 0.6913 LearningRate 0.0032 Epoch: 16 Global Step: 274490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:42:17,838-Speed 5198.82 samples/sec Loss 0.6992 LearningRate 0.0032 Epoch: 16 Global Step: 274500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:42:19,804-Speed 5210.00 samples/sec Loss 0.6502 LearningRate 0.0032 Epoch: 16 Global Step: 274510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:42:21,784-Speed 5174.24 samples/sec Loss 0.6935 LearningRate 0.0032 Epoch: 16 Global Step: 274520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:42:23,764-Speed 5174.42 samples/sec Loss 0.6815 LearningRate 0.0032 Epoch: 16 Global Step: 274530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:42:25,755-Speed 5144.29 samples/sec Loss 0.6540 LearningRate 0.0032 Epoch: 16 Global Step: 274540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:42:27,740-Speed 5160.61 samples/sec Loss 0.6833 LearningRate 0.0032 Epoch: 16 Global Step: 274550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:42:29,706-Speed 5207.90 samples/sec Loss 0.6746 LearningRate 0.0032 Epoch: 16 Global Step: 274560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:42:31,677-Speed 5198.22 samples/sec Loss 0.6619 LearningRate 0.0032 Epoch: 16 Global Step: 274570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:42:33,647-Speed 5200.33 samples/sec Loss 0.6975 LearningRate 0.0031 Epoch: 16 Global Step: 274580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:42:35,649-Speed 5118.35 samples/sec Loss 0.6997 LearningRate 0.0031 Epoch: 16 Global Step: 274590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:42:37,611-Speed 5220.28 samples/sec Loss 0.6537 LearningRate 0.0031 Epoch: 16 Global Step: 274600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:42:39,602-Speed 5144.69 samples/sec Loss 0.6876 LearningRate 0.0031 Epoch: 16 Global Step: 274610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:42:41,595-Speed 5138.61 samples/sec Loss 0.7051 LearningRate 0.0031 Epoch: 16 Global Step: 274620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:42:43,560-Speed 5213.31 samples/sec Loss 0.6841 LearningRate 0.0031 Epoch: 16 Global Step: 274630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:42:45,532-Speed 5195.06 samples/sec Loss 0.6548 LearningRate 0.0031 Epoch: 16 Global Step: 274640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:42:47,513-Speed 5171.39 samples/sec Loss 0.6844 LearningRate 0.0031 Epoch: 16 Global Step: 274650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:42:49,501-Speed 5153.34 samples/sec Loss 0.6880 LearningRate 0.0031 Epoch: 16 Global Step: 274660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:42:51,486-Speed 5158.96 samples/sec Loss 0.6675 LearningRate 0.0031 Epoch: 16 Global Step: 274670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:42:53,468-Speed 5171.43 samples/sec Loss 0.6709 LearningRate 0.0031 Epoch: 16 Global Step: 274680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:42:55,440-Speed 5195.14 samples/sec Loss 0.6655 LearningRate 0.0031 Epoch: 16 Global Step: 274690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:42:57,404-Speed 5213.88 samples/sec Loss 0.6721 LearningRate 0.0031 Epoch: 16 Global Step: 274700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:42:59,388-Speed 5164.05 samples/sec Loss 0.6667 LearningRate 0.0031 Epoch: 16 Global Step: 274710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:43:01,357-Speed 5202.94 samples/sec Loss 0.6606 LearningRate 0.0031 Epoch: 16 Global Step: 274720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:43:03,326-Speed 5200.55 samples/sec Loss 0.7039 LearningRate 0.0031 Epoch: 16 Global Step: 274730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:43:05,315-Speed 5150.44 samples/sec Loss 0.6919 LearningRate 0.0031 Epoch: 16 Global Step: 274740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:43:07,283-Speed 5205.88 samples/sec Loss 0.6866 LearningRate 0.0031 Epoch: 16 Global Step: 274750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:43:09,270-Speed 5154.57 samples/sec Loss 0.6748 LearningRate 0.0031 Epoch: 16 Global Step: 274760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:43:11,261-Speed 5146.43 samples/sec Loss 0.6686 LearningRate 0.0031 Epoch: 16 Global Step: 274770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:43:13,261-Speed 5119.82 samples/sec Loss 0.6481 LearningRate 0.0031 Epoch: 16 Global Step: 274780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:43:15,260-Speed 5125.64 samples/sec Loss 0.6810 LearningRate 0.0031 Epoch: 16 Global Step: 274790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:43:17,223-Speed 5219.12 samples/sec Loss 0.6642 LearningRate 0.0031 Epoch: 16 Global Step: 274800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:43:19,197-Speed 5189.72 samples/sec Loss 0.6648 LearningRate 0.0031 Epoch: 16 Global Step: 274810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:43:21,171-Speed 5189.21 samples/sec Loss 0.6922 LearningRate 0.0031 Epoch: 16 Global Step: 274820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:43:23,144-Speed 5191.54 samples/sec Loss 0.7091 LearningRate 0.0031 Epoch: 16 Global Step: 274830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:43:25,113-Speed 5203.43 samples/sec Loss 0.6709 LearningRate 0.0031 Epoch: 16 Global Step: 274840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:43:27,083-Speed 5199.26 samples/sec Loss 0.6777 LearningRate 0.0031 Epoch: 16 Global Step: 274850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:43:29,051-Speed 5204.82 samples/sec Loss 0.6805 LearningRate 0.0031 Epoch: 16 Global Step: 274860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:43:31,035-Speed 5163.16 samples/sec Loss 0.6844 LearningRate 0.0031 Epoch: 16 Global Step: 274870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:43:33,021-Speed 5158.35 samples/sec Loss 0.6640 LearningRate 0.0031 Epoch: 16 Global Step: 274880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:43:35,011-Speed 5147.53 samples/sec Loss 0.6989 LearningRate 0.0031 Epoch: 16 Global Step: 274890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:43:36,983-Speed 5194.65 samples/sec Loss 0.6791 LearningRate 0.0031 Epoch: 16 Global Step: 274900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:43:38,964-Speed 5170.29 samples/sec Loss 0.6554 LearningRate 0.0031 Epoch: 16 Global Step: 274910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:43:40,938-Speed 5189.26 samples/sec Loss 0.6745 LearningRate 0.0031 Epoch: 16 Global Step: 274920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:43:42,906-Speed 5206.82 samples/sec Loss 0.6485 LearningRate 0.0031 Epoch: 16 Global Step: 274930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:43:44,865-Speed 5227.03 samples/sec Loss 0.6885 LearningRate 0.0031 Epoch: 16 Global Step: 274940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:43:46,842-Speed 5182.70 samples/sec Loss 0.6696 LearningRate 0.0031 Epoch: 16 Global Step: 274950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:43:48,819-Speed 5180.77 samples/sec Loss 0.6837 LearningRate 0.0031 Epoch: 16 Global Step: 274960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:43:50,797-Speed 5178.38 samples/sec Loss 0.6949 LearningRate 0.0031 Epoch: 16 Global Step: 274970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:43:52,784-Speed 5156.51 samples/sec Loss 0.6863 LearningRate 0.0031 Epoch: 16 Global Step: 274980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:43:54,762-Speed 5179.20 samples/sec Loss 0.6942 LearningRate 0.0031 Epoch: 16 Global Step: 274990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:43:56,750-Speed 5153.31 samples/sec Loss 0.7091 LearningRate 0.0031 Epoch: 16 Global Step: 275000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:43:58,721-Speed 5195.38 samples/sec Loss 0.7160 LearningRate 0.0031 Epoch: 16 Global Step: 275010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:44:00,698-Speed 5181.06 samples/sec Loss 0.7086 LearningRate 0.0031 Epoch: 16 Global Step: 275020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:44:02,675-Speed 5182.98 samples/sec Loss 0.7066 LearningRate 0.0031 Epoch: 16 Global Step: 275030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:44:04,648-Speed 5191.83 samples/sec Loss 0.6680 LearningRate 0.0031 Epoch: 16 Global Step: 275040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:44:06,620-Speed 5195.40 samples/sec Loss 0.6608 LearningRate 0.0031 Epoch: 16 Global Step: 275050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:44:08,589-Speed 5201.99 samples/sec Loss 0.6745 LearningRate 0.0031 Epoch: 16 Global Step: 275060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:44:10,559-Speed 5199.23 samples/sec Loss 0.6470 LearningRate 0.0031 Epoch: 16 Global Step: 275070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:44:12,530-Speed 5196.32 samples/sec Loss 0.6731 LearningRate 0.0031 Epoch: 16 Global Step: 275080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:44:14,510-Speed 5175.30 samples/sec Loss 0.6828 LearningRate 0.0031 Epoch: 16 Global Step: 275090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:44:16,477-Speed 5209.86 samples/sec Loss 0.6776 LearningRate 0.0031 Epoch: 16 Global Step: 275100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:44:18,457-Speed 5172.07 samples/sec Loss 0.6818 LearningRate 0.0031 Epoch: 16 Global Step: 275110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:44:20,430-Speed 5192.40 samples/sec Loss 0.6862 LearningRate 0.0031 Epoch: 16 Global Step: 275120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:44:22,401-Speed 5198.00 samples/sec Loss 0.7063 LearningRate 0.0031 Epoch: 16 Global Step: 275130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:44:24,375-Speed 5188.79 samples/sec Loss 0.7053 LearningRate 0.0031 Epoch: 16 Global Step: 275140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:44:26,356-Speed 5168.57 samples/sec Loss 0.6658 LearningRate 0.0031 Epoch: 16 Global Step: 275150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:44:28,352-Speed 5134.13 samples/sec Loss 0.6611 LearningRate 0.0031 Epoch: 16 Global Step: 275160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:44:30,321-Speed 5201.18 samples/sec Loss 0.6740 LearningRate 0.0031 Epoch: 16 Global Step: 275170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:44:32,289-Speed 5205.11 samples/sec Loss 0.6783 LearningRate 0.0031 Epoch: 16 Global Step: 275180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:44:34,266-Speed 5182.51 samples/sec Loss 0.6822 LearningRate 0.0031 Epoch: 16 Global Step: 275190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:44:36,234-Speed 5203.31 samples/sec Loss 0.6542 LearningRate 0.0031 Epoch: 16 Global Step: 275200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:44:38,220-Speed 5159.00 samples/sec Loss 0.7162 LearningRate 0.0031 Epoch: 16 Global Step: 275210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:44:40,191-Speed 5196.07 samples/sec Loss 0.6679 LearningRate 0.0031 Epoch: 16 Global Step: 275220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:44:42,165-Speed 5191.06 samples/sec Loss 0.6578 LearningRate 0.0031 Epoch: 16 Global Step: 275230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:44:44,136-Speed 5196.43 samples/sec Loss 0.6781 LearningRate 0.0031 Epoch: 16 Global Step: 275240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:44:46,116-Speed 5173.72 samples/sec Loss 0.6634 LearningRate 0.0031 Epoch: 16 Global Step: 275250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:44:48,096-Speed 5173.87 samples/sec Loss 0.6537 LearningRate 0.0031 Epoch: 16 Global Step: 275260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:44:50,068-Speed 5194.19 samples/sec Loss 0.6983 LearningRate 0.0031 Epoch: 16 Global Step: 275270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:44:52,058-Speed 5147.42 samples/sec Loss 0.6868 LearningRate 0.0031 Epoch: 16 Global Step: 275280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:44:54,036-Speed 5180.59 samples/sec Loss 0.6551 LearningRate 0.0031 Epoch: 16 Global Step: 275290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:44:56,005-Speed 5202.23 samples/sec Loss 0.6549 LearningRate 0.0031 Epoch: 16 Global Step: 275300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:44:57,991-Speed 5156.87 samples/sec Loss 0.7062 LearningRate 0.0031 Epoch: 16 Global Step: 275310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:44:59,961-Speed 5198.72 samples/sec Loss 0.7115 LearningRate 0.0031 Epoch: 16 Global Step: 275320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:45:01,930-Speed 5202.19 samples/sec Loss 0.6872 LearningRate 0.0031 Epoch: 16 Global Step: 275330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:45:03,908-Speed 5179.52 samples/sec Loss 0.6804 LearningRate 0.0031 Epoch: 16 Global Step: 275340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:45:05,878-Speed 5199.13 samples/sec Loss 0.6654 LearningRate 0.0031 Epoch: 16 Global Step: 275350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:45:07,855-Speed 5181.48 samples/sec Loss 0.6418 LearningRate 0.0031 Epoch: 16 Global Step: 275360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:45:09,844-Speed 5151.58 samples/sec Loss 0.6933 LearningRate 0.0031 Epoch: 16 Global Step: 275370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:45:11,823-Speed 5174.50 samples/sec Loss 0.6730 LearningRate 0.0031 Epoch: 16 Global Step: 275380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:45:13,808-Speed 5162.04 samples/sec Loss 0.6568 LearningRate 0.0031 Epoch: 16 Global Step: 275390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:45:15,790-Speed 5169.40 samples/sec Loss 0.6532 LearningRate 0.0031 Epoch: 16 Global Step: 275400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:45:17,773-Speed 5165.34 samples/sec Loss 0.6619 LearningRate 0.0031 Epoch: 16 Global Step: 275410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:45:19,776-Speed 5112.94 samples/sec Loss 0.7185 LearningRate 0.0031 Epoch: 16 Global Step: 275420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:45:21,762-Speed 5159.13 samples/sec Loss 0.7171 LearningRate 0.0031 Epoch: 16 Global Step: 275430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:45:23,741-Speed 5175.75 samples/sec Loss 0.6552 LearningRate 0.0031 Epoch: 16 Global Step: 275440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:45:25,739-Speed 5128.52 samples/sec Loss 0.6563 LearningRate 0.0031 Epoch: 16 Global Step: 275450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:45:27,724-Speed 5160.48 samples/sec Loss 0.6864 LearningRate 0.0031 Epoch: 16 Global Step: 275460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:45:29,710-Speed 5156.73 samples/sec Loss 0.7301 LearningRate 0.0031 Epoch: 16 Global Step: 275470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:45:31,727-Speed 5081.30 samples/sec Loss 0.6860 LearningRate 0.0031 Epoch: 16 Global Step: 275480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:45:33,703-Speed 5184.61 samples/sec Loss 0.7089 LearningRate 0.0031 Epoch: 16 Global Step: 275490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:45:35,721-Speed 5078.06 samples/sec Loss 0.6823 LearningRate 0.0031 Epoch: 16 Global Step: 275500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:45:37,703-Speed 5167.65 samples/sec Loss 0.7020 LearningRate 0.0031 Epoch: 16 Global Step: 275510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:45:39,698-Speed 5133.20 samples/sec Loss 0.6698 LearningRate 0.0031 Epoch: 16 Global Step: 275520 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:45:41,704-Speed 5108.47 samples/sec Loss 0.7008 LearningRate 0.0030 Epoch: 16 Global Step: 275530 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:45:43,685-Speed 5170.44 samples/sec Loss 0.6630 LearningRate 0.0030 Epoch: 16 Global Step: 275540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:45:45,652-Speed 5207.59 samples/sec Loss 0.7237 LearningRate 0.0030 Epoch: 16 Global Step: 275550 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:45:47,632-Speed 5172.70 samples/sec Loss 0.6701 LearningRate 0.0030 Epoch: 16 Global Step: 275560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:45:49,619-Speed 5155.75 samples/sec Loss 0.7051 LearningRate 0.0030 Epoch: 16 Global Step: 275570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:45:51,614-Speed 5135.45 samples/sec Loss 0.6640 LearningRate 0.0030 Epoch: 16 Global Step: 275580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:45:53,585-Speed 5197.48 samples/sec Loss 0.6765 LearningRate 0.0030 Epoch: 16 Global Step: 275590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:45:55,557-Speed 5194.67 samples/sec Loss 0.7122 LearningRate 0.0030 Epoch: 16 Global Step: 275600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:45:57,530-Speed 5192.68 samples/sec Loss 0.6607 LearningRate 0.0030 Epoch: 16 Global Step: 275610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:45:59,520-Speed 5147.28 samples/sec Loss 0.6931 LearningRate 0.0030 Epoch: 16 Global Step: 275620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:46:01,510-Speed 5145.54 samples/sec Loss 0.6932 LearningRate 0.0030 Epoch: 16 Global Step: 275630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:46:03,517-Speed 5106.01 samples/sec Loss 0.7217 LearningRate 0.0030 Epoch: 16 Global Step: 275640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:46:05,499-Speed 5167.43 samples/sec Loss 0.7121 LearningRate 0.0030 Epoch: 16 Global Step: 275650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:46:07,466-Speed 5207.56 samples/sec Loss 0.6893 LearningRate 0.0030 Epoch: 16 Global Step: 275660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:46:09,447-Speed 5171.89 samples/sec Loss 0.6545 LearningRate 0.0030 Epoch: 16 Global Step: 275670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:46:11,431-Speed 5161.79 samples/sec Loss 0.7010 LearningRate 0.0030 Epoch: 16 Global Step: 275680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:46:13,426-Speed 5136.05 samples/sec Loss 0.7106 LearningRate 0.0030 Epoch: 16 Global Step: 275690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:46:15,480-Speed 4985.70 samples/sec Loss 0.7055 LearningRate 0.0030 Epoch: 16 Global Step: 275700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:46:17,493-Speed 5091.35 samples/sec Loss 0.6803 LearningRate 0.0030 Epoch: 16 Global Step: 275710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:46:19,466-Speed 5191.04 samples/sec Loss 0.6800 LearningRate 0.0030 Epoch: 16 Global Step: 275720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:46:21,456-Speed 5150.71 samples/sec Loss 0.6694 LearningRate 0.0030 Epoch: 16 Global Step: 275730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:46:23,434-Speed 5179.32 samples/sec Loss 0.6952 LearningRate 0.0030 Epoch: 16 Global Step: 275740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:46:25,405-Speed 5196.84 samples/sec Loss 0.6425 LearningRate 0.0030 Epoch: 16 Global Step: 275750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:46:27,406-Speed 5117.43 samples/sec Loss 0.7082 LearningRate 0.0030 Epoch: 16 Global Step: 275760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:46:29,377-Speed 5196.81 samples/sec Loss 0.6767 LearningRate 0.0030 Epoch: 16 Global Step: 275770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:46:31,350-Speed 5191.77 samples/sec Loss 0.6907 LearningRate 0.0030 Epoch: 16 Global Step: 275780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:46:33,352-Speed 5118.27 samples/sec Loss 0.6922 LearningRate 0.0030 Epoch: 16 Global Step: 275790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:46:35,341-Speed 5151.07 samples/sec Loss 0.7092 LearningRate 0.0030 Epoch: 16 Global Step: 275800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:46:37,320-Speed 5175.55 samples/sec Loss 0.6903 LearningRate 0.0030 Epoch: 16 Global Step: 275810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:46:39,299-Speed 5176.42 samples/sec Loss 0.7271 LearningRate 0.0030 Epoch: 16 Global Step: 275820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:46:41,271-Speed 5193.18 samples/sec Loss 0.6963 LearningRate 0.0030 Epoch: 16 Global Step: 275830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:46:43,248-Speed 5182.90 samples/sec Loss 0.6915 LearningRate 0.0030 Epoch: 16 Global Step: 275840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:46:45,232-Speed 5162.62 samples/sec Loss 0.6697 LearningRate 0.0030 Epoch: 16 Global Step: 275850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:46:47,236-Speed 5111.34 samples/sec Loss 0.6935 LearningRate 0.0030 Epoch: 16 Global Step: 275860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:46:49,251-Speed 5083.50 samples/sec Loss 0.6771 LearningRate 0.0030 Epoch: 16 Global Step: 275870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:46:51,223-Speed 5196.61 samples/sec Loss 0.6465 LearningRate 0.0030 Epoch: 16 Global Step: 275880 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:46:53,200-Speed 5180.30 samples/sec Loss 0.7032 LearningRate 0.0030 Epoch: 16 Global Step: 275890 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:46:55,178-Speed 5180.54 samples/sec Loss 0.6846 LearningRate 0.0030 Epoch: 16 Global Step: 275900 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:46:57,141-Speed 5216.75 samples/sec Loss 0.7070 LearningRate 0.0030 Epoch: 16 Global Step: 275910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:46:59,183-Speed 5016.99 samples/sec Loss 0.6838 LearningRate 0.0030 Epoch: 16 Global Step: 275920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:47:01,174-Speed 5143.64 samples/sec Loss 0.6553 LearningRate 0.0030 Epoch: 16 Global Step: 275930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:47:03,148-Speed 5189.09 samples/sec Loss 0.6977 LearningRate 0.0030 Epoch: 16 Global Step: 275940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:47:05,138-Speed 5148.71 samples/sec Loss 0.6745 LearningRate 0.0030 Epoch: 16 Global Step: 275950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:47:07,134-Speed 5132.24 samples/sec Loss 0.6925 LearningRate 0.0030 Epoch: 16 Global Step: 275960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:47:09,112-Speed 5177.23 samples/sec Loss 0.6658 LearningRate 0.0030 Epoch: 16 Global Step: 275970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:47:11,127-Speed 5085.69 samples/sec Loss 0.7176 LearningRate 0.0030 Epoch: 16 Global Step: 275980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:47:13,107-Speed 5174.14 samples/sec Loss 0.6607 LearningRate 0.0030 Epoch: 16 Global Step: 275990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:47:15,151-Speed 5012.08 samples/sec Loss 0.6916 LearningRate 0.0030 Epoch: 16 Global Step: 276000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:47:42,080-[lfw][276000]XNorm: 21.836149 Training: 2022-04-11 17:47:42,080-[lfw][276000]Accuracy-Flip: 0.99783+-0.00289 Training: 2022-04-11 17:47:42,081-[lfw][276000]Accuracy-Highest: 0.99833 Training: 2022-04-11 17:48:12,972-[cfp_fp][276000]XNorm: 21.946349 Training: 2022-04-11 17:48:12,973-[cfp_fp][276000]Accuracy-Flip: 0.99000+-0.00373 Training: 2022-04-11 17:48:12,973-[cfp_fp][276000]Accuracy-Highest: 0.99000 Training: 2022-04-11 17:48:39,585-[agedb_30][276000]XNorm: 22.676410 Training: 2022-04-11 17:48:39,586-[agedb_30][276000]Accuracy-Flip: 0.98317+-0.00555 Training: 2022-04-11 17:48:39,586-[agedb_30][276000]Accuracy-Highest: 0.98333 Training: 2022-04-11 17:48:41,574-Speed 118.49 samples/sec Loss 0.6806 LearningRate 0.0030 Epoch: 16 Global Step: 276010 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:48:43,532-Speed 5231.88 samples/sec Loss 0.7167 LearningRate 0.0030 Epoch: 16 Global Step: 276020 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:48:45,490-Speed 5229.63 samples/sec Loss 0.6361 LearningRate 0.0030 Epoch: 16 Global Step: 276030 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:48:47,467-Speed 5182.05 samples/sec Loss 0.6904 LearningRate 0.0030 Epoch: 16 Global Step: 276040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:48:49,440-Speed 5194.63 samples/sec Loss 0.7069 LearningRate 0.0030 Epoch: 16 Global Step: 276050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:48:51,403-Speed 5216.59 samples/sec Loss 0.6867 LearningRate 0.0030 Epoch: 16 Global Step: 276060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:48:53,382-Speed 5177.10 samples/sec Loss 0.6851 LearningRate 0.0030 Epoch: 16 Global Step: 276070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:48:55,352-Speed 5200.03 samples/sec Loss 0.6708 LearningRate 0.0030 Epoch: 16 Global Step: 276080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:48:57,367-Speed 5085.49 samples/sec Loss 0.6685 LearningRate 0.0030 Epoch: 16 Global Step: 276090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:48:59,333-Speed 5209.48 samples/sec Loss 0.6833 LearningRate 0.0030 Epoch: 16 Global Step: 276100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:49:01,318-Speed 5161.06 samples/sec Loss 0.7077 LearningRate 0.0030 Epoch: 16 Global Step: 276110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:49:03,324-Speed 5108.01 samples/sec Loss 0.6526 LearningRate 0.0030 Epoch: 16 Global Step: 276120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:49:05,286-Speed 5221.07 samples/sec Loss 0.6776 LearningRate 0.0030 Epoch: 16 Global Step: 276130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:49:07,266-Speed 5172.08 samples/sec Loss 0.6504 LearningRate 0.0030 Epoch: 16 Global Step: 276140 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:49:09,229-Speed 5218.05 samples/sec Loss 0.7235 LearningRate 0.0030 Epoch: 16 Global Step: 276150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:49:11,199-Speed 5199.89 samples/sec Loss 0.6916 LearningRate 0.0030 Epoch: 16 Global Step: 276160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:49:13,189-Speed 5149.59 samples/sec Loss 0.6844 LearningRate 0.0030 Epoch: 16 Global Step: 276170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:49:15,162-Speed 5190.61 samples/sec Loss 0.6977 LearningRate 0.0030 Epoch: 16 Global Step: 276180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:49:17,148-Speed 5158.49 samples/sec Loss 0.6771 LearningRate 0.0030 Epoch: 16 Global Step: 276190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:49:19,117-Speed 5203.02 samples/sec Loss 0.6693 LearningRate 0.0030 Epoch: 16 Global Step: 276200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:49:21,084-Speed 5207.43 samples/sec Loss 0.6765 LearningRate 0.0030 Epoch: 16 Global Step: 276210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:49:23,075-Speed 5145.12 samples/sec Loss 0.6638 LearningRate 0.0030 Epoch: 16 Global Step: 276220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:49:25,052-Speed 5182.24 samples/sec Loss 0.6871 LearningRate 0.0030 Epoch: 16 Global Step: 276230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:49:27,024-Speed 5195.29 samples/sec Loss 0.6812 LearningRate 0.0030 Epoch: 16 Global Step: 276240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:49:28,993-Speed 5201.20 samples/sec Loss 0.6634 LearningRate 0.0030 Epoch: 16 Global Step: 276250 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:49:30,956-Speed 5220.79 samples/sec Loss 0.7072 LearningRate 0.0030 Epoch: 16 Global Step: 276260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:49:32,925-Speed 5201.72 samples/sec Loss 0.6729 LearningRate 0.0030 Epoch: 16 Global Step: 276270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:49:34,920-Speed 5135.08 samples/sec Loss 0.6901 LearningRate 0.0030 Epoch: 16 Global Step: 276280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:49:36,908-Speed 5153.33 samples/sec Loss 0.6875 LearningRate 0.0030 Epoch: 16 Global Step: 276290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:49:38,909-Speed 5118.25 samples/sec Loss 0.7129 LearningRate 0.0030 Epoch: 16 Global Step: 276300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:49:40,873-Speed 5217.21 samples/sec Loss 0.6643 LearningRate 0.0030 Epoch: 16 Global Step: 276310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:49:42,837-Speed 5214.78 samples/sec Loss 0.6952 LearningRate 0.0030 Epoch: 16 Global Step: 276320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:49:44,805-Speed 5204.54 samples/sec Loss 0.6689 LearningRate 0.0030 Epoch: 16 Global Step: 276330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:49:46,773-Speed 5204.77 samples/sec Loss 0.6685 LearningRate 0.0030 Epoch: 16 Global Step: 276340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:49:48,742-Speed 5202.64 samples/sec Loss 0.6850 LearningRate 0.0030 Epoch: 16 Global Step: 276350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:49:50,708-Speed 5209.71 samples/sec Loss 0.6915 LearningRate 0.0030 Epoch: 16 Global Step: 276360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:49:52,670-Speed 5220.90 samples/sec Loss 0.6487 LearningRate 0.0030 Epoch: 16 Global Step: 276370 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:49:54,635-Speed 5213.67 samples/sec Loss 0.6863 LearningRate 0.0030 Epoch: 16 Global Step: 276380 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:49:56,608-Speed 5193.71 samples/sec Loss 0.6588 LearningRate 0.0030 Epoch: 16 Global Step: 276390 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:49:58,570-Speed 5221.07 samples/sec Loss 0.7106 LearningRate 0.0030 Epoch: 16 Global Step: 276400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:00,543-Speed 5192.27 samples/sec Loss 0.7051 LearningRate 0.0030 Epoch: 16 Global Step: 276410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:02,531-Speed 5151.75 samples/sec Loss 0.7003 LearningRate 0.0030 Epoch: 16 Global Step: 276420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:04,531-Speed 5124.37 samples/sec Loss 0.6888 LearningRate 0.0030 Epoch: 16 Global Step: 276430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:06,507-Speed 5183.79 samples/sec Loss 0.6647 LearningRate 0.0030 Epoch: 16 Global Step: 276440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:08,493-Speed 5158.54 samples/sec Loss 0.6537 LearningRate 0.0030 Epoch: 16 Global Step: 276450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:10,492-Speed 5123.17 samples/sec Loss 0.6732 LearningRate 0.0030 Epoch: 16 Global Step: 276460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:12,492-Speed 5122.99 samples/sec Loss 0.6942 LearningRate 0.0030 Epoch: 16 Global Step: 276470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:14,457-Speed 5214.62 samples/sec Loss 0.6900 LearningRate 0.0030 Epoch: 16 Global Step: 276480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:16,424-Speed 5206.38 samples/sec Loss 0.6770 LearningRate 0.0029 Epoch: 16 Global Step: 276490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:18,380-Speed 5236.34 samples/sec Loss 0.6662 LearningRate 0.0029 Epoch: 16 Global Step: 276500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:20,361-Speed 5172.04 samples/sec Loss 0.6717 LearningRate 0.0029 Epoch: 16 Global Step: 276510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:22,330-Speed 5203.55 samples/sec Loss 0.6601 LearningRate 0.0029 Epoch: 16 Global Step: 276520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:24,299-Speed 5203.16 samples/sec Loss 0.6378 LearningRate 0.0029 Epoch: 16 Global Step: 276530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:26,324-Speed 5058.22 samples/sec Loss 0.6745 LearningRate 0.0029 Epoch: 16 Global Step: 276540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:28,302-Speed 5177.94 samples/sec Loss 0.6881 LearningRate 0.0029 Epoch: 16 Global Step: 276550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:30,269-Speed 5209.71 samples/sec Loss 0.7292 LearningRate 0.0029 Epoch: 16 Global Step: 276560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:32,247-Speed 5177.18 samples/sec Loss 0.6879 LearningRate 0.0029 Epoch: 16 Global Step: 276570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:34,214-Speed 5209.84 samples/sec Loss 0.6336 LearningRate 0.0029 Epoch: 16 Global Step: 276580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:36,187-Speed 5191.28 samples/sec Loss 0.7117 LearningRate 0.0029 Epoch: 16 Global Step: 276590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:38,163-Speed 5184.13 samples/sec Loss 0.6830 LearningRate 0.0029 Epoch: 16 Global Step: 276600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:50:40,127-Speed 5216.99 samples/sec Loss 0.6843 LearningRate 0.0029 Epoch: 16 Global Step: 276610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:42,100-Speed 5190.74 samples/sec Loss 0.7085 LearningRate 0.0029 Epoch: 16 Global Step: 276620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:44,101-Speed 5120.29 samples/sec Loss 0.7212 LearningRate 0.0029 Epoch: 16 Global Step: 276630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:46,104-Speed 5114.08 samples/sec Loss 0.6973 LearningRate 0.0029 Epoch: 16 Global Step: 276640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:48,087-Speed 5167.07 samples/sec Loss 0.7032 LearningRate 0.0029 Epoch: 16 Global Step: 276650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:50,077-Speed 5148.21 samples/sec Loss 0.7048 LearningRate 0.0029 Epoch: 16 Global Step: 276660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:52,042-Speed 5212.39 samples/sec Loss 0.7033 LearningRate 0.0029 Epoch: 16 Global Step: 276670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:54,025-Speed 5164.59 samples/sec Loss 0.6577 LearningRate 0.0029 Epoch: 16 Global Step: 276680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:55,998-Speed 5192.18 samples/sec Loss 0.7008 LearningRate 0.0029 Epoch: 16 Global Step: 276690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:57,960-Speed 5221.56 samples/sec Loss 0.6674 LearningRate 0.0029 Epoch: 16 Global Step: 276700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:50:59,932-Speed 5195.70 samples/sec Loss 0.6861 LearningRate 0.0029 Epoch: 16 Global Step: 276710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:51:01,899-Speed 5207.67 samples/sec Loss 0.6928 LearningRate 0.0029 Epoch: 16 Global Step: 276720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:51:03,882-Speed 5165.02 samples/sec Loss 0.6742 LearningRate 0.0029 Epoch: 16 Global Step: 276730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:51:05,848-Speed 5211.65 samples/sec Loss 0.6809 LearningRate 0.0029 Epoch: 16 Global Step: 276740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:51:07,816-Speed 5205.76 samples/sec Loss 0.7307 LearningRate 0.0029 Epoch: 16 Global Step: 276750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:51:09,801-Speed 5159.33 samples/sec Loss 0.6967 LearningRate 0.0029 Epoch: 16 Global Step: 276760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:51:11,779-Speed 5179.38 samples/sec Loss 0.6913 LearningRate 0.0029 Epoch: 16 Global Step: 276770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:51:13,754-Speed 5186.49 samples/sec Loss 0.7247 LearningRate 0.0029 Epoch: 16 Global Step: 276780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:51:15,720-Speed 5209.53 samples/sec Loss 0.6854 LearningRate 0.0029 Epoch: 16 Global Step: 276790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:51:17,694-Speed 5189.73 samples/sec Loss 0.6704 LearningRate 0.0029 Epoch: 16 Global Step: 276800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:51:19,662-Speed 5204.85 samples/sec Loss 0.7154 LearningRate 0.0029 Epoch: 16 Global Step: 276810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:51:21,654-Speed 5143.08 samples/sec Loss 0.6991 LearningRate 0.0029 Epoch: 16 Global Step: 276820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:51:23,635-Speed 5171.79 samples/sec Loss 0.6588 LearningRate 0.0029 Epoch: 16 Global Step: 276830 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:51:25,603-Speed 5203.91 samples/sec Loss 0.6615 LearningRate 0.0029 Epoch: 16 Global Step: 276840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:51:27,573-Speed 5200.20 samples/sec Loss 0.6881 LearningRate 0.0029 Epoch: 16 Global Step: 276850 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:51:29,544-Speed 5198.14 samples/sec Loss 0.6770 LearningRate 0.0029 Epoch: 16 Global Step: 276860 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:51:31,511-Speed 5207.88 samples/sec Loss 0.7125 LearningRate 0.0029 Epoch: 16 Global Step: 276870 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:51:33,485-Speed 5188.37 samples/sec Loss 0.7223 LearningRate 0.0029 Epoch: 16 Global Step: 276880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:51:35,462-Speed 5180.79 samples/sec Loss 0.6790 LearningRate 0.0029 Epoch: 16 Global Step: 276890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:51:37,432-Speed 5201.12 samples/sec Loss 0.6934 LearningRate 0.0029 Epoch: 16 Global Step: 276900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:51:39,419-Speed 5155.24 samples/sec Loss 0.7003 LearningRate 0.0029 Epoch: 16 Global Step: 276910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:51:41,399-Speed 5172.85 samples/sec Loss 0.6914 LearningRate 0.0029 Epoch: 16 Global Step: 276920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:51:43,379-Speed 5173.60 samples/sec Loss 0.6688 LearningRate 0.0029 Epoch: 16 Global Step: 276930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:51:45,349-Speed 5199.57 samples/sec Loss 0.6966 LearningRate 0.0029 Epoch: 16 Global Step: 276940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:51:47,315-Speed 5210.94 samples/sec Loss 0.6564 LearningRate 0.0029 Epoch: 16 Global Step: 276950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:51:49,290-Speed 5186.98 samples/sec Loss 0.7054 LearningRate 0.0029 Epoch: 16 Global Step: 276960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:51:51,296-Speed 5106.26 samples/sec Loss 0.6773 LearningRate 0.0029 Epoch: 16 Global Step: 276970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:51:53,267-Speed 5198.31 samples/sec Loss 0.6452 LearningRate 0.0029 Epoch: 16 Global Step: 276980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:51:55,254-Speed 5155.52 samples/sec Loss 0.6587 LearningRate 0.0029 Epoch: 16 Global Step: 276990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:51:57,241-Speed 5155.09 samples/sec Loss 0.6763 LearningRate 0.0029 Epoch: 16 Global Step: 277000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:51:59,215-Speed 5189.10 samples/sec Loss 0.6770 LearningRate 0.0029 Epoch: 16 Global Step: 277010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:52:01,213-Speed 5128.25 samples/sec Loss 0.7125 LearningRate 0.0029 Epoch: 16 Global Step: 277020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:52:03,219-Speed 5105.48 samples/sec Loss 0.6638 LearningRate 0.0029 Epoch: 16 Global Step: 277030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:52:05,195-Speed 5186.35 samples/sec Loss 0.6974 LearningRate 0.0029 Epoch: 16 Global Step: 277040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:52:07,166-Speed 5198.35 samples/sec Loss 0.6965 LearningRate 0.0029 Epoch: 16 Global Step: 277050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:52:09,155-Speed 5149.31 samples/sec Loss 0.6889 LearningRate 0.0029 Epoch: 16 Global Step: 277060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:52:11,122-Speed 5208.64 samples/sec Loss 0.6571 LearningRate 0.0029 Epoch: 16 Global Step: 277070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:52:13,095-Speed 5191.34 samples/sec Loss 0.6765 LearningRate 0.0029 Epoch: 16 Global Step: 277080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:52:15,097-Speed 5117.49 samples/sec Loss 0.6804 LearningRate 0.0029 Epoch: 16 Global Step: 277090 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:52:17,062-Speed 5212.91 samples/sec Loss 0.6432 LearningRate 0.0029 Epoch: 16 Global Step: 277100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:52:19,035-Speed 5191.13 samples/sec Loss 0.6826 LearningRate 0.0029 Epoch: 16 Global Step: 277110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:52:21,010-Speed 5186.47 samples/sec Loss 0.6770 LearningRate 0.0029 Epoch: 16 Global Step: 277120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:52:22,991-Speed 5172.65 samples/sec Loss 0.6899 LearningRate 0.0029 Epoch: 16 Global Step: 277130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:52:24,966-Speed 5185.65 samples/sec Loss 0.6718 LearningRate 0.0029 Epoch: 16 Global Step: 277140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:52:26,975-Speed 5100.86 samples/sec Loss 0.6909 LearningRate 0.0029 Epoch: 16 Global Step: 277150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:52:28,952-Speed 5180.05 samples/sec Loss 0.6656 LearningRate 0.0029 Epoch: 16 Global Step: 277160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:52:30,942-Speed 5149.71 samples/sec Loss 0.6911 LearningRate 0.0029 Epoch: 16 Global Step: 277170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:52:32,915-Speed 5192.22 samples/sec Loss 0.7335 LearningRate 0.0029 Epoch: 16 Global Step: 277180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:52:34,902-Speed 5155.12 samples/sec Loss 0.6646 LearningRate 0.0029 Epoch: 16 Global Step: 277190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:52:36,889-Speed 5155.46 samples/sec Loss 0.7075 LearningRate 0.0029 Epoch: 16 Global Step: 277200 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:52:38,871-Speed 5167.85 samples/sec Loss 0.6779 LearningRate 0.0029 Epoch: 16 Global Step: 277210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:52:40,894-Speed 5064.38 samples/sec Loss 0.7428 LearningRate 0.0029 Epoch: 16 Global Step: 277220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:52:42,861-Speed 5208.53 samples/sec Loss 0.6964 LearningRate 0.0029 Epoch: 16 Global Step: 277230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:52:44,856-Speed 5134.88 samples/sec Loss 0.6597 LearningRate 0.0029 Epoch: 16 Global Step: 277240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:52:46,838-Speed 5167.87 samples/sec Loss 0.6353 LearningRate 0.0029 Epoch: 16 Global Step: 277250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:52:48,829-Speed 5146.43 samples/sec Loss 0.6790 LearningRate 0.0029 Epoch: 16 Global Step: 277260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:52:50,815-Speed 5155.75 samples/sec Loss 0.6794 LearningRate 0.0029 Epoch: 16 Global Step: 277270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:52:52,825-Speed 5098.12 samples/sec Loss 0.7011 LearningRate 0.0029 Epoch: 16 Global Step: 277280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:52:54,798-Speed 5192.02 samples/sec Loss 0.6311 LearningRate 0.0029 Epoch: 16 Global Step: 277290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:52:56,766-Speed 5206.09 samples/sec Loss 0.7037 LearningRate 0.0029 Epoch: 16 Global Step: 277300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:52:58,747-Speed 5170.63 samples/sec Loss 0.6656 LearningRate 0.0029 Epoch: 16 Global Step: 277310 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:53:00,747-Speed 5123.53 samples/sec Loss 0.6723 LearningRate 0.0029 Epoch: 16 Global Step: 277320 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:53:02,735-Speed 5153.27 samples/sec Loss 0.6955 LearningRate 0.0029 Epoch: 16 Global Step: 277330 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:53:04,740-Speed 5107.82 samples/sec Loss 0.6667 LearningRate 0.0029 Epoch: 16 Global Step: 277340 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:53:06,701-Speed 5223.70 samples/sec Loss 0.6831 LearningRate 0.0029 Epoch: 16 Global Step: 277350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:53:08,709-Speed 5103.04 samples/sec Loss 0.6716 LearningRate 0.0029 Epoch: 16 Global Step: 277360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:53:10,696-Speed 5155.77 samples/sec Loss 0.6606 LearningRate 0.0029 Epoch: 16 Global Step: 277370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:53:12,672-Speed 5184.18 samples/sec Loss 0.6940 LearningRate 0.0029 Epoch: 16 Global Step: 277380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:53:14,667-Speed 5136.20 samples/sec Loss 0.7337 LearningRate 0.0029 Epoch: 16 Global Step: 277390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:53:16,657-Speed 5146.48 samples/sec Loss 0.7018 LearningRate 0.0029 Epoch: 16 Global Step: 277400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:53:18,678-Speed 5069.19 samples/sec Loss 0.6654 LearningRate 0.0029 Epoch: 16 Global Step: 277410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:53:20,651-Speed 5192.08 samples/sec Loss 0.6796 LearningRate 0.0029 Epoch: 16 Global Step: 277420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:53:22,648-Speed 5129.73 samples/sec Loss 0.6843 LearningRate 0.0029 Epoch: 16 Global Step: 277430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:53:24,631-Speed 5166.94 samples/sec Loss 0.7017 LearningRate 0.0029 Epoch: 16 Global Step: 277440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:53:26,613-Speed 5168.07 samples/sec Loss 0.6861 LearningRate 0.0029 Epoch: 16 Global Step: 277450 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:53:28,582-Speed 5202.67 samples/sec Loss 0.6698 LearningRate 0.0029 Epoch: 16 Global Step: 277460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:53:30,572-Speed 5148.81 samples/sec Loss 0.6720 LearningRate 0.0028 Epoch: 16 Global Step: 277470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:53:32,567-Speed 5134.63 samples/sec Loss 0.7248 LearningRate 0.0028 Epoch: 16 Global Step: 277480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:53:34,553-Speed 5158.52 samples/sec Loss 0.6700 LearningRate 0.0028 Epoch: 16 Global Step: 277490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:53:36,525-Speed 5194.94 samples/sec Loss 0.7080 LearningRate 0.0028 Epoch: 16 Global Step: 277500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:53:38,503-Speed 5179.34 samples/sec Loss 0.6972 LearningRate 0.0028 Epoch: 16 Global Step: 277510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:53:40,479-Speed 5183.29 samples/sec Loss 0.6982 LearningRate 0.0028 Epoch: 16 Global Step: 277520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:53:42,453-Speed 5190.68 samples/sec Loss 0.7105 LearningRate 0.0028 Epoch: 16 Global Step: 277530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:53:44,431-Speed 5177.13 samples/sec Loss 0.6736 LearningRate 0.0028 Epoch: 16 Global Step: 277540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:53:46,426-Speed 5135.88 samples/sec Loss 0.6974 LearningRate 0.0028 Epoch: 16 Global Step: 277550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:53:48,410-Speed 5162.25 samples/sec Loss 0.6698 LearningRate 0.0028 Epoch: 16 Global Step: 277560 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:53:50,388-Speed 5178.52 samples/sec Loss 0.6791 LearningRate 0.0028 Epoch: 16 Global Step: 277570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:53:52,366-Speed 5180.82 samples/sec Loss 0.6984 LearningRate 0.0028 Epoch: 16 Global Step: 277580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:53:54,334-Speed 5203.84 samples/sec Loss 0.7010 LearningRate 0.0028 Epoch: 16 Global Step: 277590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:53:56,310-Speed 5184.65 samples/sec Loss 0.6654 LearningRate 0.0028 Epoch: 16 Global Step: 277600 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:53:58,283-Speed 5192.63 samples/sec Loss 0.6932 LearningRate 0.0028 Epoch: 16 Global Step: 277610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:00,253-Speed 5200.34 samples/sec Loss 0.6725 LearningRate 0.0028 Epoch: 16 Global Step: 277620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:02,236-Speed 5163.84 samples/sec Loss 0.7256 LearningRate 0.0028 Epoch: 16 Global Step: 277630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:04,204-Speed 5206.39 samples/sec Loss 0.6854 LearningRate 0.0028 Epoch: 16 Global Step: 277640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:06,185-Speed 5168.90 samples/sec Loss 0.7004 LearningRate 0.0028 Epoch: 16 Global Step: 277650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:08,154-Speed 5203.57 samples/sec Loss 0.6611 LearningRate 0.0028 Epoch: 16 Global Step: 277660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:10,145-Speed 5145.08 samples/sec Loss 0.6639 LearningRate 0.0028 Epoch: 16 Global Step: 277670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:12,129-Speed 5163.39 samples/sec Loss 0.7000 LearningRate 0.0028 Epoch: 16 Global Step: 277680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:14,115-Speed 5157.66 samples/sec Loss 0.6662 LearningRate 0.0028 Epoch: 16 Global Step: 277690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:16,092-Speed 5182.01 samples/sec Loss 0.7016 LearningRate 0.0028 Epoch: 16 Global Step: 277700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:18,062-Speed 5198.32 samples/sec Loss 0.6919 LearningRate 0.0028 Epoch: 16 Global Step: 277710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:20,034-Speed 5194.24 samples/sec Loss 0.6654 LearningRate 0.0028 Epoch: 16 Global Step: 277720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:22,032-Speed 5128.86 samples/sec Loss 0.6886 LearningRate 0.0028 Epoch: 16 Global Step: 277730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:24,016-Speed 5161.27 samples/sec Loss 0.7010 LearningRate 0.0028 Epoch: 16 Global Step: 277740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:26,000-Speed 5162.61 samples/sec Loss 0.6993 LearningRate 0.0028 Epoch: 16 Global Step: 277750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:27,990-Speed 5148.79 samples/sec Loss 0.7204 LearningRate 0.0028 Epoch: 16 Global Step: 277760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:29,983-Speed 5139.77 samples/sec Loss 0.6573 LearningRate 0.0028 Epoch: 16 Global Step: 277770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:31,951-Speed 5206.75 samples/sec Loss 0.6532 LearningRate 0.0028 Epoch: 16 Global Step: 277780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:33,918-Speed 5207.79 samples/sec Loss 0.6957 LearningRate 0.0028 Epoch: 16 Global Step: 277790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:35,897-Speed 5174.91 samples/sec Loss 0.7098 LearningRate 0.0028 Epoch: 16 Global Step: 277800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:37,873-Speed 5184.15 samples/sec Loss 0.6515 LearningRate 0.0028 Epoch: 16 Global Step: 277810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:39,845-Speed 5193.87 samples/sec Loss 0.6804 LearningRate 0.0028 Epoch: 16 Global Step: 277820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:41,821-Speed 5185.99 samples/sec Loss 0.7221 LearningRate 0.0028 Epoch: 16 Global Step: 277830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:43,797-Speed 5181.88 samples/sec Loss 0.7001 LearningRate 0.0028 Epoch: 16 Global Step: 277840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:45,808-Speed 5095.22 samples/sec Loss 0.7114 LearningRate 0.0028 Epoch: 16 Global Step: 277850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:47,836-Speed 5048.94 samples/sec Loss 0.7088 LearningRate 0.0028 Epoch: 16 Global Step: 277860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:49,831-Speed 5137.34 samples/sec Loss 0.7003 LearningRate 0.0028 Epoch: 16 Global Step: 277870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:51,817-Speed 5157.13 samples/sec Loss 0.6728 LearningRate 0.0028 Epoch: 16 Global Step: 277880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:53,825-Speed 5100.91 samples/sec Loss 0.7352 LearningRate 0.0028 Epoch: 16 Global Step: 277890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:55,845-Speed 5071.34 samples/sec Loss 0.6992 LearningRate 0.0028 Epoch: 16 Global Step: 277900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:54:57,839-Speed 5137.56 samples/sec Loss 0.6756 LearningRate 0.0028 Epoch: 16 Global Step: 277910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:54:59,827-Speed 5153.28 samples/sec Loss 0.6844 LearningRate 0.0028 Epoch: 16 Global Step: 277920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:55:01,825-Speed 5127.64 samples/sec Loss 0.6621 LearningRate 0.0028 Epoch: 16 Global Step: 277930 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:55:03,810-Speed 5161.01 samples/sec Loss 0.6674 LearningRate 0.0028 Epoch: 16 Global Step: 277940 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:55:05,794-Speed 5162.63 samples/sec Loss 0.6820 LearningRate 0.0028 Epoch: 16 Global Step: 277950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:55:07,774-Speed 5174.76 samples/sec Loss 0.7198 LearningRate 0.0028 Epoch: 16 Global Step: 277960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:55:09,760-Speed 5159.36 samples/sec Loss 0.6864 LearningRate 0.0028 Epoch: 16 Global Step: 277970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:55:11,736-Speed 5182.70 samples/sec Loss 0.6842 LearningRate 0.0028 Epoch: 16 Global Step: 277980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:55:13,714-Speed 5178.38 samples/sec Loss 0.7218 LearningRate 0.0028 Epoch: 16 Global Step: 277990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:55:15,693-Speed 5175.93 samples/sec Loss 0.6888 LearningRate 0.0028 Epoch: 16 Global Step: 278000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:55:42,858-[lfw][278000]XNorm: 21.970444 Training: 2022-04-11 17:55:42,859-[lfw][278000]Accuracy-Flip: 0.99783+-0.00289 Training: 2022-04-11 17:55:42,860-[lfw][278000]Accuracy-Highest: 0.99833 Training: 2022-04-11 17:56:14,075-[cfp_fp][278000]XNorm: 21.939130 Training: 2022-04-11 17:56:14,076-[cfp_fp][278000]Accuracy-Flip: 0.98729+-0.00493 Training: 2022-04-11 17:56:14,076-[cfp_fp][278000]Accuracy-Highest: 0.99000 Training: 2022-04-11 17:56:41,029-[agedb_30][278000]XNorm: 22.949261 Training: 2022-04-11 17:56:41,030-[agedb_30][278000]Accuracy-Flip: 0.98183+-0.00681 Training: 2022-04-11 17:56:41,030-[agedb_30][278000]Accuracy-Highest: 0.98333 Training: 2022-04-11 17:56:43,017-Speed 117.27 samples/sec Loss 0.6422 LearningRate 0.0028 Epoch: 16 Global Step: 278010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:56:44,981-Speed 5215.52 samples/sec Loss 0.6965 LearningRate 0.0028 Epoch: 16 Global Step: 278020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:56:46,951-Speed 5200.01 samples/sec Loss 0.6909 LearningRate 0.0028 Epoch: 16 Global Step: 278030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:56:48,948-Speed 5128.48 samples/sec Loss 0.6980 LearningRate 0.0028 Epoch: 16 Global Step: 278040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:56:50,945-Speed 5130.95 samples/sec Loss 0.6949 LearningRate 0.0028 Epoch: 16 Global Step: 278050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:56:52,927-Speed 5169.61 samples/sec Loss 0.7081 LearningRate 0.0028 Epoch: 16 Global Step: 278060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:56:54,906-Speed 5176.16 samples/sec Loss 0.7226 LearningRate 0.0028 Epoch: 16 Global Step: 278070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:56:56,876-Speed 5199.97 samples/sec Loss 0.6944 LearningRate 0.0028 Epoch: 16 Global Step: 278080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:56:58,855-Speed 5174.40 samples/sec Loss 0.7008 LearningRate 0.0028 Epoch: 16 Global Step: 278090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:57:00,821-Speed 5210.15 samples/sec Loss 0.6809 LearningRate 0.0028 Epoch: 16 Global Step: 278100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:57:02,847-Speed 5056.95 samples/sec Loss 0.6930 LearningRate 0.0028 Epoch: 16 Global Step: 278110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:57:04,823-Speed 5184.83 samples/sec Loss 0.7146 LearningRate 0.0028 Epoch: 16 Global Step: 278120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:57:06,790-Speed 5208.27 samples/sec Loss 0.7282 LearningRate 0.0028 Epoch: 16 Global Step: 278130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:57:08,782-Speed 5143.04 samples/sec Loss 0.7142 LearningRate 0.0028 Epoch: 16 Global Step: 278140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:57:10,773-Speed 5145.69 samples/sec Loss 0.6632 LearningRate 0.0028 Epoch: 16 Global Step: 278150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:57:12,774-Speed 5118.23 samples/sec Loss 0.7262 LearningRate 0.0028 Epoch: 16 Global Step: 278160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:57:14,760-Speed 5159.46 samples/sec Loss 0.6964 LearningRate 0.0028 Epoch: 16 Global Step: 278170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:57:16,756-Speed 5130.05 samples/sec Loss 0.6967 LearningRate 0.0028 Epoch: 16 Global Step: 278180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:57:18,766-Speed 5097.32 samples/sec Loss 0.6833 LearningRate 0.0028 Epoch: 16 Global Step: 278190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:57:20,732-Speed 5211.55 samples/sec Loss 0.6792 LearningRate 0.0028 Epoch: 16 Global Step: 278200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:57:22,698-Speed 5209.15 samples/sec Loss 0.6902 LearningRate 0.0028 Epoch: 16 Global Step: 278210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:57:24,670-Speed 5194.92 samples/sec Loss 0.6806 LearningRate 0.0028 Epoch: 16 Global Step: 278220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:57:26,638-Speed 5204.44 samples/sec Loss 0.6684 LearningRate 0.0028 Epoch: 16 Global Step: 278230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:57:28,623-Speed 5159.67 samples/sec Loss 0.6984 LearningRate 0.0028 Epoch: 16 Global Step: 278240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:57:30,604-Speed 5171.10 samples/sec Loss 0.6835 LearningRate 0.0028 Epoch: 16 Global Step: 278250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:57:32,577-Speed 5191.78 samples/sec Loss 0.6483 LearningRate 0.0028 Epoch: 16 Global Step: 278260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:57:34,563-Speed 5157.90 samples/sec Loss 0.6945 LearningRate 0.0028 Epoch: 16 Global Step: 278270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:57:36,564-Speed 5120.90 samples/sec Loss 0.6735 LearningRate 0.0028 Epoch: 16 Global Step: 278280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:57:38,558-Speed 5138.62 samples/sec Loss 0.6699 LearningRate 0.0028 Epoch: 16 Global Step: 278290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:57:40,542-Speed 5161.56 samples/sec Loss 0.6945 LearningRate 0.0028 Epoch: 16 Global Step: 278300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:57:42,511-Speed 5202.63 samples/sec Loss 0.6359 LearningRate 0.0028 Epoch: 16 Global Step: 278310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:57:44,481-Speed 5198.54 samples/sec Loss 0.6965 LearningRate 0.0028 Epoch: 16 Global Step: 278320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:57:46,458-Speed 5182.74 samples/sec Loss 0.6716 LearningRate 0.0028 Epoch: 16 Global Step: 278330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:57:48,450-Speed 5142.50 samples/sec Loss 0.6870 LearningRate 0.0028 Epoch: 16 Global Step: 278340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:57:50,433-Speed 5164.62 samples/sec Loss 0.6524 LearningRate 0.0028 Epoch: 16 Global Step: 278350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:57:52,431-Speed 5128.05 samples/sec Loss 0.7188 LearningRate 0.0028 Epoch: 16 Global Step: 278360 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:57:54,403-Speed 5194.04 samples/sec Loss 0.6769 LearningRate 0.0028 Epoch: 16 Global Step: 278370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:57:56,372-Speed 5202.67 samples/sec Loss 0.6785 LearningRate 0.0028 Epoch: 16 Global Step: 278380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:57:58,342-Speed 5200.58 samples/sec Loss 0.6810 LearningRate 0.0028 Epoch: 16 Global Step: 278390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:58:00,324-Speed 5167.81 samples/sec Loss 0.6802 LearningRate 0.0028 Epoch: 16 Global Step: 278400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:58:02,336-Speed 5093.05 samples/sec Loss 0.6748 LearningRate 0.0028 Epoch: 16 Global Step: 278410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:58:04,332-Speed 5130.94 samples/sec Loss 0.6967 LearningRate 0.0028 Epoch: 16 Global Step: 278420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:58:06,303-Speed 5196.76 samples/sec Loss 0.6818 LearningRate 0.0028 Epoch: 16 Global Step: 278430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:58:08,267-Speed 5216.92 samples/sec Loss 0.6965 LearningRate 0.0028 Epoch: 16 Global Step: 278440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:58:10,241-Speed 5190.42 samples/sec Loss 0.6950 LearningRate 0.0028 Epoch: 16 Global Step: 278450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:58:12,214-Speed 5190.42 samples/sec Loss 0.6818 LearningRate 0.0028 Epoch: 16 Global Step: 278460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:58:14,188-Speed 5189.92 samples/sec Loss 0.6526 LearningRate 0.0027 Epoch: 16 Global Step: 278470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:58:16,152-Speed 5215.78 samples/sec Loss 0.6859 LearningRate 0.0027 Epoch: 16 Global Step: 278480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:58:18,118-Speed 5209.05 samples/sec Loss 0.6971 LearningRate 0.0027 Epoch: 16 Global Step: 278490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:58:20,109-Speed 5146.38 samples/sec Loss 0.6669 LearningRate 0.0027 Epoch: 16 Global Step: 278500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:58:22,127-Speed 5077.03 samples/sec Loss 0.7036 LearningRate 0.0027 Epoch: 16 Global Step: 278510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:58:24,109-Speed 5169.41 samples/sec Loss 0.7121 LearningRate 0.0027 Epoch: 16 Global Step: 278520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:58:26,077-Speed 5204.77 samples/sec Loss 0.6879 LearningRate 0.0027 Epoch: 16 Global Step: 278530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:58:28,054-Speed 5181.85 samples/sec Loss 0.7085 LearningRate 0.0027 Epoch: 16 Global Step: 278540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:58:30,021-Speed 5207.13 samples/sec Loss 0.6874 LearningRate 0.0027 Epoch: 16 Global Step: 278550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:58:31,983-Speed 5221.21 samples/sec Loss 0.6766 LearningRate 0.0027 Epoch: 16 Global Step: 278560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:58:33,950-Speed 5207.81 samples/sec Loss 0.6891 LearningRate 0.0027 Epoch: 16 Global Step: 278570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:58:35,956-Speed 5106.93 samples/sec Loss 0.7085 LearningRate 0.0027 Epoch: 16 Global Step: 278580 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:58:37,937-Speed 5171.21 samples/sec Loss 0.7245 LearningRate 0.0027 Epoch: 16 Global Step: 278590 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:58:39,910-Speed 5192.21 samples/sec Loss 0.6880 LearningRate 0.0027 Epoch: 16 Global Step: 278600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:58:41,901-Speed 5146.88 samples/sec Loss 0.7039 LearningRate 0.0027 Epoch: 16 Global Step: 278610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:58:43,869-Speed 5205.15 samples/sec Loss 0.7097 LearningRate 0.0027 Epoch: 16 Global Step: 278620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:58:45,842-Speed 5191.55 samples/sec Loss 0.6816 LearningRate 0.0027 Epoch: 16 Global Step: 278630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:58:47,814-Speed 5193.22 samples/sec Loss 0.6598 LearningRate 0.0027 Epoch: 16 Global Step: 278640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:58:49,785-Speed 5198.65 samples/sec Loss 0.6880 LearningRate 0.0027 Epoch: 16 Global Step: 278650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:58:51,756-Speed 5195.92 samples/sec Loss 0.7098 LearningRate 0.0027 Epoch: 16 Global Step: 278660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:58:53,720-Speed 5216.05 samples/sec Loss 0.6757 LearningRate 0.0027 Epoch: 16 Global Step: 278670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:58:55,684-Speed 5215.47 samples/sec Loss 0.6995 LearningRate 0.0027 Epoch: 16 Global Step: 278680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:58:57,683-Speed 5125.79 samples/sec Loss 0.6697 LearningRate 0.0027 Epoch: 16 Global Step: 278690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:58:59,650-Speed 5205.63 samples/sec Loss 0.6720 LearningRate 0.0027 Epoch: 16 Global Step: 278700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:59:01,611-Speed 5223.53 samples/sec Loss 0.6838 LearningRate 0.0027 Epoch: 16 Global Step: 278710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:59:03,579-Speed 5205.17 samples/sec Loss 0.6545 LearningRate 0.0027 Epoch: 16 Global Step: 278720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:59:05,543-Speed 5217.30 samples/sec Loss 0.6953 LearningRate 0.0027 Epoch: 16 Global Step: 278730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:59:07,508-Speed 5212.33 samples/sec Loss 0.6942 LearningRate 0.0027 Epoch: 16 Global Step: 278740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:59:09,482-Speed 5188.30 samples/sec Loss 0.6626 LearningRate 0.0027 Epoch: 16 Global Step: 278750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:59:11,448-Speed 5212.12 samples/sec Loss 0.6732 LearningRate 0.0027 Epoch: 16 Global Step: 278760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:59:13,433-Speed 5159.33 samples/sec Loss 0.7022 LearningRate 0.0027 Epoch: 16 Global Step: 278770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:59:15,399-Speed 5211.82 samples/sec Loss 0.6833 LearningRate 0.0027 Epoch: 16 Global Step: 278780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:59:17,372-Speed 5191.72 samples/sec Loss 0.6764 LearningRate 0.0027 Epoch: 16 Global Step: 278790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:59:19,380-Speed 5101.24 samples/sec Loss 0.6873 LearningRate 0.0027 Epoch: 16 Global Step: 278800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:59:21,348-Speed 5206.37 samples/sec Loss 0.6686 LearningRate 0.0027 Epoch: 16 Global Step: 278810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:59:23,335-Speed 5154.13 samples/sec Loss 0.6918 LearningRate 0.0027 Epoch: 16 Global Step: 278820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:59:25,311-Speed 5185.49 samples/sec Loss 0.7090 LearningRate 0.0027 Epoch: 16 Global Step: 278830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:59:27,278-Speed 5207.59 samples/sec Loss 0.7051 LearningRate 0.0027 Epoch: 16 Global Step: 278840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:59:29,244-Speed 5211.03 samples/sec Loss 0.6733 LearningRate 0.0027 Epoch: 16 Global Step: 278850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:59:31,213-Speed 5203.50 samples/sec Loss 0.6528 LearningRate 0.0027 Epoch: 16 Global Step: 278860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:59:33,196-Speed 5163.45 samples/sec Loss 0.6727 LearningRate 0.0027 Epoch: 16 Global Step: 278870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:59:35,198-Speed 5117.45 samples/sec Loss 0.6826 LearningRate 0.0027 Epoch: 16 Global Step: 278880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:59:37,173-Speed 5188.76 samples/sec Loss 0.6841 LearningRate 0.0027 Epoch: 16 Global Step: 278890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:59:39,144-Speed 5194.99 samples/sec Loss 0.6947 LearningRate 0.0027 Epoch: 16 Global Step: 278900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:59:41,161-Speed 5080.74 samples/sec Loss 0.6892 LearningRate 0.0027 Epoch: 16 Global Step: 278910 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:59:43,126-Speed 5213.27 samples/sec Loss 0.6846 LearningRate 0.0027 Epoch: 16 Global Step: 278920 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 17:59:45,114-Speed 5154.30 samples/sec Loss 0.7146 LearningRate 0.0027 Epoch: 16 Global Step: 278930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:59:47,092-Speed 5177.10 samples/sec Loss 0.6678 LearningRate 0.0027 Epoch: 16 Global Step: 278940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 17:59:49,082-Speed 5148.37 samples/sec Loss 0.6987 LearningRate 0.0027 Epoch: 16 Global Step: 278950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:59:51,079-Speed 5129.05 samples/sec Loss 0.6625 LearningRate 0.0027 Epoch: 16 Global Step: 278960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:59:53,081-Speed 5117.36 samples/sec Loss 0.6965 LearningRate 0.0027 Epoch: 16 Global Step: 278970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:59:55,060-Speed 5177.53 samples/sec Loss 0.7082 LearningRate 0.0027 Epoch: 16 Global Step: 278980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:59:57,061-Speed 5120.25 samples/sec Loss 0.7058 LearningRate 0.0027 Epoch: 16 Global Step: 278990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 17:59:59,032-Speed 5198.51 samples/sec Loss 0.6830 LearningRate 0.0027 Epoch: 16 Global Step: 279000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 18:00:01,003-Speed 5196.76 samples/sec Loss 0.6847 LearningRate 0.0027 Epoch: 16 Global Step: 279010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 18:00:02,992-Speed 5150.05 samples/sec Loss 0.7102 LearningRate 0.0027 Epoch: 16 Global Step: 279020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 18:00:04,962-Speed 5201.58 samples/sec Loss 0.7081 LearningRate 0.0027 Epoch: 16 Global Step: 279030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 18:00:06,934-Speed 5194.57 samples/sec Loss 0.7107 LearningRate 0.0027 Epoch: 16 Global Step: 279040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 18:00:08,905-Speed 5197.44 samples/sec Loss 0.6944 LearningRate 0.0027 Epoch: 16 Global Step: 279050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:00:10,882-Speed 5180.04 samples/sec Loss 0.7009 LearningRate 0.0027 Epoch: 16 Global Step: 279060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:00:12,876-Speed 5138.65 samples/sec Loss 0.7065 LearningRate 0.0027 Epoch: 16 Global Step: 279070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:00:14,844-Speed 5204.14 samples/sec Loss 0.6845 LearningRate 0.0027 Epoch: 16 Global Step: 279080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:00:16,811-Speed 5208.11 samples/sec Loss 0.6782 LearningRate 0.0027 Epoch: 16 Global Step: 279090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:00:18,777-Speed 5210.24 samples/sec Loss 0.6928 LearningRate 0.0027 Epoch: 16 Global Step: 279100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:00:20,757-Speed 5172.60 samples/sec Loss 0.6755 LearningRate 0.0027 Epoch: 16 Global Step: 279110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:00:22,729-Speed 5196.19 samples/sec Loss 0.6826 LearningRate 0.0027 Epoch: 16 Global Step: 279120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:00:24,713-Speed 5161.10 samples/sec Loss 0.6795 LearningRate 0.0027 Epoch: 16 Global Step: 279130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:00:26,682-Speed 5203.53 samples/sec Loss 0.6812 LearningRate 0.0027 Epoch: 16 Global Step: 279140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:00:28,679-Speed 5128.90 samples/sec Loss 0.6793 LearningRate 0.0027 Epoch: 16 Global Step: 279150 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 18:00:30,667-Speed 5154.44 samples/sec Loss 0.6982 LearningRate 0.0027 Epoch: 16 Global Step: 279160 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 18:00:32,640-Speed 5192.48 samples/sec Loss 0.6621 LearningRate 0.0027 Epoch: 16 Global Step: 279170 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 18:00:34,634-Speed 5137.36 samples/sec Loss 0.6493 LearningRate 0.0027 Epoch: 16 Global Step: 279180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:00:36,601-Speed 5207.78 samples/sec Loss 0.6811 LearningRate 0.0027 Epoch: 16 Global Step: 279190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:00:38,585-Speed 5163.44 samples/sec Loss 0.7015 LearningRate 0.0027 Epoch: 16 Global Step: 279200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:00:40,587-Speed 5116.15 samples/sec Loss 0.6868 LearningRate 0.0027 Epoch: 16 Global Step: 279210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:00:42,561-Speed 5189.47 samples/sec Loss 0.6914 LearningRate 0.0027 Epoch: 16 Global Step: 279220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:00:44,527-Speed 5212.80 samples/sec Loss 0.6734 LearningRate 0.0027 Epoch: 16 Global Step: 279230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:00:46,551-Speed 5061.10 samples/sec Loss 0.6679 LearningRate 0.0027 Epoch: 16 Global Step: 279240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:00:48,520-Speed 5202.16 samples/sec Loss 0.6788 LearningRate 0.0027 Epoch: 16 Global Step: 279250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:00:50,497-Speed 5182.21 samples/sec Loss 0.6848 LearningRate 0.0027 Epoch: 16 Global Step: 279260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:00:52,473-Speed 5184.36 samples/sec Loss 0.6798 LearningRate 0.0027 Epoch: 16 Global Step: 279270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:00:54,446-Speed 5191.67 samples/sec Loss 0.6893 LearningRate 0.0027 Epoch: 16 Global Step: 279280 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 18:00:56,432-Speed 5156.64 samples/sec Loss 0.7094 LearningRate 0.0027 Epoch: 16 Global Step: 279290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:00:58,398-Speed 5210.95 samples/sec Loss 0.6919 LearningRate 0.0027 Epoch: 16 Global Step: 279300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:01:00,371-Speed 5193.28 samples/sec Loss 0.6909 LearningRate 0.0027 Epoch: 16 Global Step: 279310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:01:02,379-Speed 5101.17 samples/sec Loss 0.6861 LearningRate 0.0027 Epoch: 16 Global Step: 279320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:01:04,371-Speed 5144.55 samples/sec Loss 0.6936 LearningRate 0.0027 Epoch: 16 Global Step: 279330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:01:06,370-Speed 5123.74 samples/sec Loss 0.6978 LearningRate 0.0027 Epoch: 16 Global Step: 279340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:01:08,338-Speed 5205.55 samples/sec Loss 0.7149 LearningRate 0.0027 Epoch: 16 Global Step: 279350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:01:10,332-Speed 5137.77 samples/sec Loss 0.6932 LearningRate 0.0027 Epoch: 16 Global Step: 279360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:01:12,312-Speed 5171.81 samples/sec Loss 0.6791 LearningRate 0.0027 Epoch: 16 Global Step: 279370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 18:01:14,282-Speed 5203.12 samples/sec Loss 0.6980 LearningRate 0.0027 Epoch: 16 Global Step: 279380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 18:01:16,269-Speed 5153.36 samples/sec Loss 0.6762 LearningRate 0.0027 Epoch: 16 Global Step: 279390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 18:01:18,278-Speed 5101.96 samples/sec Loss 0.6855 LearningRate 0.0027 Epoch: 16 Global Step: 279400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 18:01:20,245-Speed 5205.32 samples/sec Loss 0.6809 LearningRate 0.0027 Epoch: 16 Global Step: 279410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 18:01:22,222-Speed 5181.33 samples/sec Loss 0.7348 LearningRate 0.0027 Epoch: 16 Global Step: 279420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 18:01:24,216-Speed 5138.48 samples/sec Loss 0.6737 LearningRate 0.0027 Epoch: 16 Global Step: 279430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 18:01:26,186-Speed 5199.71 samples/sec Loss 0.6756 LearningRate 0.0027 Epoch: 16 Global Step: 279440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 18:01:28,173-Speed 5155.07 samples/sec Loss 0.6589 LearningRate 0.0027 Epoch: 16 Global Step: 279450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 18:01:30,141-Speed 5206.42 samples/sec Loss 0.6655 LearningRate 0.0027 Epoch: 16 Global Step: 279460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 18:01:32,138-Speed 5129.26 samples/sec Loss 0.6903 LearningRate 0.0027 Epoch: 16 Global Step: 279470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:01:34,113-Speed 5187.03 samples/sec Loss 0.6877 LearningRate 0.0026 Epoch: 16 Global Step: 279480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:01:36,090-Speed 5181.31 samples/sec Loss 0.6681 LearningRate 0.0026 Epoch: 16 Global Step: 279490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:01:38,075-Speed 5161.46 samples/sec Loss 0.7286 LearningRate 0.0026 Epoch: 16 Global Step: 279500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:01:40,069-Speed 5136.44 samples/sec Loss 0.7128 LearningRate 0.0026 Epoch: 16 Global Step: 279510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:01:42,054-Speed 5159.70 samples/sec Loss 0.7162 LearningRate 0.0026 Epoch: 16 Global Step: 279520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:01:44,029-Speed 5187.34 samples/sec Loss 0.6929 LearningRate 0.0026 Epoch: 16 Global Step: 279530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:01:46,017-Speed 5153.73 samples/sec Loss 0.6832 LearningRate 0.0026 Epoch: 16 Global Step: 279540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:01:48,027-Speed 5097.61 samples/sec Loss 0.6861 LearningRate 0.0026 Epoch: 16 Global Step: 279550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:01:49,994-Speed 5207.09 samples/sec Loss 0.6705 LearningRate 0.0026 Epoch: 16 Global Step: 279560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:01:51,973-Speed 5177.79 samples/sec Loss 0.6939 LearningRate 0.0026 Epoch: 16 Global Step: 279570 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 18:01:53,947-Speed 5188.52 samples/sec Loss 0.6759 LearningRate 0.0026 Epoch: 16 Global Step: 279580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:01:55,927-Speed 5173.65 samples/sec Loss 0.7138 LearningRate 0.0026 Epoch: 16 Global Step: 279590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:01:57,912-Speed 5161.32 samples/sec Loss 0.6756 LearningRate 0.0026 Epoch: 16 Global Step: 279600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:01:59,891-Speed 5175.15 samples/sec Loss 0.6948 LearningRate 0.0026 Epoch: 16 Global Step: 279610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:02:01,886-Speed 5135.70 samples/sec Loss 0.6924 LearningRate 0.0026 Epoch: 16 Global Step: 279620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:02:03,918-Speed 5039.85 samples/sec Loss 0.6970 LearningRate 0.0026 Epoch: 16 Global Step: 279630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:02:05,895-Speed 5183.81 samples/sec Loss 0.7060 LearningRate 0.0026 Epoch: 16 Global Step: 279640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:02:07,870-Speed 5184.86 samples/sec Loss 0.6962 LearningRate 0.0026 Epoch: 16 Global Step: 279650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:02:09,868-Speed 5127.43 samples/sec Loss 0.6863 LearningRate 0.0026 Epoch: 16 Global Step: 279660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:02:11,859-Speed 5146.00 samples/sec Loss 0.6640 LearningRate 0.0026 Epoch: 16 Global Step: 279670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:02:13,833-Speed 5190.21 samples/sec Loss 0.6722 LearningRate 0.0026 Epoch: 16 Global Step: 279680 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 18:02:15,805-Speed 5193.74 samples/sec Loss 0.6818 LearningRate 0.0026 Epoch: 16 Global Step: 279690 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 18:02:17,786-Speed 5172.92 samples/sec Loss 0.6862 LearningRate 0.0026 Epoch: 16 Global Step: 279700 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 18:02:19,766-Speed 5173.06 samples/sec Loss 0.7292 LearningRate 0.0026 Epoch: 16 Global Step: 279710 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 18:02:21,740-Speed 5190.31 samples/sec Loss 0.7199 LearningRate 0.0026 Epoch: 16 Global Step: 279720 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 18:02:23,716-Speed 5182.74 samples/sec Loss 0.6703 LearningRate 0.0026 Epoch: 16 Global Step: 279730 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 18:02:25,703-Speed 5154.21 samples/sec Loss 0.6950 LearningRate 0.0026 Epoch: 16 Global Step: 279740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:02:27,683-Speed 5174.33 samples/sec Loss 0.6704 LearningRate 0.0026 Epoch: 16 Global Step: 279750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:02:29,654-Speed 5197.13 samples/sec Loss 0.7011 LearningRate 0.0026 Epoch: 16 Global Step: 279760 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:02:31,641-Speed 5154.56 samples/sec Loss 0.6940 LearningRate 0.0026 Epoch: 16 Global Step: 279770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:02:33,638-Speed 5129.32 samples/sec Loss 0.6773 LearningRate 0.0026 Epoch: 16 Global Step: 279780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:02:35,651-Speed 5091.02 samples/sec Loss 0.6975 LearningRate 0.0026 Epoch: 16 Global Step: 279790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:02:37,627-Speed 5184.34 samples/sec Loss 0.6833 LearningRate 0.0026 Epoch: 16 Global Step: 279800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:02:39,600-Speed 5191.66 samples/sec Loss 0.6779 LearningRate 0.0026 Epoch: 16 Global Step: 279810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:02:41,603-Speed 5113.15 samples/sec Loss 0.7211 LearningRate 0.0026 Epoch: 16 Global Step: 279820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:02:43,576-Speed 5192.57 samples/sec Loss 0.7268 LearningRate 0.0026 Epoch: 16 Global Step: 279830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:02:45,545-Speed 5202.65 samples/sec Loss 0.7172 LearningRate 0.0026 Epoch: 16 Global Step: 279840 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 18:02:47,515-Speed 5201.36 samples/sec Loss 0.7036 LearningRate 0.0026 Epoch: 16 Global Step: 279850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:02:49,510-Speed 5133.58 samples/sec Loss 0.7251 LearningRate 0.0026 Epoch: 16 Global Step: 279860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:02:51,521-Speed 5093.23 samples/sec Loss 0.7026 LearningRate 0.0026 Epoch: 16 Global Step: 279870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:02:53,498-Speed 5182.70 samples/sec Loss 0.6804 LearningRate 0.0026 Epoch: 16 Global Step: 279880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:02:55,466-Speed 5204.40 samples/sec Loss 0.6826 LearningRate 0.0026 Epoch: 16 Global Step: 279890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:02:57,437-Speed 5197.19 samples/sec Loss 0.7027 LearningRate 0.0026 Epoch: 16 Global Step: 279900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:02:59,438-Speed 5120.13 samples/sec Loss 0.6790 LearningRate 0.0026 Epoch: 16 Global Step: 279910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:03:01,447-Speed 5098.12 samples/sec Loss 0.6893 LearningRate 0.0026 Epoch: 16 Global Step: 279920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:03:03,420-Speed 5193.60 samples/sec Loss 0.7243 LearningRate 0.0026 Epoch: 16 Global Step: 279930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:03:05,397-Speed 5181.04 samples/sec Loss 0.6982 LearningRate 0.0026 Epoch: 16 Global Step: 279940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:03:07,397-Speed 5122.29 samples/sec Loss 0.6921 LearningRate 0.0026 Epoch: 16 Global Step: 279950 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 18:03:09,379-Speed 5168.53 samples/sec Loss 0.6961 LearningRate 0.0026 Epoch: 16 Global Step: 279960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:03:11,359-Speed 5173.89 samples/sec Loss 0.6853 LearningRate 0.0026 Epoch: 16 Global Step: 279970 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:03:13,350-Speed 5145.07 samples/sec Loss 0.7084 LearningRate 0.0026 Epoch: 16 Global Step: 279980 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:03:15,327-Speed 5180.94 samples/sec Loss 0.6718 LearningRate 0.0026 Epoch: 16 Global Step: 279990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:03:17,327-Speed 5121.84 samples/sec Loss 0.6863 LearningRate 0.0026 Epoch: 16 Global Step: 280000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:03:44,337-[lfw][280000]XNorm: 22.081640 Training: 2022-04-11 18:03:44,337-[lfw][280000]Accuracy-Flip: 0.99800+-0.00296 Training: 2022-04-11 18:03:44,338-[lfw][280000]Accuracy-Highest: 0.99833 Training: 2022-04-11 18:04:15,362-[cfp_fp][280000]XNorm: 21.939377 Training: 2022-04-11 18:04:15,363-[cfp_fp][280000]Accuracy-Flip: 0.98971+-0.00464 Training: 2022-04-11 18:04:15,363-[cfp_fp][280000]Accuracy-Highest: 0.99000 Training: 2022-04-11 18:04:42,121-[agedb_30][280000]XNorm: 22.811542 Training: 2022-04-11 18:04:42,122-[agedb_30][280000]Accuracy-Flip: 0.98333+-0.00601 Training: 2022-04-11 18:04:42,122-[agedb_30][280000]Accuracy-Highest: 0.98333 Training: 2022-04-11 18:04:44,102-Speed 118.01 samples/sec Loss 0.6814 LearningRate 0.0026 Epoch: 16 Global Step: 280010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:04:46,065-Speed 5219.02 samples/sec Loss 0.7099 LearningRate 0.0026 Epoch: 16 Global Step: 280020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:04:48,035-Speed 5198.60 samples/sec Loss 0.6771 LearningRate 0.0026 Epoch: 16 Global Step: 280030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:04:50,009-Speed 5189.56 samples/sec Loss 0.7086 LearningRate 0.0026 Epoch: 16 Global Step: 280040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:04:51,995-Speed 5158.44 samples/sec Loss 0.7196 LearningRate 0.0026 Epoch: 16 Global Step: 280050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:04:53,970-Speed 5185.93 samples/sec Loss 0.6666 LearningRate 0.0026 Epoch: 16 Global Step: 280060 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 18:04:55,939-Speed 5203.12 samples/sec Loss 0.7144 LearningRate 0.0026 Epoch: 16 Global Step: 280070 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 18:04:57,899-Speed 5227.54 samples/sec Loss 0.6761 LearningRate 0.0026 Epoch: 16 Global Step: 280080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:04:59,880-Speed 5171.30 samples/sec Loss 0.7040 LearningRate 0.0026 Epoch: 16 Global Step: 280090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:05:01,847-Speed 5207.81 samples/sec Loss 0.6626 LearningRate 0.0026 Epoch: 16 Global Step: 280100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:05:03,821-Speed 5188.83 samples/sec Loss 0.6928 LearningRate 0.0026 Epoch: 16 Global Step: 280110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:05:05,787-Speed 5210.94 samples/sec Loss 0.6985 LearningRate 0.0026 Epoch: 16 Global Step: 280120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 18:05:07,755-Speed 5204.23 samples/sec Loss 0.6759 LearningRate 0.0026 Epoch: 16 Global Step: 280130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 18:05:09,743-Speed 5152.13 samples/sec Loss 0.6901 LearningRate 0.0026 Epoch: 16 Global Step: 280140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 18:05:11,713-Speed 5199.94 samples/sec Loss 0.6523 LearningRate 0.0026 Epoch: 16 Global Step: 280150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 18:05:13,695-Speed 5168.50 samples/sec Loss 0.6824 LearningRate 0.0026 Epoch: 16 Global Step: 280160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 18:05:15,674-Speed 5175.52 samples/sec Loss 0.6996 LearningRate 0.0026 Epoch: 16 Global Step: 280170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 18:05:17,647-Speed 5194.64 samples/sec Loss 0.6821 LearningRate 0.0026 Epoch: 16 Global Step: 280180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 18:05:19,619-Speed 5194.76 samples/sec Loss 0.6951 LearningRate 0.0026 Epoch: 16 Global Step: 280190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 18:05:21,588-Speed 5201.18 samples/sec Loss 0.6812 LearningRate 0.0026 Epoch: 16 Global Step: 280200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 18:05:23,565-Speed 5180.66 samples/sec Loss 0.6547 LearningRate 0.0026 Epoch: 16 Global Step: 280210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-11 18:05:25,549-Speed 5163.63 samples/sec Loss 0.6994 LearningRate 0.0026 Epoch: 16 Global Step: 280220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:05:27,535-Speed 5158.63 samples/sec Loss 0.6856 LearningRate 0.0026 Epoch: 16 Global Step: 280230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:05:29,537-Speed 5116.09 samples/sec Loss 0.6894 LearningRate 0.0026 Epoch: 16 Global Step: 280240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:05:31,519-Speed 5170.49 samples/sec Loss 0.7027 LearningRate 0.0026 Epoch: 16 Global Step: 280250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:05:33,519-Speed 5122.29 samples/sec Loss 0.6872 LearningRate 0.0026 Epoch: 16 Global Step: 280260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:05:35,524-Speed 5110.41 samples/sec Loss 0.6658 LearningRate 0.0026 Epoch: 16 Global Step: 280270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:05:37,559-Speed 5033.29 samples/sec Loss 0.7212 LearningRate 0.0026 Epoch: 16 Global Step: 280280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:05:39,535-Speed 5186.01 samples/sec Loss 0.6924 LearningRate 0.0026 Epoch: 16 Global Step: 280290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:05:41,508-Speed 5189.56 samples/sec Loss 0.6823 LearningRate 0.0026 Epoch: 16 Global Step: 280300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:05:43,502-Speed 5138.99 samples/sec Loss 0.6918 LearningRate 0.0026 Epoch: 16 Global Step: 280310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:05:45,478-Speed 5185.09 samples/sec Loss 0.6844 LearningRate 0.0026 Epoch: 16 Global Step: 280320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:05:47,459-Speed 5170.18 samples/sec Loss 0.6401 LearningRate 0.0026 Epoch: 16 Global Step: 280330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:05:49,477-Speed 5078.95 samples/sec Loss 0.7013 LearningRate 0.0026 Epoch: 16 Global Step: 280340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:05:51,469-Speed 5141.46 samples/sec Loss 0.6809 LearningRate 0.0026 Epoch: 16 Global Step: 280350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:05:53,465-Speed 5132.22 samples/sec Loss 0.7027 LearningRate 0.0026 Epoch: 16 Global Step: 280360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:05:55,449-Speed 5164.26 samples/sec Loss 0.7122 LearningRate 0.0026 Epoch: 16 Global Step: 280370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:05:57,420-Speed 5195.43 samples/sec Loss 0.6786 LearningRate 0.0026 Epoch: 16 Global Step: 280380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:05:59,402-Speed 5168.82 samples/sec Loss 0.7150 LearningRate 0.0026 Epoch: 16 Global Step: 280390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:06:01,385-Speed 5167.12 samples/sec Loss 0.7054 LearningRate 0.0026 Epoch: 16 Global Step: 280400 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:06:03,388-Speed 5113.25 samples/sec Loss 0.7284 LearningRate 0.0026 Epoch: 16 Global Step: 280410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:06:05,361-Speed 5191.25 samples/sec Loss 0.6914 LearningRate 0.0026 Epoch: 16 Global Step: 280420 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 18:06:07,339-Speed 5180.28 samples/sec Loss 0.7012 LearningRate 0.0026 Epoch: 16 Global Step: 280430 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 18:06:09,351-Speed 5092.76 samples/sec Loss 0.6746 LearningRate 0.0026 Epoch: 16 Global Step: 280440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:06:11,323-Speed 5194.11 samples/sec Loss 0.6764 LearningRate 0.0026 Epoch: 16 Global Step: 280450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:06:13,314-Speed 5144.08 samples/sec Loss 0.6791 LearningRate 0.0026 Epoch: 16 Global Step: 280460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:06:15,299-Speed 5162.49 samples/sec Loss 0.6807 LearningRate 0.0026 Epoch: 16 Global Step: 280470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:06:17,264-Speed 5214.20 samples/sec Loss 0.6865 LearningRate 0.0026 Epoch: 16 Global Step: 280480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:06:19,232-Speed 5204.40 samples/sec Loss 0.7086 LearningRate 0.0026 Epoch: 16 Global Step: 280490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:06:21,212-Speed 5172.84 samples/sec Loss 0.6770 LearningRate 0.0026 Epoch: 16 Global Step: 280500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:06:23,197-Speed 5161.56 samples/sec Loss 0.6979 LearningRate 0.0026 Epoch: 16 Global Step: 280510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:06:25,195-Speed 5127.72 samples/sec Loss 0.6829 LearningRate 0.0025 Epoch: 16 Global Step: 280520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:06:27,167-Speed 5193.58 samples/sec Loss 0.7170 LearningRate 0.0025 Epoch: 16 Global Step: 280530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:06:29,132-Speed 5214.62 samples/sec Loss 0.6750 LearningRate 0.0025 Epoch: 16 Global Step: 280540 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 18:06:31,090-Speed 5229.41 samples/sec Loss 0.7120 LearningRate 0.0025 Epoch: 16 Global Step: 280550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:06:33,066-Speed 5186.13 samples/sec Loss 0.6905 LearningRate 0.0025 Epoch: 16 Global Step: 280560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:06:35,051-Speed 5160.41 samples/sec Loss 0.6724 LearningRate 0.0025 Epoch: 16 Global Step: 280570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:06:37,024-Speed 5192.05 samples/sec Loss 0.6828 LearningRate 0.0025 Epoch: 16 Global Step: 280580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:06:39,028-Speed 5110.88 samples/sec Loss 0.7115 LearningRate 0.0025 Epoch: 16 Global Step: 280590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:06:41,024-Speed 5134.89 samples/sec Loss 0.6957 LearningRate 0.0025 Epoch: 16 Global Step: 280600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:06:42,998-Speed 5187.69 samples/sec Loss 0.6824 LearningRate 0.0025 Epoch: 16 Global Step: 280610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:06:44,969-Speed 5197.49 samples/sec Loss 0.6847 LearningRate 0.0025 Epoch: 16 Global Step: 280620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:06:46,944-Speed 5187.40 samples/sec Loss 0.6676 LearningRate 0.0025 Epoch: 16 Global Step: 280630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:06:48,917-Speed 5192.42 samples/sec Loss 0.6867 LearningRate 0.0025 Epoch: 16 Global Step: 280640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:06:50,893-Speed 5183.11 samples/sec Loss 0.6791 LearningRate 0.0025 Epoch: 16 Global Step: 280650 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 18:06:52,898-Speed 5110.14 samples/sec Loss 0.6858 LearningRate 0.0025 Epoch: 16 Global Step: 280660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:06:54,899-Speed 5119.15 samples/sec Loss 0.6857 LearningRate 0.0025 Epoch: 16 Global Step: 280670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:06:56,865-Speed 5209.82 samples/sec Loss 0.6788 LearningRate 0.0025 Epoch: 16 Global Step: 280680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:06:58,848-Speed 5165.18 samples/sec Loss 0.6668 LearningRate 0.0025 Epoch: 16 Global Step: 280690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:07:00,859-Speed 5096.52 samples/sec Loss 0.7008 LearningRate 0.0025 Epoch: 16 Global Step: 280700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:07:02,838-Speed 5174.71 samples/sec Loss 0.6997 LearningRate 0.0025 Epoch: 16 Global Step: 280710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:07:04,801-Speed 5218.27 samples/sec Loss 0.6828 LearningRate 0.0025 Epoch: 16 Global Step: 280720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:07:06,769-Speed 5207.37 samples/sec Loss 0.6902 LearningRate 0.0025 Epoch: 16 Global Step: 280730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:07:08,755-Speed 5157.51 samples/sec Loss 0.6932 LearningRate 0.0025 Epoch: 16 Global Step: 280740 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:07:10,750-Speed 5134.61 samples/sec Loss 0.6790 LearningRate 0.0025 Epoch: 16 Global Step: 280750 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:07:12,745-Speed 5135.41 samples/sec Loss 0.6807 LearningRate 0.0025 Epoch: 16 Global Step: 280760 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 18:07:14,735-Speed 5146.22 samples/sec Loss 0.7130 LearningRate 0.0025 Epoch: 16 Global Step: 280770 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:07:16,711-Speed 5183.88 samples/sec Loss 0.7054 LearningRate 0.0025 Epoch: 16 Global Step: 280780 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:07:18,688-Speed 5183.25 samples/sec Loss 0.6912 LearningRate 0.0025 Epoch: 16 Global Step: 280790 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:07:20,658-Speed 5197.06 samples/sec Loss 0.6694 LearningRate 0.0025 Epoch: 16 Global Step: 280800 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:07:22,626-Speed 5204.91 samples/sec Loss 0.7027 LearningRate 0.0025 Epoch: 16 Global Step: 280810 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:07:24,597-Speed 5198.12 samples/sec Loss 0.6911 LearningRate 0.0025 Epoch: 16 Global Step: 280820 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:07:26,575-Speed 5177.43 samples/sec Loss 0.6835 LearningRate 0.0025 Epoch: 16 Global Step: 280830 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:07:28,555-Speed 5175.52 samples/sec Loss 0.6671 LearningRate 0.0025 Epoch: 16 Global Step: 280840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:07:30,537-Speed 5167.06 samples/sec Loss 0.6940 LearningRate 0.0025 Epoch: 16 Global Step: 280850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:07:32,506-Speed 5204.60 samples/sec Loss 0.6730 LearningRate 0.0025 Epoch: 16 Global Step: 280860 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:07:34,492-Speed 5156.84 samples/sec Loss 0.7001 LearningRate 0.0025 Epoch: 16 Global Step: 280870 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:07:36,472-Speed 5174.77 samples/sec Loss 0.6767 LearningRate 0.0025 Epoch: 16 Global Step: 280880 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:07:38,463-Speed 5144.04 samples/sec Loss 0.6762 LearningRate 0.0025 Epoch: 16 Global Step: 280890 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:07:40,456-Speed 5139.59 samples/sec Loss 0.7245 LearningRate 0.0025 Epoch: 16 Global Step: 280900 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:07:42,419-Speed 5216.72 samples/sec Loss 0.7088 LearningRate 0.0025 Epoch: 16 Global Step: 280910 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:07:44,393-Speed 5190.93 samples/sec Loss 0.6928 LearningRate 0.0025 Epoch: 16 Global Step: 280920 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:07:46,411-Speed 5075.89 samples/sec Loss 0.7089 LearningRate 0.0025 Epoch: 16 Global Step: 280930 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:07:48,396-Speed 5160.10 samples/sec Loss 0.6769 LearningRate 0.0025 Epoch: 16 Global Step: 280940 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:07:50,367-Speed 5199.24 samples/sec Loss 0.6787 LearningRate 0.0025 Epoch: 16 Global Step: 280950 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:07:52,337-Speed 5198.17 samples/sec Loss 0.6917 LearningRate 0.0025 Epoch: 16 Global Step: 280960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:07:54,306-Speed 5203.47 samples/sec Loss 0.6793 LearningRate 0.0025 Epoch: 16 Global Step: 280970 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 18:07:56,306-Speed 5119.33 samples/sec Loss 0.6854 LearningRate 0.0025 Epoch: 16 Global Step: 280980 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 18:07:58,265-Speed 5231.12 samples/sec Loss 0.6771 LearningRate 0.0025 Epoch: 16 Global Step: 280990 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:00,253-Speed 5151.35 samples/sec Loss 0.7174 LearningRate 0.0025 Epoch: 16 Global Step: 281000 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:02,269-Speed 5081.60 samples/sec Loss 0.6980 LearningRate 0.0025 Epoch: 16 Global Step: 281010 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:04,245-Speed 5185.78 samples/sec Loss 0.6755 LearningRate 0.0025 Epoch: 16 Global Step: 281020 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:06,225-Speed 5173.01 samples/sec Loss 0.6985 LearningRate 0.0025 Epoch: 16 Global Step: 281030 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:08,219-Speed 5138.82 samples/sec Loss 0.6507 LearningRate 0.0025 Epoch: 16 Global Step: 281040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:10,183-Speed 5214.55 samples/sec Loss 0.6959 LearningRate 0.0025 Epoch: 16 Global Step: 281050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:12,154-Speed 5199.14 samples/sec Loss 0.6829 LearningRate 0.0025 Epoch: 16 Global Step: 281060 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:14,128-Speed 5188.48 samples/sec Loss 0.7073 LearningRate 0.0025 Epoch: 16 Global Step: 281070 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:16,121-Speed 5140.12 samples/sec Loss 0.6895 LearningRate 0.0025 Epoch: 16 Global Step: 281080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:18,113-Speed 5142.55 samples/sec Loss 0.6754 LearningRate 0.0025 Epoch: 16 Global Step: 281090 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:20,096-Speed 5165.83 samples/sec Loss 0.6740 LearningRate 0.0025 Epoch: 16 Global Step: 281100 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:22,070-Speed 5187.54 samples/sec Loss 0.7176 LearningRate 0.0025 Epoch: 16 Global Step: 281110 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:24,064-Speed 5137.20 samples/sec Loss 0.6972 LearningRate 0.0025 Epoch: 16 Global Step: 281120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:26,080-Speed 5083.38 samples/sec Loss 0.6924 LearningRate 0.0025 Epoch: 16 Global Step: 281130 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:28,067-Speed 5154.40 samples/sec Loss 0.7260 LearningRate 0.0025 Epoch: 16 Global Step: 281140 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:30,055-Speed 5154.80 samples/sec Loss 0.6998 LearningRate 0.0025 Epoch: 16 Global Step: 281150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:32,021-Speed 5208.97 samples/sec Loss 0.6862 LearningRate 0.0025 Epoch: 16 Global Step: 281160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:33,998-Speed 5183.47 samples/sec Loss 0.6819 LearningRate 0.0025 Epoch: 16 Global Step: 281170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:35,992-Speed 5136.29 samples/sec Loss 0.6823 LearningRate 0.0025 Epoch: 16 Global Step: 281180 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:37,973-Speed 5170.29 samples/sec Loss 0.6655 LearningRate 0.0025 Epoch: 16 Global Step: 281190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:39,946-Speed 5194.13 samples/sec Loss 0.6601 LearningRate 0.0025 Epoch: 16 Global Step: 281200 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:41,917-Speed 5195.73 samples/sec Loss 0.6449 LearningRate 0.0025 Epoch: 16 Global Step: 281210 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:43,913-Speed 5133.84 samples/sec Loss 0.6940 LearningRate 0.0025 Epoch: 16 Global Step: 281220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:45,922-Speed 5097.75 samples/sec Loss 0.6950 LearningRate 0.0025 Epoch: 16 Global Step: 281230 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:47,892-Speed 5201.32 samples/sec Loss 0.6986 LearningRate 0.0025 Epoch: 16 Global Step: 281240 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:49,875-Speed 5165.77 samples/sec Loss 0.7017 LearningRate 0.0025 Epoch: 16 Global Step: 281250 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:51,845-Speed 5199.99 samples/sec Loss 0.6840 LearningRate 0.0025 Epoch: 16 Global Step: 281260 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:53,827-Speed 5167.24 samples/sec Loss 0.7029 LearningRate 0.0025 Epoch: 16 Global Step: 281270 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:55,798-Speed 5196.78 samples/sec Loss 0.6571 LearningRate 0.0025 Epoch: 16 Global Step: 281280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:08:57,789-Speed 5145.22 samples/sec Loss 0.7262 LearningRate 0.0025 Epoch: 16 Global Step: 281290 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 18:08:59,762-Speed 5190.66 samples/sec Loss 0.6912 LearningRate 0.0025 Epoch: 16 Global Step: 281300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:01,736-Speed 5188.49 samples/sec Loss 0.6592 LearningRate 0.0025 Epoch: 16 Global Step: 281310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:03,707-Speed 5199.17 samples/sec Loss 0.6986 LearningRate 0.0025 Epoch: 16 Global Step: 281320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:05,680-Speed 5192.68 samples/sec Loss 0.6999 LearningRate 0.0025 Epoch: 16 Global Step: 281330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:07,661-Speed 5170.50 samples/sec Loss 0.6658 LearningRate 0.0025 Epoch: 16 Global Step: 281340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:09,648-Speed 5154.48 samples/sec Loss 0.7092 LearningRate 0.0025 Epoch: 16 Global Step: 281350 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:11,639-Speed 5145.11 samples/sec Loss 0.6807 LearningRate 0.0025 Epoch: 16 Global Step: 281360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:13,611-Speed 5195.98 samples/sec Loss 0.7041 LearningRate 0.0025 Epoch: 16 Global Step: 281370 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:15,593-Speed 5166.53 samples/sec Loss 0.6916 LearningRate 0.0025 Epoch: 16 Global Step: 281380 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:17,578-Speed 5162.05 samples/sec Loss 0.6718 LearningRate 0.0025 Epoch: 16 Global Step: 281390 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:19,579-Speed 5118.51 samples/sec Loss 0.7206 LearningRate 0.0025 Epoch: 16 Global Step: 281400 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 18:09:21,564-Speed 5160.65 samples/sec Loss 0.6843 LearningRate 0.0025 Epoch: 16 Global Step: 281410 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:23,595-Speed 5043.36 samples/sec Loss 0.6905 LearningRate 0.0025 Epoch: 16 Global Step: 281420 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:25,571-Speed 5183.83 samples/sec Loss 0.7119 LearningRate 0.0025 Epoch: 16 Global Step: 281430 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:27,542-Speed 5199.05 samples/sec Loss 0.6527 LearningRate 0.0025 Epoch: 16 Global Step: 281440 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:29,553-Speed 5091.98 samples/sec Loss 0.6873 LearningRate 0.0025 Epoch: 16 Global Step: 281450 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:31,526-Speed 5192.63 samples/sec Loss 0.6746 LearningRate 0.0025 Epoch: 16 Global Step: 281460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:33,524-Speed 5128.23 samples/sec Loss 0.6848 LearningRate 0.0025 Epoch: 16 Global Step: 281470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:35,497-Speed 5192.02 samples/sec Loss 0.6918 LearningRate 0.0025 Epoch: 16 Global Step: 281480 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:37,471-Speed 5189.30 samples/sec Loss 0.6922 LearningRate 0.0025 Epoch: 16 Global Step: 281490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:39,479-Speed 5099.53 samples/sec Loss 0.6828 LearningRate 0.0025 Epoch: 16 Global Step: 281500 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:41,446-Speed 5209.79 samples/sec Loss 0.7028 LearningRate 0.0025 Epoch: 16 Global Step: 281510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:43,427-Speed 5171.22 samples/sec Loss 0.6964 LearningRate 0.0025 Epoch: 16 Global Step: 281520 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:45,408-Speed 5171.24 samples/sec Loss 0.6902 LearningRate 0.0025 Epoch: 16 Global Step: 281530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:47,399-Speed 5144.02 samples/sec Loss 0.6687 LearningRate 0.0025 Epoch: 16 Global Step: 281540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:49,405-Speed 5108.24 samples/sec Loss 0.6436 LearningRate 0.0025 Epoch: 16 Global Step: 281550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:51,382-Speed 5181.33 samples/sec Loss 0.7146 LearningRate 0.0025 Epoch: 16 Global Step: 281560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:53,387-Speed 5108.46 samples/sec Loss 0.6868 LearningRate 0.0024 Epoch: 16 Global Step: 281570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:55,367-Speed 5174.68 samples/sec Loss 0.7006 LearningRate 0.0024 Epoch: 16 Global Step: 281580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:57,357-Speed 5146.68 samples/sec Loss 0.6687 LearningRate 0.0024 Epoch: 16 Global Step: 281590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:09:59,352-Speed 5135.35 samples/sec Loss 0.6805 LearningRate 0.0024 Epoch: 16 Global Step: 281600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:10:01,324-Speed 5196.46 samples/sec Loss 0.6610 LearningRate 0.0024 Epoch: 16 Global Step: 281610 Fp16 Grad Scale: 131072 Required: 4 hours Training: 2022-04-11 18:10:03,296-Speed 5193.75 samples/sec Loss 0.7085 LearningRate 0.0024 Epoch: 16 Global Step: 281620 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:10:05,263-Speed 5207.47 samples/sec Loss 0.6878 LearningRate 0.0024 Epoch: 16 Global Step: 281630 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:10:07,237-Speed 5190.63 samples/sec Loss 0.6903 LearningRate 0.0024 Epoch: 16 Global Step: 281640 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:10:09,232-Speed 5133.02 samples/sec Loss 0.6645 LearningRate 0.0024 Epoch: 16 Global Step: 281650 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:10:11,228-Speed 5135.56 samples/sec Loss 0.7086 LearningRate 0.0024 Epoch: 16 Global Step: 281660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:10:13,203-Speed 5186.11 samples/sec Loss 0.6807 LearningRate 0.0024 Epoch: 16 Global Step: 281670 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:10:15,187-Speed 5161.87 samples/sec Loss 0.7134 LearningRate 0.0024 Epoch: 16 Global Step: 281680 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:10:17,157-Speed 5199.24 samples/sec Loss 0.6885 LearningRate 0.0024 Epoch: 16 Global Step: 281690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:10:19,181-Speed 5062.11 samples/sec Loss 0.6974 LearningRate 0.0024 Epoch: 16 Global Step: 281700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-11 18:10:21,162-Speed 5172.75 samples/sec Loss 0.7115 LearningRate 0.0024 Epoch: 16 Global Step: 281710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:10:23,127-Speed 5212.00 samples/sec Loss 0.6780 LearningRate 0.0024 Epoch: 16 Global Step: 281720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:10:25,122-Speed 5134.25 samples/sec Loss 0.6862 LearningRate 0.0024 Epoch: 16 Global Step: 281730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:10:27,105-Speed 5166.36 samples/sec Loss 0.6635 LearningRate 0.0024 Epoch: 16 Global Step: 281740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:10:29,078-Speed 5193.04 samples/sec Loss 0.6949 LearningRate 0.0024 Epoch: 16 Global Step: 281750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:10:31,064-Speed 5156.85 samples/sec Loss 0.6662 LearningRate 0.0024 Epoch: 16 Global Step: 281760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:10:33,037-Speed 5192.32 samples/sec Loss 0.6947 LearningRate 0.0024 Epoch: 16 Global Step: 281770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:10:35,011-Speed 5191.70 samples/sec Loss 0.6973 LearningRate 0.0024 Epoch: 16 Global Step: 281780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:10:36,996-Speed 5158.59 samples/sec Loss 0.6593 LearningRate 0.0024 Epoch: 16 Global Step: 281790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:10:38,983-Speed 5157.31 samples/sec Loss 0.7064 LearningRate 0.0024 Epoch: 16 Global Step: 281800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:10:40,967-Speed 5162.03 samples/sec Loss 0.6929 LearningRate 0.0024 Epoch: 16 Global Step: 281810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:10:42,941-Speed 5191.60 samples/sec Loss 0.6612 LearningRate 0.0024 Epoch: 16 Global Step: 281820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:10:44,914-Speed 5189.95 samples/sec Loss 0.6998 LearningRate 0.0024 Epoch: 16 Global Step: 281830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:10:46,910-Speed 5134.65 samples/sec Loss 0.6595 LearningRate 0.0024 Epoch: 16 Global Step: 281840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:10:48,887-Speed 5180.04 samples/sec Loss 0.7069 LearningRate 0.0024 Epoch: 16 Global Step: 281850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:10:50,912-Speed 5059.24 samples/sec Loss 0.6672 LearningRate 0.0024 Epoch: 16 Global Step: 281860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:10:52,918-Speed 5107.73 samples/sec Loss 0.7103 LearningRate 0.0024 Epoch: 16 Global Step: 281870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:10:54,894-Speed 5183.06 samples/sec Loss 0.7003 LearningRate 0.0024 Epoch: 16 Global Step: 281880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:10:56,870-Speed 5185.35 samples/sec Loss 0.6567 LearningRate 0.0024 Epoch: 16 Global Step: 281890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:10:58,862-Speed 5142.02 samples/sec Loss 0.7007 LearningRate 0.0024 Epoch: 16 Global Step: 281900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:11:00,871-Speed 5098.35 samples/sec Loss 0.7047 LearningRate 0.0024 Epoch: 16 Global Step: 281910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:11:02,849-Speed 5179.31 samples/sec Loss 0.6653 LearningRate 0.0024 Epoch: 16 Global Step: 281920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:11:04,840-Speed 5144.87 samples/sec Loss 0.7095 LearningRate 0.0024 Epoch: 16 Global Step: 281930 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:11:06,806-Speed 5209.74 samples/sec Loss 0.6895 LearningRate 0.0024 Epoch: 16 Global Step: 281940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:11:08,777-Speed 5196.99 samples/sec Loss 0.6891 LearningRate 0.0024 Epoch: 16 Global Step: 281950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:11:10,763-Speed 5157.13 samples/sec Loss 0.6434 LearningRate 0.0024 Epoch: 16 Global Step: 281960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:11:12,769-Speed 5137.10 samples/sec Loss 0.7044 LearningRate 0.0024 Epoch: 16 Global Step: 281970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:11:14,747-Speed 5181.30 samples/sec Loss 0.6926 LearningRate 0.0024 Epoch: 16 Global Step: 281980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:11:16,734-Speed 5154.47 samples/sec Loss 0.6798 LearningRate 0.0024 Epoch: 16 Global Step: 281990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:11:18,725-Speed 5145.17 samples/sec Loss 0.6871 LearningRate 0.0024 Epoch: 16 Global Step: 282000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:11:45,844-[lfw][282000]XNorm: 21.705221 Training: 2022-04-11 18:11:45,845-[lfw][282000]Accuracy-Flip: 0.99800+-0.00287 Training: 2022-04-11 18:11:45,846-[lfw][282000]Accuracy-Highest: 0.99833 Training: 2022-04-11 18:12:16,879-[cfp_fp][282000]XNorm: 21.906669 Training: 2022-04-11 18:12:16,879-[cfp_fp][282000]Accuracy-Flip: 0.98857+-0.00429 Training: 2022-04-11 18:12:16,880-[cfp_fp][282000]Accuracy-Highest: 0.99000 Training: 2022-04-11 18:12:43,404-[agedb_30][282000]XNorm: 22.659515 Training: 2022-04-11 18:12:43,404-[agedb_30][282000]Accuracy-Flip: 0.98183+-0.00709 Training: 2022-04-11 18:12:43,405-[agedb_30][282000]Accuracy-Highest: 0.98333 Training: 2022-04-11 18:12:45,401-Speed 118.14 samples/sec Loss 0.6687 LearningRate 0.0024 Epoch: 16 Global Step: 282010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:12:47,358-Speed 5234.27 samples/sec Loss 0.6933 LearningRate 0.0024 Epoch: 16 Global Step: 282020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:12:49,336-Speed 5178.73 samples/sec Loss 0.6577 LearningRate 0.0024 Epoch: 16 Global Step: 282030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:12:51,305-Speed 5199.75 samples/sec Loss 0.6900 LearningRate 0.0024 Epoch: 16 Global Step: 282040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:12:53,274-Speed 5204.43 samples/sec Loss 0.6806 LearningRate 0.0024 Epoch: 16 Global Step: 282050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:12:55,249-Speed 5186.47 samples/sec Loss 0.7112 LearningRate 0.0024 Epoch: 16 Global Step: 282060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:12:57,211-Speed 5220.86 samples/sec Loss 0.7114 LearningRate 0.0024 Epoch: 16 Global Step: 282070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:12:59,183-Speed 5194.57 samples/sec Loss 0.6791 LearningRate 0.0024 Epoch: 16 Global Step: 282080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:13:01,151-Speed 5205.20 samples/sec Loss 0.6795 LearningRate 0.0024 Epoch: 16 Global Step: 282090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:13:03,113-Speed 5220.43 samples/sec Loss 0.6667 LearningRate 0.0024 Epoch: 16 Global Step: 282100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:13:05,081-Speed 5205.45 samples/sec Loss 0.6765 LearningRate 0.0024 Epoch: 16 Global Step: 282110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:13:07,059-Speed 5178.93 samples/sec Loss 0.6538 LearningRate 0.0024 Epoch: 16 Global Step: 282120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:13:09,031-Speed 5195.42 samples/sec Loss 0.6681 LearningRate 0.0024 Epoch: 16 Global Step: 282130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:13:11,018-Speed 5155.08 samples/sec Loss 0.7051 LearningRate 0.0024 Epoch: 16 Global Step: 282140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:13:13,004-Speed 5159.51 samples/sec Loss 0.6663 LearningRate 0.0024 Epoch: 16 Global Step: 282150 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:13:14,972-Speed 5205.21 samples/sec Loss 0.6984 LearningRate 0.0024 Epoch: 16 Global Step: 282160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:13:16,963-Speed 5145.42 samples/sec Loss 0.6942 LearningRate 0.0024 Epoch: 16 Global Step: 282170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:13:18,932-Speed 5200.86 samples/sec Loss 0.6835 LearningRate 0.0024 Epoch: 16 Global Step: 282180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:13:20,911-Speed 5187.13 samples/sec Loss 0.6645 LearningRate 0.0024 Epoch: 16 Global Step: 282190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:13:22,876-Speed 5212.08 samples/sec Loss 0.6976 LearningRate 0.0024 Epoch: 16 Global Step: 282200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:13:24,862-Speed 5157.13 samples/sec Loss 0.7281 LearningRate 0.0024 Epoch: 16 Global Step: 282210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:13:26,848-Speed 5157.21 samples/sec Loss 0.6927 LearningRate 0.0024 Epoch: 16 Global Step: 282220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:13:28,827-Speed 5178.54 samples/sec Loss 0.6963 LearningRate 0.0024 Epoch: 16 Global Step: 282230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:13:30,819-Speed 5141.32 samples/sec Loss 0.7226 LearningRate 0.0024 Epoch: 16 Global Step: 282240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:13:32,804-Speed 5161.30 samples/sec Loss 0.6760 LearningRate 0.0024 Epoch: 16 Global Step: 282250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:13:34,825-Speed 5069.01 samples/sec Loss 0.6905 LearningRate 0.0024 Epoch: 16 Global Step: 282260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:13:36,831-Speed 5106.00 samples/sec Loss 0.6672 LearningRate 0.0024 Epoch: 16 Global Step: 282270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:13:38,832-Speed 5121.13 samples/sec Loss 0.6943 LearningRate 0.0024 Epoch: 16 Global Step: 282280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:13:40,835-Speed 5113.85 samples/sec Loss 0.6889 LearningRate 0.0024 Epoch: 16 Global Step: 282290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:13:42,870-Speed 5033.09 samples/sec Loss 0.7091 LearningRate 0.0024 Epoch: 16 Global Step: 282300 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:13:44,845-Speed 5187.14 samples/sec Loss 0.7101 LearningRate 0.0024 Epoch: 16 Global Step: 282310 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:13:46,868-Speed 5065.85 samples/sec Loss 0.6760 LearningRate 0.0024 Epoch: 16 Global Step: 282320 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:13:48,841-Speed 5192.79 samples/sec Loss 0.6935 LearningRate 0.0024 Epoch: 16 Global Step: 282330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:13:50,816-Speed 5184.71 samples/sec Loss 0.7057 LearningRate 0.0024 Epoch: 16 Global Step: 282340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:13:52,825-Speed 5100.47 samples/sec Loss 0.6855 LearningRate 0.0024 Epoch: 16 Global Step: 282350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:13:54,789-Speed 5214.47 samples/sec Loss 0.6686 LearningRate 0.0024 Epoch: 16 Global Step: 282360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:13:56,777-Speed 5153.64 samples/sec Loss 0.7093 LearningRate 0.0024 Epoch: 16 Global Step: 282370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:13:58,760-Speed 5166.16 samples/sec Loss 0.6462 LearningRate 0.0024 Epoch: 16 Global Step: 282380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:14:00,783-Speed 5063.83 samples/sec Loss 0.7039 LearningRate 0.0024 Epoch: 16 Global Step: 282390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:14:02,799-Speed 5082.71 samples/sec Loss 0.6955 LearningRate 0.0024 Epoch: 16 Global Step: 282400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:14:04,773-Speed 5190.80 samples/sec Loss 0.6838 LearningRate 0.0024 Epoch: 16 Global Step: 282410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:14:06,743-Speed 5198.68 samples/sec Loss 0.6968 LearningRate 0.0024 Epoch: 16 Global Step: 282420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:14:08,739-Speed 5133.36 samples/sec Loss 0.6637 LearningRate 0.0024 Epoch: 16 Global Step: 282430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:14:10,705-Speed 5209.74 samples/sec Loss 0.7016 LearningRate 0.0024 Epoch: 16 Global Step: 282440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:14:12,694-Speed 5152.31 samples/sec Loss 0.6947 LearningRate 0.0024 Epoch: 16 Global Step: 282450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:14:14,672-Speed 5176.47 samples/sec Loss 0.6625 LearningRate 0.0024 Epoch: 16 Global Step: 282460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:14:16,660-Speed 5153.06 samples/sec Loss 0.6898 LearningRate 0.0024 Epoch: 16 Global Step: 282470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:14:18,646-Speed 5158.69 samples/sec Loss 0.6811 LearningRate 0.0024 Epoch: 16 Global Step: 282480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:14:20,612-Speed 5210.16 samples/sec Loss 0.6463 LearningRate 0.0024 Epoch: 16 Global Step: 282490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:14:22,583-Speed 5198.61 samples/sec Loss 0.6891 LearningRate 0.0024 Epoch: 16 Global Step: 282500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:14:24,576-Speed 5138.85 samples/sec Loss 0.6854 LearningRate 0.0024 Epoch: 16 Global Step: 282510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:14:26,570-Speed 5136.72 samples/sec Loss 0.6985 LearningRate 0.0024 Epoch: 16 Global Step: 282520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:14:28,548-Speed 5179.35 samples/sec Loss 0.6806 LearningRate 0.0024 Epoch: 16 Global Step: 282530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:14:30,521-Speed 5191.37 samples/sec Loss 0.6585 LearningRate 0.0024 Epoch: 16 Global Step: 282540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:14:32,492-Speed 5196.40 samples/sec Loss 0.7041 LearningRate 0.0024 Epoch: 16 Global Step: 282550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:14:34,457-Speed 5212.66 samples/sec Loss 0.7127 LearningRate 0.0024 Epoch: 16 Global Step: 282560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:14:36,461-Speed 5112.94 samples/sec Loss 0.7099 LearningRate 0.0024 Epoch: 16 Global Step: 282570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:14:38,428-Speed 5206.66 samples/sec Loss 0.7022 LearningRate 0.0024 Epoch: 16 Global Step: 282580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:14:40,396-Speed 5205.74 samples/sec Loss 0.6909 LearningRate 0.0024 Epoch: 16 Global Step: 282590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:14:42,366-Speed 5200.48 samples/sec Loss 0.6990 LearningRate 0.0024 Epoch: 16 Global Step: 282600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:14:44,330-Speed 5214.49 samples/sec Loss 0.6868 LearningRate 0.0024 Epoch: 16 Global Step: 282610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:14:46,298-Speed 5205.08 samples/sec Loss 0.6914 LearningRate 0.0024 Epoch: 16 Global Step: 282620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:14:48,274-Speed 5183.53 samples/sec Loss 0.6834 LearningRate 0.0024 Epoch: 16 Global Step: 282630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:14:50,263-Speed 5152.41 samples/sec Loss 0.7010 LearningRate 0.0024 Epoch: 16 Global Step: 282640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:14:52,269-Speed 5105.37 samples/sec Loss 0.6922 LearningRate 0.0023 Epoch: 16 Global Step: 282650 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:14:54,232-Speed 5218.47 samples/sec Loss 0.6692 LearningRate 0.0023 Epoch: 16 Global Step: 282660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:14:56,222-Speed 5149.09 samples/sec Loss 0.7325 LearningRate 0.0023 Epoch: 16 Global Step: 282670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:14:58,203-Speed 5169.56 samples/sec Loss 0.6767 LearningRate 0.0023 Epoch: 16 Global Step: 282680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:15:00,168-Speed 5215.17 samples/sec Loss 0.6878 LearningRate 0.0023 Epoch: 16 Global Step: 282690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:15:02,145-Speed 5180.48 samples/sec Loss 0.6729 LearningRate 0.0023 Epoch: 16 Global Step: 282700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:15:04,112-Speed 5207.91 samples/sec Loss 0.6329 LearningRate 0.0023 Epoch: 16 Global Step: 282710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:15:06,086-Speed 5188.58 samples/sec Loss 0.7054 LearningRate 0.0023 Epoch: 16 Global Step: 282720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:15:08,049-Speed 5217.57 samples/sec Loss 0.6897 LearningRate 0.0023 Epoch: 16 Global Step: 282730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:15:10,044-Speed 5137.07 samples/sec Loss 0.6730 LearningRate 0.0023 Epoch: 16 Global Step: 282740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:15:12,013-Speed 5202.81 samples/sec Loss 0.6785 LearningRate 0.0023 Epoch: 16 Global Step: 282750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:15:13,987-Speed 5189.32 samples/sec Loss 0.7106 LearningRate 0.0023 Epoch: 16 Global Step: 282760 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:15:15,945-Speed 5231.09 samples/sec Loss 0.6673 LearningRate 0.0023 Epoch: 16 Global Step: 282770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:15:17,923-Speed 5177.97 samples/sec Loss 0.6676 LearningRate 0.0023 Epoch: 16 Global Step: 282780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:15:19,921-Speed 5126.98 samples/sec Loss 0.6942 LearningRate 0.0023 Epoch: 16 Global Step: 282790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:15:21,904-Speed 5166.71 samples/sec Loss 0.7098 LearningRate 0.0023 Epoch: 16 Global Step: 282800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:15:23,881-Speed 5180.79 samples/sec Loss 0.7127 LearningRate 0.0023 Epoch: 16 Global Step: 282810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:15:25,870-Speed 5151.23 samples/sec Loss 0.6751 LearningRate 0.0023 Epoch: 16 Global Step: 282820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:15:27,851-Speed 5171.64 samples/sec Loss 0.6939 LearningRate 0.0023 Epoch: 16 Global Step: 282830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:15:29,822-Speed 5194.99 samples/sec Loss 0.6884 LearningRate 0.0023 Epoch: 16 Global Step: 282840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:15:31,794-Speed 5194.09 samples/sec Loss 0.6998 LearningRate 0.0023 Epoch: 16 Global Step: 282850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:15:33,780-Speed 5160.07 samples/sec Loss 0.7204 LearningRate 0.0023 Epoch: 16 Global Step: 282860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:15:35,768-Speed 5151.63 samples/sec Loss 0.7232 LearningRate 0.0023 Epoch: 16 Global Step: 282870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:15:37,770-Speed 5117.95 samples/sec Loss 0.7223 LearningRate 0.0023 Epoch: 16 Global Step: 282880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:15:39,738-Speed 5205.19 samples/sec Loss 0.6710 LearningRate 0.0023 Epoch: 16 Global Step: 282890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:15:41,761-Speed 5065.06 samples/sec Loss 0.6756 LearningRate 0.0023 Epoch: 16 Global Step: 282900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:15:43,726-Speed 5214.42 samples/sec Loss 0.6775 LearningRate 0.0023 Epoch: 16 Global Step: 282910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:15:45,691-Speed 5212.27 samples/sec Loss 0.6702 LearningRate 0.0023 Epoch: 16 Global Step: 282920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:15:47,702-Speed 5093.02 samples/sec Loss 0.6970 LearningRate 0.0023 Epoch: 16 Global Step: 282930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:15:49,673-Speed 5198.50 samples/sec Loss 0.6897 LearningRate 0.0023 Epoch: 16 Global Step: 282940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:15:51,641-Speed 5204.72 samples/sec Loss 0.6891 LearningRate 0.0023 Epoch: 16 Global Step: 282950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:15:53,653-Speed 5092.28 samples/sec Loss 0.6711 LearningRate 0.0023 Epoch: 16 Global Step: 282960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:15:55,624-Speed 5197.84 samples/sec Loss 0.7121 LearningRate 0.0023 Epoch: 16 Global Step: 282970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:15:57,621-Speed 5128.92 samples/sec Loss 0.7243 LearningRate 0.0023 Epoch: 16 Global Step: 282980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:15:59,633-Speed 5092.06 samples/sec Loss 0.6804 LearningRate 0.0023 Epoch: 16 Global Step: 282990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:16:01,619-Speed 5158.53 samples/sec Loss 0.6525 LearningRate 0.0023 Epoch: 16 Global Step: 283000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:16:03,595-Speed 5184.03 samples/sec Loss 0.6738 LearningRate 0.0023 Epoch: 16 Global Step: 283010 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:16:05,607-Speed 5092.26 samples/sec Loss 0.6675 LearningRate 0.0023 Epoch: 16 Global Step: 283020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:16:07,572-Speed 5211.77 samples/sec Loss 0.6996 LearningRate 0.0023 Epoch: 16 Global Step: 283030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:16:09,545-Speed 5193.43 samples/sec Loss 0.6946 LearningRate 0.0023 Epoch: 16 Global Step: 283040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:16:11,537-Speed 5142.30 samples/sec Loss 0.7032 LearningRate 0.0023 Epoch: 16 Global Step: 283050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:16:13,516-Speed 5176.21 samples/sec Loss 0.6818 LearningRate 0.0023 Epoch: 16 Global Step: 283060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:16:15,489-Speed 5191.49 samples/sec Loss 0.6993 LearningRate 0.0023 Epoch: 16 Global Step: 283070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:16:17,460-Speed 5197.24 samples/sec Loss 0.6770 LearningRate 0.0023 Epoch: 16 Global Step: 283080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:16:19,435-Speed 5187.10 samples/sec Loss 0.7013 LearningRate 0.0023 Epoch: 16 Global Step: 283090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:16:21,429-Speed 5136.35 samples/sec Loss 0.6839 LearningRate 0.0023 Epoch: 16 Global Step: 283100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:16:23,406-Speed 5182.36 samples/sec Loss 0.7015 LearningRate 0.0023 Epoch: 16 Global Step: 283110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:16:25,376-Speed 5200.83 samples/sec Loss 0.6854 LearningRate 0.0023 Epoch: 16 Global Step: 283120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:16:27,381-Speed 5109.20 samples/sec Loss 0.6642 LearningRate 0.0023 Epoch: 16 Global Step: 283130 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:16:29,371-Speed 5147.73 samples/sec Loss 0.6921 LearningRate 0.0023 Epoch: 16 Global Step: 283140 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:16:31,348-Speed 5179.80 samples/sec Loss 0.7146 LearningRate 0.0023 Epoch: 16 Global Step: 283150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:16:33,335-Speed 5156.03 samples/sec Loss 0.7120 LearningRate 0.0023 Epoch: 16 Global Step: 283160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:16:35,310-Speed 5187.70 samples/sec Loss 0.7156 LearningRate 0.0023 Epoch: 16 Global Step: 283170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:16:37,278-Speed 5203.01 samples/sec Loss 0.7337 LearningRate 0.0023 Epoch: 16 Global Step: 283180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:16:39,255-Speed 5182.61 samples/sec Loss 0.6823 LearningRate 0.0023 Epoch: 16 Global Step: 283190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:16:41,238-Speed 5165.65 samples/sec Loss 0.6550 LearningRate 0.0023 Epoch: 16 Global Step: 283200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:16:43,211-Speed 5192.84 samples/sec Loss 0.6757 LearningRate 0.0023 Epoch: 16 Global Step: 283210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:16:45,227-Speed 5082.09 samples/sec Loss 0.6669 LearningRate 0.0023 Epoch: 16 Global Step: 283220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:16:47,239-Speed 5091.65 samples/sec Loss 0.6929 LearningRate 0.0023 Epoch: 16 Global Step: 283230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:16:49,245-Speed 5108.60 samples/sec Loss 0.6832 LearningRate 0.0023 Epoch: 16 Global Step: 283240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:16:51,246-Speed 5118.45 samples/sec Loss 0.7309 LearningRate 0.0023 Epoch: 16 Global Step: 283250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:16:53,219-Speed 5192.82 samples/sec Loss 0.6782 LearningRate 0.0023 Epoch: 16 Global Step: 283260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:16:55,199-Speed 5172.80 samples/sec Loss 0.6693 LearningRate 0.0023 Epoch: 16 Global Step: 283270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:16:57,189-Speed 5146.93 samples/sec Loss 0.6902 LearningRate 0.0023 Epoch: 16 Global Step: 283280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:16:59,191-Speed 5117.92 samples/sec Loss 0.6570 LearningRate 0.0023 Epoch: 16 Global Step: 283290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:17:01,171-Speed 5172.89 samples/sec Loss 0.7357 LearningRate 0.0023 Epoch: 16 Global Step: 283300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:17:03,141-Speed 5201.08 samples/sec Loss 0.6746 LearningRate 0.0023 Epoch: 16 Global Step: 283310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:17:05,139-Speed 5125.96 samples/sec Loss 0.6901 LearningRate 0.0023 Epoch: 16 Global Step: 283320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:17:07,112-Speed 5193.62 samples/sec Loss 0.6968 LearningRate 0.0023 Epoch: 16 Global Step: 283330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:17:09,081-Speed 5201.23 samples/sec Loss 0.6815 LearningRate 0.0023 Epoch: 16 Global Step: 283340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:17:11,055-Speed 5188.51 samples/sec Loss 0.6820 LearningRate 0.0023 Epoch: 16 Global Step: 283350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:17:13,048-Speed 5141.07 samples/sec Loss 0.6638 LearningRate 0.0023 Epoch: 16 Global Step: 283360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:17:15,039-Speed 5144.93 samples/sec Loss 0.6667 LearningRate 0.0023 Epoch: 16 Global Step: 283370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:17:17,011-Speed 5193.63 samples/sec Loss 0.7128 LearningRate 0.0023 Epoch: 16 Global Step: 283380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:17:18,987-Speed 5185.31 samples/sec Loss 0.6989 LearningRate 0.0023 Epoch: 16 Global Step: 283390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:17:20,956-Speed 5202.55 samples/sec Loss 0.7083 LearningRate 0.0023 Epoch: 16 Global Step: 283400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:17:22,937-Speed 5170.97 samples/sec Loss 0.7059 LearningRate 0.0023 Epoch: 16 Global Step: 283410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:17:24,924-Speed 5154.91 samples/sec Loss 0.6865 LearningRate 0.0023 Epoch: 16 Global Step: 283420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:17:26,918-Speed 5139.33 samples/sec Loss 0.6999 LearningRate 0.0023 Epoch: 16 Global Step: 283430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:17:28,893-Speed 5184.37 samples/sec Loss 0.7016 LearningRate 0.0023 Epoch: 16 Global Step: 283440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:17:30,876-Speed 5167.06 samples/sec Loss 0.6747 LearningRate 0.0023 Epoch: 16 Global Step: 283450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:17:32,884-Speed 5101.70 samples/sec Loss 0.6996 LearningRate 0.0023 Epoch: 16 Global Step: 283460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:17:34,862-Speed 5180.14 samples/sec Loss 0.6662 LearningRate 0.0023 Epoch: 16 Global Step: 283470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:17:36,859-Speed 5129.82 samples/sec Loss 0.6793 LearningRate 0.0023 Epoch: 16 Global Step: 283480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:17:38,867-Speed 5101.37 samples/sec Loss 0.7019 LearningRate 0.0023 Epoch: 16 Global Step: 283490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:17:40,858-Speed 5145.18 samples/sec Loss 0.7351 LearningRate 0.0023 Epoch: 16 Global Step: 283500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:17:42,830-Speed 5196.01 samples/sec Loss 0.6887 LearningRate 0.0023 Epoch: 16 Global Step: 283510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:17:44,813-Speed 5164.63 samples/sec Loss 0.6970 LearningRate 0.0023 Epoch: 16 Global Step: 283520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:17:46,789-Speed 5183.74 samples/sec Loss 0.6911 LearningRate 0.0023 Epoch: 16 Global Step: 283530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:17:48,796-Speed 5106.48 samples/sec Loss 0.6837 LearningRate 0.0023 Epoch: 16 Global Step: 283540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:17:50,771-Speed 5186.55 samples/sec Loss 0.7065 LearningRate 0.0023 Epoch: 16 Global Step: 283550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:17:52,792-Speed 5069.42 samples/sec Loss 0.6880 LearningRate 0.0023 Epoch: 16 Global Step: 283560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:17:54,765-Speed 5192.36 samples/sec Loss 0.7071 LearningRate 0.0023 Epoch: 16 Global Step: 283570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:17:56,745-Speed 5172.21 samples/sec Loss 0.6885 LearningRate 0.0023 Epoch: 16 Global Step: 283580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:17:58,764-Speed 5075.97 samples/sec Loss 0.6782 LearningRate 0.0023 Epoch: 16 Global Step: 283590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:18:00,737-Speed 5191.00 samples/sec Loss 0.6720 LearningRate 0.0023 Epoch: 16 Global Step: 283600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:18:02,722-Speed 5160.09 samples/sec Loss 0.6899 LearningRate 0.0023 Epoch: 16 Global Step: 283610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:18:04,713-Speed 5146.58 samples/sec Loss 0.6759 LearningRate 0.0023 Epoch: 16 Global Step: 283620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:18:06,698-Speed 5162.04 samples/sec Loss 0.6680 LearningRate 0.0023 Epoch: 16 Global Step: 283630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:18:08,681-Speed 5163.69 samples/sec Loss 0.6772 LearningRate 0.0023 Epoch: 16 Global Step: 283640 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:18:10,661-Speed 5176.31 samples/sec Loss 0.6761 LearningRate 0.0023 Epoch: 16 Global Step: 283650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:18:12,637-Speed 5184.67 samples/sec Loss 0.6834 LearningRate 0.0023 Epoch: 16 Global Step: 283660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:18:14,637-Speed 5121.16 samples/sec Loss 0.7098 LearningRate 0.0023 Epoch: 16 Global Step: 283670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:18:16,638-Speed 5120.70 samples/sec Loss 0.6689 LearningRate 0.0023 Epoch: 16 Global Step: 283680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:18:18,604-Speed 5208.99 samples/sec Loss 0.6888 LearningRate 0.0023 Epoch: 16 Global Step: 283690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:18:20,579-Speed 5186.06 samples/sec Loss 0.6792 LearningRate 0.0023 Epoch: 16 Global Step: 283700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:18:22,573-Speed 5137.39 samples/sec Loss 0.6700 LearningRate 0.0023 Epoch: 16 Global Step: 283710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:18:24,572-Speed 5124.23 samples/sec Loss 0.6960 LearningRate 0.0023 Epoch: 16 Global Step: 283720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:18:26,550-Speed 5181.76 samples/sec Loss 0.6910 LearningRate 0.0023 Epoch: 16 Global Step: 283730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:18:28,759-Speed 4636.01 samples/sec Loss 0.7081 LearningRate 0.0023 Epoch: 16 Global Step: 283740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:18:58,952-Speed 339.17 samples/sec Loss 0.6594 LearningRate 0.0022 Epoch: 17 Global Step: 283750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:19:00,994-Speed 5016.62 samples/sec Loss 0.5132 LearningRate 0.0022 Epoch: 17 Global Step: 283760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:19:03,042-Speed 5002.74 samples/sec Loss 0.5033 LearningRate 0.0022 Epoch: 17 Global Step: 283770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:19:05,023-Speed 5172.21 samples/sec Loss 0.4698 LearningRate 0.0022 Epoch: 17 Global Step: 283780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:19:07,784-Speed 3710.14 samples/sec Loss 0.4750 LearningRate 0.0022 Epoch: 17 Global Step: 283790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:19:09,949-Speed 4731.80 samples/sec Loss 0.5004 LearningRate 0.0022 Epoch: 17 Global Step: 283800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:19:11,971-Speed 5067.77 samples/sec Loss 0.5000 LearningRate 0.0022 Epoch: 17 Global Step: 283810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:19:13,959-Speed 5152.97 samples/sec Loss 0.4840 LearningRate 0.0022 Epoch: 17 Global Step: 283820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:19:15,984-Speed 5062.21 samples/sec Loss 0.4929 LearningRate 0.0022 Epoch: 17 Global Step: 283830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:19:17,960-Speed 5182.43 samples/sec Loss 0.4779 LearningRate 0.0022 Epoch: 17 Global Step: 283840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:19:20,481-Speed 4065.56 samples/sec Loss 0.4815 LearningRate 0.0022 Epoch: 17 Global Step: 283850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:19:22,511-Speed 5046.67 samples/sec Loss 0.4848 LearningRate 0.0022 Epoch: 17 Global Step: 283860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:19:24,519-Speed 5102.36 samples/sec Loss 0.4790 LearningRate 0.0022 Epoch: 17 Global Step: 283870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:19:26,507-Speed 5153.48 samples/sec Loss 0.5011 LearningRate 0.0022 Epoch: 17 Global Step: 283880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:19:28,505-Speed 5127.67 samples/sec Loss 0.5105 LearningRate 0.0022 Epoch: 17 Global Step: 283890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:19:30,485-Speed 5174.04 samples/sec Loss 0.4927 LearningRate 0.0022 Epoch: 17 Global Step: 283900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:19:32,475-Speed 5147.65 samples/sec Loss 0.5186 LearningRate 0.0022 Epoch: 17 Global Step: 283910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:19:34,456-Speed 5172.54 samples/sec Loss 0.4970 LearningRate 0.0022 Epoch: 17 Global Step: 283920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:19:36,457-Speed 5124.64 samples/sec Loss 0.5056 LearningRate 0.0022 Epoch: 17 Global Step: 283930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:19:38,456-Speed 5125.41 samples/sec Loss 0.4920 LearningRate 0.0022 Epoch: 17 Global Step: 283940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:19:40,429-Speed 5192.24 samples/sec Loss 0.5032 LearningRate 0.0022 Epoch: 17 Global Step: 283950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:19:42,422-Speed 5140.78 samples/sec Loss 0.4931 LearningRate 0.0022 Epoch: 17 Global Step: 283960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:19:44,397-Speed 5186.02 samples/sec Loss 0.4981 LearningRate 0.0022 Epoch: 17 Global Step: 283970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:19:46,396-Speed 5125.89 samples/sec Loss 0.4989 LearningRate 0.0022 Epoch: 17 Global Step: 283980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:19:48,396-Speed 5123.64 samples/sec Loss 0.4676 LearningRate 0.0022 Epoch: 17 Global Step: 283990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:19:50,440-Speed 5012.16 samples/sec Loss 0.4857 LearningRate 0.0022 Epoch: 17 Global Step: 284000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:20:17,611-[lfw][284000]XNorm: 21.068084 Training: 2022-04-11 18:20:17,612-[lfw][284000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 18:20:17,613-[lfw][284000]Accuracy-Highest: 0.99833 Training: 2022-04-11 18:20:48,462-[cfp_fp][284000]XNorm: 21.187221 Training: 2022-04-11 18:20:48,463-[cfp_fp][284000]Accuracy-Flip: 0.98814+-0.00465 Training: 2022-04-11 18:20:48,464-[cfp_fp][284000]Accuracy-Highest: 0.99000 Training: 2022-04-11 18:21:14,965-[agedb_30][284000]XNorm: 21.897161 Training: 2022-04-11 18:21:14,966-[agedb_30][284000]Accuracy-Flip: 0.98333+-0.00662 Training: 2022-04-11 18:21:14,966-[agedb_30][284000]Accuracy-Highest: 0.98333 Training: 2022-04-11 18:21:16,973-Speed 118.34 samples/sec Loss 0.5036 LearningRate 0.0022 Epoch: 17 Global Step: 284010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:21:18,944-Speed 5195.81 samples/sec Loss 0.4699 LearningRate 0.0022 Epoch: 17 Global Step: 284020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:21:20,913-Speed 5201.83 samples/sec Loss 0.4829 LearningRate 0.0022 Epoch: 17 Global Step: 284030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:21:22,913-Speed 5124.61 samples/sec Loss 0.5051 LearningRate 0.0022 Epoch: 17 Global Step: 284040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:21:24,899-Speed 5157.97 samples/sec Loss 0.4838 LearningRate 0.0022 Epoch: 17 Global Step: 284050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:21:26,929-Speed 5046.19 samples/sec Loss 0.5056 LearningRate 0.0022 Epoch: 17 Global Step: 284060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:21:28,904-Speed 5184.60 samples/sec Loss 0.4966 LearningRate 0.0022 Epoch: 17 Global Step: 284070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:21:30,879-Speed 5189.35 samples/sec Loss 0.5067 LearningRate 0.0022 Epoch: 17 Global Step: 284080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:21:32,851-Speed 5193.39 samples/sec Loss 0.4790 LearningRate 0.0022 Epoch: 17 Global Step: 284090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:21:34,845-Speed 5137.45 samples/sec Loss 0.5046 LearningRate 0.0022 Epoch: 17 Global Step: 284100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:21:36,825-Speed 5172.59 samples/sec Loss 0.4890 LearningRate 0.0022 Epoch: 17 Global Step: 284110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:21:38,815-Speed 5148.17 samples/sec Loss 0.4761 LearningRate 0.0022 Epoch: 17 Global Step: 284120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:21:40,798-Speed 5167.12 samples/sec Loss 0.4941 LearningRate 0.0022 Epoch: 17 Global Step: 284130 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:21:42,775-Speed 5179.11 samples/sec Loss 0.4750 LearningRate 0.0022 Epoch: 17 Global Step: 284140 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:21:44,770-Speed 5135.56 samples/sec Loss 0.5020 LearningRate 0.0022 Epoch: 17 Global Step: 284150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:21:46,760-Speed 5147.18 samples/sec Loss 0.5050 LearningRate 0.0022 Epoch: 17 Global Step: 284160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:21:48,747-Speed 5154.69 samples/sec Loss 0.4846 LearningRate 0.0022 Epoch: 17 Global Step: 284170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:21:50,730-Speed 5167.28 samples/sec Loss 0.4941 LearningRate 0.0022 Epoch: 17 Global Step: 284180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:21:52,732-Speed 5115.53 samples/sec Loss 0.5081 LearningRate 0.0022 Epoch: 17 Global Step: 284190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:21:54,789-Speed 4981.58 samples/sec Loss 0.4909 LearningRate 0.0022 Epoch: 17 Global Step: 284200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:21:56,765-Speed 5184.89 samples/sec Loss 0.4941 LearningRate 0.0022 Epoch: 17 Global Step: 284210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:21:58,743-Speed 5177.61 samples/sec Loss 0.5049 LearningRate 0.0022 Epoch: 17 Global Step: 284220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:22:00,735-Speed 5142.68 samples/sec Loss 0.5139 LearningRate 0.0022 Epoch: 17 Global Step: 284230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:22:02,705-Speed 5198.75 samples/sec Loss 0.5113 LearningRate 0.0022 Epoch: 17 Global Step: 284240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:22:04,677-Speed 5196.16 samples/sec Loss 0.5111 LearningRate 0.0022 Epoch: 17 Global Step: 284250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:22:06,651-Speed 5187.92 samples/sec Loss 0.4805 LearningRate 0.0022 Epoch: 17 Global Step: 284260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:22:08,634-Speed 5166.43 samples/sec Loss 0.5234 LearningRate 0.0022 Epoch: 17 Global Step: 284270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:22:10,608-Speed 5190.07 samples/sec Loss 0.4902 LearningRate 0.0022 Epoch: 17 Global Step: 284280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:22:12,581-Speed 5191.03 samples/sec Loss 0.4813 LearningRate 0.0022 Epoch: 17 Global Step: 284290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:22:14,583-Speed 5118.98 samples/sec Loss 0.5124 LearningRate 0.0022 Epoch: 17 Global Step: 284300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:22:16,568-Speed 5159.11 samples/sec Loss 0.5115 LearningRate 0.0022 Epoch: 17 Global Step: 284310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:22:18,540-Speed 5193.40 samples/sec Loss 0.5020 LearningRate 0.0022 Epoch: 17 Global Step: 284320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:22:20,524-Speed 5165.44 samples/sec Loss 0.4922 LearningRate 0.0022 Epoch: 17 Global Step: 284330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:22:22,493-Speed 5201.71 samples/sec Loss 0.4789 LearningRate 0.0022 Epoch: 17 Global Step: 284340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:22:24,470-Speed 5181.25 samples/sec Loss 0.5107 LearningRate 0.0022 Epoch: 17 Global Step: 284350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:22:26,450-Speed 5174.65 samples/sec Loss 0.4997 LearningRate 0.0022 Epoch: 17 Global Step: 284360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:22:28,442-Speed 5141.26 samples/sec Loss 0.5053 LearningRate 0.0022 Epoch: 17 Global Step: 284370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:22:30,431-Speed 5151.85 samples/sec Loss 0.4880 LearningRate 0.0022 Epoch: 17 Global Step: 284380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:22:32,434-Speed 5114.36 samples/sec Loss 0.4658 LearningRate 0.0022 Epoch: 17 Global Step: 284390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:22:34,401-Speed 5207.10 samples/sec Loss 0.4787 LearningRate 0.0022 Epoch: 17 Global Step: 284400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:22:36,376-Speed 5187.63 samples/sec Loss 0.4809 LearningRate 0.0022 Epoch: 17 Global Step: 284410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:22:38,358-Speed 5167.91 samples/sec Loss 0.4886 LearningRate 0.0022 Epoch: 17 Global Step: 284420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:22:40,339-Speed 5170.98 samples/sec Loss 0.4711 LearningRate 0.0022 Epoch: 17 Global Step: 284430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:22:42,322-Speed 5165.48 samples/sec Loss 0.4864 LearningRate 0.0022 Epoch: 17 Global Step: 284440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:22:44,297-Speed 5187.71 samples/sec Loss 0.5501 LearningRate 0.0022 Epoch: 17 Global Step: 284450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:22:46,272-Speed 5186.95 samples/sec Loss 0.5013 LearningRate 0.0022 Epoch: 17 Global Step: 284460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:22:48,270-Speed 5125.87 samples/sec Loss 0.4987 LearningRate 0.0022 Epoch: 17 Global Step: 284470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:22:50,253-Speed 5165.19 samples/sec Loss 0.4744 LearningRate 0.0022 Epoch: 17 Global Step: 284480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:22:52,221-Speed 5205.28 samples/sec Loss 0.4697 LearningRate 0.0022 Epoch: 17 Global Step: 284490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:22:54,205-Speed 5162.72 samples/sec Loss 0.5068 LearningRate 0.0022 Epoch: 17 Global Step: 284500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:22:56,225-Speed 5072.19 samples/sec Loss 0.4873 LearningRate 0.0022 Epoch: 17 Global Step: 284510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:22:58,208-Speed 5166.61 samples/sec Loss 0.5021 LearningRate 0.0022 Epoch: 17 Global Step: 284520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:23:00,177-Speed 5201.77 samples/sec Loss 0.4971 LearningRate 0.0022 Epoch: 17 Global Step: 284530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:23:02,147-Speed 5200.61 samples/sec Loss 0.4931 LearningRate 0.0022 Epoch: 17 Global Step: 284540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:23:04,130-Speed 5165.17 samples/sec Loss 0.5067 LearningRate 0.0022 Epoch: 17 Global Step: 284550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:23:06,102-Speed 5194.08 samples/sec Loss 0.5004 LearningRate 0.0022 Epoch: 17 Global Step: 284560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:23:08,095-Speed 5138.62 samples/sec Loss 0.4922 LearningRate 0.0022 Epoch: 17 Global Step: 284570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:23:10,068-Speed 5193.50 samples/sec Loss 0.4829 LearningRate 0.0022 Epoch: 17 Global Step: 284580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:23:12,065-Speed 5129.05 samples/sec Loss 0.4931 LearningRate 0.0022 Epoch: 17 Global Step: 284590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:23:14,053-Speed 5152.07 samples/sec Loss 0.4928 LearningRate 0.0022 Epoch: 17 Global Step: 284600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:23:16,054-Speed 5118.98 samples/sec Loss 0.5050 LearningRate 0.0022 Epoch: 17 Global Step: 284610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:23:18,029-Speed 5187.89 samples/sec Loss 0.5121 LearningRate 0.0022 Epoch: 17 Global Step: 284620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:23:19,997-Speed 5205.46 samples/sec Loss 0.5041 LearningRate 0.0022 Epoch: 17 Global Step: 284630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:23:21,993-Speed 5132.41 samples/sec Loss 0.4674 LearningRate 0.0022 Epoch: 17 Global Step: 284640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:23:23,997-Speed 5110.24 samples/sec Loss 0.4886 LearningRate 0.0022 Epoch: 17 Global Step: 284650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:23:25,996-Speed 5123.90 samples/sec Loss 0.5049 LearningRate 0.0022 Epoch: 17 Global Step: 284660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:23:27,974-Speed 5181.44 samples/sec Loss 0.4839 LearningRate 0.0022 Epoch: 17 Global Step: 284670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:23:29,946-Speed 5193.19 samples/sec Loss 0.5247 LearningRate 0.0022 Epoch: 17 Global Step: 284680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:23:31,925-Speed 5176.82 samples/sec Loss 0.4856 LearningRate 0.0022 Epoch: 17 Global Step: 284690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:23:33,908-Speed 5166.19 samples/sec Loss 0.4794 LearningRate 0.0022 Epoch: 17 Global Step: 284700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:23:35,885-Speed 5180.80 samples/sec Loss 0.5020 LearningRate 0.0022 Epoch: 17 Global Step: 284710 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:23:37,863-Speed 5180.08 samples/sec Loss 0.5076 LearningRate 0.0022 Epoch: 17 Global Step: 284720 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:23:39,843-Speed 5172.00 samples/sec Loss 0.4999 LearningRate 0.0022 Epoch: 17 Global Step: 284730 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:23:41,810-Speed 5207.77 samples/sec Loss 0.4823 LearningRate 0.0022 Epoch: 17 Global Step: 284740 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:23:43,777-Speed 5206.86 samples/sec Loss 0.5270 LearningRate 0.0022 Epoch: 17 Global Step: 284750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:23:45,754-Speed 5181.14 samples/sec Loss 0.4910 LearningRate 0.0022 Epoch: 17 Global Step: 284760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:23:47,740-Speed 5159.94 samples/sec Loss 0.4984 LearningRate 0.0022 Epoch: 17 Global Step: 284770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:23:49,718-Speed 5176.92 samples/sec Loss 0.5088 LearningRate 0.0022 Epoch: 17 Global Step: 284780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:23:51,691-Speed 5194.33 samples/sec Loss 0.5063 LearningRate 0.0022 Epoch: 17 Global Step: 284790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:23:53,664-Speed 5191.27 samples/sec Loss 0.4700 LearningRate 0.0022 Epoch: 17 Global Step: 284800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:23:55,638-Speed 5188.60 samples/sec Loss 0.4647 LearningRate 0.0022 Epoch: 17 Global Step: 284810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:23:57,611-Speed 5190.69 samples/sec Loss 0.4884 LearningRate 0.0022 Epoch: 17 Global Step: 284820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:23:59,591-Speed 5175.23 samples/sec Loss 0.4972 LearningRate 0.0022 Epoch: 17 Global Step: 284830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:24:01,573-Speed 5168.91 samples/sec Loss 0.5141 LearningRate 0.0022 Epoch: 17 Global Step: 284840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:24:03,547-Speed 5186.73 samples/sec Loss 0.4748 LearningRate 0.0022 Epoch: 17 Global Step: 284850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:24:05,545-Speed 5128.40 samples/sec Loss 0.5003 LearningRate 0.0022 Epoch: 17 Global Step: 284860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:24:07,523-Speed 5177.81 samples/sec Loss 0.4862 LearningRate 0.0022 Epoch: 17 Global Step: 284870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:24:09,493-Speed 5202.00 samples/sec Loss 0.4822 LearningRate 0.0021 Epoch: 17 Global Step: 284880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:24:11,471-Speed 5176.53 samples/sec Loss 0.4910 LearningRate 0.0021 Epoch: 17 Global Step: 284890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:24:13,448-Speed 5182.58 samples/sec Loss 0.4728 LearningRate 0.0021 Epoch: 17 Global Step: 284900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:24:15,432-Speed 5162.83 samples/sec Loss 0.4855 LearningRate 0.0021 Epoch: 17 Global Step: 284910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:24:17,410-Speed 5178.47 samples/sec Loss 0.5010 LearningRate 0.0021 Epoch: 17 Global Step: 284920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:24:19,380-Speed 5200.31 samples/sec Loss 0.4978 LearningRate 0.0021 Epoch: 17 Global Step: 284930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:24:21,358-Speed 5178.07 samples/sec Loss 0.5031 LearningRate 0.0021 Epoch: 17 Global Step: 284940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:24:23,345-Speed 5155.40 samples/sec Loss 0.4811 LearningRate 0.0021 Epoch: 17 Global Step: 284950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:24:25,338-Speed 5141.76 samples/sec Loss 0.5124 LearningRate 0.0021 Epoch: 17 Global Step: 284960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:24:27,321-Speed 5164.73 samples/sec Loss 0.4947 LearningRate 0.0021 Epoch: 17 Global Step: 284970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:24:29,286-Speed 5214.67 samples/sec Loss 0.4867 LearningRate 0.0021 Epoch: 17 Global Step: 284980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:24:31,261-Speed 5184.71 samples/sec Loss 0.5249 LearningRate 0.0021 Epoch: 17 Global Step: 284990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:24:33,259-Speed 5127.49 samples/sec Loss 0.5381 LearningRate 0.0021 Epoch: 17 Global Step: 285000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:24:35,240-Speed 5169.99 samples/sec Loss 0.5043 LearningRate 0.0021 Epoch: 17 Global Step: 285010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:24:37,222-Speed 5168.59 samples/sec Loss 0.4858 LearningRate 0.0021 Epoch: 17 Global Step: 285020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:24:39,194-Speed 5193.34 samples/sec Loss 0.5077 LearningRate 0.0021 Epoch: 17 Global Step: 285030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:24:41,168-Speed 5190.33 samples/sec Loss 0.4989 LearningRate 0.0021 Epoch: 17 Global Step: 285040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:24:43,162-Speed 5139.82 samples/sec Loss 0.4915 LearningRate 0.0021 Epoch: 17 Global Step: 285050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:24:45,156-Speed 5138.97 samples/sec Loss 0.4847 LearningRate 0.0021 Epoch: 17 Global Step: 285060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:24:47,127-Speed 5198.14 samples/sec Loss 0.5029 LearningRate 0.0021 Epoch: 17 Global Step: 285070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:24:49,136-Speed 5097.91 samples/sec Loss 0.4863 LearningRate 0.0021 Epoch: 17 Global Step: 285080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:24:51,114-Speed 5178.49 samples/sec Loss 0.5095 LearningRate 0.0021 Epoch: 17 Global Step: 285090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:24:53,100-Speed 5157.04 samples/sec Loss 0.5062 LearningRate 0.0021 Epoch: 17 Global Step: 285100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:24:55,068-Speed 5205.88 samples/sec Loss 0.5209 LearningRate 0.0021 Epoch: 17 Global Step: 285110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:24:57,050-Speed 5167.81 samples/sec Loss 0.5109 LearningRate 0.0021 Epoch: 17 Global Step: 285120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:24:59,034-Speed 5163.38 samples/sec Loss 0.5133 LearningRate 0.0021 Epoch: 17 Global Step: 285130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:25:01,013-Speed 5175.93 samples/sec Loss 0.5014 LearningRate 0.0021 Epoch: 17 Global Step: 285140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:25:03,004-Speed 5145.30 samples/sec Loss 0.4863 LearningRate 0.0021 Epoch: 17 Global Step: 285150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:25:04,983-Speed 5175.24 samples/sec Loss 0.5224 LearningRate 0.0021 Epoch: 17 Global Step: 285160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:25:06,974-Speed 5145.54 samples/sec Loss 0.5131 LearningRate 0.0021 Epoch: 17 Global Step: 285170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:25:08,947-Speed 5192.28 samples/sec Loss 0.5027 LearningRate 0.0021 Epoch: 17 Global Step: 285180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:25:10,935-Speed 5152.13 samples/sec Loss 0.5101 LearningRate 0.0021 Epoch: 17 Global Step: 285190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:25:12,929-Speed 5137.12 samples/sec Loss 0.5257 LearningRate 0.0021 Epoch: 17 Global Step: 285200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:25:14,902-Speed 5191.24 samples/sec Loss 0.5170 LearningRate 0.0021 Epoch: 17 Global Step: 285210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:25:16,896-Speed 5137.45 samples/sec Loss 0.4876 LearningRate 0.0021 Epoch: 17 Global Step: 285220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:25:18,871-Speed 5186.13 samples/sec Loss 0.4957 LearningRate 0.0021 Epoch: 17 Global Step: 285230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:25:20,863-Speed 5142.97 samples/sec Loss 0.5091 LearningRate 0.0021 Epoch: 17 Global Step: 285240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:25:22,835-Speed 5195.37 samples/sec Loss 0.4966 LearningRate 0.0021 Epoch: 17 Global Step: 285250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:25:24,812-Speed 5180.14 samples/sec Loss 0.4952 LearningRate 0.0021 Epoch: 17 Global Step: 285260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:25:26,795-Speed 5167.81 samples/sec Loss 0.5206 LearningRate 0.0021 Epoch: 17 Global Step: 285270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:25:28,771-Speed 5183.53 samples/sec Loss 0.4845 LearningRate 0.0021 Epoch: 17 Global Step: 285280 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:25:30,741-Speed 5199.68 samples/sec Loss 0.5248 LearningRate 0.0021 Epoch: 17 Global Step: 285290 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:25:32,713-Speed 5195.10 samples/sec Loss 0.5101 LearningRate 0.0021 Epoch: 17 Global Step: 285300 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:25:34,706-Speed 5138.69 samples/sec Loss 0.5067 LearningRate 0.0021 Epoch: 17 Global Step: 285310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:25:36,692-Speed 5156.94 samples/sec Loss 0.4958 LearningRate 0.0021 Epoch: 17 Global Step: 285320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:25:38,671-Speed 5175.82 samples/sec Loss 0.5118 LearningRate 0.0021 Epoch: 17 Global Step: 285330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:25:40,661-Speed 5148.58 samples/sec Loss 0.5363 LearningRate 0.0021 Epoch: 17 Global Step: 285340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:25:42,649-Speed 5151.56 samples/sec Loss 0.5136 LearningRate 0.0021 Epoch: 17 Global Step: 285350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:25:44,643-Speed 5137.97 samples/sec Loss 0.4925 LearningRate 0.0021 Epoch: 17 Global Step: 285360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:25:46,625-Speed 5167.52 samples/sec Loss 0.5089 LearningRate 0.0021 Epoch: 17 Global Step: 285370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:25:48,596-Speed 5199.31 samples/sec Loss 0.4912 LearningRate 0.0021 Epoch: 17 Global Step: 285380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:25:50,574-Speed 5178.25 samples/sec Loss 0.4982 LearningRate 0.0021 Epoch: 17 Global Step: 285390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:25:52,553-Speed 5175.23 samples/sec Loss 0.4772 LearningRate 0.0021 Epoch: 17 Global Step: 285400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:25:54,540-Speed 5155.93 samples/sec Loss 0.4915 LearningRate 0.0021 Epoch: 17 Global Step: 285410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:25:56,510-Speed 5200.18 samples/sec Loss 0.4898 LearningRate 0.0021 Epoch: 17 Global Step: 285420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:25:58,498-Speed 5152.97 samples/sec Loss 0.5155 LearningRate 0.0021 Epoch: 17 Global Step: 285430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:26:00,478-Speed 5172.00 samples/sec Loss 0.5101 LearningRate 0.0021 Epoch: 17 Global Step: 285440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:26:02,477-Speed 5123.80 samples/sec Loss 0.4789 LearningRate 0.0021 Epoch: 17 Global Step: 285450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:26:04,479-Speed 5117.53 samples/sec Loss 0.4971 LearningRate 0.0021 Epoch: 17 Global Step: 285460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:26:06,472-Speed 5141.18 samples/sec Loss 0.4898 LearningRate 0.0021 Epoch: 17 Global Step: 285470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:26:08,461-Speed 5149.12 samples/sec Loss 0.5077 LearningRate 0.0021 Epoch: 17 Global Step: 285480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:26:10,459-Speed 5128.19 samples/sec Loss 0.5255 LearningRate 0.0021 Epoch: 17 Global Step: 285490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:26:12,449-Speed 5146.79 samples/sec Loss 0.4945 LearningRate 0.0021 Epoch: 17 Global Step: 285500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:26:14,441-Speed 5141.54 samples/sec Loss 0.5024 LearningRate 0.0021 Epoch: 17 Global Step: 285510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:26:16,417-Speed 5183.95 samples/sec Loss 0.4925 LearningRate 0.0021 Epoch: 17 Global Step: 285520 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:26:18,394-Speed 5182.20 samples/sec Loss 0.4963 LearningRate 0.0021 Epoch: 17 Global Step: 285530 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:26:20,366-Speed 5194.71 samples/sec Loss 0.5149 LearningRate 0.0021 Epoch: 17 Global Step: 285540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:26:22,356-Speed 5149.17 samples/sec Loss 0.4949 LearningRate 0.0021 Epoch: 17 Global Step: 285550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:26:24,369-Speed 5087.24 samples/sec Loss 0.5040 LearningRate 0.0021 Epoch: 17 Global Step: 285560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:26:26,356-Speed 5157.21 samples/sec Loss 0.5174 LearningRate 0.0021 Epoch: 17 Global Step: 285570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:26:28,365-Speed 5099.22 samples/sec Loss 0.5028 LearningRate 0.0021 Epoch: 17 Global Step: 285580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:26:30,353-Speed 5152.89 samples/sec Loss 0.4944 LearningRate 0.0021 Epoch: 17 Global Step: 285590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:26:32,324-Speed 5195.80 samples/sec Loss 0.4874 LearningRate 0.0021 Epoch: 17 Global Step: 285600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:26:34,308-Speed 5165.37 samples/sec Loss 0.4799 LearningRate 0.0021 Epoch: 17 Global Step: 285610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:26:36,281-Speed 5189.56 samples/sec Loss 0.4954 LearningRate 0.0021 Epoch: 17 Global Step: 285620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:26:38,284-Speed 5114.13 samples/sec Loss 0.4767 LearningRate 0.0021 Epoch: 17 Global Step: 285630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:26:40,274-Speed 5147.37 samples/sec Loss 0.5123 LearningRate 0.0021 Epoch: 17 Global Step: 285640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:26:42,256-Speed 5173.58 samples/sec Loss 0.4977 LearningRate 0.0021 Epoch: 17 Global Step: 285650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:26:44,230-Speed 5190.97 samples/sec Loss 0.4902 LearningRate 0.0021 Epoch: 17 Global Step: 285660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:26:46,246-Speed 5079.97 samples/sec Loss 0.4880 LearningRate 0.0021 Epoch: 17 Global Step: 285670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:26:48,228-Speed 5169.01 samples/sec Loss 0.5125 LearningRate 0.0021 Epoch: 17 Global Step: 285680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:26:50,209-Speed 5172.26 samples/sec Loss 0.4946 LearningRate 0.0021 Epoch: 17 Global Step: 285690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:26:52,197-Speed 5152.42 samples/sec Loss 0.4946 LearningRate 0.0021 Epoch: 17 Global Step: 285700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:26:54,182-Speed 5159.34 samples/sec Loss 0.5026 LearningRate 0.0021 Epoch: 17 Global Step: 285710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:26:56,176-Speed 5139.72 samples/sec Loss 0.5125 LearningRate 0.0021 Epoch: 17 Global Step: 285720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:26:58,176-Speed 5123.76 samples/sec Loss 0.5127 LearningRate 0.0021 Epoch: 17 Global Step: 285730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:27:00,171-Speed 5134.95 samples/sec Loss 0.5085 LearningRate 0.0021 Epoch: 17 Global Step: 285740 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:27:02,156-Speed 5158.28 samples/sec Loss 0.4782 LearningRate 0.0021 Epoch: 17 Global Step: 285750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:27:04,143-Speed 5157.74 samples/sec Loss 0.5141 LearningRate 0.0021 Epoch: 17 Global Step: 285760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:27:06,139-Speed 5131.72 samples/sec Loss 0.5157 LearningRate 0.0021 Epoch: 17 Global Step: 285770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:27:08,115-Speed 5181.80 samples/sec Loss 0.5124 LearningRate 0.0021 Epoch: 17 Global Step: 285780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:27:10,101-Speed 5158.53 samples/sec Loss 0.4797 LearningRate 0.0021 Epoch: 17 Global Step: 285790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:27:12,107-Speed 5105.33 samples/sec Loss 0.5194 LearningRate 0.0021 Epoch: 17 Global Step: 285800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:27:14,110-Speed 5115.91 samples/sec Loss 0.5064 LearningRate 0.0021 Epoch: 17 Global Step: 285810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:27:16,096-Speed 5156.58 samples/sec Loss 0.4895 LearningRate 0.0021 Epoch: 17 Global Step: 285820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:27:18,096-Speed 5124.02 samples/sec Loss 0.4949 LearningRate 0.0021 Epoch: 17 Global Step: 285830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:27:20,120-Speed 5061.62 samples/sec Loss 0.4810 LearningRate 0.0021 Epoch: 17 Global Step: 285840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:27:22,096-Speed 5184.22 samples/sec Loss 0.5358 LearningRate 0.0021 Epoch: 17 Global Step: 285850 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:27:24,073-Speed 5180.88 samples/sec Loss 0.5168 LearningRate 0.0021 Epoch: 17 Global Step: 285860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:27:26,048-Speed 5186.80 samples/sec Loss 0.5361 LearningRate 0.0021 Epoch: 17 Global Step: 285870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:27:28,030-Speed 5167.31 samples/sec Loss 0.5270 LearningRate 0.0021 Epoch: 17 Global Step: 285880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:27:30,036-Speed 5107.29 samples/sec Loss 0.5026 LearningRate 0.0021 Epoch: 17 Global Step: 285890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:27:32,009-Speed 5191.89 samples/sec Loss 0.4972 LearningRate 0.0021 Epoch: 17 Global Step: 285900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:27:33,989-Speed 5174.60 samples/sec Loss 0.5173 LearningRate 0.0021 Epoch: 17 Global Step: 285910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:27:35,986-Speed 5131.63 samples/sec Loss 0.4791 LearningRate 0.0021 Epoch: 17 Global Step: 285920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:27:37,964-Speed 5177.15 samples/sec Loss 0.5400 LearningRate 0.0021 Epoch: 17 Global Step: 285930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:27:39,962-Speed 5127.84 samples/sec Loss 0.5375 LearningRate 0.0021 Epoch: 17 Global Step: 285940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:27:41,946-Speed 5163.39 samples/sec Loss 0.4978 LearningRate 0.0021 Epoch: 17 Global Step: 285950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:27:43,921-Speed 5186.51 samples/sec Loss 0.5083 LearningRate 0.0021 Epoch: 17 Global Step: 285960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:27:45,909-Speed 5153.10 samples/sec Loss 0.4789 LearningRate 0.0021 Epoch: 17 Global Step: 285970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:27:47,891-Speed 5169.33 samples/sec Loss 0.4917 LearningRate 0.0021 Epoch: 17 Global Step: 285980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:27:49,898-Speed 5101.37 samples/sec Loss 0.5058 LearningRate 0.0021 Epoch: 17 Global Step: 285990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:27:51,911-Speed 5091.22 samples/sec Loss 0.4936 LearningRate 0.0021 Epoch: 17 Global Step: 286000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:28:18,713-[lfw][286000]XNorm: 21.529807 Training: 2022-04-11 18:28:18,714-[lfw][286000]Accuracy-Flip: 0.99783+-0.00259 Training: 2022-04-11 18:28:18,714-[lfw][286000]Accuracy-Highest: 0.99833 Training: 2022-04-11 18:28:49,485-[cfp_fp][286000]XNorm: 21.877538 Training: 2022-04-11 18:28:49,486-[cfp_fp][286000]Accuracy-Flip: 0.98871+-0.00391 Training: 2022-04-11 18:28:49,486-[cfp_fp][286000]Accuracy-Highest: 0.99000 Training: 2022-04-11 18:29:16,074-[agedb_30][286000]XNorm: 22.699843 Training: 2022-04-11 18:29:16,075-[agedb_30][286000]Accuracy-Flip: 0.98300+-0.00653 Training: 2022-04-11 18:29:16,075-[agedb_30][286000]Accuracy-Highest: 0.98333 Training: 2022-04-11 18:29:18,061-Speed 118.86 samples/sec Loss 0.5406 LearningRate 0.0021 Epoch: 17 Global Step: 286010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:29:20,021-Speed 5224.37 samples/sec Loss 0.5091 LearningRate 0.0021 Epoch: 17 Global Step: 286020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:29:21,998-Speed 5181.55 samples/sec Loss 0.5206 LearningRate 0.0020 Epoch: 17 Global Step: 286030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:29:23,959-Speed 5223.67 samples/sec Loss 0.5160 LearningRate 0.0020 Epoch: 17 Global Step: 286040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:29:25,951-Speed 5144.40 samples/sec Loss 0.5177 LearningRate 0.0020 Epoch: 17 Global Step: 286050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:29:27,918-Speed 5207.67 samples/sec Loss 0.4839 LearningRate 0.0020 Epoch: 17 Global Step: 286060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:29:29,893-Speed 5187.32 samples/sec Loss 0.5118 LearningRate 0.0020 Epoch: 17 Global Step: 286070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:29:31,857-Speed 5215.20 samples/sec Loss 0.5117 LearningRate 0.0020 Epoch: 17 Global Step: 286080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:29:33,838-Speed 5169.71 samples/sec Loss 0.5053 LearningRate 0.0020 Epoch: 17 Global Step: 286090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:29:35,802-Speed 5217.00 samples/sec Loss 0.5060 LearningRate 0.0020 Epoch: 17 Global Step: 286100 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:29:37,765-Speed 5217.53 samples/sec Loss 0.5386 LearningRate 0.0020 Epoch: 17 Global Step: 286110 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:29:39,747-Speed 5170.96 samples/sec Loss 0.4879 LearningRate 0.0020 Epoch: 17 Global Step: 286120 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:29:41,712-Speed 5213.04 samples/sec Loss 0.5176 LearningRate 0.0020 Epoch: 17 Global Step: 286130 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:29:43,677-Speed 5212.27 samples/sec Loss 0.4912 LearningRate 0.0020 Epoch: 17 Global Step: 286140 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:29:45,654-Speed 5179.94 samples/sec Loss 0.4944 LearningRate 0.0020 Epoch: 17 Global Step: 286150 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:29:47,629-Speed 5187.46 samples/sec Loss 0.4913 LearningRate 0.0020 Epoch: 17 Global Step: 286160 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:29:49,615-Speed 5157.20 samples/sec Loss 0.5213 LearningRate 0.0020 Epoch: 17 Global Step: 286170 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:29:51,579-Speed 5215.82 samples/sec Loss 0.5056 LearningRate 0.0020 Epoch: 17 Global Step: 286180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:29:53,558-Speed 5176.11 samples/sec Loss 0.5110 LearningRate 0.0020 Epoch: 17 Global Step: 286190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:29:55,552-Speed 5138.10 samples/sec Loss 0.4579 LearningRate 0.0020 Epoch: 17 Global Step: 286200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:29:57,526-Speed 5187.63 samples/sec Loss 0.4957 LearningRate 0.0020 Epoch: 17 Global Step: 286210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:29:59,497-Speed 5199.04 samples/sec Loss 0.5367 LearningRate 0.0020 Epoch: 17 Global Step: 286220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:30:01,485-Speed 5151.46 samples/sec Loss 0.5124 LearningRate 0.0020 Epoch: 17 Global Step: 286230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:30:03,456-Speed 5198.91 samples/sec Loss 0.5048 LearningRate 0.0020 Epoch: 17 Global Step: 286240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:30:05,429-Speed 5189.39 samples/sec Loss 0.4773 LearningRate 0.0020 Epoch: 17 Global Step: 286250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:30:07,406-Speed 5183.00 samples/sec Loss 0.5308 LearningRate 0.0020 Epoch: 17 Global Step: 286260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:30:09,375-Speed 5202.53 samples/sec Loss 0.5226 LearningRate 0.0020 Epoch: 17 Global Step: 286270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:30:11,354-Speed 5176.14 samples/sec Loss 0.5106 LearningRate 0.0020 Epoch: 17 Global Step: 286280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:30:13,371-Speed 5078.34 samples/sec Loss 0.4823 LearningRate 0.0020 Epoch: 17 Global Step: 286290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:30:15,359-Speed 5150.98 samples/sec Loss 0.4991 LearningRate 0.0020 Epoch: 17 Global Step: 286300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:30:17,341-Speed 5168.04 samples/sec Loss 0.4831 LearningRate 0.0020 Epoch: 17 Global Step: 286310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:30:19,346-Speed 5109.78 samples/sec Loss 0.5024 LearningRate 0.0020 Epoch: 17 Global Step: 286320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:30:21,314-Speed 5208.11 samples/sec Loss 0.5361 LearningRate 0.0020 Epoch: 17 Global Step: 286330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:30:23,287-Speed 5191.02 samples/sec Loss 0.5122 LearningRate 0.0020 Epoch: 17 Global Step: 286340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:30:25,262-Speed 5186.43 samples/sec Loss 0.5162 LearningRate 0.0020 Epoch: 17 Global Step: 286350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:30:27,227-Speed 5210.93 samples/sec Loss 0.5219 LearningRate 0.0020 Epoch: 17 Global Step: 286360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:30:29,197-Speed 5200.65 samples/sec Loss 0.5102 LearningRate 0.0020 Epoch: 17 Global Step: 286370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:30:31,176-Speed 5177.53 samples/sec Loss 0.4886 LearningRate 0.0020 Epoch: 17 Global Step: 286380 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:30:33,154-Speed 5178.56 samples/sec Loss 0.5195 LearningRate 0.0020 Epoch: 17 Global Step: 286390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:30:35,131-Speed 5181.74 samples/sec Loss 0.4987 LearningRate 0.0020 Epoch: 17 Global Step: 286400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:30:37,094-Speed 5217.01 samples/sec Loss 0.5132 LearningRate 0.0020 Epoch: 17 Global Step: 286410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:30:39,080-Speed 5158.85 samples/sec Loss 0.5038 LearningRate 0.0020 Epoch: 17 Global Step: 286420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:30:41,065-Speed 5160.48 samples/sec Loss 0.5341 LearningRate 0.0020 Epoch: 17 Global Step: 286430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:30:43,033-Speed 5204.56 samples/sec Loss 0.4949 LearningRate 0.0020 Epoch: 17 Global Step: 286440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:30:45,017-Speed 5162.10 samples/sec Loss 0.5207 LearningRate 0.0020 Epoch: 17 Global Step: 286450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:30:47,009-Speed 5141.49 samples/sec Loss 0.5305 LearningRate 0.0020 Epoch: 17 Global Step: 286460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:30:48,993-Speed 5165.04 samples/sec Loss 0.5112 LearningRate 0.0020 Epoch: 17 Global Step: 286470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:30:50,969-Speed 5181.22 samples/sec Loss 0.4972 LearningRate 0.0020 Epoch: 17 Global Step: 286480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:30:52,940-Speed 5198.95 samples/sec Loss 0.4988 LearningRate 0.0020 Epoch: 17 Global Step: 286490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:30:54,923-Speed 5165.87 samples/sec Loss 0.5205 LearningRate 0.0020 Epoch: 17 Global Step: 286500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:30:56,896-Speed 5192.98 samples/sec Loss 0.4918 LearningRate 0.0020 Epoch: 17 Global Step: 286510 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:30:58,889-Speed 5139.01 samples/sec Loss 0.5090 LearningRate 0.0020 Epoch: 17 Global Step: 286520 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:31:00,885-Speed 5133.41 samples/sec Loss 0.4888 LearningRate 0.0020 Epoch: 17 Global Step: 286530 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:31:02,884-Speed 5122.50 samples/sec Loss 0.5053 LearningRate 0.0020 Epoch: 17 Global Step: 286540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:31:04,852-Speed 5205.50 samples/sec Loss 0.4908 LearningRate 0.0020 Epoch: 17 Global Step: 286550 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:31:06,835-Speed 5164.59 samples/sec Loss 0.5155 LearningRate 0.0020 Epoch: 17 Global Step: 286560 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:31:08,814-Speed 5176.01 samples/sec Loss 0.4900 LearningRate 0.0020 Epoch: 17 Global Step: 286570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:31:10,801-Speed 5154.58 samples/sec Loss 0.4900 LearningRate 0.0020 Epoch: 17 Global Step: 286580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:31:12,769-Speed 5207.75 samples/sec Loss 0.5190 LearningRate 0.0020 Epoch: 17 Global Step: 286590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:31:14,743-Speed 5189.42 samples/sec Loss 0.5095 LearningRate 0.0020 Epoch: 17 Global Step: 286600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:31:16,725-Speed 5169.97 samples/sec Loss 0.5147 LearningRate 0.0020 Epoch: 17 Global Step: 286610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:31:18,689-Speed 5214.63 samples/sec Loss 0.4830 LearningRate 0.0020 Epoch: 17 Global Step: 286620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:31:20,654-Speed 5212.76 samples/sec Loss 0.5274 LearningRate 0.0020 Epoch: 17 Global Step: 286630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:31:22,628-Speed 5190.83 samples/sec Loss 0.4825 LearningRate 0.0020 Epoch: 17 Global Step: 286640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:31:24,608-Speed 5173.12 samples/sec Loss 0.5333 LearningRate 0.0020 Epoch: 17 Global Step: 286650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:31:26,593-Speed 5159.31 samples/sec Loss 0.5176 LearningRate 0.0020 Epoch: 17 Global Step: 286660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:31:28,580-Speed 5156.33 samples/sec Loss 0.4862 LearningRate 0.0020 Epoch: 17 Global Step: 286670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:31:30,547-Speed 5208.65 samples/sec Loss 0.5020 LearningRate 0.0020 Epoch: 17 Global Step: 286680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:31:32,525-Speed 5178.11 samples/sec Loss 0.4842 LearningRate 0.0020 Epoch: 17 Global Step: 286690 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:31:34,495-Speed 5199.29 samples/sec Loss 0.5319 LearningRate 0.0020 Epoch: 17 Global Step: 286700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:31:36,478-Speed 5165.31 samples/sec Loss 0.5396 LearningRate 0.0020 Epoch: 17 Global Step: 286710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:31:38,449-Speed 5197.36 samples/sec Loss 0.4983 LearningRate 0.0020 Epoch: 17 Global Step: 286720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:31:40,434-Speed 5160.35 samples/sec Loss 0.4946 LearningRate 0.0020 Epoch: 17 Global Step: 286730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:31:42,429-Speed 5135.11 samples/sec Loss 0.5034 LearningRate 0.0020 Epoch: 17 Global Step: 286740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:31:44,403-Speed 5189.16 samples/sec Loss 0.5153 LearningRate 0.0020 Epoch: 17 Global Step: 286750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:31:46,383-Speed 5173.27 samples/sec Loss 0.5236 LearningRate 0.0020 Epoch: 17 Global Step: 286760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:31:48,383-Speed 5123.53 samples/sec Loss 0.5074 LearningRate 0.0020 Epoch: 17 Global Step: 286770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:31:50,359-Speed 5183.69 samples/sec Loss 0.5086 LearningRate 0.0020 Epoch: 17 Global Step: 286780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:31:52,335-Speed 5185.24 samples/sec Loss 0.5444 LearningRate 0.0020 Epoch: 17 Global Step: 286790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:31:54,312-Speed 5180.86 samples/sec Loss 0.5448 LearningRate 0.0020 Epoch: 17 Global Step: 286800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:31:56,292-Speed 5174.12 samples/sec Loss 0.5004 LearningRate 0.0020 Epoch: 17 Global Step: 286810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:31:58,275-Speed 5164.63 samples/sec Loss 0.5148 LearningRate 0.0020 Epoch: 17 Global Step: 286820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:32:00,255-Speed 5175.59 samples/sec Loss 0.5022 LearningRate 0.0020 Epoch: 17 Global Step: 286830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:32:02,226-Speed 5196.07 samples/sec Loss 0.5207 LearningRate 0.0020 Epoch: 17 Global Step: 286840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:32:04,204-Speed 5179.31 samples/sec Loss 0.5182 LearningRate 0.0020 Epoch: 17 Global Step: 286850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:32:06,173-Speed 5201.80 samples/sec Loss 0.4908 LearningRate 0.0020 Epoch: 17 Global Step: 286860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:32:08,146-Speed 5192.01 samples/sec Loss 0.5159 LearningRate 0.0020 Epoch: 17 Global Step: 286870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:32:10,168-Speed 5065.72 samples/sec Loss 0.5046 LearningRate 0.0020 Epoch: 17 Global Step: 286880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:32:12,165-Speed 5131.99 samples/sec Loss 0.4992 LearningRate 0.0020 Epoch: 17 Global Step: 286890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:32:14,148-Speed 5163.68 samples/sec Loss 0.4807 LearningRate 0.0020 Epoch: 17 Global Step: 286900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:32:16,160-Speed 5093.33 samples/sec Loss 0.5105 LearningRate 0.0020 Epoch: 17 Global Step: 286910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:32:18,127-Speed 5207.58 samples/sec Loss 0.5058 LearningRate 0.0020 Epoch: 17 Global Step: 286920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:32:20,094-Speed 5207.88 samples/sec Loss 0.5087 LearningRate 0.0020 Epoch: 17 Global Step: 286930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:32:22,094-Speed 5122.46 samples/sec Loss 0.4807 LearningRate 0.0020 Epoch: 17 Global Step: 286940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:32:24,065-Speed 5198.87 samples/sec Loss 0.4938 LearningRate 0.0020 Epoch: 17 Global Step: 286950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:32:26,034-Speed 5201.35 samples/sec Loss 0.4986 LearningRate 0.0020 Epoch: 17 Global Step: 286960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:32:28,027-Speed 5139.59 samples/sec Loss 0.5337 LearningRate 0.0020 Epoch: 17 Global Step: 286970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:32:30,008-Speed 5170.87 samples/sec Loss 0.4965 LearningRate 0.0020 Epoch: 17 Global Step: 286980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:32:31,989-Speed 5171.66 samples/sec Loss 0.5147 LearningRate 0.0020 Epoch: 17 Global Step: 286990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:32:33,971-Speed 5168.12 samples/sec Loss 0.5238 LearningRate 0.0020 Epoch: 17 Global Step: 287000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:32:35,959-Speed 5153.30 samples/sec Loss 0.5080 LearningRate 0.0020 Epoch: 17 Global Step: 287010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:32:37,949-Speed 5147.73 samples/sec Loss 0.4967 LearningRate 0.0020 Epoch: 17 Global Step: 287020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:32:39,925-Speed 5184.70 samples/sec Loss 0.4926 LearningRate 0.0020 Epoch: 17 Global Step: 287030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:32:41,913-Speed 5153.06 samples/sec Loss 0.5189 LearningRate 0.0020 Epoch: 17 Global Step: 287040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:32:43,889-Speed 5183.91 samples/sec Loss 0.4995 LearningRate 0.0020 Epoch: 17 Global Step: 287050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:32:45,878-Speed 5150.16 samples/sec Loss 0.5247 LearningRate 0.0020 Epoch: 17 Global Step: 287060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:32:47,885-Speed 5105.49 samples/sec Loss 0.5262 LearningRate 0.0020 Epoch: 17 Global Step: 287070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:32:49,875-Speed 5146.65 samples/sec Loss 0.5233 LearningRate 0.0020 Epoch: 17 Global Step: 287080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:32:51,850-Speed 5186.03 samples/sec Loss 0.5131 LearningRate 0.0020 Epoch: 17 Global Step: 287090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:32:53,839-Speed 5151.82 samples/sec Loss 0.5361 LearningRate 0.0020 Epoch: 17 Global Step: 287100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:32:55,808-Speed 5200.29 samples/sec Loss 0.4584 LearningRate 0.0020 Epoch: 17 Global Step: 287110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:32:57,798-Speed 5150.33 samples/sec Loss 0.5210 LearningRate 0.0020 Epoch: 17 Global Step: 287120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:32:59,790-Speed 5141.52 samples/sec Loss 0.5090 LearningRate 0.0020 Epoch: 17 Global Step: 287130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:33:01,806-Speed 5081.52 samples/sec Loss 0.5318 LearningRate 0.0020 Epoch: 17 Global Step: 287140 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:33:03,781-Speed 5187.91 samples/sec Loss 0.5058 LearningRate 0.0020 Epoch: 17 Global Step: 287150 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:33:05,759-Speed 5179.36 samples/sec Loss 0.4938 LearningRate 0.0020 Epoch: 17 Global Step: 287160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:33:07,747-Speed 5153.26 samples/sec Loss 0.5298 LearningRate 0.0020 Epoch: 17 Global Step: 287170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:33:09,716-Speed 5200.43 samples/sec Loss 0.5075 LearningRate 0.0020 Epoch: 17 Global Step: 287180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:33:11,699-Speed 5167.44 samples/sec Loss 0.5323 LearningRate 0.0020 Epoch: 17 Global Step: 287190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:33:13,669-Speed 5198.36 samples/sec Loss 0.5349 LearningRate 0.0020 Epoch: 17 Global Step: 287200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:33:15,679-Speed 5096.16 samples/sec Loss 0.5550 LearningRate 0.0019 Epoch: 17 Global Step: 287210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:33:17,644-Speed 5213.40 samples/sec Loss 0.4979 LearningRate 0.0019 Epoch: 17 Global Step: 287220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:33:19,616-Speed 5195.00 samples/sec Loss 0.5214 LearningRate 0.0019 Epoch: 17 Global Step: 287230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:33:21,602-Speed 5158.79 samples/sec Loss 0.4979 LearningRate 0.0019 Epoch: 17 Global Step: 287240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:33:23,585-Speed 5166.35 samples/sec Loss 0.5232 LearningRate 0.0019 Epoch: 17 Global Step: 287250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:33:25,586-Speed 5118.61 samples/sec Loss 0.5384 LearningRate 0.0019 Epoch: 17 Global Step: 287260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:33:27,567-Speed 5170.53 samples/sec Loss 0.5351 LearningRate 0.0019 Epoch: 17 Global Step: 287270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:33:29,537-Speed 5198.49 samples/sec Loss 0.5357 LearningRate 0.0019 Epoch: 17 Global Step: 287280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:33:31,523-Speed 5158.83 samples/sec Loss 0.5094 LearningRate 0.0019 Epoch: 17 Global Step: 287290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:33:33,494-Speed 5196.38 samples/sec Loss 0.5085 LearningRate 0.0019 Epoch: 17 Global Step: 287300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:33:35,474-Speed 5173.40 samples/sec Loss 0.4888 LearningRate 0.0019 Epoch: 17 Global Step: 287310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:33:37,463-Speed 5149.94 samples/sec Loss 0.5172 LearningRate 0.0019 Epoch: 17 Global Step: 287320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:33:39,461-Speed 5125.93 samples/sec Loss 0.4950 LearningRate 0.0019 Epoch: 17 Global Step: 287330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:33:41,466-Speed 5109.18 samples/sec Loss 0.5143 LearningRate 0.0019 Epoch: 17 Global Step: 287340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:33:43,435-Speed 5205.47 samples/sec Loss 0.5044 LearningRate 0.0019 Epoch: 17 Global Step: 287350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:33:45,408-Speed 5191.40 samples/sec Loss 0.5002 LearningRate 0.0019 Epoch: 17 Global Step: 287360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:33:47,391-Speed 5164.17 samples/sec Loss 0.5120 LearningRate 0.0019 Epoch: 17 Global Step: 287370 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:33:49,367-Speed 5185.07 samples/sec Loss 0.5194 LearningRate 0.0019 Epoch: 17 Global Step: 287380 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:33:51,343-Speed 5182.19 samples/sec Loss 0.4984 LearningRate 0.0019 Epoch: 17 Global Step: 287390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:33:53,329-Speed 5158.28 samples/sec Loss 0.5184 LearningRate 0.0019 Epoch: 17 Global Step: 287400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:33:55,299-Speed 5199.97 samples/sec Loss 0.5099 LearningRate 0.0019 Epoch: 17 Global Step: 287410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:33:57,282-Speed 5165.74 samples/sec Loss 0.5317 LearningRate 0.0019 Epoch: 17 Global Step: 287420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:33:59,254-Speed 5194.17 samples/sec Loss 0.5215 LearningRate 0.0019 Epoch: 17 Global Step: 287430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:34:01,232-Speed 5180.18 samples/sec Loss 0.5111 LearningRate 0.0019 Epoch: 17 Global Step: 287440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:34:03,220-Speed 5152.04 samples/sec Loss 0.4930 LearningRate 0.0019 Epoch: 17 Global Step: 287450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:34:05,192-Speed 5196.03 samples/sec Loss 0.4874 LearningRate 0.0019 Epoch: 17 Global Step: 287460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:34:07,170-Speed 5178.36 samples/sec Loss 0.4808 LearningRate 0.0019 Epoch: 17 Global Step: 287470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:34:09,145-Speed 5186.89 samples/sec Loss 0.5344 LearningRate 0.0019 Epoch: 17 Global Step: 287480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:34:11,129-Speed 5162.21 samples/sec Loss 0.5199 LearningRate 0.0019 Epoch: 17 Global Step: 287490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:34:13,105-Speed 5184.25 samples/sec Loss 0.5312 LearningRate 0.0019 Epoch: 17 Global Step: 287500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:34:15,103-Speed 5127.88 samples/sec Loss 0.5321 LearningRate 0.0019 Epoch: 17 Global Step: 287510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:34:17,086-Speed 5166.01 samples/sec Loss 0.5117 LearningRate 0.0019 Epoch: 17 Global Step: 287520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:34:19,060-Speed 5188.96 samples/sec Loss 0.5074 LearningRate 0.0019 Epoch: 17 Global Step: 287530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:34:21,036-Speed 5185.27 samples/sec Loss 0.5345 LearningRate 0.0019 Epoch: 17 Global Step: 287540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:34:23,015-Speed 5176.64 samples/sec Loss 0.4984 LearningRate 0.0019 Epoch: 17 Global Step: 287550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:34:25,000-Speed 5160.58 samples/sec Loss 0.4922 LearningRate 0.0019 Epoch: 17 Global Step: 287560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:34:26,971-Speed 5198.19 samples/sec Loss 0.5222 LearningRate 0.0019 Epoch: 17 Global Step: 287570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:34:28,942-Speed 5195.52 samples/sec Loss 0.4783 LearningRate 0.0019 Epoch: 17 Global Step: 287580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:34:30,923-Speed 5174.08 samples/sec Loss 0.4997 LearningRate 0.0019 Epoch: 17 Global Step: 287590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:34:32,895-Speed 5194.17 samples/sec Loss 0.4982 LearningRate 0.0019 Epoch: 17 Global Step: 287600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:34:34,868-Speed 5190.92 samples/sec Loss 0.5058 LearningRate 0.0019 Epoch: 17 Global Step: 287610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:34:36,865-Speed 5128.60 samples/sec Loss 0.5060 LearningRate 0.0019 Epoch: 17 Global Step: 287620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:34:38,855-Speed 5148.19 samples/sec Loss 0.4977 LearningRate 0.0019 Epoch: 17 Global Step: 287630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:34:40,849-Speed 5138.15 samples/sec Loss 0.5017 LearningRate 0.0019 Epoch: 17 Global Step: 287640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:34:42,840-Speed 5143.75 samples/sec Loss 0.5177 LearningRate 0.0019 Epoch: 17 Global Step: 287650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:34:44,813-Speed 5193.35 samples/sec Loss 0.5318 LearningRate 0.0019 Epoch: 17 Global Step: 287660 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:34:46,784-Speed 5196.59 samples/sec Loss 0.5332 LearningRate 0.0019 Epoch: 17 Global Step: 287670 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:34:48,762-Speed 5179.19 samples/sec Loss 0.5088 LearningRate 0.0019 Epoch: 17 Global Step: 287680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:34:50,727-Speed 5211.42 samples/sec Loss 0.5040 LearningRate 0.0019 Epoch: 17 Global Step: 287690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:34:52,699-Speed 5195.26 samples/sec Loss 0.5265 LearningRate 0.0019 Epoch: 17 Global Step: 287700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:34:54,672-Speed 5192.88 samples/sec Loss 0.5274 LearningRate 0.0019 Epoch: 17 Global Step: 287710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:34:56,657-Speed 5160.35 samples/sec Loss 0.5243 LearningRate 0.0019 Epoch: 17 Global Step: 287720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:34:58,663-Speed 5105.47 samples/sec Loss 0.5172 LearningRate 0.0019 Epoch: 17 Global Step: 287730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:35:00,652-Speed 5150.92 samples/sec Loss 0.5129 LearningRate 0.0019 Epoch: 17 Global Step: 287740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:35:02,650-Speed 5126.19 samples/sec Loss 0.4958 LearningRate 0.0019 Epoch: 17 Global Step: 287750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:35:04,635-Speed 5162.21 samples/sec Loss 0.5106 LearningRate 0.0019 Epoch: 17 Global Step: 287760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:35:06,613-Speed 5178.18 samples/sec Loss 0.5433 LearningRate 0.0019 Epoch: 17 Global Step: 287770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:35:08,586-Speed 5191.89 samples/sec Loss 0.5165 LearningRate 0.0019 Epoch: 17 Global Step: 287780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:35:10,570-Speed 5163.05 samples/sec Loss 0.5135 LearningRate 0.0019 Epoch: 17 Global Step: 287790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:35:12,544-Speed 5188.68 samples/sec Loss 0.5167 LearningRate 0.0019 Epoch: 17 Global Step: 287800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:35:14,559-Speed 5083.92 samples/sec Loss 0.5050 LearningRate 0.0019 Epoch: 17 Global Step: 287810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:35:16,546-Speed 5155.65 samples/sec Loss 0.5315 LearningRate 0.0019 Epoch: 17 Global Step: 287820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:35:18,515-Speed 5202.56 samples/sec Loss 0.4999 LearningRate 0.0019 Epoch: 17 Global Step: 287830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:35:20,522-Speed 5103.33 samples/sec Loss 0.5343 LearningRate 0.0019 Epoch: 17 Global Step: 287840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:35:22,495-Speed 5192.33 samples/sec Loss 0.5292 LearningRate 0.0019 Epoch: 17 Global Step: 287850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:35:24,466-Speed 5197.18 samples/sec Loss 0.5330 LearningRate 0.0019 Epoch: 17 Global Step: 287860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:35:26,451-Speed 5161.07 samples/sec Loss 0.5089 LearningRate 0.0019 Epoch: 17 Global Step: 287870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:35:28,459-Speed 5100.53 samples/sec Loss 0.5026 LearningRate 0.0019 Epoch: 17 Global Step: 287880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:35:30,437-Speed 5181.54 samples/sec Loss 0.4874 LearningRate 0.0019 Epoch: 17 Global Step: 287890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:35:32,407-Speed 5198.76 samples/sec Loss 0.5128 LearningRate 0.0019 Epoch: 17 Global Step: 287900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:35:34,402-Speed 5135.13 samples/sec Loss 0.5288 LearningRate 0.0019 Epoch: 17 Global Step: 287910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:35:36,426-Speed 5061.74 samples/sec Loss 0.5067 LearningRate 0.0019 Epoch: 17 Global Step: 287920 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:35:38,419-Speed 5139.00 samples/sec Loss 0.5214 LearningRate 0.0019 Epoch: 17 Global Step: 287930 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:35:40,416-Speed 5128.89 samples/sec Loss 0.5009 LearningRate 0.0019 Epoch: 17 Global Step: 287940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:35:42,405-Speed 5149.40 samples/sec Loss 0.5011 LearningRate 0.0019 Epoch: 17 Global Step: 287950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:35:44,378-Speed 5192.75 samples/sec Loss 0.5181 LearningRate 0.0019 Epoch: 17 Global Step: 287960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:35:46,353-Speed 5186.61 samples/sec Loss 0.5002 LearningRate 0.0019 Epoch: 17 Global Step: 287970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:35:48,339-Speed 5157.76 samples/sec Loss 0.5325 LearningRate 0.0019 Epoch: 17 Global Step: 287980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:35:50,331-Speed 5142.76 samples/sec Loss 0.5178 LearningRate 0.0019 Epoch: 17 Global Step: 287990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:35:52,315-Speed 5162.17 samples/sec Loss 0.5211 LearningRate 0.0019 Epoch: 17 Global Step: 288000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:36:18,879-[lfw][288000]XNorm: 21.816920 Training: 2022-04-11 18:36:18,879-[lfw][288000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 18:36:18,880-[lfw][288000]Accuracy-Highest: 0.99833 Training: 2022-04-11 18:36:49,658-[cfp_fp][288000]XNorm: 22.256762 Training: 2022-04-11 18:36:49,658-[cfp_fp][288000]Accuracy-Flip: 0.98814+-0.00456 Training: 2022-04-11 18:36:49,659-[cfp_fp][288000]Accuracy-Highest: 0.99000 Training: 2022-04-11 18:37:16,313-[agedb_30][288000]XNorm: 22.965675 Training: 2022-04-11 18:37:16,314-[agedb_30][288000]Accuracy-Flip: 0.98267+-0.00633 Training: 2022-04-11 18:37:16,314-[agedb_30][288000]Accuracy-Highest: 0.98333 Training: 2022-04-11 18:37:18,308-Speed 119.08 samples/sec Loss 0.5203 LearningRate 0.0019 Epoch: 17 Global Step: 288010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:37:20,292-Speed 5164.81 samples/sec Loss 0.5146 LearningRate 0.0019 Epoch: 17 Global Step: 288020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:37:22,288-Speed 5131.77 samples/sec Loss 0.4820 LearningRate 0.0019 Epoch: 17 Global Step: 288030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:37:24,262-Speed 5187.64 samples/sec Loss 0.5094 LearningRate 0.0019 Epoch: 17 Global Step: 288040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:37:26,252-Speed 5148.19 samples/sec Loss 0.5213 LearningRate 0.0019 Epoch: 17 Global Step: 288050 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:37:28,218-Speed 5209.77 samples/sec Loss 0.5120 LearningRate 0.0019 Epoch: 17 Global Step: 288060 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:37:30,185-Speed 5210.08 samples/sec Loss 0.5128 LearningRate 0.0019 Epoch: 17 Global Step: 288070 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:37:32,160-Speed 5184.40 samples/sec Loss 0.4975 LearningRate 0.0019 Epoch: 17 Global Step: 288080 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:37:34,139-Speed 5177.74 samples/sec Loss 0.5355 LearningRate 0.0019 Epoch: 17 Global Step: 288090 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:37:36,119-Speed 5172.21 samples/sec Loss 0.5020 LearningRate 0.0019 Epoch: 17 Global Step: 288100 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:37:38,085-Speed 5209.92 samples/sec Loss 0.5226 LearningRate 0.0019 Epoch: 17 Global Step: 288110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:37:40,063-Speed 5177.84 samples/sec Loss 0.4906 LearningRate 0.0019 Epoch: 17 Global Step: 288120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:37:42,033-Speed 5199.66 samples/sec Loss 0.5009 LearningRate 0.0019 Epoch: 17 Global Step: 288130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:37:43,999-Speed 5212.91 samples/sec Loss 0.5206 LearningRate 0.0019 Epoch: 17 Global Step: 288140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:37:45,971-Speed 5193.51 samples/sec Loss 0.4940 LearningRate 0.0019 Epoch: 17 Global Step: 288150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:37:47,976-Speed 5109.73 samples/sec Loss 0.5135 LearningRate 0.0019 Epoch: 17 Global Step: 288160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:37:49,967-Speed 5143.42 samples/sec Loss 0.5482 LearningRate 0.0019 Epoch: 17 Global Step: 288170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:37:51,957-Speed 5147.87 samples/sec Loss 0.4858 LearningRate 0.0019 Epoch: 17 Global Step: 288180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:37:53,953-Speed 5132.12 samples/sec Loss 0.4876 LearningRate 0.0019 Epoch: 17 Global Step: 288190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:37:55,932-Speed 5175.63 samples/sec Loss 0.5251 LearningRate 0.0019 Epoch: 17 Global Step: 288200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:37:57,926-Speed 5138.93 samples/sec Loss 0.5075 LearningRate 0.0019 Epoch: 17 Global Step: 288210 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:37:59,925-Speed 5123.94 samples/sec Loss 0.5315 LearningRate 0.0019 Epoch: 17 Global Step: 288220 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:38:01,915-Speed 5148.07 samples/sec Loss 0.4988 LearningRate 0.0019 Epoch: 17 Global Step: 288230 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:38:03,889-Speed 5187.55 samples/sec Loss 0.5296 LearningRate 0.0019 Epoch: 17 Global Step: 288240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:38:05,860-Speed 5196.27 samples/sec Loss 0.5020 LearningRate 0.0019 Epoch: 17 Global Step: 288250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:38:07,853-Speed 5140.81 samples/sec Loss 0.5290 LearningRate 0.0019 Epoch: 17 Global Step: 288260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:38:09,833-Speed 5173.87 samples/sec Loss 0.5271 LearningRate 0.0019 Epoch: 17 Global Step: 288270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:38:11,805-Speed 5192.97 samples/sec Loss 0.4982 LearningRate 0.0019 Epoch: 17 Global Step: 288280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:38:13,803-Speed 5126.48 samples/sec Loss 0.5263 LearningRate 0.0019 Epoch: 17 Global Step: 288290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:38:15,777-Speed 5190.25 samples/sec Loss 0.5203 LearningRate 0.0019 Epoch: 17 Global Step: 288300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:38:17,751-Speed 5189.10 samples/sec Loss 0.4800 LearningRate 0.0019 Epoch: 17 Global Step: 288310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:38:19,755-Speed 5111.82 samples/sec Loss 0.4900 LearningRate 0.0019 Epoch: 17 Global Step: 288320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:38:21,724-Speed 5203.70 samples/sec Loss 0.4940 LearningRate 0.0019 Epoch: 17 Global Step: 288330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:38:23,704-Speed 5173.42 samples/sec Loss 0.5086 LearningRate 0.0019 Epoch: 17 Global Step: 288340 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:38:25,696-Speed 5142.14 samples/sec Loss 0.5049 LearningRate 0.0019 Epoch: 17 Global Step: 288350 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:38:27,670-Speed 5188.26 samples/sec Loss 0.5161 LearningRate 0.0019 Epoch: 17 Global Step: 288360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:38:29,649-Speed 5174.92 samples/sec Loss 0.5347 LearningRate 0.0019 Epoch: 17 Global Step: 288370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:38:31,633-Speed 5164.33 samples/sec Loss 0.5024 LearningRate 0.0019 Epoch: 17 Global Step: 288380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:38:33,614-Speed 5169.60 samples/sec Loss 0.5367 LearningRate 0.0019 Epoch: 17 Global Step: 288390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:38:35,600-Speed 5157.65 samples/sec Loss 0.5097 LearningRate 0.0019 Epoch: 17 Global Step: 288400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:38:37,575-Speed 5188.26 samples/sec Loss 0.5071 LearningRate 0.0019 Epoch: 17 Global Step: 288410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:38:39,544-Speed 5202.43 samples/sec Loss 0.5474 LearningRate 0.0018 Epoch: 17 Global Step: 288420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:38:41,515-Speed 5196.46 samples/sec Loss 0.5032 LearningRate 0.0018 Epoch: 17 Global Step: 288430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:38:43,483-Speed 5205.21 samples/sec Loss 0.5352 LearningRate 0.0018 Epoch: 17 Global Step: 288440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:38:45,470-Speed 5154.45 samples/sec Loss 0.5048 LearningRate 0.0018 Epoch: 17 Global Step: 288450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:38:47,452-Speed 5168.48 samples/sec Loss 0.5097 LearningRate 0.0018 Epoch: 17 Global Step: 288460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:38:49,423-Speed 5196.28 samples/sec Loss 0.5092 LearningRate 0.0018 Epoch: 17 Global Step: 288470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:38:51,395-Speed 5195.82 samples/sec Loss 0.5407 LearningRate 0.0018 Epoch: 17 Global Step: 288480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:38:53,362-Speed 5206.71 samples/sec Loss 0.5130 LearningRate 0.0018 Epoch: 17 Global Step: 288490 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:38:55,334-Speed 5194.59 samples/sec Loss 0.5020 LearningRate 0.0018 Epoch: 17 Global Step: 288500 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:38:57,306-Speed 5194.04 samples/sec Loss 0.5404 LearningRate 0.0018 Epoch: 17 Global Step: 288510 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:38:59,272-Speed 5210.50 samples/sec Loss 0.5170 LearningRate 0.0018 Epoch: 17 Global Step: 288520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:39:01,259-Speed 5155.64 samples/sec Loss 0.5044 LearningRate 0.0018 Epoch: 17 Global Step: 288530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:39:03,244-Speed 5161.27 samples/sec Loss 0.5032 LearningRate 0.0018 Epoch: 17 Global Step: 288540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:39:05,231-Speed 5156.33 samples/sec Loss 0.5146 LearningRate 0.0018 Epoch: 17 Global Step: 288550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:39:07,236-Speed 5107.89 samples/sec Loss 0.5350 LearningRate 0.0018 Epoch: 17 Global Step: 288560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:39:09,208-Speed 5195.53 samples/sec Loss 0.5122 LearningRate 0.0018 Epoch: 17 Global Step: 288570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:39:11,184-Speed 5181.94 samples/sec Loss 0.5552 LearningRate 0.0018 Epoch: 17 Global Step: 288580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:39:13,177-Speed 5141.01 samples/sec Loss 0.5219 LearningRate 0.0018 Epoch: 17 Global Step: 288590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:39:15,147-Speed 5199.41 samples/sec Loss 0.5058 LearningRate 0.0018 Epoch: 17 Global Step: 288600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:39:17,141-Speed 5138.94 samples/sec Loss 0.5395 LearningRate 0.0018 Epoch: 17 Global Step: 288610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:39:19,130-Speed 5148.48 samples/sec Loss 0.5006 LearningRate 0.0018 Epoch: 17 Global Step: 288620 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:39:21,132-Speed 5116.67 samples/sec Loss 0.5483 LearningRate 0.0018 Epoch: 17 Global Step: 288630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:39:23,122-Speed 5150.45 samples/sec Loss 0.5347 LearningRate 0.0018 Epoch: 17 Global Step: 288640 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:39:25,131-Speed 5098.65 samples/sec Loss 0.5177 LearningRate 0.0018 Epoch: 17 Global Step: 288650 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:39:27,108-Speed 5182.87 samples/sec Loss 0.5264 LearningRate 0.0018 Epoch: 17 Global Step: 288660 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:39:29,102-Speed 5135.69 samples/sec Loss 0.5203 LearningRate 0.0018 Epoch: 17 Global Step: 288670 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:39:31,079-Speed 5183.36 samples/sec Loss 0.5216 LearningRate 0.0018 Epoch: 17 Global Step: 288680 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:39:33,048-Speed 5203.18 samples/sec Loss 0.5040 LearningRate 0.0018 Epoch: 17 Global Step: 288690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:39:35,017-Speed 5201.53 samples/sec Loss 0.5211 LearningRate 0.0018 Epoch: 17 Global Step: 288700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:39:37,000-Speed 5167.86 samples/sec Loss 0.5023 LearningRate 0.0018 Epoch: 17 Global Step: 288710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:39:38,966-Speed 5208.15 samples/sec Loss 0.5198 LearningRate 0.0018 Epoch: 17 Global Step: 288720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:39:40,937-Speed 5199.03 samples/sec Loss 0.5144 LearningRate 0.0018 Epoch: 17 Global Step: 288730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:39:42,913-Speed 5183.61 samples/sec Loss 0.5328 LearningRate 0.0018 Epoch: 17 Global Step: 288740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:39:44,886-Speed 5190.63 samples/sec Loss 0.5271 LearningRate 0.0018 Epoch: 17 Global Step: 288750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:39:46,881-Speed 5135.78 samples/sec Loss 0.5218 LearningRate 0.0018 Epoch: 17 Global Step: 288760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:39:48,857-Speed 5183.49 samples/sec Loss 0.4940 LearningRate 0.0018 Epoch: 17 Global Step: 288770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:39:50,828-Speed 5197.93 samples/sec Loss 0.5203 LearningRate 0.0018 Epoch: 17 Global Step: 288780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:39:52,791-Speed 5219.13 samples/sec Loss 0.5016 LearningRate 0.0018 Epoch: 17 Global Step: 288790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:39:54,769-Speed 5177.95 samples/sec Loss 0.5420 LearningRate 0.0018 Epoch: 17 Global Step: 288800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:39:56,743-Speed 5189.28 samples/sec Loss 0.4898 LearningRate 0.0018 Epoch: 17 Global Step: 288810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:39:58,719-Speed 5184.21 samples/sec Loss 0.4918 LearningRate 0.0018 Epoch: 17 Global Step: 288820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:40:00,700-Speed 5170.19 samples/sec Loss 0.5116 LearningRate 0.0018 Epoch: 17 Global Step: 288830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:40:02,675-Speed 5185.26 samples/sec Loss 0.5143 LearningRate 0.0018 Epoch: 17 Global Step: 288840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:40:04,665-Speed 5148.19 samples/sec Loss 0.5148 LearningRate 0.0018 Epoch: 17 Global Step: 288850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:40:06,657-Speed 5143.77 samples/sec Loss 0.5167 LearningRate 0.0018 Epoch: 17 Global Step: 288860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:40:08,640-Speed 5166.12 samples/sec Loss 0.5060 LearningRate 0.0018 Epoch: 17 Global Step: 288870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:40:10,625-Speed 5158.84 samples/sec Loss 0.4967 LearningRate 0.0018 Epoch: 17 Global Step: 288880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:40:12,599-Speed 5189.71 samples/sec Loss 0.5373 LearningRate 0.0018 Epoch: 17 Global Step: 288890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:40:14,574-Speed 5185.61 samples/sec Loss 0.4915 LearningRate 0.0018 Epoch: 17 Global Step: 288900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:40:16,558-Speed 5162.61 samples/sec Loss 0.5087 LearningRate 0.0018 Epoch: 17 Global Step: 288910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:40:18,532-Speed 5190.22 samples/sec Loss 0.5003 LearningRate 0.0018 Epoch: 17 Global Step: 288920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:40:20,499-Speed 5207.37 samples/sec Loss 0.5081 LearningRate 0.0018 Epoch: 17 Global Step: 288930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:40:22,478-Speed 5176.72 samples/sec Loss 0.5426 LearningRate 0.0018 Epoch: 17 Global Step: 288940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:40:24,469-Speed 5145.25 samples/sec Loss 0.4996 LearningRate 0.0018 Epoch: 17 Global Step: 288950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:40:26,445-Speed 5184.84 samples/sec Loss 0.5522 LearningRate 0.0018 Epoch: 17 Global Step: 288960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:40:28,418-Speed 5192.15 samples/sec Loss 0.5236 LearningRate 0.0018 Epoch: 17 Global Step: 288970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:40:30,392-Speed 5189.97 samples/sec Loss 0.5537 LearningRate 0.0018 Epoch: 17 Global Step: 288980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:40:32,365-Speed 5191.89 samples/sec Loss 0.5159 LearningRate 0.0018 Epoch: 17 Global Step: 288990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:40:34,342-Speed 5180.70 samples/sec Loss 0.4817 LearningRate 0.0018 Epoch: 17 Global Step: 289000 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:40:36,317-Speed 5184.22 samples/sec Loss 0.5499 LearningRate 0.0018 Epoch: 17 Global Step: 289010 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:40:38,296-Speed 5175.76 samples/sec Loss 0.5297 LearningRate 0.0018 Epoch: 17 Global Step: 289020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:40:40,276-Speed 5173.29 samples/sec Loss 0.4940 LearningRate 0.0018 Epoch: 17 Global Step: 289030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:40:42,247-Speed 5198.55 samples/sec Loss 0.4747 LearningRate 0.0018 Epoch: 17 Global Step: 289040 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:40:44,218-Speed 5197.93 samples/sec Loss 0.5302 LearningRate 0.0018 Epoch: 17 Global Step: 289050 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:40:46,189-Speed 5196.60 samples/sec Loss 0.5339 LearningRate 0.0018 Epoch: 17 Global Step: 289060 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:40:48,182-Speed 5138.47 samples/sec Loss 0.5343 LearningRate 0.0018 Epoch: 17 Global Step: 289070 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:40:50,173-Speed 5146.25 samples/sec Loss 0.5072 LearningRate 0.0018 Epoch: 17 Global Step: 289080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:40:52,142-Speed 5202.77 samples/sec Loss 0.5288 LearningRate 0.0018 Epoch: 17 Global Step: 289090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:40:54,110-Speed 5203.37 samples/sec Loss 0.5243 LearningRate 0.0018 Epoch: 17 Global Step: 289100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:40:56,078-Speed 5206.06 samples/sec Loss 0.5090 LearningRate 0.0018 Epoch: 17 Global Step: 289110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:40:58,057-Speed 5175.85 samples/sec Loss 0.5296 LearningRate 0.0018 Epoch: 17 Global Step: 289120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:41:00,031-Speed 5187.76 samples/sec Loss 0.5100 LearningRate 0.0018 Epoch: 17 Global Step: 289130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:41:02,029-Speed 5127.52 samples/sec Loss 0.5195 LearningRate 0.0018 Epoch: 17 Global Step: 289140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:41:04,009-Speed 5173.25 samples/sec Loss 0.5514 LearningRate 0.0018 Epoch: 17 Global Step: 289150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:41:05,984-Speed 5186.52 samples/sec Loss 0.5100 LearningRate 0.0018 Epoch: 17 Global Step: 289160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:41:07,987-Speed 5115.56 samples/sec Loss 0.5291 LearningRate 0.0018 Epoch: 17 Global Step: 289170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:41:09,971-Speed 5163.16 samples/sec Loss 0.5145 LearningRate 0.0018 Epoch: 17 Global Step: 289180 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:41:11,969-Speed 5125.73 samples/sec Loss 0.5017 LearningRate 0.0018 Epoch: 17 Global Step: 289190 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:41:13,953-Speed 5163.86 samples/sec Loss 0.4949 LearningRate 0.0018 Epoch: 17 Global Step: 289200 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:41:15,935-Speed 5168.30 samples/sec Loss 0.5204 LearningRate 0.0018 Epoch: 17 Global Step: 289210 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:41:17,906-Speed 5197.85 samples/sec Loss 0.5203 LearningRate 0.0018 Epoch: 17 Global Step: 289220 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:41:19,890-Speed 5162.73 samples/sec Loss 0.5106 LearningRate 0.0018 Epoch: 17 Global Step: 289230 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:41:21,862-Speed 5193.90 samples/sec Loss 0.5231 LearningRate 0.0018 Epoch: 17 Global Step: 289240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:41:23,838-Speed 5183.02 samples/sec Loss 0.5172 LearningRate 0.0018 Epoch: 17 Global Step: 289250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:41:25,853-Speed 5085.07 samples/sec Loss 0.4994 LearningRate 0.0018 Epoch: 17 Global Step: 289260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:41:27,829-Speed 5183.30 samples/sec Loss 0.5246 LearningRate 0.0018 Epoch: 17 Global Step: 289270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:41:29,810-Speed 5169.31 samples/sec Loss 0.5194 LearningRate 0.0018 Epoch: 17 Global Step: 289280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:41:31,782-Speed 5196.98 samples/sec Loss 0.5125 LearningRate 0.0018 Epoch: 17 Global Step: 289290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:41:33,753-Speed 5197.11 samples/sec Loss 0.4926 LearningRate 0.0018 Epoch: 17 Global Step: 289300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:41:35,750-Speed 5128.14 samples/sec Loss 0.5291 LearningRate 0.0018 Epoch: 17 Global Step: 289310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:41:37,774-Speed 5060.67 samples/sec Loss 0.4887 LearningRate 0.0018 Epoch: 17 Global Step: 289320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:41:39,776-Speed 5118.65 samples/sec Loss 0.5270 LearningRate 0.0018 Epoch: 17 Global Step: 289330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:41:41,750-Speed 5188.28 samples/sec Loss 0.5240 LearningRate 0.0018 Epoch: 17 Global Step: 289340 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:41:43,714-Speed 5214.75 samples/sec Loss 0.5489 LearningRate 0.0018 Epoch: 17 Global Step: 289350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:41:45,691-Speed 5182.90 samples/sec Loss 0.5131 LearningRate 0.0018 Epoch: 17 Global Step: 289360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:41:47,682-Speed 5143.57 samples/sec Loss 0.5105 LearningRate 0.0018 Epoch: 17 Global Step: 289370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:41:49,655-Speed 5191.79 samples/sec Loss 0.5020 LearningRate 0.0018 Epoch: 17 Global Step: 289380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:41:51,631-Speed 5184.99 samples/sec Loss 0.5308 LearningRate 0.0018 Epoch: 17 Global Step: 289390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:41:53,601-Speed 5197.80 samples/sec Loss 0.5338 LearningRate 0.0018 Epoch: 17 Global Step: 289400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:41:55,588-Speed 5155.79 samples/sec Loss 0.5393 LearningRate 0.0018 Epoch: 17 Global Step: 289410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:41:57,589-Speed 5121.14 samples/sec Loss 0.5230 LearningRate 0.0018 Epoch: 17 Global Step: 289420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:41:59,561-Speed 5194.62 samples/sec Loss 0.5318 LearningRate 0.0018 Epoch: 17 Global Step: 289430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:42:01,548-Speed 5153.19 samples/sec Loss 0.5352 LearningRate 0.0018 Epoch: 17 Global Step: 289440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:42:03,525-Speed 5182.51 samples/sec Loss 0.5528 LearningRate 0.0018 Epoch: 17 Global Step: 289450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:42:05,513-Speed 5150.93 samples/sec Loss 0.4959 LearningRate 0.0018 Epoch: 17 Global Step: 289460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:42:07,483-Speed 5200.90 samples/sec Loss 0.5043 LearningRate 0.0018 Epoch: 17 Global Step: 289470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:42:09,459-Speed 5182.79 samples/sec Loss 0.5137 LearningRate 0.0018 Epoch: 17 Global Step: 289480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:42:11,430-Speed 5198.04 samples/sec Loss 0.5257 LearningRate 0.0018 Epoch: 17 Global Step: 289490 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:42:13,406-Speed 5182.46 samples/sec Loss 0.5186 LearningRate 0.0018 Epoch: 17 Global Step: 289500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:42:15,393-Speed 5156.21 samples/sec Loss 0.5362 LearningRate 0.0018 Epoch: 17 Global Step: 289510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:42:17,369-Speed 5183.91 samples/sec Loss 0.5051 LearningRate 0.0018 Epoch: 17 Global Step: 289520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:42:19,354-Speed 5160.00 samples/sec Loss 0.5151 LearningRate 0.0018 Epoch: 17 Global Step: 289530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:42:21,329-Speed 5188.30 samples/sec Loss 0.5399 LearningRate 0.0018 Epoch: 17 Global Step: 289540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:42:23,308-Speed 5176.89 samples/sec Loss 0.5340 LearningRate 0.0018 Epoch: 17 Global Step: 289550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:42:25,306-Speed 5126.11 samples/sec Loss 0.5192 LearningRate 0.0018 Epoch: 17 Global Step: 289560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:42:27,282-Speed 5183.30 samples/sec Loss 0.5127 LearningRate 0.0018 Epoch: 17 Global Step: 289570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:42:29,257-Speed 5186.71 samples/sec Loss 0.5126 LearningRate 0.0018 Epoch: 17 Global Step: 289580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:42:31,228-Speed 5195.98 samples/sec Loss 0.5298 LearningRate 0.0018 Epoch: 17 Global Step: 289590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:42:33,200-Speed 5196.35 samples/sec Loss 0.4973 LearningRate 0.0018 Epoch: 17 Global Step: 289600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:42:35,168-Speed 5203.57 samples/sec Loss 0.5355 LearningRate 0.0018 Epoch: 17 Global Step: 289610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:42:37,165-Speed 5130.41 samples/sec Loss 0.5257 LearningRate 0.0018 Epoch: 17 Global Step: 289620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:42:39,154-Speed 5150.00 samples/sec Loss 0.5005 LearningRate 0.0018 Epoch: 17 Global Step: 289630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:42:41,140-Speed 5155.90 samples/sec Loss 0.5120 LearningRate 0.0018 Epoch: 17 Global Step: 289640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:42:43,115-Speed 5188.53 samples/sec Loss 0.5285 LearningRate 0.0018 Epoch: 17 Global Step: 289650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:42:45,091-Speed 5184.80 samples/sec Loss 0.5421 LearningRate 0.0017 Epoch: 17 Global Step: 289660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:42:47,061-Speed 5198.65 samples/sec Loss 0.5017 LearningRate 0.0017 Epoch: 17 Global Step: 289670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:42:49,044-Speed 5164.68 samples/sec Loss 0.5318 LearningRate 0.0017 Epoch: 17 Global Step: 289680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:42:51,014-Speed 5201.36 samples/sec Loss 0.5142 LearningRate 0.0017 Epoch: 17 Global Step: 289690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:42:52,985-Speed 5196.75 samples/sec Loss 0.4774 LearningRate 0.0017 Epoch: 17 Global Step: 289700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:42:54,957-Speed 5193.63 samples/sec Loss 0.5164 LearningRate 0.0017 Epoch: 17 Global Step: 289710 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:42:56,942-Speed 5161.66 samples/sec Loss 0.5246 LearningRate 0.0017 Epoch: 17 Global Step: 289720 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:42:58,926-Speed 5160.18 samples/sec Loss 0.5384 LearningRate 0.0017 Epoch: 17 Global Step: 289730 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:43:00,905-Speed 5178.76 samples/sec Loss 0.5033 LearningRate 0.0017 Epoch: 17 Global Step: 289740 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:43:02,893-Speed 5152.13 samples/sec Loss 0.5454 LearningRate 0.0017 Epoch: 17 Global Step: 289750 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:43:04,887-Speed 5138.11 samples/sec Loss 0.5497 LearningRate 0.0017 Epoch: 17 Global Step: 289760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:43:06,878-Speed 5143.27 samples/sec Loss 0.5298 LearningRate 0.0017 Epoch: 17 Global Step: 289770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:43:08,864-Speed 5158.13 samples/sec Loss 0.5120 LearningRate 0.0017 Epoch: 17 Global Step: 289780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:43:10,834-Speed 5198.80 samples/sec Loss 0.5153 LearningRate 0.0017 Epoch: 17 Global Step: 289790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:43:12,827-Speed 5139.46 samples/sec Loss 0.5094 LearningRate 0.0017 Epoch: 17 Global Step: 289800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:43:14,807-Speed 5173.97 samples/sec Loss 0.5470 LearningRate 0.0017 Epoch: 17 Global Step: 289810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:43:16,785-Speed 5178.93 samples/sec Loss 0.5324 LearningRate 0.0017 Epoch: 17 Global Step: 289820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:43:18,758-Speed 5191.88 samples/sec Loss 0.5107 LearningRate 0.0017 Epoch: 17 Global Step: 289830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:43:20,747-Speed 5151.32 samples/sec Loss 0.5165 LearningRate 0.0017 Epoch: 17 Global Step: 289840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:43:22,749-Speed 5116.36 samples/sec Loss 0.5078 LearningRate 0.0017 Epoch: 17 Global Step: 289850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:43:24,736-Speed 5156.41 samples/sec Loss 0.5016 LearningRate 0.0017 Epoch: 17 Global Step: 289860 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:43:26,731-Speed 5134.76 samples/sec Loss 0.5311 LearningRate 0.0017 Epoch: 17 Global Step: 289870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:43:28,726-Speed 5133.37 samples/sec Loss 0.4960 LearningRate 0.0017 Epoch: 17 Global Step: 289880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:43:30,710-Speed 5163.18 samples/sec Loss 0.5070 LearningRate 0.0017 Epoch: 17 Global Step: 289890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:43:32,694-Speed 5162.47 samples/sec Loss 0.5383 LearningRate 0.0017 Epoch: 17 Global Step: 289900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:43:34,677-Speed 5166.65 samples/sec Loss 0.5301 LearningRate 0.0017 Epoch: 17 Global Step: 289910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:43:36,652-Speed 5188.04 samples/sec Loss 0.5048 LearningRate 0.0017 Epoch: 17 Global Step: 289920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:43:38,694-Speed 5016.56 samples/sec Loss 0.5566 LearningRate 0.0017 Epoch: 17 Global Step: 289930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:43:40,684-Speed 5146.47 samples/sec Loss 0.5280 LearningRate 0.0017 Epoch: 17 Global Step: 289940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:43:42,662-Speed 5179.15 samples/sec Loss 0.5275 LearningRate 0.0017 Epoch: 17 Global Step: 289950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:43:44,647-Speed 5161.54 samples/sec Loss 0.5243 LearningRate 0.0017 Epoch: 17 Global Step: 289960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:43:46,641-Speed 5135.81 samples/sec Loss 0.5422 LearningRate 0.0017 Epoch: 17 Global Step: 289970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:43:48,633-Speed 5143.56 samples/sec Loss 0.5062 LearningRate 0.0017 Epoch: 17 Global Step: 289980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:43:50,605-Speed 5194.18 samples/sec Loss 0.5263 LearningRate 0.0017 Epoch: 17 Global Step: 289990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:43:52,583-Speed 5178.78 samples/sec Loss 0.5493 LearningRate 0.0017 Epoch: 17 Global Step: 290000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:44:19,361-[lfw][290000]XNorm: 21.523837 Training: 2022-04-11 18:44:19,362-[lfw][290000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 18:44:19,362-[lfw][290000]Accuracy-Highest: 0.99833 Training: 2022-04-11 18:44:50,183-[cfp_fp][290000]XNorm: 21.839081 Training: 2022-04-11 18:44:50,184-[cfp_fp][290000]Accuracy-Flip: 0.98886+-0.00494 Training: 2022-04-11 18:44:50,184-[cfp_fp][290000]Accuracy-Highest: 0.99000 Training: 2022-04-11 18:45:16,884-[agedb_30][290000]XNorm: 22.671801 Training: 2022-04-11 18:45:16,884-[agedb_30][290000]Accuracy-Flip: 0.98383+-0.00695 Training: 2022-04-11 18:45:16,885-[agedb_30][290000]Accuracy-Highest: 0.98383 Training: 2022-04-11 18:45:18,863-Speed 118.68 samples/sec Loss 0.5263 LearningRate 0.0017 Epoch: 17 Global Step: 290010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:45:20,832-Speed 5201.60 samples/sec Loss 0.5114 LearningRate 0.0017 Epoch: 17 Global Step: 290020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:45:22,790-Speed 5231.64 samples/sec Loss 0.5314 LearningRate 0.0017 Epoch: 17 Global Step: 290030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:45:24,757-Speed 5209.32 samples/sec Loss 0.5180 LearningRate 0.0017 Epoch: 17 Global Step: 290040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:45:26,741-Speed 5161.11 samples/sec Loss 0.5280 LearningRate 0.0017 Epoch: 17 Global Step: 290050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:45:28,711-Speed 5200.00 samples/sec Loss 0.5324 LearningRate 0.0017 Epoch: 17 Global Step: 290060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:45:30,698-Speed 5155.95 samples/sec Loss 0.5198 LearningRate 0.0017 Epoch: 17 Global Step: 290070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:45:32,674-Speed 5185.16 samples/sec Loss 0.5177 LearningRate 0.0017 Epoch: 17 Global Step: 290080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:45:34,643-Speed 5200.33 samples/sec Loss 0.5364 LearningRate 0.0017 Epoch: 17 Global Step: 290090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:45:36,613-Speed 5201.51 samples/sec Loss 0.5283 LearningRate 0.0017 Epoch: 17 Global Step: 290100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:45:38,591-Speed 5178.59 samples/sec Loss 0.5258 LearningRate 0.0017 Epoch: 17 Global Step: 290110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:45:40,578-Speed 5154.43 samples/sec Loss 0.5442 LearningRate 0.0017 Epoch: 17 Global Step: 290120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:45:42,547-Speed 5200.56 samples/sec Loss 0.5136 LearningRate 0.0017 Epoch: 17 Global Step: 290130 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:45:44,509-Speed 5222.49 samples/sec Loss 0.5428 LearningRate 0.0017 Epoch: 17 Global Step: 290140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:45:46,481-Speed 5195.04 samples/sec Loss 0.5156 LearningRate 0.0017 Epoch: 17 Global Step: 290150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:45:48,462-Speed 5169.83 samples/sec Loss 0.5389 LearningRate 0.0017 Epoch: 17 Global Step: 290160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:45:50,442-Speed 5173.06 samples/sec Loss 0.5375 LearningRate 0.0017 Epoch: 17 Global Step: 290170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:45:52,412-Speed 5200.00 samples/sec Loss 0.5291 LearningRate 0.0017 Epoch: 17 Global Step: 290180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:45:54,382-Speed 5202.05 samples/sec Loss 0.5362 LearningRate 0.0017 Epoch: 17 Global Step: 290190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:45:56,367-Speed 5159.40 samples/sec Loss 0.4982 LearningRate 0.0017 Epoch: 17 Global Step: 290200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:45:58,336-Speed 5202.18 samples/sec Loss 0.5376 LearningRate 0.0017 Epoch: 17 Global Step: 290210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:46:00,311-Speed 5185.94 samples/sec Loss 0.5529 LearningRate 0.0017 Epoch: 17 Global Step: 290220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:46:02,288-Speed 5183.03 samples/sec Loss 0.5129 LearningRate 0.0017 Epoch: 17 Global Step: 290230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:46:04,281-Speed 5137.03 samples/sec Loss 0.5678 LearningRate 0.0017 Epoch: 17 Global Step: 290240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:46:06,255-Speed 5191.80 samples/sec Loss 0.5477 LearningRate 0.0017 Epoch: 17 Global Step: 290250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:46:08,221-Speed 5208.99 samples/sec Loss 0.5270 LearningRate 0.0017 Epoch: 17 Global Step: 290260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:46:10,198-Speed 5182.22 samples/sec Loss 0.5097 LearningRate 0.0017 Epoch: 17 Global Step: 290270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:46:12,174-Speed 5184.58 samples/sec Loss 0.5095 LearningRate 0.0017 Epoch: 17 Global Step: 290280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:46:14,159-Speed 5159.30 samples/sec Loss 0.5102 LearningRate 0.0017 Epoch: 17 Global Step: 290290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:46:16,141-Speed 5168.33 samples/sec Loss 0.5393 LearningRate 0.0017 Epoch: 17 Global Step: 290300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:46:18,134-Speed 5139.77 samples/sec Loss 0.5220 LearningRate 0.0017 Epoch: 17 Global Step: 290310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:46:20,117-Speed 5164.75 samples/sec Loss 0.5281 LearningRate 0.0017 Epoch: 17 Global Step: 290320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:46:22,090-Speed 5191.25 samples/sec Loss 0.5223 LearningRate 0.0017 Epoch: 17 Global Step: 290330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:46:24,075-Speed 5161.94 samples/sec Loss 0.5037 LearningRate 0.0017 Epoch: 17 Global Step: 290340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:46:26,049-Speed 5190.35 samples/sec Loss 0.5238 LearningRate 0.0017 Epoch: 17 Global Step: 290350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:46:28,025-Speed 5183.44 samples/sec Loss 0.5229 LearningRate 0.0017 Epoch: 17 Global Step: 290360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:46:30,011-Speed 5158.23 samples/sec Loss 0.5531 LearningRate 0.0017 Epoch: 17 Global Step: 290370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:46:31,982-Speed 5197.07 samples/sec Loss 0.5473 LearningRate 0.0017 Epoch: 17 Global Step: 290380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:46:33,954-Speed 5195.29 samples/sec Loss 0.5026 LearningRate 0.0017 Epoch: 17 Global Step: 290390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:46:35,937-Speed 5163.86 samples/sec Loss 0.5136 LearningRate 0.0017 Epoch: 17 Global Step: 290400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:46:37,912-Speed 5186.76 samples/sec Loss 0.5169 LearningRate 0.0017 Epoch: 17 Global Step: 290410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:46:39,894-Speed 5168.58 samples/sec Loss 0.5146 LearningRate 0.0017 Epoch: 17 Global Step: 290420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:46:41,873-Speed 5175.76 samples/sec Loss 0.5403 LearningRate 0.0017 Epoch: 17 Global Step: 290430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:46:43,860-Speed 5156.19 samples/sec Loss 0.5218 LearningRate 0.0017 Epoch: 17 Global Step: 290440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:46:45,836-Speed 5184.02 samples/sec Loss 0.5177 LearningRate 0.0017 Epoch: 17 Global Step: 290450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:46:47,827-Speed 5146.37 samples/sec Loss 0.4816 LearningRate 0.0017 Epoch: 17 Global Step: 290460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:46:49,798-Speed 5197.98 samples/sec Loss 0.5038 LearningRate 0.0017 Epoch: 17 Global Step: 290470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:46:51,773-Speed 5186.99 samples/sec Loss 0.5322 LearningRate 0.0017 Epoch: 17 Global Step: 290480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:46:53,741-Speed 5204.89 samples/sec Loss 0.5546 LearningRate 0.0017 Epoch: 17 Global Step: 290490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:46:55,710-Speed 5201.01 samples/sec Loss 0.5671 LearningRate 0.0017 Epoch: 17 Global Step: 290500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:46:57,678-Speed 5204.32 samples/sec Loss 0.5217 LearningRate 0.0017 Epoch: 17 Global Step: 290510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:46:59,660-Speed 5168.45 samples/sec Loss 0.5648 LearningRate 0.0017 Epoch: 17 Global Step: 290520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:47:01,638-Speed 5178.89 samples/sec Loss 0.5127 LearningRate 0.0017 Epoch: 17 Global Step: 290530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:47:03,630-Speed 5143.92 samples/sec Loss 0.5319 LearningRate 0.0017 Epoch: 17 Global Step: 290540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:47:05,599-Speed 5202.20 samples/sec Loss 0.5516 LearningRate 0.0017 Epoch: 17 Global Step: 290550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:47:07,565-Speed 5209.14 samples/sec Loss 0.5263 LearningRate 0.0017 Epoch: 17 Global Step: 290560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:47:09,538-Speed 5193.48 samples/sec Loss 0.5102 LearningRate 0.0017 Epoch: 17 Global Step: 290570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:47:11,508-Speed 5202.93 samples/sec Loss 0.5299 LearningRate 0.0017 Epoch: 17 Global Step: 290580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:47:13,476-Speed 5204.67 samples/sec Loss 0.5179 LearningRate 0.0017 Epoch: 17 Global Step: 290590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:47:15,437-Speed 5221.67 samples/sec Loss 0.5497 LearningRate 0.0017 Epoch: 17 Global Step: 290600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:47:17,404-Speed 5208.36 samples/sec Loss 0.4925 LearningRate 0.0017 Epoch: 17 Global Step: 290610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:47:19,378-Speed 5190.15 samples/sec Loss 0.5238 LearningRate 0.0017 Epoch: 17 Global Step: 290620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:47:21,342-Speed 5214.02 samples/sec Loss 0.5391 LearningRate 0.0017 Epoch: 17 Global Step: 290630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:47:23,312-Speed 5199.48 samples/sec Loss 0.5058 LearningRate 0.0017 Epoch: 17 Global Step: 290640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:47:25,320-Speed 5101.66 samples/sec Loss 0.5265 LearningRate 0.0017 Epoch: 17 Global Step: 290650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:47:27,291-Speed 5198.07 samples/sec Loss 0.4984 LearningRate 0.0017 Epoch: 17 Global Step: 290660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:47:29,256-Speed 5214.07 samples/sec Loss 0.4799 LearningRate 0.0017 Epoch: 17 Global Step: 290670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:47:31,237-Speed 5171.01 samples/sec Loss 0.5315 LearningRate 0.0017 Epoch: 17 Global Step: 290680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:47:33,217-Speed 5171.59 samples/sec Loss 0.5022 LearningRate 0.0017 Epoch: 17 Global Step: 290690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:47:35,191-Speed 5190.43 samples/sec Loss 0.5259 LearningRate 0.0017 Epoch: 17 Global Step: 290700 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:47:37,153-Speed 5219.39 samples/sec Loss 0.4972 LearningRate 0.0017 Epoch: 17 Global Step: 290710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:47:39,140-Speed 5156.73 samples/sec Loss 0.5058 LearningRate 0.0017 Epoch: 17 Global Step: 290720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:47:41,110-Speed 5201.23 samples/sec Loss 0.5439 LearningRate 0.0017 Epoch: 17 Global Step: 290730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:47:43,075-Speed 5213.01 samples/sec Loss 0.5470 LearningRate 0.0017 Epoch: 17 Global Step: 290740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:47:45,059-Speed 5162.04 samples/sec Loss 0.5322 LearningRate 0.0017 Epoch: 17 Global Step: 290750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:47:47,046-Speed 5156.15 samples/sec Loss 0.5081 LearningRate 0.0017 Epoch: 17 Global Step: 290760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:47:49,010-Speed 5214.60 samples/sec Loss 0.5201 LearningRate 0.0017 Epoch: 17 Global Step: 290770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:47:50,980-Speed 5200.29 samples/sec Loss 0.5242 LearningRate 0.0017 Epoch: 17 Global Step: 290780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:47:52,946-Speed 5210.86 samples/sec Loss 0.5389 LearningRate 0.0017 Epoch: 17 Global Step: 290790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:47:54,921-Speed 5188.02 samples/sec Loss 0.5194 LearningRate 0.0017 Epoch: 17 Global Step: 290800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:47:56,894-Speed 5191.28 samples/sec Loss 0.5157 LearningRate 0.0017 Epoch: 17 Global Step: 290810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:47:58,862-Speed 5203.83 samples/sec Loss 0.5281 LearningRate 0.0017 Epoch: 17 Global Step: 290820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:48:00,844-Speed 5169.60 samples/sec Loss 0.5207 LearningRate 0.0017 Epoch: 17 Global Step: 290830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:48:02,815-Speed 5195.69 samples/sec Loss 0.5324 LearningRate 0.0017 Epoch: 17 Global Step: 290840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:48:04,786-Speed 5197.75 samples/sec Loss 0.5424 LearningRate 0.0017 Epoch: 17 Global Step: 290850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:48:06,761-Speed 5187.54 samples/sec Loss 0.5048 LearningRate 0.0017 Epoch: 17 Global Step: 290860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:48:08,754-Speed 5137.86 samples/sec Loss 0.4960 LearningRate 0.0017 Epoch: 17 Global Step: 290870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:48:10,737-Speed 5166.95 samples/sec Loss 0.5149 LearningRate 0.0017 Epoch: 17 Global Step: 290880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:48:12,710-Speed 5190.99 samples/sec Loss 0.5317 LearningRate 0.0017 Epoch: 17 Global Step: 290890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:48:14,691-Speed 5171.30 samples/sec Loss 0.5308 LearningRate 0.0017 Epoch: 17 Global Step: 290900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:48:16,679-Speed 5152.35 samples/sec Loss 0.5423 LearningRate 0.0017 Epoch: 17 Global Step: 290910 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:48:18,659-Speed 5173.70 samples/sec Loss 0.5180 LearningRate 0.0017 Epoch: 17 Global Step: 290920 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:48:20,625-Speed 5211.07 samples/sec Loss 0.5459 LearningRate 0.0017 Epoch: 17 Global Step: 290930 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:48:22,607-Speed 5166.91 samples/sec Loss 0.5058 LearningRate 0.0017 Epoch: 17 Global Step: 290940 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:48:24,574-Speed 5209.05 samples/sec Loss 0.5343 LearningRate 0.0016 Epoch: 17 Global Step: 290950 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:48:26,584-Speed 5096.55 samples/sec Loss 0.5334 LearningRate 0.0016 Epoch: 17 Global Step: 290960 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:48:28,561-Speed 5181.48 samples/sec Loss 0.5265 LearningRate 0.0016 Epoch: 17 Global Step: 290970 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:48:30,567-Speed 5103.96 samples/sec Loss 0.5212 LearningRate 0.0016 Epoch: 17 Global Step: 290980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:48:32,538-Speed 5198.76 samples/sec Loss 0.5158 LearningRate 0.0016 Epoch: 17 Global Step: 290990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:48:34,535-Speed 5129.32 samples/sec Loss 0.5203 LearningRate 0.0016 Epoch: 17 Global Step: 291000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:48:36,565-Speed 5044.69 samples/sec Loss 0.5335 LearningRate 0.0016 Epoch: 17 Global Step: 291010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:48:38,561-Speed 5132.55 samples/sec Loss 0.5172 LearningRate 0.0016 Epoch: 17 Global Step: 291020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:48:40,533-Speed 5197.08 samples/sec Loss 0.5113 LearningRate 0.0016 Epoch: 17 Global Step: 291030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:48:42,503-Speed 5198.76 samples/sec Loss 0.5332 LearningRate 0.0016 Epoch: 17 Global Step: 291040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:48:44,474-Speed 5196.40 samples/sec Loss 0.5277 LearningRate 0.0016 Epoch: 17 Global Step: 291050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:48:46,470-Speed 5132.55 samples/sec Loss 0.5266 LearningRate 0.0016 Epoch: 17 Global Step: 291060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:48:48,454-Speed 5162.34 samples/sec Loss 0.5290 LearningRate 0.0016 Epoch: 17 Global Step: 291070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:48:50,430-Speed 5184.36 samples/sec Loss 0.5562 LearningRate 0.0016 Epoch: 17 Global Step: 291080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:48:52,405-Speed 5185.65 samples/sec Loss 0.5104 LearningRate 0.0016 Epoch: 17 Global Step: 291090 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:48:54,371-Speed 5211.17 samples/sec Loss 0.5209 LearningRate 0.0016 Epoch: 17 Global Step: 291100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:48:56,356-Speed 5159.55 samples/sec Loss 0.4943 LearningRate 0.0016 Epoch: 17 Global Step: 291110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:48:58,331-Speed 5186.06 samples/sec Loss 0.4929 LearningRate 0.0016 Epoch: 17 Global Step: 291120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:49:00,322-Speed 5145.87 samples/sec Loss 0.5219 LearningRate 0.0016 Epoch: 17 Global Step: 291130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:49:02,305-Speed 5165.83 samples/sec Loss 0.5414 LearningRate 0.0016 Epoch: 17 Global Step: 291140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:49:04,279-Speed 5190.17 samples/sec Loss 0.5115 LearningRate 0.0016 Epoch: 17 Global Step: 291150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:49:06,250-Speed 5196.01 samples/sec Loss 0.5099 LearningRate 0.0016 Epoch: 17 Global Step: 291160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:49:08,254-Speed 5112.94 samples/sec Loss 0.5214 LearningRate 0.0016 Epoch: 17 Global Step: 291170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:49:10,246-Speed 5140.93 samples/sec Loss 0.5297 LearningRate 0.0016 Epoch: 17 Global Step: 291180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:49:12,246-Speed 5123.29 samples/sec Loss 0.5319 LearningRate 0.0016 Epoch: 17 Global Step: 291190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:49:14,216-Speed 5200.15 samples/sec Loss 0.5201 LearningRate 0.0016 Epoch: 17 Global Step: 291200 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:49:16,193-Speed 5179.69 samples/sec Loss 0.5534 LearningRate 0.0016 Epoch: 17 Global Step: 291210 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:49:18,161-Speed 5205.01 samples/sec Loss 0.5163 LearningRate 0.0016 Epoch: 17 Global Step: 291220 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:49:20,139-Speed 5179.66 samples/sec Loss 0.5174 LearningRate 0.0016 Epoch: 17 Global Step: 291230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:49:22,116-Speed 5180.14 samples/sec Loss 0.5149 LearningRate 0.0016 Epoch: 17 Global Step: 291240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:49:24,111-Speed 5133.76 samples/sec Loss 0.5203 LearningRate 0.0016 Epoch: 17 Global Step: 291250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:49:26,085-Speed 5189.28 samples/sec Loss 0.5396 LearningRate 0.0016 Epoch: 17 Global Step: 291260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:49:28,068-Speed 5165.91 samples/sec Loss 0.5092 LearningRate 0.0016 Epoch: 17 Global Step: 291270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:49:30,056-Speed 5152.72 samples/sec Loss 0.5166 LearningRate 0.0016 Epoch: 17 Global Step: 291280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:49:32,042-Speed 5159.96 samples/sec Loss 0.5545 LearningRate 0.0016 Epoch: 17 Global Step: 291290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:49:34,009-Speed 5209.36 samples/sec Loss 0.5425 LearningRate 0.0016 Epoch: 17 Global Step: 291300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:49:35,984-Speed 5186.19 samples/sec Loss 0.5128 LearningRate 0.0016 Epoch: 17 Global Step: 291310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:49:37,956-Speed 5193.46 samples/sec Loss 0.5289 LearningRate 0.0016 Epoch: 17 Global Step: 291320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:49:39,922-Speed 5211.55 samples/sec Loss 0.5175 LearningRate 0.0016 Epoch: 17 Global Step: 291330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:49:41,905-Speed 5164.00 samples/sec Loss 0.5277 LearningRate 0.0016 Epoch: 17 Global Step: 291340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:49:43,913-Speed 5101.40 samples/sec Loss 0.5272 LearningRate 0.0016 Epoch: 17 Global Step: 291350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:49:45,918-Speed 5109.59 samples/sec Loss 0.5184 LearningRate 0.0016 Epoch: 17 Global Step: 291360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:49:47,900-Speed 5168.93 samples/sec Loss 0.5151 LearningRate 0.0016 Epoch: 17 Global Step: 291370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:49:49,879-Speed 5175.84 samples/sec Loss 0.5335 LearningRate 0.0016 Epoch: 17 Global Step: 291380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:49:51,873-Speed 5135.59 samples/sec Loss 0.5353 LearningRate 0.0016 Epoch: 17 Global Step: 291390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:49:53,841-Speed 5205.72 samples/sec Loss 0.5267 LearningRate 0.0016 Epoch: 17 Global Step: 291400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:49:55,809-Speed 5207.25 samples/sec Loss 0.5294 LearningRate 0.0016 Epoch: 17 Global Step: 291410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:49:57,778-Speed 5201.53 samples/sec Loss 0.5071 LearningRate 0.0016 Epoch: 17 Global Step: 291420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:49:59,746-Speed 5204.23 samples/sec Loss 0.5068 LearningRate 0.0016 Epoch: 17 Global Step: 291430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:50:01,732-Speed 5157.36 samples/sec Loss 0.5294 LearningRate 0.0016 Epoch: 17 Global Step: 291440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:50:03,725-Speed 5140.10 samples/sec Loss 0.5235 LearningRate 0.0016 Epoch: 17 Global Step: 291450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:50:05,706-Speed 5171.45 samples/sec Loss 0.5232 LearningRate 0.0016 Epoch: 17 Global Step: 291460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:50:07,704-Speed 5127.92 samples/sec Loss 0.5364 LearningRate 0.0016 Epoch: 17 Global Step: 291470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:50:09,683-Speed 5175.53 samples/sec Loss 0.5192 LearningRate 0.0016 Epoch: 17 Global Step: 291480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:50:11,672-Speed 5151.40 samples/sec Loss 0.5571 LearningRate 0.0016 Epoch: 17 Global Step: 291490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:50:13,663-Speed 5143.59 samples/sec Loss 0.5271 LearningRate 0.0016 Epoch: 17 Global Step: 291500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:50:15,682-Speed 5075.37 samples/sec Loss 0.5289 LearningRate 0.0016 Epoch: 17 Global Step: 291510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:50:17,675-Speed 5139.67 samples/sec Loss 0.5161 LearningRate 0.0016 Epoch: 17 Global Step: 291520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:50:19,648-Speed 5191.46 samples/sec Loss 0.5233 LearningRate 0.0016 Epoch: 17 Global Step: 291530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:50:21,613-Speed 5211.88 samples/sec Loss 0.5461 LearningRate 0.0016 Epoch: 17 Global Step: 291540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:50:23,583-Speed 5198.75 samples/sec Loss 0.5242 LearningRate 0.0016 Epoch: 17 Global Step: 291550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:50:25,579-Speed 5133.72 samples/sec Loss 0.5623 LearningRate 0.0016 Epoch: 17 Global Step: 291560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:50:27,549-Speed 5199.44 samples/sec Loss 0.5187 LearningRate 0.0016 Epoch: 17 Global Step: 291570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:50:29,515-Speed 5210.57 samples/sec Loss 0.5647 LearningRate 0.0016 Epoch: 17 Global Step: 291580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:50:31,494-Speed 5176.33 samples/sec Loss 0.5292 LearningRate 0.0016 Epoch: 17 Global Step: 291590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:50:33,462-Speed 5205.26 samples/sec Loss 0.5111 LearningRate 0.0016 Epoch: 17 Global Step: 291600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:50:35,450-Speed 5152.80 samples/sec Loss 0.5351 LearningRate 0.0016 Epoch: 17 Global Step: 291610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:50:37,424-Speed 5188.00 samples/sec Loss 0.5054 LearningRate 0.0016 Epoch: 17 Global Step: 291620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:50:39,406-Speed 5169.86 samples/sec Loss 0.5222 LearningRate 0.0016 Epoch: 17 Global Step: 291630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:50:41,394-Speed 5152.10 samples/sec Loss 0.5172 LearningRate 0.0016 Epoch: 17 Global Step: 291640 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:50:43,357-Speed 5218.53 samples/sec Loss 0.5244 LearningRate 0.0016 Epoch: 17 Global Step: 291650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:50:45,327-Speed 5200.31 samples/sec Loss 0.5213 LearningRate 0.0016 Epoch: 17 Global Step: 291660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:50:47,298-Speed 5195.99 samples/sec Loss 0.5132 LearningRate 0.0016 Epoch: 17 Global Step: 291670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:50:49,287-Speed 5149.41 samples/sec Loss 0.5222 LearningRate 0.0016 Epoch: 17 Global Step: 291680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:50:51,260-Speed 5191.92 samples/sec Loss 0.5282 LearningRate 0.0016 Epoch: 17 Global Step: 291690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:50:53,244-Speed 5163.65 samples/sec Loss 0.5222 LearningRate 0.0016 Epoch: 17 Global Step: 291700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:50:55,215-Speed 5198.87 samples/sec Loss 0.5658 LearningRate 0.0016 Epoch: 17 Global Step: 291710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:50:57,198-Speed 5163.92 samples/sec Loss 0.5378 LearningRate 0.0016 Epoch: 17 Global Step: 291720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:50:59,166-Speed 5205.61 samples/sec Loss 0.5354 LearningRate 0.0016 Epoch: 17 Global Step: 291730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:51:01,141-Speed 5185.00 samples/sec Loss 0.4894 LearningRate 0.0016 Epoch: 17 Global Step: 291740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:51:03,128-Speed 5155.86 samples/sec Loss 0.5451 LearningRate 0.0016 Epoch: 17 Global Step: 291750 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:51:05,104-Speed 5186.06 samples/sec Loss 0.5038 LearningRate 0.0016 Epoch: 17 Global Step: 291760 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:51:07,085-Speed 5168.17 samples/sec Loss 0.5005 LearningRate 0.0016 Epoch: 17 Global Step: 291770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:51:09,075-Speed 5147.43 samples/sec Loss 0.5058 LearningRate 0.0016 Epoch: 17 Global Step: 291780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:51:11,067-Speed 5143.89 samples/sec Loss 0.5104 LearningRate 0.0016 Epoch: 17 Global Step: 291790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:51:13,049-Speed 5167.32 samples/sec Loss 0.5460 LearningRate 0.0016 Epoch: 17 Global Step: 291800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:51:15,033-Speed 5163.25 samples/sec Loss 0.5138 LearningRate 0.0016 Epoch: 17 Global Step: 291810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:51:17,032-Speed 5125.90 samples/sec Loss 0.5329 LearningRate 0.0016 Epoch: 17 Global Step: 291820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:51:19,007-Speed 5185.09 samples/sec Loss 0.5203 LearningRate 0.0016 Epoch: 17 Global Step: 291830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:51:20,977-Speed 5198.82 samples/sec Loss 0.5430 LearningRate 0.0016 Epoch: 17 Global Step: 291840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:51:22,949-Speed 5194.81 samples/sec Loss 0.5418 LearningRate 0.0016 Epoch: 17 Global Step: 291850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:51:24,925-Speed 5184.44 samples/sec Loss 0.5611 LearningRate 0.0016 Epoch: 17 Global Step: 291860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:51:26,897-Speed 5195.96 samples/sec Loss 0.5410 LearningRate 0.0016 Epoch: 17 Global Step: 291870 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:51:28,863-Speed 5210.09 samples/sec Loss 0.5511 LearningRate 0.0016 Epoch: 17 Global Step: 291880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:51:30,835-Speed 5193.44 samples/sec Loss 0.5411 LearningRate 0.0016 Epoch: 17 Global Step: 291890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:51:32,814-Speed 5176.30 samples/sec Loss 0.5366 LearningRate 0.0016 Epoch: 17 Global Step: 291900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:51:34,787-Speed 5191.78 samples/sec Loss 0.4986 LearningRate 0.0016 Epoch: 17 Global Step: 291910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:51:36,760-Speed 5191.52 samples/sec Loss 0.5235 LearningRate 0.0016 Epoch: 17 Global Step: 291920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:51:38,763-Speed 5114.25 samples/sec Loss 0.5175 LearningRate 0.0016 Epoch: 17 Global Step: 291930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:51:40,736-Speed 5191.18 samples/sec Loss 0.5216 LearningRate 0.0016 Epoch: 17 Global Step: 291940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:51:42,704-Speed 5205.87 samples/sec Loss 0.5084 LearningRate 0.0016 Epoch: 17 Global Step: 291950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:51:44,696-Speed 5142.25 samples/sec Loss 0.5198 LearningRate 0.0016 Epoch: 17 Global Step: 291960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:51:46,695-Speed 5124.00 samples/sec Loss 0.5142 LearningRate 0.0016 Epoch: 17 Global Step: 291970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:51:48,666-Speed 5196.33 samples/sec Loss 0.5272 LearningRate 0.0016 Epoch: 17 Global Step: 291980 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:51:50,666-Speed 5122.22 samples/sec Loss 0.5357 LearningRate 0.0016 Epoch: 17 Global Step: 291990 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:51:52,648-Speed 5168.29 samples/sec Loss 0.5030 LearningRate 0.0016 Epoch: 17 Global Step: 292000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:52:19,322-[lfw][292000]XNorm: 21.550610 Training: 2022-04-11 18:52:19,322-[lfw][292000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 18:52:19,323-[lfw][292000]Accuracy-Highest: 0.99833 Training: 2022-04-11 18:52:50,087-[cfp_fp][292000]XNorm: 21.879637 Training: 2022-04-11 18:52:50,087-[cfp_fp][292000]Accuracy-Flip: 0.98871+-0.00421 Training: 2022-04-11 18:52:50,088-[cfp_fp][292000]Accuracy-Highest: 0.99000 Training: 2022-04-11 18:53:16,591-[agedb_30][292000]XNorm: 22.614329 Training: 2022-04-11 18:53:16,592-[agedb_30][292000]Accuracy-Flip: 0.98267+-0.00676 Training: 2022-04-11 18:53:16,592-[agedb_30][292000]Accuracy-Highest: 0.98383 Training: 2022-04-11 18:53:18,575-Speed 119.17 samples/sec Loss 0.5251 LearningRate 0.0016 Epoch: 17 Global Step: 292010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:53:20,537-Speed 5222.25 samples/sec Loss 0.4987 LearningRate 0.0016 Epoch: 17 Global Step: 292020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:53:22,504-Speed 5208.30 samples/sec Loss 0.5405 LearningRate 0.0016 Epoch: 17 Global Step: 292030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:53:24,483-Speed 5174.27 samples/sec Loss 0.5471 LearningRate 0.0016 Epoch: 17 Global Step: 292040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:53:26,462-Speed 5177.07 samples/sec Loss 0.5127 LearningRate 0.0016 Epoch: 17 Global Step: 292050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:53:28,432-Speed 5200.37 samples/sec Loss 0.5388 LearningRate 0.0016 Epoch: 17 Global Step: 292060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:53:30,398-Speed 5209.58 samples/sec Loss 0.5511 LearningRate 0.0016 Epoch: 17 Global Step: 292070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:53:32,372-Speed 5190.14 samples/sec Loss 0.5677 LearningRate 0.0016 Epoch: 17 Global Step: 292080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:53:34,351-Speed 5175.13 samples/sec Loss 0.5354 LearningRate 0.0016 Epoch: 17 Global Step: 292090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:53:36,332-Speed 5171.97 samples/sec Loss 0.5037 LearningRate 0.0016 Epoch: 17 Global Step: 292100 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:53:38,310-Speed 5178.56 samples/sec Loss 0.5084 LearningRate 0.0016 Epoch: 17 Global Step: 292110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:53:40,282-Speed 5193.43 samples/sec Loss 0.5243 LearningRate 0.0016 Epoch: 17 Global Step: 292120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:53:42,256-Speed 5191.06 samples/sec Loss 0.5472 LearningRate 0.0016 Epoch: 17 Global Step: 292130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:53:44,226-Speed 5198.17 samples/sec Loss 0.5591 LearningRate 0.0016 Epoch: 17 Global Step: 292140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:53:46,197-Speed 5197.23 samples/sec Loss 0.5330 LearningRate 0.0016 Epoch: 17 Global Step: 292150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:53:48,196-Speed 5122.78 samples/sec Loss 0.5487 LearningRate 0.0016 Epoch: 17 Global Step: 292160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:53:50,171-Speed 5188.27 samples/sec Loss 0.5287 LearningRate 0.0016 Epoch: 17 Global Step: 292170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:53:52,140-Speed 5200.13 samples/sec Loss 0.5185 LearningRate 0.0016 Epoch: 17 Global Step: 292180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:53:54,122-Speed 5168.13 samples/sec Loss 0.5283 LearningRate 0.0016 Epoch: 17 Global Step: 292190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:53:56,100-Speed 5180.00 samples/sec Loss 0.5138 LearningRate 0.0016 Epoch: 17 Global Step: 292200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:53:58,075-Speed 5186.63 samples/sec Loss 0.5123 LearningRate 0.0016 Epoch: 17 Global Step: 292210 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:54:00,071-Speed 5131.82 samples/sec Loss 0.5450 LearningRate 0.0016 Epoch: 17 Global Step: 292220 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:54:02,084-Speed 5088.29 samples/sec Loss 0.5420 LearningRate 0.0016 Epoch: 17 Global Step: 292230 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:54:04,066-Speed 5168.98 samples/sec Loss 0.5440 LearningRate 0.0016 Epoch: 17 Global Step: 292240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:54:06,055-Speed 5151.45 samples/sec Loss 0.5354 LearningRate 0.0016 Epoch: 17 Global Step: 292250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:54:08,050-Speed 5134.74 samples/sec Loss 0.5205 LearningRate 0.0015 Epoch: 17 Global Step: 292260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:54:10,046-Speed 5130.85 samples/sec Loss 0.5167 LearningRate 0.0015 Epoch: 17 Global Step: 292270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:54:12,024-Speed 5178.73 samples/sec Loss 0.5540 LearningRate 0.0015 Epoch: 17 Global Step: 292280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:54:14,002-Speed 5178.46 samples/sec Loss 0.5537 LearningRate 0.0015 Epoch: 17 Global Step: 292290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:54:15,985-Speed 5164.98 samples/sec Loss 0.5524 LearningRate 0.0015 Epoch: 17 Global Step: 292300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:54:17,970-Speed 5160.28 samples/sec Loss 0.5291 LearningRate 0.0015 Epoch: 17 Global Step: 292310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:54:19,968-Speed 5127.52 samples/sec Loss 0.5357 LearningRate 0.0015 Epoch: 17 Global Step: 292320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:54:21,967-Speed 5124.02 samples/sec Loss 0.5326 LearningRate 0.0015 Epoch: 17 Global Step: 292330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:54:23,959-Speed 5143.75 samples/sec Loss 0.5074 LearningRate 0.0015 Epoch: 17 Global Step: 292340 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:54:25,955-Speed 5132.40 samples/sec Loss 0.5183 LearningRate 0.0015 Epoch: 17 Global Step: 292350 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:54:27,952-Speed 5127.62 samples/sec Loss 0.5355 LearningRate 0.0015 Epoch: 17 Global Step: 292360 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:54:29,929-Speed 5180.72 samples/sec Loss 0.5412 LearningRate 0.0015 Epoch: 17 Global Step: 292370 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:54:31,900-Speed 5198.36 samples/sec Loss 0.5252 LearningRate 0.0015 Epoch: 17 Global Step: 292380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:54:33,879-Speed 5176.44 samples/sec Loss 0.5227 LearningRate 0.0015 Epoch: 17 Global Step: 292390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:54:35,878-Speed 5124.16 samples/sec Loss 0.5358 LearningRate 0.0015 Epoch: 17 Global Step: 292400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:54:37,867-Speed 5148.60 samples/sec Loss 0.5108 LearningRate 0.0015 Epoch: 17 Global Step: 292410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:54:39,881-Speed 5087.69 samples/sec Loss 0.5305 LearningRate 0.0015 Epoch: 17 Global Step: 292420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:54:41,853-Speed 5193.63 samples/sec Loss 0.5224 LearningRate 0.0015 Epoch: 17 Global Step: 292430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:54:43,834-Speed 5172.29 samples/sec Loss 0.5220 LearningRate 0.0015 Epoch: 17 Global Step: 292440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:54:45,808-Speed 5189.01 samples/sec Loss 0.4892 LearningRate 0.0015 Epoch: 17 Global Step: 292450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:54:47,781-Speed 5190.76 samples/sec Loss 0.5221 LearningRate 0.0015 Epoch: 17 Global Step: 292460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:54:49,790-Speed 5098.34 samples/sec Loss 0.5628 LearningRate 0.0015 Epoch: 17 Global Step: 292470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:54:51,768-Speed 5178.96 samples/sec Loss 0.5136 LearningRate 0.0015 Epoch: 17 Global Step: 292480 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:54:53,747-Speed 5175.66 samples/sec Loss 0.5408 LearningRate 0.0015 Epoch: 17 Global Step: 292490 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:54:55,726-Speed 5177.29 samples/sec Loss 0.5461 LearningRate 0.0015 Epoch: 17 Global Step: 292500 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:54:57,708-Speed 5168.58 samples/sec Loss 0.5291 LearningRate 0.0015 Epoch: 17 Global Step: 292510 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:54:59,677-Speed 5202.80 samples/sec Loss 0.5238 LearningRate 0.0015 Epoch: 17 Global Step: 292520 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:55:01,668-Speed 5144.41 samples/sec Loss 0.5095 LearningRate 0.0015 Epoch: 17 Global Step: 292530 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:55:03,659-Speed 5145.08 samples/sec Loss 0.5187 LearningRate 0.0015 Epoch: 17 Global Step: 292540 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:55:05,643-Speed 5164.04 samples/sec Loss 0.5295 LearningRate 0.0015 Epoch: 17 Global Step: 292550 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:55:07,613-Speed 5199.03 samples/sec Loss 0.5354 LearningRate 0.0015 Epoch: 17 Global Step: 292560 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:55:09,597-Speed 5162.86 samples/sec Loss 0.5509 LearningRate 0.0015 Epoch: 17 Global Step: 292570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:55:11,594-Speed 5129.65 samples/sec Loss 0.5642 LearningRate 0.0015 Epoch: 17 Global Step: 292580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:55:13,562-Speed 5203.60 samples/sec Loss 0.5332 LearningRate 0.0015 Epoch: 17 Global Step: 292590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:55:15,534-Speed 5194.55 samples/sec Loss 0.5401 LearningRate 0.0015 Epoch: 17 Global Step: 292600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:55:17,504-Speed 5201.37 samples/sec Loss 0.5359 LearningRate 0.0015 Epoch: 17 Global Step: 292610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:55:19,498-Speed 5135.79 samples/sec Loss 0.5448 LearningRate 0.0015 Epoch: 17 Global Step: 292620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:55:21,466-Speed 5204.43 samples/sec Loss 0.5536 LearningRate 0.0015 Epoch: 17 Global Step: 292630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:55:23,446-Speed 5175.06 samples/sec Loss 0.5090 LearningRate 0.0015 Epoch: 17 Global Step: 292640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:55:25,433-Speed 5153.02 samples/sec Loss 0.5299 LearningRate 0.0015 Epoch: 17 Global Step: 292650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:55:27,424-Speed 5145.95 samples/sec Loss 0.5304 LearningRate 0.0015 Epoch: 17 Global Step: 292660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:55:29,393-Speed 5202.15 samples/sec Loss 0.5294 LearningRate 0.0015 Epoch: 17 Global Step: 292670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:55:31,364-Speed 5197.46 samples/sec Loss 0.5291 LearningRate 0.0015 Epoch: 17 Global Step: 292680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:55:33,343-Speed 5177.10 samples/sec Loss 0.5288 LearningRate 0.0015 Epoch: 17 Global Step: 292690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:55:35,333-Speed 5147.63 samples/sec Loss 0.5128 LearningRate 0.0015 Epoch: 17 Global Step: 292700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:55:37,297-Speed 5214.11 samples/sec Loss 0.5452 LearningRate 0.0015 Epoch: 17 Global Step: 292710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:55:39,285-Speed 5154.20 samples/sec Loss 0.5186 LearningRate 0.0015 Epoch: 17 Global Step: 292720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:55:41,250-Speed 5211.79 samples/sec Loss 0.5421 LearningRate 0.0015 Epoch: 17 Global Step: 292730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:55:43,216-Speed 5210.93 samples/sec Loss 0.5327 LearningRate 0.0015 Epoch: 17 Global Step: 292740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:55:45,183-Speed 5207.22 samples/sec Loss 0.5427 LearningRate 0.0015 Epoch: 17 Global Step: 292750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:55:47,152-Speed 5203.15 samples/sec Loss 0.5185 LearningRate 0.0015 Epoch: 17 Global Step: 292760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:55:49,146-Speed 5135.35 samples/sec Loss 0.5182 LearningRate 0.0015 Epoch: 17 Global Step: 292770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:55:51,123-Speed 5184.18 samples/sec Loss 0.5247 LearningRate 0.0015 Epoch: 17 Global Step: 292780 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:55:53,095-Speed 5193.43 samples/sec Loss 0.4975 LearningRate 0.0015 Epoch: 17 Global Step: 292790 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:55:55,067-Speed 5195.19 samples/sec Loss 0.5359 LearningRate 0.0015 Epoch: 17 Global Step: 292800 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:55:57,045-Speed 5177.56 samples/sec Loss 0.5233 LearningRate 0.0015 Epoch: 17 Global Step: 292810 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:55:59,008-Speed 5217.14 samples/sec Loss 0.5346 LearningRate 0.0015 Epoch: 17 Global Step: 292820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:56:00,981-Speed 5193.33 samples/sec Loss 0.5163 LearningRate 0.0015 Epoch: 17 Global Step: 292830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:56:02,977-Speed 5130.67 samples/sec Loss 0.5266 LearningRate 0.0015 Epoch: 17 Global Step: 292840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:56:04,950-Speed 5193.32 samples/sec Loss 0.5563 LearningRate 0.0015 Epoch: 17 Global Step: 292850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:56:06,932-Speed 5167.86 samples/sec Loss 0.5011 LearningRate 0.0015 Epoch: 17 Global Step: 292860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:56:08,929-Speed 5129.77 samples/sec Loss 0.5191 LearningRate 0.0015 Epoch: 17 Global Step: 292870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:56:10,919-Speed 5146.79 samples/sec Loss 0.5316 LearningRate 0.0015 Epoch: 17 Global Step: 292880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:56:12,889-Speed 5201.35 samples/sec Loss 0.5365 LearningRate 0.0015 Epoch: 17 Global Step: 292890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:56:14,886-Speed 5127.16 samples/sec Loss 0.5445 LearningRate 0.0015 Epoch: 17 Global Step: 292900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:56:16,870-Speed 5165.33 samples/sec Loss 0.5189 LearningRate 0.0015 Epoch: 17 Global Step: 292910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:56:18,841-Speed 5196.81 samples/sec Loss 0.5404 LearningRate 0.0015 Epoch: 17 Global Step: 292920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:56:20,874-Speed 5039.88 samples/sec Loss 0.5039 LearningRate 0.0015 Epoch: 17 Global Step: 292930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:56:22,843-Speed 5202.26 samples/sec Loss 0.5082 LearningRate 0.0015 Epoch: 17 Global Step: 292940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:56:24,833-Speed 5146.31 samples/sec Loss 0.5298 LearningRate 0.0015 Epoch: 17 Global Step: 292950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:56:26,806-Speed 5193.36 samples/sec Loss 0.5494 LearningRate 0.0015 Epoch: 17 Global Step: 292960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:56:28,796-Speed 5145.99 samples/sec Loss 0.5586 LearningRate 0.0015 Epoch: 17 Global Step: 292970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:56:30,785-Speed 5152.53 samples/sec Loss 0.4844 LearningRate 0.0015 Epoch: 17 Global Step: 292980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:56:32,762-Speed 5180.82 samples/sec Loss 0.5270 LearningRate 0.0015 Epoch: 17 Global Step: 292990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:56:34,736-Speed 5189.37 samples/sec Loss 0.5229 LearningRate 0.0015 Epoch: 17 Global Step: 293000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:56:36,727-Speed 5143.23 samples/sec Loss 0.5169 LearningRate 0.0015 Epoch: 17 Global Step: 293010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:56:38,700-Speed 5193.04 samples/sec Loss 0.4997 LearningRate 0.0015 Epoch: 17 Global Step: 293020 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:56:40,681-Speed 5169.40 samples/sec Loss 0.5355 LearningRate 0.0015 Epoch: 17 Global Step: 293030 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:56:42,663-Speed 5171.00 samples/sec Loss 0.5191 LearningRate 0.0015 Epoch: 17 Global Step: 293040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:56:44,631-Speed 5204.20 samples/sec Loss 0.5037 LearningRate 0.0015 Epoch: 17 Global Step: 293050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:56:46,598-Speed 5208.67 samples/sec Loss 0.5305 LearningRate 0.0015 Epoch: 17 Global Step: 293060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:56:48,570-Speed 5193.04 samples/sec Loss 0.5407 LearningRate 0.0015 Epoch: 17 Global Step: 293070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:56:50,589-Speed 5074.48 samples/sec Loss 0.5048 LearningRate 0.0015 Epoch: 17 Global Step: 293080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:56:52,573-Speed 5162.71 samples/sec Loss 0.5198 LearningRate 0.0015 Epoch: 17 Global Step: 293090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:56:54,550-Speed 5180.06 samples/sec Loss 0.5506 LearningRate 0.0015 Epoch: 17 Global Step: 293100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:56:56,536-Speed 5159.14 samples/sec Loss 0.5321 LearningRate 0.0015 Epoch: 17 Global Step: 293110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:56:58,537-Speed 5118.59 samples/sec Loss 0.5307 LearningRate 0.0015 Epoch: 17 Global Step: 293120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:57:00,518-Speed 5171.27 samples/sec Loss 0.5389 LearningRate 0.0015 Epoch: 17 Global Step: 293130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:57:02,490-Speed 5193.76 samples/sec Loss 0.5134 LearningRate 0.0015 Epoch: 17 Global Step: 293140 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:57:04,468-Speed 5178.82 samples/sec Loss 0.5631 LearningRate 0.0015 Epoch: 17 Global Step: 293150 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:57:06,438-Speed 5201.07 samples/sec Loss 0.5290 LearningRate 0.0015 Epoch: 17 Global Step: 293160 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:57:08,409-Speed 5197.47 samples/sec Loss 0.5234 LearningRate 0.0015 Epoch: 17 Global Step: 293170 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:57:10,392-Speed 5164.44 samples/sec Loss 0.5227 LearningRate 0.0015 Epoch: 17 Global Step: 293180 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:57:12,361-Speed 5203.40 samples/sec Loss 0.5355 LearningRate 0.0015 Epoch: 17 Global Step: 293190 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:57:14,329-Speed 5204.74 samples/sec Loss 0.5393 LearningRate 0.0015 Epoch: 17 Global Step: 293200 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:57:16,298-Speed 5200.85 samples/sec Loss 0.5587 LearningRate 0.0015 Epoch: 17 Global Step: 293210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:57:18,269-Speed 5199.13 samples/sec Loss 0.5335 LearningRate 0.0015 Epoch: 17 Global Step: 293220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:57:20,236-Speed 5205.78 samples/sec Loss 0.4789 LearningRate 0.0015 Epoch: 17 Global Step: 293230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:57:22,222-Speed 5157.78 samples/sec Loss 0.5277 LearningRate 0.0015 Epoch: 17 Global Step: 293240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:57:24,217-Speed 5135.41 samples/sec Loss 0.5063 LearningRate 0.0015 Epoch: 17 Global Step: 293250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:57:26,216-Speed 5124.33 samples/sec Loss 0.5256 LearningRate 0.0015 Epoch: 17 Global Step: 293260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:57:28,205-Speed 5148.74 samples/sec Loss 0.5409 LearningRate 0.0015 Epoch: 17 Global Step: 293270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:57:30,176-Speed 5198.00 samples/sec Loss 0.5229 LearningRate 0.0015 Epoch: 17 Global Step: 293280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:57:32,148-Speed 5195.24 samples/sec Loss 0.5200 LearningRate 0.0015 Epoch: 17 Global Step: 293290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:57:34,137-Speed 5149.64 samples/sec Loss 0.5183 LearningRate 0.0015 Epoch: 17 Global Step: 293300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:57:36,159-Speed 5065.93 samples/sec Loss 0.5313 LearningRate 0.0015 Epoch: 17 Global Step: 293310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:57:38,148-Speed 5148.50 samples/sec Loss 0.5256 LearningRate 0.0015 Epoch: 17 Global Step: 293320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:57:40,150-Speed 5119.54 samples/sec Loss 0.5164 LearningRate 0.0015 Epoch: 17 Global Step: 293330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:57:42,134-Speed 5164.29 samples/sec Loss 0.5634 LearningRate 0.0015 Epoch: 17 Global Step: 293340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:57:44,104-Speed 5199.04 samples/sec Loss 0.5192 LearningRate 0.0015 Epoch: 17 Global Step: 293350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:57:46,114-Speed 5095.00 samples/sec Loss 0.5365 LearningRate 0.0015 Epoch: 17 Global Step: 293360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:57:48,124-Speed 5097.51 samples/sec Loss 0.5209 LearningRate 0.0015 Epoch: 17 Global Step: 293370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:57:50,131-Speed 5104.95 samples/sec Loss 0.5304 LearningRate 0.0015 Epoch: 17 Global Step: 293380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:57:52,111-Speed 5172.03 samples/sec Loss 0.5292 LearningRate 0.0015 Epoch: 17 Global Step: 293390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:57:54,082-Speed 5197.24 samples/sec Loss 0.5281 LearningRate 0.0015 Epoch: 17 Global Step: 293400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:57:56,055-Speed 5191.81 samples/sec Loss 0.5371 LearningRate 0.0015 Epoch: 17 Global Step: 293410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:57:58,044-Speed 5149.97 samples/sec Loss 0.5066 LearningRate 0.0015 Epoch: 17 Global Step: 293420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:58:00,018-Speed 5187.60 samples/sec Loss 0.5287 LearningRate 0.0015 Epoch: 17 Global Step: 293430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:01,990-Speed 5196.69 samples/sec Loss 0.5279 LearningRate 0.0015 Epoch: 17 Global Step: 293440 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:03,965-Speed 5185.71 samples/sec Loss 0.5434 LearningRate 0.0015 Epoch: 17 Global Step: 293450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:05,937-Speed 5193.84 samples/sec Loss 0.5488 LearningRate 0.0015 Epoch: 17 Global Step: 293460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:07,908-Speed 5198.38 samples/sec Loss 0.5481 LearningRate 0.0015 Epoch: 17 Global Step: 293470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:09,914-Speed 5106.72 samples/sec Loss 0.5313 LearningRate 0.0015 Epoch: 17 Global Step: 293480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:11,895-Speed 5170.77 samples/sec Loss 0.5244 LearningRate 0.0015 Epoch: 17 Global Step: 293490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:13,874-Speed 5175.58 samples/sec Loss 0.5077 LearningRate 0.0015 Epoch: 17 Global Step: 293500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:15,864-Speed 5147.37 samples/sec Loss 0.5496 LearningRate 0.0015 Epoch: 17 Global Step: 293510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:17,837-Speed 5192.25 samples/sec Loss 0.5470 LearningRate 0.0015 Epoch: 17 Global Step: 293520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:19,818-Speed 5171.48 samples/sec Loss 0.5315 LearningRate 0.0015 Epoch: 17 Global Step: 293530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:21,800-Speed 5168.40 samples/sec Loss 0.5451 LearningRate 0.0015 Epoch: 17 Global Step: 293540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:23,775-Speed 5186.56 samples/sec Loss 0.5386 LearningRate 0.0015 Epoch: 17 Global Step: 293550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:25,759-Speed 5161.03 samples/sec Loss 0.5567 LearningRate 0.0015 Epoch: 17 Global Step: 293560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:27,729-Speed 5201.26 samples/sec Loss 0.5445 LearningRate 0.0015 Epoch: 17 Global Step: 293570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:29,731-Speed 5114.40 samples/sec Loss 0.5305 LearningRate 0.0015 Epoch: 17 Global Step: 293580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:31,708-Speed 5182.43 samples/sec Loss 0.5257 LearningRate 0.0015 Epoch: 17 Global Step: 293590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:33,698-Speed 5148.60 samples/sec Loss 0.5226 LearningRate 0.0015 Epoch: 17 Global Step: 293600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:35,677-Speed 5177.43 samples/sec Loss 0.5116 LearningRate 0.0015 Epoch: 17 Global Step: 293610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:37,644-Speed 5205.98 samples/sec Loss 0.5580 LearningRate 0.0015 Epoch: 17 Global Step: 293620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:39,617-Speed 5191.55 samples/sec Loss 0.5354 LearningRate 0.0014 Epoch: 17 Global Step: 293630 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:58:41,598-Speed 5171.32 samples/sec Loss 0.5438 LearningRate 0.0014 Epoch: 17 Global Step: 293640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:43,566-Speed 5204.43 samples/sec Loss 0.5484 LearningRate 0.0014 Epoch: 17 Global Step: 293650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:45,559-Speed 5139.61 samples/sec Loss 0.5230 LearningRate 0.0014 Epoch: 17 Global Step: 293660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:47,557-Speed 5128.11 samples/sec Loss 0.5131 LearningRate 0.0014 Epoch: 17 Global Step: 293670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:49,532-Speed 5185.52 samples/sec Loss 0.5091 LearningRate 0.0014 Epoch: 17 Global Step: 293680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:51,525-Speed 5141.22 samples/sec Loss 0.5140 LearningRate 0.0014 Epoch: 17 Global Step: 293690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:53,535-Speed 5094.60 samples/sec Loss 0.5240 LearningRate 0.0014 Epoch: 17 Global Step: 293700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:55,521-Speed 5157.97 samples/sec Loss 0.5392 LearningRate 0.0014 Epoch: 17 Global Step: 293710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:57,498-Speed 5181.77 samples/sec Loss 0.5354 LearningRate 0.0014 Epoch: 17 Global Step: 293720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:58:59,470-Speed 5194.84 samples/sec Loss 0.5454 LearningRate 0.0014 Epoch: 17 Global Step: 293730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:59:01,441-Speed 5198.05 samples/sec Loss 0.5232 LearningRate 0.0014 Epoch: 17 Global Step: 293740 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:59:03,430-Speed 5150.27 samples/sec Loss 0.5336 LearningRate 0.0014 Epoch: 17 Global Step: 293750 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:59:05,426-Speed 5131.37 samples/sec Loss 0.5262 LearningRate 0.0014 Epoch: 17 Global Step: 293760 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 18:59:07,391-Speed 5211.55 samples/sec Loss 0.5126 LearningRate 0.0014 Epoch: 17 Global Step: 293770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:59:09,367-Speed 5184.58 samples/sec Loss 0.5243 LearningRate 0.0014 Epoch: 17 Global Step: 293780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:59:11,343-Speed 5183.41 samples/sec Loss 0.5701 LearningRate 0.0014 Epoch: 17 Global Step: 293790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:59:13,363-Speed 5073.58 samples/sec Loss 0.5119 LearningRate 0.0014 Epoch: 17 Global Step: 293800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:59:15,340-Speed 5181.44 samples/sec Loss 0.5312 LearningRate 0.0014 Epoch: 17 Global Step: 293810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:59:17,340-Speed 5121.00 samples/sec Loss 0.5186 LearningRate 0.0014 Epoch: 17 Global Step: 293820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:59:19,318-Speed 5177.92 samples/sec Loss 0.5386 LearningRate 0.0014 Epoch: 17 Global Step: 293830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:59:21,305-Speed 5156.63 samples/sec Loss 0.4948 LearningRate 0.0014 Epoch: 17 Global Step: 293840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:59:23,276-Speed 5197.44 samples/sec Loss 0.5324 LearningRate 0.0014 Epoch: 17 Global Step: 293850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:59:25,277-Speed 5117.96 samples/sec Loss 0.5438 LearningRate 0.0014 Epoch: 17 Global Step: 293860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:59:27,285-Speed 5102.84 samples/sec Loss 0.5408 LearningRate 0.0014 Epoch: 17 Global Step: 293870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 18:59:29,258-Speed 5190.16 samples/sec Loss 0.5316 LearningRate 0.0014 Epoch: 17 Global Step: 293880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:59:31,234-Speed 5186.94 samples/sec Loss 0.5470 LearningRate 0.0014 Epoch: 17 Global Step: 293890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:59:33,217-Speed 5164.31 samples/sec Loss 0.5347 LearningRate 0.0014 Epoch: 17 Global Step: 293900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:59:35,200-Speed 5166.08 samples/sec Loss 0.5290 LearningRate 0.0014 Epoch: 17 Global Step: 293910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:59:37,180-Speed 5173.97 samples/sec Loss 0.5255 LearningRate 0.0014 Epoch: 17 Global Step: 293920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:59:39,176-Speed 5131.64 samples/sec Loss 0.5209 LearningRate 0.0014 Epoch: 17 Global Step: 293930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:59:41,158-Speed 5167.39 samples/sec Loss 0.5196 LearningRate 0.0014 Epoch: 17 Global Step: 293940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:59:43,134-Speed 5184.43 samples/sec Loss 0.5211 LearningRate 0.0014 Epoch: 17 Global Step: 293950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:59:45,112-Speed 5178.44 samples/sec Loss 0.5158 LearningRate 0.0014 Epoch: 17 Global Step: 293960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:59:47,092-Speed 5174.48 samples/sec Loss 0.5379 LearningRate 0.0014 Epoch: 17 Global Step: 293970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:59:49,071-Speed 5175.90 samples/sec Loss 0.5306 LearningRate 0.0014 Epoch: 17 Global Step: 293980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:59:51,068-Speed 5130.06 samples/sec Loss 0.5175 LearningRate 0.0014 Epoch: 17 Global Step: 293990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 18:59:53,047-Speed 5175.68 samples/sec Loss 0.5331 LearningRate 0.0014 Epoch: 17 Global Step: 294000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:00:19,735-[lfw][294000]XNorm: 21.563823 Training: 2022-04-11 19:00:19,735-[lfw][294000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 19:00:19,736-[lfw][294000]Accuracy-Highest: 0.99833 Training: 2022-04-11 19:00:50,491-[cfp_fp][294000]XNorm: 21.862950 Training: 2022-04-11 19:00:50,492-[cfp_fp][294000]Accuracy-Flip: 0.99000+-0.00438 Training: 2022-04-11 19:00:50,492-[cfp_fp][294000]Accuracy-Highest: 0.99000 Training: 2022-04-11 19:01:16,996-[agedb_30][294000]XNorm: 22.595988 Training: 2022-04-11 19:01:16,997-[agedb_30][294000]Accuracy-Flip: 0.98383+-0.00691 Training: 2022-04-11 19:01:16,997-[agedb_30][294000]Accuracy-Highest: 0.98383 Training: 2022-04-11 19:01:18,983-Speed 119.16 samples/sec Loss 0.4948 LearningRate 0.0014 Epoch: 17 Global Step: 294010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:01:20,949-Speed 5209.29 samples/sec Loss 0.5157 LearningRate 0.0014 Epoch: 17 Global Step: 294020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:01:22,928-Speed 5177.51 samples/sec Loss 0.5395 LearningRate 0.0014 Epoch: 17 Global Step: 294030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:01:24,901-Speed 5191.67 samples/sec Loss 0.5226 LearningRate 0.0014 Epoch: 17 Global Step: 294040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:01:26,892-Speed 5144.97 samples/sec Loss 0.5294 LearningRate 0.0014 Epoch: 17 Global Step: 294050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:01:28,883-Speed 5143.88 samples/sec Loss 0.5194 LearningRate 0.0014 Epoch: 17 Global Step: 294060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:01:30,860-Speed 5181.98 samples/sec Loss 0.5431 LearningRate 0.0014 Epoch: 17 Global Step: 294070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:01:32,827-Speed 5206.28 samples/sec Loss 0.5435 LearningRate 0.0014 Epoch: 17 Global Step: 294080 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:01:34,810-Speed 5167.43 samples/sec Loss 0.5170 LearningRate 0.0014 Epoch: 17 Global Step: 294090 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:01:36,779-Speed 5201.69 samples/sec Loss 0.5185 LearningRate 0.0014 Epoch: 17 Global Step: 294100 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:01:38,743-Speed 5214.45 samples/sec Loss 0.5116 LearningRate 0.0014 Epoch: 17 Global Step: 294110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:01:40,739-Speed 5133.96 samples/sec Loss 0.5384 LearningRate 0.0014 Epoch: 17 Global Step: 294120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:01:42,712-Speed 5189.55 samples/sec Loss 0.5303 LearningRate 0.0014 Epoch: 17 Global Step: 294130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:01:44,685-Speed 5193.03 samples/sec Loss 0.5143 LearningRate 0.0014 Epoch: 17 Global Step: 294140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:01:46,657-Speed 5192.90 samples/sec Loss 0.5226 LearningRate 0.0014 Epoch: 17 Global Step: 294150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:01:48,651-Speed 5136.97 samples/sec Loss 0.5440 LearningRate 0.0014 Epoch: 17 Global Step: 294160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:01:50,632-Speed 5171.91 samples/sec Loss 0.5432 LearningRate 0.0014 Epoch: 17 Global Step: 294170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:01:52,616-Speed 5163.64 samples/sec Loss 0.5126 LearningRate 0.0014 Epoch: 17 Global Step: 294180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:01:54,619-Speed 5113.39 samples/sec Loss 0.5517 LearningRate 0.0014 Epoch: 17 Global Step: 294190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:01:56,629-Speed 5095.13 samples/sec Loss 0.5079 LearningRate 0.0014 Epoch: 17 Global Step: 294200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:01:58,624-Speed 5136.76 samples/sec Loss 0.5588 LearningRate 0.0014 Epoch: 17 Global Step: 294210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:02:00,602-Speed 5178.12 samples/sec Loss 0.5684 LearningRate 0.0014 Epoch: 17 Global Step: 294220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:02:02,585-Speed 5165.99 samples/sec Loss 0.5519 LearningRate 0.0014 Epoch: 17 Global Step: 294230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:02:04,561-Speed 5185.27 samples/sec Loss 0.5701 LearningRate 0.0014 Epoch: 17 Global Step: 294240 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:02:06,549-Speed 5150.64 samples/sec Loss 0.5301 LearningRate 0.0014 Epoch: 17 Global Step: 294250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:02:08,528-Speed 5177.57 samples/sec Loss 0.5384 LearningRate 0.0014 Epoch: 17 Global Step: 294260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:02:10,547-Speed 5073.80 samples/sec Loss 0.5540 LearningRate 0.0014 Epoch: 17 Global Step: 294270 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:02:12,544-Speed 5127.36 samples/sec Loss 0.5121 LearningRate 0.0014 Epoch: 17 Global Step: 294280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:02:14,536-Speed 5143.22 samples/sec Loss 0.5261 LearningRate 0.0014 Epoch: 17 Global Step: 294290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:02:16,523-Speed 5154.10 samples/sec Loss 0.5074 LearningRate 0.0014 Epoch: 17 Global Step: 294300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:02:18,514-Speed 5144.93 samples/sec Loss 0.5333 LearningRate 0.0014 Epoch: 17 Global Step: 294310 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:02:20,511-Speed 5129.75 samples/sec Loss 0.5161 LearningRate 0.0014 Epoch: 17 Global Step: 294320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:02:22,492-Speed 5172.26 samples/sec Loss 0.5038 LearningRate 0.0014 Epoch: 17 Global Step: 294330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:02:24,502-Speed 5096.73 samples/sec Loss 0.5371 LearningRate 0.0014 Epoch: 17 Global Step: 294340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:02:26,494-Speed 5141.97 samples/sec Loss 0.5111 LearningRate 0.0014 Epoch: 17 Global Step: 294350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:02:28,500-Speed 5104.52 samples/sec Loss 0.5341 LearningRate 0.0014 Epoch: 17 Global Step: 294360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:02:30,485-Speed 5161.74 samples/sec Loss 0.5479 LearningRate 0.0014 Epoch: 17 Global Step: 294370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:02:32,462-Speed 5180.91 samples/sec Loss 0.5365 LearningRate 0.0014 Epoch: 17 Global Step: 294380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:02:34,460-Speed 5125.80 samples/sec Loss 0.5630 LearningRate 0.0014 Epoch: 17 Global Step: 294390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:02:36,445-Speed 5161.49 samples/sec Loss 0.5400 LearningRate 0.0014 Epoch: 17 Global Step: 294400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:02:38,444-Speed 5123.26 samples/sec Loss 0.5220 LearningRate 0.0014 Epoch: 17 Global Step: 294410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:02:40,443-Speed 5125.76 samples/sec Loss 0.5246 LearningRate 0.0014 Epoch: 17 Global Step: 294420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:02:42,463-Speed 5072.51 samples/sec Loss 0.5078 LearningRate 0.0014 Epoch: 17 Global Step: 294430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:02:44,458-Speed 5134.30 samples/sec Loss 0.5170 LearningRate 0.0014 Epoch: 17 Global Step: 294440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:02:46,450-Speed 5143.55 samples/sec Loss 0.5318 LearningRate 0.0014 Epoch: 17 Global Step: 294450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:02:48,449-Speed 5124.06 samples/sec Loss 0.5356 LearningRate 0.0014 Epoch: 17 Global Step: 294460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:02:50,439-Speed 5147.94 samples/sec Loss 0.5251 LearningRate 0.0014 Epoch: 17 Global Step: 294470 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:02:52,409-Speed 5199.09 samples/sec Loss 0.5341 LearningRate 0.0014 Epoch: 17 Global Step: 294480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:02:54,394-Speed 5159.44 samples/sec Loss 0.5373 LearningRate 0.0014 Epoch: 17 Global Step: 294490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:02:56,374-Speed 5173.48 samples/sec Loss 0.5325 LearningRate 0.0014 Epoch: 17 Global Step: 294500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:02:58,350-Speed 5184.93 samples/sec Loss 0.5478 LearningRate 0.0014 Epoch: 17 Global Step: 294510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:03:00,364-Speed 5085.05 samples/sec Loss 0.5137 LearningRate 0.0014 Epoch: 17 Global Step: 294520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:03:02,343-Speed 5175.01 samples/sec Loss 0.5513 LearningRate 0.0014 Epoch: 17 Global Step: 294530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:03:04,324-Speed 5171.04 samples/sec Loss 0.5265 LearningRate 0.0014 Epoch: 17 Global Step: 294540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:03:06,297-Speed 5192.34 samples/sec Loss 0.5312 LearningRate 0.0014 Epoch: 17 Global Step: 294550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:03:08,279-Speed 5169.41 samples/sec Loss 0.4943 LearningRate 0.0014 Epoch: 17 Global Step: 294560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:03:10,289-Speed 5102.55 samples/sec Loss 0.5346 LearningRate 0.0014 Epoch: 17 Global Step: 294570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:03:12,288-Speed 5124.26 samples/sec Loss 0.5268 LearningRate 0.0014 Epoch: 17 Global Step: 294580 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:03:14,269-Speed 5170.91 samples/sec Loss 0.5295 LearningRate 0.0014 Epoch: 17 Global Step: 294590 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:03:16,241-Speed 5194.34 samples/sec Loss 0.5376 LearningRate 0.0014 Epoch: 17 Global Step: 294600 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:03:18,217-Speed 5185.24 samples/sec Loss 0.5240 LearningRate 0.0014 Epoch: 17 Global Step: 294610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:03:20,210-Speed 5139.29 samples/sec Loss 0.5426 LearningRate 0.0014 Epoch: 17 Global Step: 294620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:03:22,190-Speed 5174.26 samples/sec Loss 0.5203 LearningRate 0.0014 Epoch: 17 Global Step: 294630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:03:24,181-Speed 5144.68 samples/sec Loss 0.5060 LearningRate 0.0014 Epoch: 17 Global Step: 294640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:03:26,189-Speed 5100.83 samples/sec Loss 0.5233 LearningRate 0.0014 Epoch: 17 Global Step: 294650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:03:28,184-Speed 5135.80 samples/sec Loss 0.5258 LearningRate 0.0014 Epoch: 17 Global Step: 294660 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:03:30,151-Speed 5205.86 samples/sec Loss 0.5219 LearningRate 0.0014 Epoch: 17 Global Step: 294670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:03:32,116-Speed 5215.20 samples/sec Loss 0.5269 LearningRate 0.0014 Epoch: 17 Global Step: 294680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:03:34,082-Speed 5210.83 samples/sec Loss 0.5305 LearningRate 0.0014 Epoch: 17 Global Step: 294690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:03:36,052-Speed 5199.63 samples/sec Loss 0.5080 LearningRate 0.0014 Epoch: 17 Global Step: 294700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:03:38,028-Speed 5183.73 samples/sec Loss 0.5442 LearningRate 0.0014 Epoch: 17 Global Step: 294710 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:03:40,003-Speed 5186.13 samples/sec Loss 0.5126 LearningRate 0.0014 Epoch: 17 Global Step: 294720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:03:41,970-Speed 5206.13 samples/sec Loss 0.5159 LearningRate 0.0014 Epoch: 17 Global Step: 294730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:03:43,962-Speed 5142.92 samples/sec Loss 0.5388 LearningRate 0.0014 Epoch: 17 Global Step: 294740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:03:45,929-Speed 5208.30 samples/sec Loss 0.5475 LearningRate 0.0014 Epoch: 17 Global Step: 294750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:03:47,921-Speed 5141.34 samples/sec Loss 0.5329 LearningRate 0.0014 Epoch: 17 Global Step: 294760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:03:49,897-Speed 5184.35 samples/sec Loss 0.5516 LearningRate 0.0014 Epoch: 17 Global Step: 294770 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:03:51,861-Speed 5216.64 samples/sec Loss 0.5215 LearningRate 0.0014 Epoch: 17 Global Step: 294780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:03:53,827-Speed 5208.90 samples/sec Loss 0.5491 LearningRate 0.0014 Epoch: 17 Global Step: 294790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:03:55,807-Speed 5173.98 samples/sec Loss 0.5401 LearningRate 0.0014 Epoch: 17 Global Step: 294800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:03:57,789-Speed 5169.81 samples/sec Loss 0.5198 LearningRate 0.0014 Epoch: 17 Global Step: 294810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:03:59,763-Speed 5188.64 samples/sec Loss 0.5218 LearningRate 0.0014 Epoch: 17 Global Step: 294820 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:04:01,742-Speed 5175.17 samples/sec Loss 0.5226 LearningRate 0.0014 Epoch: 17 Global Step: 294830 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:04:03,707-Speed 5212.16 samples/sec Loss 0.5052 LearningRate 0.0014 Epoch: 17 Global Step: 294840 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:04:05,672-Speed 5213.55 samples/sec Loss 0.5372 LearningRate 0.0014 Epoch: 17 Global Step: 294850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:04:07,653-Speed 5169.90 samples/sec Loss 0.5397 LearningRate 0.0014 Epoch: 17 Global Step: 294860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:04:09,639-Speed 5157.97 samples/sec Loss 0.5347 LearningRate 0.0014 Epoch: 17 Global Step: 294870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:04:11,608-Speed 5202.40 samples/sec Loss 0.5409 LearningRate 0.0014 Epoch: 17 Global Step: 294880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:04:13,584-Speed 5184.99 samples/sec Loss 0.5283 LearningRate 0.0014 Epoch: 17 Global Step: 294890 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:04:15,550-Speed 5211.52 samples/sec Loss 0.4998 LearningRate 0.0014 Epoch: 17 Global Step: 294900 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:04:17,522-Speed 5193.55 samples/sec Loss 0.5328 LearningRate 0.0014 Epoch: 17 Global Step: 294910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:04:19,490-Speed 5206.34 samples/sec Loss 0.5407 LearningRate 0.0014 Epoch: 17 Global Step: 294920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:04:21,464-Speed 5188.06 samples/sec Loss 0.5194 LearningRate 0.0014 Epoch: 17 Global Step: 294930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:04:23,462-Speed 5125.59 samples/sec Loss 0.5212 LearningRate 0.0014 Epoch: 17 Global Step: 294940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:04:25,482-Speed 5070.90 samples/sec Loss 0.5575 LearningRate 0.0014 Epoch: 17 Global Step: 294950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:04:27,454-Speed 5197.11 samples/sec Loss 0.5330 LearningRate 0.0014 Epoch: 17 Global Step: 294960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:04:29,428-Speed 5186.62 samples/sec Loss 0.5103 LearningRate 0.0014 Epoch: 17 Global Step: 294970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:04:31,407-Speed 5177.06 samples/sec Loss 0.5241 LearningRate 0.0014 Epoch: 17 Global Step: 294980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:04:33,392-Speed 5161.72 samples/sec Loss 0.5058 LearningRate 0.0014 Epoch: 17 Global Step: 294990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:04:35,371-Speed 5175.99 samples/sec Loss 0.5150 LearningRate 0.0014 Epoch: 17 Global Step: 295000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:04:37,341-Speed 5201.31 samples/sec Loss 0.5218 LearningRate 0.0014 Epoch: 17 Global Step: 295010 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:04:39,336-Speed 5131.81 samples/sec Loss 0.5533 LearningRate 0.0014 Epoch: 17 Global Step: 295020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:04:41,320-Speed 5163.41 samples/sec Loss 0.5280 LearningRate 0.0014 Epoch: 17 Global Step: 295030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:04:43,287-Speed 5209.02 samples/sec Loss 0.5333 LearningRate 0.0013 Epoch: 17 Global Step: 295040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:04:45,258-Speed 5195.44 samples/sec Loss 0.5498 LearningRate 0.0013 Epoch: 17 Global Step: 295050 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:04:47,260-Speed 5117.26 samples/sec Loss 0.5043 LearningRate 0.0013 Epoch: 17 Global Step: 295060 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:04:49,247-Speed 5157.61 samples/sec Loss 0.5419 LearningRate 0.0013 Epoch: 17 Global Step: 295070 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:04:51,213-Speed 5210.55 samples/sec Loss 0.5233 LearningRate 0.0013 Epoch: 17 Global Step: 295080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:04:53,206-Speed 5139.04 samples/sec Loss 0.5452 LearningRate 0.0013 Epoch: 17 Global Step: 295090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:04:55,193-Speed 5153.64 samples/sec Loss 0.5678 LearningRate 0.0013 Epoch: 17 Global Step: 295100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:04:57,166-Speed 5192.09 samples/sec Loss 0.5271 LearningRate 0.0013 Epoch: 17 Global Step: 295110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:04:59,145-Speed 5176.29 samples/sec Loss 0.5053 LearningRate 0.0013 Epoch: 17 Global Step: 295120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:05:01,131-Speed 5159.17 samples/sec Loss 0.5236 LearningRate 0.0013 Epoch: 17 Global Step: 295130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:05:03,111-Speed 5174.05 samples/sec Loss 0.5174 LearningRate 0.0013 Epoch: 17 Global Step: 295140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:05:05,099-Speed 5153.30 samples/sec Loss 0.5257 LearningRate 0.0013 Epoch: 17 Global Step: 295150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:05:07,068-Speed 5200.93 samples/sec Loss 0.5191 LearningRate 0.0013 Epoch: 17 Global Step: 295160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:05:09,036-Speed 5206.69 samples/sec Loss 0.5269 LearningRate 0.0013 Epoch: 17 Global Step: 295170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:05:11,007-Speed 5197.39 samples/sec Loss 0.5190 LearningRate 0.0013 Epoch: 17 Global Step: 295180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:05:12,992-Speed 5160.39 samples/sec Loss 0.5218 LearningRate 0.0013 Epoch: 17 Global Step: 295190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:05:14,977-Speed 5158.88 samples/sec Loss 0.5397 LearningRate 0.0013 Epoch: 17 Global Step: 295200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:05:16,970-Speed 5140.12 samples/sec Loss 0.5261 LearningRate 0.0013 Epoch: 17 Global Step: 295210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:05:18,944-Speed 5190.95 samples/sec Loss 0.5277 LearningRate 0.0013 Epoch: 17 Global Step: 295220 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:05:20,923-Speed 5175.94 samples/sec Loss 0.5029 LearningRate 0.0013 Epoch: 17 Global Step: 295230 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:05:22,885-Speed 5220.02 samples/sec Loss 0.5226 LearningRate 0.0013 Epoch: 17 Global Step: 295240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 19:05:24,892-Speed 5103.02 samples/sec Loss 0.5344 LearningRate 0.0013 Epoch: 17 Global Step: 295250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 19:05:26,859-Speed 5207.84 samples/sec Loss 0.5333 LearningRate 0.0013 Epoch: 17 Global Step: 295260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 19:05:28,833-Speed 5189.02 samples/sec Loss 0.5335 LearningRate 0.0013 Epoch: 17 Global Step: 295270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 19:05:30,805-Speed 5195.30 samples/sec Loss 0.5341 LearningRate 0.0013 Epoch: 17 Global Step: 295280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 19:05:32,783-Speed 5178.96 samples/sec Loss 0.5207 LearningRate 0.0013 Epoch: 17 Global Step: 295290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 19:05:34,752-Speed 5201.68 samples/sec Loss 0.5282 LearningRate 0.0013 Epoch: 17 Global Step: 295300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 19:05:36,724-Speed 5195.88 samples/sec Loss 0.5577 LearningRate 0.0013 Epoch: 17 Global Step: 295310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 19:05:38,698-Speed 5188.01 samples/sec Loss 0.5297 LearningRate 0.0013 Epoch: 17 Global Step: 295320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 19:05:40,671-Speed 5192.21 samples/sec Loss 0.5270 LearningRate 0.0013 Epoch: 17 Global Step: 295330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-11 19:05:42,639-Speed 5204.99 samples/sec Loss 0.5516 LearningRate 0.0013 Epoch: 17 Global Step: 295340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:05:44,608-Speed 5203.28 samples/sec Loss 0.5163 LearningRate 0.0013 Epoch: 17 Global Step: 295350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:05:46,602-Speed 5137.70 samples/sec Loss 0.5126 LearningRate 0.0013 Epoch: 17 Global Step: 295360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:05:48,585-Speed 5165.87 samples/sec Loss 0.5569 LearningRate 0.0013 Epoch: 17 Global Step: 295370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:05:50,582-Speed 5127.60 samples/sec Loss 0.5305 LearningRate 0.0013 Epoch: 17 Global Step: 295380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:05:52,560-Speed 5180.36 samples/sec Loss 0.5209 LearningRate 0.0013 Epoch: 17 Global Step: 295390 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:05:54,539-Speed 5175.58 samples/sec Loss 0.5164 LearningRate 0.0013 Epoch: 17 Global Step: 295400 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:05:56,523-Speed 5162.98 samples/sec Loss 0.5601 LearningRate 0.0013 Epoch: 17 Global Step: 295410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:05:58,496-Speed 5191.16 samples/sec Loss 0.5584 LearningRate 0.0013 Epoch: 17 Global Step: 295420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:06:00,467-Speed 5198.72 samples/sec Loss 0.5024 LearningRate 0.0013 Epoch: 17 Global Step: 295430 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:06:02,446-Speed 5175.85 samples/sec Loss 0.5264 LearningRate 0.0013 Epoch: 17 Global Step: 295440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:06:04,421-Speed 5185.06 samples/sec Loss 0.5079 LearningRate 0.0013 Epoch: 17 Global Step: 295450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:06:06,413-Speed 5142.93 samples/sec Loss 0.5305 LearningRate 0.0013 Epoch: 17 Global Step: 295460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:06:08,380-Speed 5206.36 samples/sec Loss 0.5387 LearningRate 0.0013 Epoch: 17 Global Step: 295470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:06:10,354-Speed 5191.73 samples/sec Loss 0.5093 LearningRate 0.0013 Epoch: 17 Global Step: 295480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:06:12,342-Speed 5150.57 samples/sec Loss 0.5373 LearningRate 0.0013 Epoch: 17 Global Step: 295490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:06:14,329-Speed 5155.68 samples/sec Loss 0.5150 LearningRate 0.0013 Epoch: 17 Global Step: 295500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:06:16,322-Speed 5140.44 samples/sec Loss 0.5204 LearningRate 0.0013 Epoch: 17 Global Step: 295510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:06:18,291-Speed 5201.96 samples/sec Loss 0.4897 LearningRate 0.0013 Epoch: 17 Global Step: 295520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:06:20,274-Speed 5165.87 samples/sec Loss 0.5456 LearningRate 0.0013 Epoch: 17 Global Step: 295530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:06:22,258-Speed 5162.76 samples/sec Loss 0.5445 LearningRate 0.0013 Epoch: 17 Global Step: 295540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:06:24,246-Speed 5155.02 samples/sec Loss 0.5037 LearningRate 0.0013 Epoch: 17 Global Step: 295550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:06:26,211-Speed 5210.80 samples/sec Loss 0.5121 LearningRate 0.0013 Epoch: 17 Global Step: 295560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:06:28,186-Speed 5186.79 samples/sec Loss 0.5319 LearningRate 0.0013 Epoch: 17 Global Step: 295570 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:06:30,157-Speed 5198.02 samples/sec Loss 0.5289 LearningRate 0.0013 Epoch: 17 Global Step: 295580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:06:32,138-Speed 5171.47 samples/sec Loss 0.5293 LearningRate 0.0013 Epoch: 17 Global Step: 295590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:06:34,125-Speed 5154.41 samples/sec Loss 0.5145 LearningRate 0.0013 Epoch: 17 Global Step: 295600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:06:36,130-Speed 5107.73 samples/sec Loss 0.5401 LearningRate 0.0013 Epoch: 17 Global Step: 295610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:06:38,116-Speed 5158.65 samples/sec Loss 0.5498 LearningRate 0.0013 Epoch: 17 Global Step: 295620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:06:40,095-Speed 5177.25 samples/sec Loss 0.5468 LearningRate 0.0013 Epoch: 17 Global Step: 295630 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:06:42,063-Speed 5206.18 samples/sec Loss 0.5494 LearningRate 0.0013 Epoch: 17 Global Step: 295640 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:06:44,034-Speed 5196.99 samples/sec Loss 0.5257 LearningRate 0.0013 Epoch: 17 Global Step: 295650 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:06:46,017-Speed 5164.34 samples/sec Loss 0.5217 LearningRate 0.0013 Epoch: 17 Global Step: 295660 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:06:47,983-Speed 5213.94 samples/sec Loss 0.5134 LearningRate 0.0013 Epoch: 17 Global Step: 295670 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:06:49,968-Speed 5158.42 samples/sec Loss 0.5327 LearningRate 0.0013 Epoch: 17 Global Step: 295680 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:06:51,944-Speed 5184.41 samples/sec Loss 0.5716 LearningRate 0.0013 Epoch: 17 Global Step: 295690 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:06:53,913-Speed 5201.85 samples/sec Loss 0.5306 LearningRate 0.0013 Epoch: 17 Global Step: 295700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:06:55,882-Speed 5203.55 samples/sec Loss 0.5199 LearningRate 0.0013 Epoch: 17 Global Step: 295710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:06:57,855-Speed 5190.97 samples/sec Loss 0.5107 LearningRate 0.0013 Epoch: 17 Global Step: 295720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:06:59,869-Speed 5086.77 samples/sec Loss 0.5418 LearningRate 0.0013 Epoch: 17 Global Step: 295730 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:07:01,863-Speed 5135.42 samples/sec Loss 0.5281 LearningRate 0.0013 Epoch: 17 Global Step: 295740 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:07:03,838-Speed 5188.08 samples/sec Loss 0.5232 LearningRate 0.0013 Epoch: 17 Global Step: 295750 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:07:05,832-Speed 5136.61 samples/sec Loss 0.5290 LearningRate 0.0013 Epoch: 17 Global Step: 295760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:07:07,814-Speed 5170.18 samples/sec Loss 0.5216 LearningRate 0.0013 Epoch: 17 Global Step: 295770 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:07:09,803-Speed 5149.51 samples/sec Loss 0.5421 LearningRate 0.0013 Epoch: 17 Global Step: 295780 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:07:11,786-Speed 5165.60 samples/sec Loss 0.5247 LearningRate 0.0013 Epoch: 17 Global Step: 295790 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:07:13,763-Speed 5180.46 samples/sec Loss 0.5426 LearningRate 0.0013 Epoch: 17 Global Step: 295800 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:07:15,739-Speed 5183.86 samples/sec Loss 0.5151 LearningRate 0.0013 Epoch: 17 Global Step: 295810 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:07:17,711-Speed 5194.20 samples/sec Loss 0.5238 LearningRate 0.0013 Epoch: 17 Global Step: 295820 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:07:19,681-Speed 5200.29 samples/sec Loss 0.5230 LearningRate 0.0013 Epoch: 17 Global Step: 295830 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:07:21,707-Speed 5056.27 samples/sec Loss 0.5359 LearningRate 0.0013 Epoch: 17 Global Step: 295840 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:07:23,683-Speed 5185.20 samples/sec Loss 0.5341 LearningRate 0.0013 Epoch: 17 Global Step: 295850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:07:25,654-Speed 5196.22 samples/sec Loss 0.5244 LearningRate 0.0013 Epoch: 17 Global Step: 295860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:07:27,629-Speed 5187.72 samples/sec Loss 0.5275 LearningRate 0.0013 Epoch: 17 Global Step: 295870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:07:29,616-Speed 5155.51 samples/sec Loss 0.5319 LearningRate 0.0013 Epoch: 17 Global Step: 295880 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:07:31,593-Speed 5181.87 samples/sec Loss 0.5182 LearningRate 0.0013 Epoch: 17 Global Step: 295890 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:07:33,579-Speed 5156.15 samples/sec Loss 0.5787 LearningRate 0.0013 Epoch: 17 Global Step: 295900 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:07:35,571-Speed 5143.93 samples/sec Loss 0.5026 LearningRate 0.0013 Epoch: 17 Global Step: 295910 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:07:37,577-Speed 5105.91 samples/sec Loss 0.5309 LearningRate 0.0013 Epoch: 17 Global Step: 295920 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:07:39,589-Speed 5090.22 samples/sec Loss 0.5154 LearningRate 0.0013 Epoch: 17 Global Step: 295930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:07:41,557-Speed 5207.11 samples/sec Loss 0.5375 LearningRate 0.0013 Epoch: 17 Global Step: 295940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:07:43,525-Speed 5204.25 samples/sec Loss 0.5432 LearningRate 0.0013 Epoch: 17 Global Step: 295950 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:07:45,513-Speed 5154.09 samples/sec Loss 0.5242 LearningRate 0.0013 Epoch: 17 Global Step: 295960 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:07:47,491-Speed 5177.46 samples/sec Loss 0.5153 LearningRate 0.0013 Epoch: 17 Global Step: 295970 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:07:49,462-Speed 5197.00 samples/sec Loss 0.5295 LearningRate 0.0013 Epoch: 17 Global Step: 295980 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:07:51,444-Speed 5167.97 samples/sec Loss 0.5202 LearningRate 0.0013 Epoch: 17 Global Step: 295990 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:07:53,429-Speed 5162.09 samples/sec Loss 0.5616 LearningRate 0.0013 Epoch: 17 Global Step: 296000 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:08:20,247-[lfw][296000]XNorm: 21.703069 Training: 2022-04-11 19:08:20,248-[lfw][296000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 19:08:20,248-[lfw][296000]Accuracy-Highest: 0.99833 Training: 2022-04-11 19:08:51,324-[cfp_fp][296000]XNorm: 21.958289 Training: 2022-04-11 19:08:51,324-[cfp_fp][296000]Accuracy-Flip: 0.98900+-0.00419 Training: 2022-04-11 19:08:51,325-[cfp_fp][296000]Accuracy-Highest: 0.99000 Training: 2022-04-11 19:09:18,164-[agedb_30][296000]XNorm: 22.742175 Training: 2022-04-11 19:09:18,164-[agedb_30][296000]Accuracy-Flip: 0.98283+-0.00715 Training: 2022-04-11 19:09:18,165-[agedb_30][296000]Accuracy-Highest: 0.98383 Training: 2022-04-11 19:09:20,155-Speed 118.07 samples/sec Loss 0.5408 LearningRate 0.0013 Epoch: 17 Global Step: 296010 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:09:22,131-Speed 5185.07 samples/sec Loss 0.5517 LearningRate 0.0013 Epoch: 17 Global Step: 296020 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:09:24,115-Speed 5162.63 samples/sec Loss 0.5266 LearningRate 0.0013 Epoch: 17 Global Step: 296030 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:09:26,095-Speed 5171.90 samples/sec Loss 0.5357 LearningRate 0.0013 Epoch: 17 Global Step: 296040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:09:28,063-Speed 5216.66 samples/sec Loss 0.5246 LearningRate 0.0013 Epoch: 17 Global Step: 296050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:09:30,025-Speed 5218.67 samples/sec Loss 0.5147 LearningRate 0.0013 Epoch: 17 Global Step: 296060 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:09:31,990-Speed 5213.43 samples/sec Loss 0.5008 LearningRate 0.0013 Epoch: 17 Global Step: 296070 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:09:33,970-Speed 5174.32 samples/sec Loss 0.5143 LearningRate 0.0013 Epoch: 17 Global Step: 296080 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:09:35,935-Speed 5212.92 samples/sec Loss 0.5098 LearningRate 0.0013 Epoch: 17 Global Step: 296090 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:09:37,908-Speed 5191.58 samples/sec Loss 0.5518 LearningRate 0.0013 Epoch: 17 Global Step: 296100 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:09:39,896-Speed 5153.28 samples/sec Loss 0.5115 LearningRate 0.0013 Epoch: 17 Global Step: 296110 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:09:41,863-Speed 5206.79 samples/sec Loss 0.5613 LearningRate 0.0013 Epoch: 17 Global Step: 296120 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:09:43,846-Speed 5164.55 samples/sec Loss 0.5229 LearningRate 0.0013 Epoch: 17 Global Step: 296130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:09:45,818-Speed 5196.32 samples/sec Loss 0.5195 LearningRate 0.0013 Epoch: 17 Global Step: 296140 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:09:47,788-Speed 5198.32 samples/sec Loss 0.5512 LearningRate 0.0013 Epoch: 17 Global Step: 296150 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:09:49,757-Speed 5201.42 samples/sec Loss 0.5111 LearningRate 0.0013 Epoch: 17 Global Step: 296160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:09:51,760-Speed 5113.42 samples/sec Loss 0.5380 LearningRate 0.0013 Epoch: 17 Global Step: 296170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:09:53,749-Speed 5151.73 samples/sec Loss 0.4976 LearningRate 0.0013 Epoch: 17 Global Step: 296180 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:09:55,715-Speed 5211.00 samples/sec Loss 0.5097 LearningRate 0.0013 Epoch: 17 Global Step: 296190 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:09:57,701-Speed 5158.48 samples/sec Loss 0.5180 LearningRate 0.0013 Epoch: 17 Global Step: 296200 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:09:59,698-Speed 5128.75 samples/sec Loss 0.5173 LearningRate 0.0013 Epoch: 17 Global Step: 296210 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:10:01,678-Speed 5172.95 samples/sec Loss 0.5113 LearningRate 0.0013 Epoch: 17 Global Step: 296220 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:10:03,661-Speed 5167.18 samples/sec Loss 0.5223 LearningRate 0.0013 Epoch: 17 Global Step: 296230 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:10:05,631-Speed 5199.65 samples/sec Loss 0.5299 LearningRate 0.0013 Epoch: 17 Global Step: 296240 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:10:07,605-Speed 5187.25 samples/sec Loss 0.5296 LearningRate 0.0013 Epoch: 17 Global Step: 296250 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:10:09,592-Speed 5155.13 samples/sec Loss 0.5371 LearningRate 0.0013 Epoch: 17 Global Step: 296260 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:10:11,598-Speed 5107.41 samples/sec Loss 0.5506 LearningRate 0.0013 Epoch: 17 Global Step: 296270 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:10:13,596-Speed 5126.53 samples/sec Loss 0.5328 LearningRate 0.0013 Epoch: 17 Global Step: 296280 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:10:15,589-Speed 5141.08 samples/sec Loss 0.5496 LearningRate 0.0013 Epoch: 17 Global Step: 296290 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:10:17,576-Speed 5154.91 samples/sec Loss 0.5441 LearningRate 0.0013 Epoch: 17 Global Step: 296300 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:10:19,546-Speed 5200.28 samples/sec Loss 0.5386 LearningRate 0.0013 Epoch: 17 Global Step: 296310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:10:21,529-Speed 5163.62 samples/sec Loss 0.5409 LearningRate 0.0013 Epoch: 17 Global Step: 296320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:10:23,513-Speed 5164.99 samples/sec Loss 0.5164 LearningRate 0.0013 Epoch: 17 Global Step: 296330 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:10:25,483-Speed 5199.79 samples/sec Loss 0.5219 LearningRate 0.0013 Epoch: 17 Global Step: 296340 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:10:27,453-Speed 5197.33 samples/sec Loss 0.5338 LearningRate 0.0013 Epoch: 17 Global Step: 296350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:10:29,420-Speed 5207.24 samples/sec Loss 0.5216 LearningRate 0.0013 Epoch: 17 Global Step: 296360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:10:31,405-Speed 5162.98 samples/sec Loss 0.5214 LearningRate 0.0013 Epoch: 17 Global Step: 296370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:10:33,409-Speed 5111.15 samples/sec Loss 0.5137 LearningRate 0.0013 Epoch: 17 Global Step: 296380 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:10:35,377-Speed 5202.69 samples/sec Loss 0.5515 LearningRate 0.0013 Epoch: 17 Global Step: 296390 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:10:37,428-Speed 4994.63 samples/sec Loss 0.5334 LearningRate 0.0013 Epoch: 17 Global Step: 296400 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:10:39,443-Speed 5085.90 samples/sec Loss 0.5303 LearningRate 0.0013 Epoch: 17 Global Step: 296410 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:10:41,422-Speed 5175.23 samples/sec Loss 0.5258 LearningRate 0.0013 Epoch: 17 Global Step: 296420 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:10:43,393-Speed 5197.11 samples/sec Loss 0.5073 LearningRate 0.0013 Epoch: 17 Global Step: 296430 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:10:45,367-Speed 5187.95 samples/sec Loss 0.5306 LearningRate 0.0013 Epoch: 17 Global Step: 296440 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:10:47,339-Speed 5194.19 samples/sec Loss 0.5397 LearningRate 0.0013 Epoch: 17 Global Step: 296450 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:10:49,330-Speed 5146.67 samples/sec Loss 0.5240 LearningRate 0.0013 Epoch: 17 Global Step: 296460 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:10:51,326-Speed 5132.02 samples/sec Loss 0.5381 LearningRate 0.0013 Epoch: 17 Global Step: 296470 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:10:53,308-Speed 5166.88 samples/sec Loss 0.5333 LearningRate 0.0013 Epoch: 17 Global Step: 296480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:10:55,281-Speed 5192.63 samples/sec Loss 0.5566 LearningRate 0.0013 Epoch: 17 Global Step: 296490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:10:57,253-Speed 5193.86 samples/sec Loss 0.5141 LearningRate 0.0012 Epoch: 17 Global Step: 296500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:10:59,223-Speed 5199.30 samples/sec Loss 0.5349 LearningRate 0.0012 Epoch: 17 Global Step: 296510 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:11:01,212-Speed 5152.12 samples/sec Loss 0.5180 LearningRate 0.0012 Epoch: 17 Global Step: 296520 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:11:03,198-Speed 5157.02 samples/sec Loss 0.5433 LearningRate 0.0012 Epoch: 17 Global Step: 296530 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:11:05,174-Speed 5184.63 samples/sec Loss 0.5162 LearningRate 0.0012 Epoch: 17 Global Step: 296540 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:11:07,161-Speed 5153.64 samples/sec Loss 0.5466 LearningRate 0.0012 Epoch: 17 Global Step: 296550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:11:09,133-Speed 5194.97 samples/sec Loss 0.5140 LearningRate 0.0012 Epoch: 17 Global Step: 296560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:11:11,160-Speed 5053.41 samples/sec Loss 0.5064 LearningRate 0.0012 Epoch: 17 Global Step: 296570 Fp16 Grad Scale: 131072 Required: 3 hours Training: 2022-04-11 19:11:13,153-Speed 5140.20 samples/sec Loss 0.5333 LearningRate 0.0012 Epoch: 17 Global Step: 296580 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:11:15,152-Speed 5125.35 samples/sec Loss 0.5316 LearningRate 0.0012 Epoch: 17 Global Step: 296590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:11:17,127-Speed 5185.82 samples/sec Loss 0.5262 LearningRate 0.0012 Epoch: 17 Global Step: 296600 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:11:19,118-Speed 5145.18 samples/sec Loss 0.5112 LearningRate 0.0012 Epoch: 17 Global Step: 296610 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:11:21,090-Speed 5193.99 samples/sec Loss 0.5412 LearningRate 0.0012 Epoch: 17 Global Step: 296620 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-11 19:11:23,070-Speed 5172.81 samples/sec Loss 0.5536 LearningRate 0.0012 Epoch: 17 Global Step: 296630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:11:25,046-Speed 5185.03 samples/sec Loss 0.5483 LearningRate 0.0012 Epoch: 17 Global Step: 296640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:11:27,021-Speed 5185.70 samples/sec Loss 0.5556 LearningRate 0.0012 Epoch: 17 Global Step: 296650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:11:28,992-Speed 5197.09 samples/sec Loss 0.5454 LearningRate 0.0012 Epoch: 17 Global Step: 296660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:11:30,968-Speed 5183.99 samples/sec Loss 0.5181 LearningRate 0.0012 Epoch: 17 Global Step: 296670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:11:32,946-Speed 5178.79 samples/sec Loss 0.5288 LearningRate 0.0012 Epoch: 17 Global Step: 296680 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:11:34,923-Speed 5180.67 samples/sec Loss 0.5089 LearningRate 0.0012 Epoch: 17 Global Step: 296690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:11:36,921-Speed 5128.58 samples/sec Loss 0.5466 LearningRate 0.0012 Epoch: 17 Global Step: 296700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:11:38,905-Speed 5162.45 samples/sec Loss 0.5611 LearningRate 0.0012 Epoch: 17 Global Step: 296710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:11:40,878-Speed 5189.87 samples/sec Loss 0.5356 LearningRate 0.0012 Epoch: 17 Global Step: 296720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:11:42,854-Speed 5186.06 samples/sec Loss 0.5459 LearningRate 0.0012 Epoch: 17 Global Step: 296730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:11:44,828-Speed 5188.52 samples/sec Loss 0.5275 LearningRate 0.0012 Epoch: 17 Global Step: 296740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:11:46,801-Speed 5193.42 samples/sec Loss 0.5418 LearningRate 0.0012 Epoch: 17 Global Step: 296750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:11:48,785-Speed 5161.78 samples/sec Loss 0.5317 LearningRate 0.0012 Epoch: 17 Global Step: 296760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:11:50,759-Speed 5190.02 samples/sec Loss 0.5488 LearningRate 0.0012 Epoch: 17 Global Step: 296770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:11:52,735-Speed 5183.54 samples/sec Loss 0.5477 LearningRate 0.0012 Epoch: 17 Global Step: 296780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:11:54,706-Speed 5196.51 samples/sec Loss 0.5141 LearningRate 0.0012 Epoch: 17 Global Step: 296790 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:11:56,684-Speed 5179.19 samples/sec Loss 0.5597 LearningRate 0.0012 Epoch: 17 Global Step: 296800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:11:58,659-Speed 5185.73 samples/sec Loss 0.5347 LearningRate 0.0012 Epoch: 17 Global Step: 296810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:12:00,631-Speed 5194.15 samples/sec Loss 0.5267 LearningRate 0.0012 Epoch: 17 Global Step: 296820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:12:02,640-Speed 5100.93 samples/sec Loss 0.5177 LearningRate 0.0012 Epoch: 17 Global Step: 296830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:12:04,613-Speed 5191.34 samples/sec Loss 0.5381 LearningRate 0.0012 Epoch: 17 Global Step: 296840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:12:06,587-Speed 5189.41 samples/sec Loss 0.4987 LearningRate 0.0012 Epoch: 17 Global Step: 296850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:12:08,565-Speed 5178.15 samples/sec Loss 0.5253 LearningRate 0.0012 Epoch: 17 Global Step: 296860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:12:10,550-Speed 5161.57 samples/sec Loss 0.5220 LearningRate 0.0012 Epoch: 17 Global Step: 296870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:12:12,561-Speed 5092.38 samples/sec Loss 0.5455 LearningRate 0.0012 Epoch: 17 Global Step: 296880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:12:14,537-Speed 5183.47 samples/sec Loss 0.5403 LearningRate 0.0012 Epoch: 17 Global Step: 296890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:12:16,527-Speed 5148.48 samples/sec Loss 0.5438 LearningRate 0.0012 Epoch: 17 Global Step: 296900 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:12:18,500-Speed 5191.63 samples/sec Loss 0.5477 LearningRate 0.0012 Epoch: 17 Global Step: 296910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:12:20,484-Speed 5161.28 samples/sec Loss 0.5382 LearningRate 0.0012 Epoch: 17 Global Step: 296920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:12:22,459-Speed 5187.91 samples/sec Loss 0.5253 LearningRate 0.0012 Epoch: 17 Global Step: 296930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:12:24,445-Speed 5159.04 samples/sec Loss 0.5119 LearningRate 0.0012 Epoch: 17 Global Step: 296940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:12:26,425-Speed 5170.92 samples/sec Loss 0.5405 LearningRate 0.0012 Epoch: 17 Global Step: 296950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:12:28,406-Speed 5170.77 samples/sec Loss 0.5184 LearningRate 0.0012 Epoch: 17 Global Step: 296960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:12:30,374-Speed 5207.31 samples/sec Loss 0.5296 LearningRate 0.0012 Epoch: 17 Global Step: 296970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:12:32,342-Speed 5205.80 samples/sec Loss 0.5071 LearningRate 0.0012 Epoch: 17 Global Step: 296980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:12:34,312-Speed 5199.45 samples/sec Loss 0.5328 LearningRate 0.0012 Epoch: 17 Global Step: 296990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:12:36,298-Speed 5157.97 samples/sec Loss 0.5155 LearningRate 0.0012 Epoch: 17 Global Step: 297000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:12:38,287-Speed 5148.98 samples/sec Loss 0.5237 LearningRate 0.0012 Epoch: 17 Global Step: 297010 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:12:40,256-Speed 5201.34 samples/sec Loss 0.5299 LearningRate 0.0012 Epoch: 17 Global Step: 297020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:12:42,226-Speed 5201.29 samples/sec Loss 0.5408 LearningRate 0.0012 Epoch: 17 Global Step: 297030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:12:44,200-Speed 5188.52 samples/sec Loss 0.5508 LearningRate 0.0012 Epoch: 17 Global Step: 297040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:12:46,170-Speed 5199.42 samples/sec Loss 0.5219 LearningRate 0.0012 Epoch: 17 Global Step: 297050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:12:48,169-Speed 5123.37 samples/sec Loss 0.5226 LearningRate 0.0012 Epoch: 17 Global Step: 297060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:12:50,152-Speed 5165.45 samples/sec Loss 0.5276 LearningRate 0.0012 Epoch: 17 Global Step: 297070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:12:52,139-Speed 5154.34 samples/sec Loss 0.5390 LearningRate 0.0012 Epoch: 17 Global Step: 297080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:12:54,124-Speed 5163.21 samples/sec Loss 0.5224 LearningRate 0.0012 Epoch: 17 Global Step: 297090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:12:56,091-Speed 5208.48 samples/sec Loss 0.5282 LearningRate 0.0012 Epoch: 17 Global Step: 297100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:12:58,065-Speed 5187.58 samples/sec Loss 0.5289 LearningRate 0.0012 Epoch: 17 Global Step: 297110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:13:00,032-Speed 5206.68 samples/sec Loss 0.5337 LearningRate 0.0012 Epoch: 17 Global Step: 297120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:13:02,013-Speed 5170.99 samples/sec Loss 0.5227 LearningRate 0.0012 Epoch: 17 Global Step: 297130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:13:03,996-Speed 5167.40 samples/sec Loss 0.5356 LearningRate 0.0012 Epoch: 17 Global Step: 297140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:13:05,972-Speed 5183.61 samples/sec Loss 0.5287 LearningRate 0.0012 Epoch: 17 Global Step: 297150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:13:07,942-Speed 5197.51 samples/sec Loss 0.5070 LearningRate 0.0012 Epoch: 17 Global Step: 297160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:13:09,927-Speed 5161.07 samples/sec Loss 0.5698 LearningRate 0.0012 Epoch: 17 Global Step: 297170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:13:11,918-Speed 5144.83 samples/sec Loss 0.5298 LearningRate 0.0012 Epoch: 17 Global Step: 297180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:13:13,911-Speed 5139.66 samples/sec Loss 0.5510 LearningRate 0.0012 Epoch: 17 Global Step: 297190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:13:15,902-Speed 5146.10 samples/sec Loss 0.5375 LearningRate 0.0012 Epoch: 17 Global Step: 297200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:13:17,874-Speed 5193.93 samples/sec Loss 0.5197 LearningRate 0.0012 Epoch: 17 Global Step: 297210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:13:19,847-Speed 5192.76 samples/sec Loss 0.5522 LearningRate 0.0012 Epoch: 17 Global Step: 297220 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:13:21,809-Speed 5219.70 samples/sec Loss 0.5407 LearningRate 0.0012 Epoch: 17 Global Step: 297230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:13:23,789-Speed 5174.19 samples/sec Loss 0.5481 LearningRate 0.0012 Epoch: 17 Global Step: 297240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:13:25,772-Speed 5165.27 samples/sec Loss 0.5159 LearningRate 0.0012 Epoch: 17 Global Step: 297250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:13:27,746-Speed 5190.49 samples/sec Loss 0.5722 LearningRate 0.0012 Epoch: 17 Global Step: 297260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:13:29,746-Speed 5120.45 samples/sec Loss 0.5529 LearningRate 0.0012 Epoch: 17 Global Step: 297270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:13:31,721-Speed 5186.83 samples/sec Loss 0.5384 LearningRate 0.0012 Epoch: 17 Global Step: 297280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:13:33,710-Speed 5150.57 samples/sec Loss 0.5021 LearningRate 0.0012 Epoch: 17 Global Step: 297290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:13:35,695-Speed 5158.75 samples/sec Loss 0.5516 LearningRate 0.0012 Epoch: 17 Global Step: 297300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:13:37,678-Speed 5165.42 samples/sec Loss 0.5184 LearningRate 0.0012 Epoch: 17 Global Step: 297310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:13:39,651-Speed 5193.09 samples/sec Loss 0.5167 LearningRate 0.0012 Epoch: 17 Global Step: 297320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:13:41,661-Speed 5095.82 samples/sec Loss 0.5448 LearningRate 0.0012 Epoch: 17 Global Step: 297330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:13:43,631-Speed 5201.39 samples/sec Loss 0.5566 LearningRate 0.0012 Epoch: 17 Global Step: 297340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:13:45,605-Speed 5189.52 samples/sec Loss 0.5434 LearningRate 0.0012 Epoch: 17 Global Step: 297350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:13:47,576-Speed 5196.47 samples/sec Loss 0.5464 LearningRate 0.0012 Epoch: 17 Global Step: 297360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:13:49,567-Speed 5143.89 samples/sec Loss 0.5260 LearningRate 0.0012 Epoch: 17 Global Step: 297370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:13:51,541-Speed 5188.25 samples/sec Loss 0.5269 LearningRate 0.0012 Epoch: 17 Global Step: 297380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:13:53,527-Speed 5158.00 samples/sec Loss 0.5529 LearningRate 0.0012 Epoch: 17 Global Step: 297390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:13:55,510-Speed 5166.47 samples/sec Loss 0.5334 LearningRate 0.0012 Epoch: 17 Global Step: 297400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:13:57,507-Speed 5128.59 samples/sec Loss 0.5182 LearningRate 0.0012 Epoch: 17 Global Step: 297410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:13:59,488-Speed 5172.04 samples/sec Loss 0.5271 LearningRate 0.0012 Epoch: 17 Global Step: 297420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:14:01,481-Speed 5138.11 samples/sec Loss 0.5562 LearningRate 0.0012 Epoch: 17 Global Step: 297430 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:14:03,448-Speed 5211.32 samples/sec Loss 0.5367 LearningRate 0.0012 Epoch: 17 Global Step: 297440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:14:05,421-Speed 5191.74 samples/sec Loss 0.5401 LearningRate 0.0012 Epoch: 17 Global Step: 297450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:14:07,406-Speed 5158.41 samples/sec Loss 0.5435 LearningRate 0.0012 Epoch: 17 Global Step: 297460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:14:09,380-Speed 5189.57 samples/sec Loss 0.5025 LearningRate 0.0012 Epoch: 17 Global Step: 297470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:14:11,355-Speed 5186.79 samples/sec Loss 0.5318 LearningRate 0.0012 Epoch: 17 Global Step: 297480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:14:13,356-Speed 5118.88 samples/sec Loss 0.5502 LearningRate 0.0012 Epoch: 17 Global Step: 297490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:14:15,333-Speed 5180.65 samples/sec Loss 0.5097 LearningRate 0.0012 Epoch: 17 Global Step: 297500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:14:17,302-Speed 5202.33 samples/sec Loss 0.5487 LearningRate 0.0012 Epoch: 17 Global Step: 297510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:14:19,280-Speed 5180.94 samples/sec Loss 0.5108 LearningRate 0.0012 Epoch: 17 Global Step: 297520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:14:21,265-Speed 5159.44 samples/sec Loss 0.5323 LearningRate 0.0012 Epoch: 17 Global Step: 297530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:14:23,234-Speed 5201.62 samples/sec Loss 0.5386 LearningRate 0.0012 Epoch: 17 Global Step: 297540 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:14:25,217-Speed 5167.61 samples/sec Loss 0.5438 LearningRate 0.0012 Epoch: 17 Global Step: 297550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:14:27,205-Speed 5151.50 samples/sec Loss 0.5194 LearningRate 0.0012 Epoch: 17 Global Step: 297560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:14:29,177-Speed 5193.57 samples/sec Loss 0.5126 LearningRate 0.0012 Epoch: 17 Global Step: 297570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:14:31,152-Speed 5186.85 samples/sec Loss 0.5118 LearningRate 0.0012 Epoch: 17 Global Step: 297580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:14:33,130-Speed 5179.32 samples/sec Loss 0.5578 LearningRate 0.0012 Epoch: 17 Global Step: 297590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:14:35,127-Speed 5129.35 samples/sec Loss 0.5510 LearningRate 0.0012 Epoch: 17 Global Step: 297600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:14:37,112-Speed 5160.30 samples/sec Loss 0.5385 LearningRate 0.0012 Epoch: 17 Global Step: 297610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:14:39,082-Speed 5199.54 samples/sec Loss 0.5428 LearningRate 0.0012 Epoch: 17 Global Step: 297620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:14:41,068-Speed 5156.38 samples/sec Loss 0.5522 LearningRate 0.0012 Epoch: 17 Global Step: 297630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:14:43,053-Speed 5162.73 samples/sec Loss 0.5521 LearningRate 0.0012 Epoch: 17 Global Step: 297640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:14:45,050-Speed 5127.44 samples/sec Loss 0.5382 LearningRate 0.0012 Epoch: 17 Global Step: 297650 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:14:47,027-Speed 5183.91 samples/sec Loss 0.5619 LearningRate 0.0012 Epoch: 17 Global Step: 297660 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:14:49,008-Speed 5168.84 samples/sec Loss 0.5404 LearningRate 0.0012 Epoch: 17 Global Step: 297670 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:14:50,989-Speed 5172.69 samples/sec Loss 0.5289 LearningRate 0.0012 Epoch: 17 Global Step: 297680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:14:52,993-Speed 5109.67 samples/sec Loss 0.5420 LearningRate 0.0012 Epoch: 17 Global Step: 297690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:14:54,972-Speed 5176.30 samples/sec Loss 0.5311 LearningRate 0.0012 Epoch: 17 Global Step: 297700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:14:56,942-Speed 5200.31 samples/sec Loss 0.5203 LearningRate 0.0012 Epoch: 17 Global Step: 297710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:14:58,933-Speed 5144.64 samples/sec Loss 0.4719 LearningRate 0.0012 Epoch: 17 Global Step: 297720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:15:00,923-Speed 5147.55 samples/sec Loss 0.4944 LearningRate 0.0012 Epoch: 17 Global Step: 297730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:15:02,901-Speed 5179.94 samples/sec Loss 0.5258 LearningRate 0.0012 Epoch: 17 Global Step: 297740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:15:04,880-Speed 5176.29 samples/sec Loss 0.5388 LearningRate 0.0012 Epoch: 17 Global Step: 297750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:15:06,848-Speed 5202.56 samples/sec Loss 0.5460 LearningRate 0.0012 Epoch: 17 Global Step: 297760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:15:08,822-Speed 5190.30 samples/sec Loss 0.5488 LearningRate 0.0012 Epoch: 17 Global Step: 297770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:15:10,806-Speed 5163.82 samples/sec Loss 0.5552 LearningRate 0.0012 Epoch: 17 Global Step: 297780 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:15:12,786-Speed 5173.71 samples/sec Loss 0.5381 LearningRate 0.0012 Epoch: 17 Global Step: 297790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:15:14,758-Speed 5193.68 samples/sec Loss 0.5343 LearningRate 0.0012 Epoch: 17 Global Step: 297800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:15:16,751-Speed 5139.72 samples/sec Loss 0.5284 LearningRate 0.0012 Epoch: 17 Global Step: 297810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:15:18,723-Speed 5195.29 samples/sec Loss 0.5430 LearningRate 0.0012 Epoch: 17 Global Step: 297820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:15:20,695-Speed 5192.79 samples/sec Loss 0.5408 LearningRate 0.0012 Epoch: 17 Global Step: 297830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:15:22,682-Speed 5155.23 samples/sec Loss 0.5251 LearningRate 0.0012 Epoch: 17 Global Step: 297840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:15:24,659-Speed 5185.25 samples/sec Loss 0.5272 LearningRate 0.0012 Epoch: 17 Global Step: 297850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:15:26,644-Speed 5161.57 samples/sec Loss 0.5273 LearningRate 0.0012 Epoch: 17 Global Step: 297860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:15:28,616-Speed 5192.97 samples/sec Loss 0.5088 LearningRate 0.0012 Epoch: 17 Global Step: 297870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:15:30,594-Speed 5178.91 samples/sec Loss 0.5564 LearningRate 0.0012 Epoch: 17 Global Step: 297880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:15:32,590-Speed 5133.46 samples/sec Loss 0.5361 LearningRate 0.0012 Epoch: 17 Global Step: 297890 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:15:34,577-Speed 5155.41 samples/sec Loss 0.5533 LearningRate 0.0012 Epoch: 17 Global Step: 297900 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:15:36,542-Speed 5211.47 samples/sec Loss 0.5086 LearningRate 0.0012 Epoch: 17 Global Step: 297910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:15:38,530-Speed 5153.92 samples/sec Loss 0.5345 LearningRate 0.0012 Epoch: 17 Global Step: 297920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:15:40,502-Speed 5193.80 samples/sec Loss 0.5180 LearningRate 0.0012 Epoch: 17 Global Step: 297930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:15:42,477-Speed 5187.19 samples/sec Loss 0.5489 LearningRate 0.0012 Epoch: 17 Global Step: 297940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:15:44,451-Speed 5186.82 samples/sec Loss 0.5184 LearningRate 0.0012 Epoch: 17 Global Step: 297950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:15:46,430-Speed 5178.25 samples/sec Loss 0.5569 LearningRate 0.0012 Epoch: 17 Global Step: 297960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:15:48,404-Speed 5188.92 samples/sec Loss 0.5672 LearningRate 0.0012 Epoch: 17 Global Step: 297970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:15:50,395-Speed 5144.59 samples/sec Loss 0.5581 LearningRate 0.0012 Epoch: 17 Global Step: 297980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:15:52,376-Speed 5169.40 samples/sec Loss 0.5300 LearningRate 0.0012 Epoch: 17 Global Step: 297990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:15:54,359-Speed 5168.65 samples/sec Loss 0.5593 LearningRate 0.0012 Epoch: 17 Global Step: 298000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:16:20,981-[lfw][298000]XNorm: 21.326507 Training: 2022-04-11 19:16:20,981-[lfw][298000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 19:16:20,981-[lfw][298000]Accuracy-Highest: 0.99833 Training: 2022-04-11 19:16:51,811-[cfp_fp][298000]XNorm: 21.768871 Training: 2022-04-11 19:16:51,811-[cfp_fp][298000]Accuracy-Flip: 0.98900+-0.00378 Training: 2022-04-11 19:16:51,812-[cfp_fp][298000]Accuracy-Highest: 0.99000 Training: 2022-04-11 19:17:18,289-[agedb_30][298000]XNorm: 22.486242 Training: 2022-04-11 19:17:18,289-[agedb_30][298000]Accuracy-Flip: 0.98333+-0.00601 Training: 2022-04-11 19:17:18,289-[agedb_30][298000]Accuracy-Highest: 0.98383 Training: 2022-04-11 19:17:20,278-Speed 119.18 samples/sec Loss 0.5261 LearningRate 0.0012 Epoch: 17 Global Step: 298010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:17:22,274-Speed 5129.71 samples/sec Loss 0.5432 LearningRate 0.0012 Epoch: 17 Global Step: 298020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:17:24,250-Speed 5185.93 samples/sec Loss 0.5400 LearningRate 0.0011 Epoch: 17 Global Step: 298030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:17:26,213-Speed 5220.27 samples/sec Loss 0.5177 LearningRate 0.0011 Epoch: 17 Global Step: 298040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:17:28,177-Speed 5214.46 samples/sec Loss 0.5505 LearningRate 0.0011 Epoch: 17 Global Step: 298050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:17:30,154-Speed 5180.66 samples/sec Loss 0.5608 LearningRate 0.0011 Epoch: 17 Global Step: 298060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:17:32,137-Speed 5167.28 samples/sec Loss 0.4773 LearningRate 0.0011 Epoch: 17 Global Step: 298070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:17:34,108-Speed 5195.80 samples/sec Loss 0.5295 LearningRate 0.0011 Epoch: 17 Global Step: 298080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:17:36,086-Speed 5179.34 samples/sec Loss 0.5525 LearningRate 0.0011 Epoch: 17 Global Step: 298090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:17:38,053-Speed 5207.28 samples/sec Loss 0.5385 LearningRate 0.0011 Epoch: 17 Global Step: 298100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:17:40,022-Speed 5201.78 samples/sec Loss 0.5177 LearningRate 0.0011 Epoch: 17 Global Step: 298110 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:17:42,005-Speed 5163.96 samples/sec Loss 0.5307 LearningRate 0.0011 Epoch: 17 Global Step: 298120 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:17:43,981-Speed 5185.55 samples/sec Loss 0.5353 LearningRate 0.0011 Epoch: 17 Global Step: 298130 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:17:45,969-Speed 5153.28 samples/sec Loss 0.5398 LearningRate 0.0011 Epoch: 17 Global Step: 298140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:17:47,967-Speed 5125.88 samples/sec Loss 0.5196 LearningRate 0.0011 Epoch: 17 Global Step: 298150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:17:49,950-Speed 5163.97 samples/sec Loss 0.5478 LearningRate 0.0011 Epoch: 17 Global Step: 298160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:17:51,954-Speed 5112.91 samples/sec Loss 0.5186 LearningRate 0.0011 Epoch: 17 Global Step: 298170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:17:53,923-Speed 5202.83 samples/sec Loss 0.5351 LearningRate 0.0011 Epoch: 17 Global Step: 298180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:17:55,896-Speed 5193.28 samples/sec Loss 0.5464 LearningRate 0.0011 Epoch: 17 Global Step: 298190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:17:57,880-Speed 5161.08 samples/sec Loss 0.5022 LearningRate 0.0011 Epoch: 17 Global Step: 298200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:17:59,854-Speed 5189.19 samples/sec Loss 0.5205 LearningRate 0.0011 Epoch: 17 Global Step: 298210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:18:01,834-Speed 5174.64 samples/sec Loss 0.5213 LearningRate 0.0011 Epoch: 17 Global Step: 298220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:18:03,822-Speed 5150.69 samples/sec Loss 0.5307 LearningRate 0.0011 Epoch: 17 Global Step: 298230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:18:05,808-Speed 5160.07 samples/sec Loss 0.5408 LearningRate 0.0011 Epoch: 17 Global Step: 298240 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:18:07,772-Speed 5215.64 samples/sec Loss 0.5265 LearningRate 0.0011 Epoch: 17 Global Step: 298250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:18:09,763-Speed 5144.36 samples/sec Loss 0.5020 LearningRate 0.0011 Epoch: 17 Global Step: 298260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:18:11,732-Speed 5201.56 samples/sec Loss 0.5216 LearningRate 0.0011 Epoch: 17 Global Step: 298270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:18:13,719-Speed 5154.79 samples/sec Loss 0.5580 LearningRate 0.0011 Epoch: 17 Global Step: 298280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:18:15,708-Speed 5151.98 samples/sec Loss 0.5281 LearningRate 0.0011 Epoch: 17 Global Step: 298290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:18:17,681-Speed 5191.98 samples/sec Loss 0.5301 LearningRate 0.0011 Epoch: 17 Global Step: 298300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:18:19,651-Speed 5197.93 samples/sec Loss 0.5104 LearningRate 0.0011 Epoch: 17 Global Step: 298310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:18:21,626-Speed 5188.06 samples/sec Loss 0.5372 LearningRate 0.0011 Epoch: 17 Global Step: 298320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:18:23,595-Speed 5200.90 samples/sec Loss 0.5383 LearningRate 0.0011 Epoch: 17 Global Step: 298330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:18:25,583-Speed 5152.24 samples/sec Loss 0.5593 LearningRate 0.0011 Epoch: 17 Global Step: 298340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:18:27,577-Speed 5137.70 samples/sec Loss 0.5388 LearningRate 0.0011 Epoch: 17 Global Step: 298350 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:18:29,559-Speed 5168.38 samples/sec Loss 0.5502 LearningRate 0.0011 Epoch: 17 Global Step: 298360 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:18:31,524-Speed 5213.57 samples/sec Loss 0.5472 LearningRate 0.0011 Epoch: 17 Global Step: 298370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:18:33,510-Speed 5158.56 samples/sec Loss 0.5428 LearningRate 0.0011 Epoch: 17 Global Step: 298380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:18:35,494-Speed 5162.47 samples/sec Loss 0.5458 LearningRate 0.0011 Epoch: 17 Global Step: 298390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:18:37,480-Speed 5159.40 samples/sec Loss 0.5249 LearningRate 0.0011 Epoch: 17 Global Step: 298400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:18:39,455-Speed 5186.16 samples/sec Loss 0.5188 LearningRate 0.0011 Epoch: 17 Global Step: 298410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:18:41,426-Speed 5196.20 samples/sec Loss 0.5180 LearningRate 0.0011 Epoch: 17 Global Step: 298420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:18:43,409-Speed 5167.05 samples/sec Loss 0.5211 LearningRate 0.0011 Epoch: 17 Global Step: 298430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:18:45,379-Speed 5197.43 samples/sec Loss 0.5459 LearningRate 0.0011 Epoch: 17 Global Step: 298440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:18:47,373-Speed 5137.24 samples/sec Loss 0.5362 LearningRate 0.0011 Epoch: 17 Global Step: 298450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:18:49,369-Speed 5133.26 samples/sec Loss 0.5124 LearningRate 0.0011 Epoch: 17 Global Step: 298460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:18:51,340-Speed 5194.97 samples/sec Loss 0.4920 LearningRate 0.0011 Epoch: 17 Global Step: 298470 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:18:53,340-Speed 5123.59 samples/sec Loss 0.5310 LearningRate 0.0011 Epoch: 17 Global Step: 298480 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:18:55,319-Speed 5175.31 samples/sec Loss 0.5216 LearningRate 0.0011 Epoch: 17 Global Step: 298490 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:18:57,292-Speed 5192.31 samples/sec Loss 0.5386 LearningRate 0.0011 Epoch: 17 Global Step: 298500 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:18:59,259-Speed 5207.98 samples/sec Loss 0.5392 LearningRate 0.0011 Epoch: 17 Global Step: 298510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:19:01,237-Speed 5178.92 samples/sec Loss 0.5463 LearningRate 0.0011 Epoch: 17 Global Step: 298520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:19:03,203-Speed 5210.73 samples/sec Loss 0.5381 LearningRate 0.0011 Epoch: 17 Global Step: 298530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:19:05,190-Speed 5154.44 samples/sec Loss 0.5278 LearningRate 0.0011 Epoch: 17 Global Step: 298540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:19:07,157-Speed 5207.70 samples/sec Loss 0.5107 LearningRate 0.0011 Epoch: 17 Global Step: 298550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:19:09,127-Speed 5198.82 samples/sec Loss 0.5231 LearningRate 0.0011 Epoch: 17 Global Step: 298560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:19:11,095-Speed 5205.52 samples/sec Loss 0.5102 LearningRate 0.0011 Epoch: 17 Global Step: 298570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:19:13,066-Speed 5197.37 samples/sec Loss 0.5113 LearningRate 0.0011 Epoch: 17 Global Step: 298580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:19:15,038-Speed 5195.66 samples/sec Loss 0.5356 LearningRate 0.0011 Epoch: 17 Global Step: 298590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:19:17,020-Speed 5168.13 samples/sec Loss 0.5199 LearningRate 0.0011 Epoch: 17 Global Step: 298600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:19:19,001-Speed 5170.42 samples/sec Loss 0.5455 LearningRate 0.0011 Epoch: 17 Global Step: 298610 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:19:20,985-Speed 5161.72 samples/sec Loss 0.5711 LearningRate 0.0011 Epoch: 17 Global Step: 298620 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:19:22,961-Speed 5185.46 samples/sec Loss 0.5330 LearningRate 0.0011 Epoch: 17 Global Step: 298630 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:19:24,941-Speed 5172.48 samples/sec Loss 0.5346 LearningRate 0.0011 Epoch: 17 Global Step: 298640 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:19:26,921-Speed 5173.46 samples/sec Loss 0.5076 LearningRate 0.0011 Epoch: 17 Global Step: 298650 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:19:28,906-Speed 5161.93 samples/sec Loss 0.5047 LearningRate 0.0011 Epoch: 17 Global Step: 298660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:19:30,876-Speed 5197.98 samples/sec Loss 0.5320 LearningRate 0.0011 Epoch: 17 Global Step: 298670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:19:32,848-Speed 5195.94 samples/sec Loss 0.5297 LearningRate 0.0011 Epoch: 17 Global Step: 298680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:19:34,834-Speed 5156.02 samples/sec Loss 0.5466 LearningRate 0.0011 Epoch: 17 Global Step: 298690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:19:36,806-Speed 5196.44 samples/sec Loss 0.5449 LearningRate 0.0011 Epoch: 17 Global Step: 298700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:19:38,779-Speed 5189.55 samples/sec Loss 0.4965 LearningRate 0.0011 Epoch: 17 Global Step: 298710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:19:40,748-Speed 5202.57 samples/sec Loss 0.4990 LearningRate 0.0011 Epoch: 17 Global Step: 298720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:19:42,732-Speed 5164.87 samples/sec Loss 0.5612 LearningRate 0.0011 Epoch: 17 Global Step: 298730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:19:44,715-Speed 5164.97 samples/sec Loss 0.5062 LearningRate 0.0011 Epoch: 17 Global Step: 298740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:19:46,715-Speed 5121.66 samples/sec Loss 0.5337 LearningRate 0.0011 Epoch: 17 Global Step: 298750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:19:48,708-Speed 5141.06 samples/sec Loss 0.5475 LearningRate 0.0011 Epoch: 17 Global Step: 298760 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:19:50,676-Speed 5202.84 samples/sec Loss 0.5486 LearningRate 0.0011 Epoch: 17 Global Step: 298770 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:19:52,661-Speed 5162.39 samples/sec Loss 0.5480 LearningRate 0.0011 Epoch: 17 Global Step: 298780 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:19:54,630-Speed 5201.40 samples/sec Loss 0.5395 LearningRate 0.0011 Epoch: 17 Global Step: 298790 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:19:56,604-Speed 5189.08 samples/sec Loss 0.5468 LearningRate 0.0011 Epoch: 17 Global Step: 298800 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:19:58,597-Speed 5139.36 samples/sec Loss 0.5535 LearningRate 0.0011 Epoch: 17 Global Step: 298810 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:20:00,573-Speed 5182.79 samples/sec Loss 0.5446 LearningRate 0.0011 Epoch: 17 Global Step: 298820 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:20:02,574-Speed 5119.20 samples/sec Loss 0.5380 LearningRate 0.0011 Epoch: 17 Global Step: 298830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:20:04,572-Speed 5127.71 samples/sec Loss 0.5007 LearningRate 0.0011 Epoch: 17 Global Step: 298840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:20:06,555-Speed 5164.87 samples/sec Loss 0.5334 LearningRate 0.0011 Epoch: 17 Global Step: 298850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:20:08,525-Speed 5200.78 samples/sec Loss 0.5094 LearningRate 0.0011 Epoch: 17 Global Step: 298860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:20:10,521-Speed 5133.74 samples/sec Loss 0.5326 LearningRate 0.0011 Epoch: 17 Global Step: 298870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:20:12,523-Speed 5114.78 samples/sec Loss 0.5371 LearningRate 0.0011 Epoch: 17 Global Step: 298880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:20:14,492-Speed 5204.49 samples/sec Loss 0.5250 LearningRate 0.0011 Epoch: 17 Global Step: 298890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:20:16,462-Speed 5199.40 samples/sec Loss 0.5405 LearningRate 0.0011 Epoch: 17 Global Step: 298900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:20:18,430-Speed 5202.45 samples/sec Loss 0.5307 LearningRate 0.0011 Epoch: 17 Global Step: 298910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:20:20,420-Speed 5148.70 samples/sec Loss 0.5327 LearningRate 0.0011 Epoch: 17 Global Step: 298920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:20:22,423-Speed 5113.74 samples/sec Loss 0.5696 LearningRate 0.0011 Epoch: 17 Global Step: 298930 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:20:24,398-Speed 5185.95 samples/sec Loss 0.5295 LearningRate 0.0011 Epoch: 17 Global Step: 298940 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:20:26,386-Speed 5154.43 samples/sec Loss 0.5333 LearningRate 0.0011 Epoch: 17 Global Step: 298950 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:20:28,380-Speed 5136.89 samples/sec Loss 0.5334 LearningRate 0.0011 Epoch: 17 Global Step: 298960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:20:30,353-Speed 5191.63 samples/sec Loss 0.5229 LearningRate 0.0011 Epoch: 17 Global Step: 298970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:20:32,336-Speed 5163.48 samples/sec Loss 0.5447 LearningRate 0.0011 Epoch: 17 Global Step: 298980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:20:34,305-Speed 5203.84 samples/sec Loss 0.5498 LearningRate 0.0011 Epoch: 17 Global Step: 298990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:20:36,273-Speed 5205.26 samples/sec Loss 0.5385 LearningRate 0.0011 Epoch: 17 Global Step: 299000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:20:38,270-Speed 5130.55 samples/sec Loss 0.5522 LearningRate 0.0011 Epoch: 17 Global Step: 299010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:20:40,254-Speed 5162.99 samples/sec Loss 0.5295 LearningRate 0.0011 Epoch: 17 Global Step: 299020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:20:42,223-Speed 5202.71 samples/sec Loss 0.5528 LearningRate 0.0011 Epoch: 17 Global Step: 299030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:20:44,196-Speed 5191.95 samples/sec Loss 0.5268 LearningRate 0.0011 Epoch: 17 Global Step: 299040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:20:46,182-Speed 5156.07 samples/sec Loss 0.5490 LearningRate 0.0011 Epoch: 17 Global Step: 299050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:20:48,168-Speed 5157.53 samples/sec Loss 0.5112 LearningRate 0.0011 Epoch: 17 Global Step: 299060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:20:50,139-Speed 5196.69 samples/sec Loss 0.5207 LearningRate 0.0011 Epoch: 17 Global Step: 299070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:20:52,110-Speed 5197.34 samples/sec Loss 0.5148 LearningRate 0.0011 Epoch: 17 Global Step: 299080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:20:54,094-Speed 5163.99 samples/sec Loss 0.5149 LearningRate 0.0011 Epoch: 17 Global Step: 299090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:20:56,079-Speed 5160.01 samples/sec Loss 0.5275 LearningRate 0.0011 Epoch: 17 Global Step: 299100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:20:58,110-Speed 5044.92 samples/sec Loss 0.5227 LearningRate 0.0011 Epoch: 17 Global Step: 299110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:21:00,103-Speed 5139.87 samples/sec Loss 0.5302 LearningRate 0.0011 Epoch: 17 Global Step: 299120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:21:02,082-Speed 5175.36 samples/sec Loss 0.5011 LearningRate 0.0011 Epoch: 17 Global Step: 299130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:21:04,055-Speed 5191.56 samples/sec Loss 0.5358 LearningRate 0.0011 Epoch: 17 Global Step: 299140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:21:06,052-Speed 5130.69 samples/sec Loss 0.5449 LearningRate 0.0011 Epoch: 17 Global Step: 299150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:21:08,038-Speed 5156.96 samples/sec Loss 0.5445 LearningRate 0.0011 Epoch: 17 Global Step: 299160 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:21:10,018-Speed 5172.81 samples/sec Loss 0.5239 LearningRate 0.0011 Epoch: 17 Global Step: 299170 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:21:11,993-Speed 5187.97 samples/sec Loss 0.5124 LearningRate 0.0011 Epoch: 17 Global Step: 299180 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:21:13,974-Speed 5169.19 samples/sec Loss 0.5170 LearningRate 0.0011 Epoch: 17 Global Step: 299190 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:21:15,967-Speed 5140.87 samples/sec Loss 0.5249 LearningRate 0.0011 Epoch: 17 Global Step: 299200 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:21:17,931-Speed 5215.90 samples/sec Loss 0.5158 LearningRate 0.0011 Epoch: 17 Global Step: 299210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:21:19,919-Speed 5152.01 samples/sec Loss 0.5159 LearningRate 0.0011 Epoch: 17 Global Step: 299220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:21:21,895-Speed 5185.15 samples/sec Loss 0.5182 LearningRate 0.0011 Epoch: 17 Global Step: 299230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:21:23,878-Speed 5165.63 samples/sec Loss 0.5427 LearningRate 0.0011 Epoch: 17 Global Step: 299240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:21:25,873-Speed 5133.23 samples/sec Loss 0.5373 LearningRate 0.0011 Epoch: 17 Global Step: 299250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:21:27,876-Speed 5115.36 samples/sec Loss 0.5164 LearningRate 0.0011 Epoch: 17 Global Step: 299260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:21:29,847-Speed 5194.92 samples/sec Loss 0.5235 LearningRate 0.0011 Epoch: 17 Global Step: 299270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:21:31,832-Speed 5160.21 samples/sec Loss 0.5232 LearningRate 0.0011 Epoch: 17 Global Step: 299280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:21:33,802-Speed 5199.80 samples/sec Loss 0.5312 LearningRate 0.0011 Epoch: 17 Global Step: 299290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:21:35,804-Speed 5117.84 samples/sec Loss 0.5230 LearningRate 0.0011 Epoch: 17 Global Step: 299300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:21:37,798-Speed 5137.17 samples/sec Loss 0.5473 LearningRate 0.0011 Epoch: 17 Global Step: 299310 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:21:39,790-Speed 5141.88 samples/sec Loss 0.5292 LearningRate 0.0011 Epoch: 17 Global Step: 299320 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:21:41,774-Speed 5162.24 samples/sec Loss 0.5467 LearningRate 0.0011 Epoch: 17 Global Step: 299330 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:21:43,745-Speed 5197.29 samples/sec Loss 0.5439 LearningRate 0.0011 Epoch: 17 Global Step: 299340 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:21:45,713-Speed 5206.04 samples/sec Loss 0.5162 LearningRate 0.0011 Epoch: 17 Global Step: 299350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:21:47,687-Speed 5190.46 samples/sec Loss 0.5046 LearningRate 0.0011 Epoch: 17 Global Step: 299360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:21:49,679-Speed 5139.96 samples/sec Loss 0.5756 LearningRate 0.0011 Epoch: 17 Global Step: 299370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:21:51,670-Speed 5144.54 samples/sec Loss 0.5108 LearningRate 0.0011 Epoch: 17 Global Step: 299380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:21:53,669-Speed 5125.82 samples/sec Loss 0.5569 LearningRate 0.0011 Epoch: 17 Global Step: 299390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:21:55,648-Speed 5175.40 samples/sec Loss 0.5288 LearningRate 0.0011 Epoch: 17 Global Step: 299400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:21:57,628-Speed 5174.78 samples/sec Loss 0.5517 LearningRate 0.0011 Epoch: 17 Global Step: 299410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:21:59,606-Speed 5178.39 samples/sec Loss 0.5310 LearningRate 0.0011 Epoch: 17 Global Step: 299420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:22:01,588-Speed 5165.75 samples/sec Loss 0.5243 LearningRate 0.0011 Epoch: 17 Global Step: 299430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:22:03,574-Speed 5159.24 samples/sec Loss 0.5514 LearningRate 0.0011 Epoch: 17 Global Step: 299440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:22:05,555-Speed 5170.29 samples/sec Loss 0.5360 LearningRate 0.0011 Epoch: 17 Global Step: 299450 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:22:07,527-Speed 5196.10 samples/sec Loss 0.5026 LearningRate 0.0011 Epoch: 17 Global Step: 299460 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:22:09,496-Speed 5201.08 samples/sec Loss 0.5843 LearningRate 0.0011 Epoch: 17 Global Step: 299470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:22:11,483-Speed 5155.18 samples/sec Loss 0.5532 LearningRate 0.0011 Epoch: 17 Global Step: 299480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:22:13,476-Speed 5139.30 samples/sec Loss 0.5354 LearningRate 0.0011 Epoch: 17 Global Step: 299490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:22:15,448-Speed 5194.38 samples/sec Loss 0.5367 LearningRate 0.0011 Epoch: 17 Global Step: 299500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:22:17,426-Speed 5180.35 samples/sec Loss 0.5590 LearningRate 0.0011 Epoch: 17 Global Step: 299510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:22:19,400-Speed 5188.94 samples/sec Loss 0.5641 LearningRate 0.0011 Epoch: 17 Global Step: 299520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:22:21,379-Speed 5174.01 samples/sec Loss 0.5291 LearningRate 0.0011 Epoch: 17 Global Step: 299530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:22:23,375-Speed 5133.29 samples/sec Loss 0.5225 LearningRate 0.0011 Epoch: 17 Global Step: 299540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:22:25,383-Speed 5101.65 samples/sec Loss 0.5394 LearningRate 0.0011 Epoch: 17 Global Step: 299550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:22:27,382-Speed 5124.94 samples/sec Loss 0.5309 LearningRate 0.0011 Epoch: 17 Global Step: 299560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:22:29,366-Speed 5161.75 samples/sec Loss 0.5253 LearningRate 0.0011 Epoch: 17 Global Step: 299570 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:22:31,351-Speed 5161.25 samples/sec Loss 0.5447 LearningRate 0.0011 Epoch: 17 Global Step: 299580 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:22:33,325-Speed 5188.43 samples/sec Loss 0.5083 LearningRate 0.0011 Epoch: 17 Global Step: 299590 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:22:35,309-Speed 5163.98 samples/sec Loss 0.5118 LearningRate 0.0011 Epoch: 17 Global Step: 299600 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:22:37,321-Speed 5090.06 samples/sec Loss 0.5192 LearningRate 0.0011 Epoch: 17 Global Step: 299610 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:22:39,307-Speed 5158.71 samples/sec Loss 0.5341 LearningRate 0.0010 Epoch: 17 Global Step: 299620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:22:41,280-Speed 5192.04 samples/sec Loss 0.5424 LearningRate 0.0010 Epoch: 17 Global Step: 299630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:22:43,256-Speed 5182.87 samples/sec Loss 0.5268 LearningRate 0.0010 Epoch: 17 Global Step: 299640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:22:45,254-Speed 5127.05 samples/sec Loss 0.5409 LearningRate 0.0010 Epoch: 17 Global Step: 299650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:22:47,231-Speed 5182.92 samples/sec Loss 0.5427 LearningRate 0.0010 Epoch: 17 Global Step: 299660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:22:49,233-Speed 5114.76 samples/sec Loss 0.5485 LearningRate 0.0010 Epoch: 17 Global Step: 299670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:22:51,226-Speed 5140.02 samples/sec Loss 0.5497 LearningRate 0.0010 Epoch: 17 Global Step: 299680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:22:53,224-Speed 5129.03 samples/sec Loss 0.5334 LearningRate 0.0010 Epoch: 17 Global Step: 299690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:22:55,209-Speed 5158.30 samples/sec Loss 0.5152 LearningRate 0.0010 Epoch: 17 Global Step: 299700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:22:57,183-Speed 5190.12 samples/sec Loss 0.5486 LearningRate 0.0010 Epoch: 17 Global Step: 299710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:22:59,156-Speed 5193.79 samples/sec Loss 0.5427 LearningRate 0.0010 Epoch: 17 Global Step: 299720 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:23:01,162-Speed 5104.30 samples/sec Loss 0.5153 LearningRate 0.0010 Epoch: 17 Global Step: 299730 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:23:03,149-Speed 5156.14 samples/sec Loss 0.5214 LearningRate 0.0010 Epoch: 17 Global Step: 299740 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:23:05,155-Speed 5105.28 samples/sec Loss 0.5190 LearningRate 0.0010 Epoch: 17 Global Step: 299750 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:23:07,126-Speed 5199.28 samples/sec Loss 0.5256 LearningRate 0.0010 Epoch: 17 Global Step: 299760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:23:09,102-Speed 5183.51 samples/sec Loss 0.5165 LearningRate 0.0010 Epoch: 17 Global Step: 299770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:23:11,098-Speed 5129.42 samples/sec Loss 0.5376 LearningRate 0.0010 Epoch: 17 Global Step: 299780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:23:13,090-Speed 5143.81 samples/sec Loss 0.5453 LearningRate 0.0010 Epoch: 17 Global Step: 299790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:23:15,061-Speed 5197.00 samples/sec Loss 0.4929 LearningRate 0.0010 Epoch: 17 Global Step: 299800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:23:17,032-Speed 5197.56 samples/sec Loss 0.5328 LearningRate 0.0010 Epoch: 17 Global Step: 299810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:23:19,006-Speed 5188.27 samples/sec Loss 0.5203 LearningRate 0.0010 Epoch: 17 Global Step: 299820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:23:20,989-Speed 5165.83 samples/sec Loss 0.5205 LearningRate 0.0010 Epoch: 17 Global Step: 299830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:23:22,960-Speed 5198.30 samples/sec Loss 0.5483 LearningRate 0.0010 Epoch: 17 Global Step: 299840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:23:24,949-Speed 5148.57 samples/sec Loss 0.5193 LearningRate 0.0010 Epoch: 17 Global Step: 299850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:23:26,918-Speed 5203.08 samples/sec Loss 0.5359 LearningRate 0.0010 Epoch: 17 Global Step: 299860 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:23:28,890-Speed 5194.46 samples/sec Loss 0.5181 LearningRate 0.0010 Epoch: 17 Global Step: 299870 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:23:30,861-Speed 5195.77 samples/sec Loss 0.5419 LearningRate 0.0010 Epoch: 17 Global Step: 299880 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:23:32,832-Speed 5198.61 samples/sec Loss 0.5307 LearningRate 0.0010 Epoch: 17 Global Step: 299890 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:23:34,815-Speed 5164.78 samples/sec Loss 0.5505 LearningRate 0.0010 Epoch: 17 Global Step: 299900 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:23:36,821-Speed 5106.32 samples/sec Loss 0.5391 LearningRate 0.0010 Epoch: 17 Global Step: 299910 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:23:38,785-Speed 5215.47 samples/sec Loss 0.5477 LearningRate 0.0010 Epoch: 17 Global Step: 299920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:23:40,782-Speed 5131.59 samples/sec Loss 0.5214 LearningRate 0.0010 Epoch: 17 Global Step: 299930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:23:42,750-Speed 5205.18 samples/sec Loss 0.5299 LearningRate 0.0010 Epoch: 17 Global Step: 299940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:23:44,724-Speed 5187.19 samples/sec Loss 0.5234 LearningRate 0.0010 Epoch: 17 Global Step: 299950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:23:46,699-Speed 5187.61 samples/sec Loss 0.5362 LearningRate 0.0010 Epoch: 17 Global Step: 299960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:23:48,687-Speed 5152.43 samples/sec Loss 0.5129 LearningRate 0.0010 Epoch: 17 Global Step: 299970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:23:50,658-Speed 5197.32 samples/sec Loss 0.5201 LearningRate 0.0010 Epoch: 17 Global Step: 299980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:23:52,636-Speed 5178.27 samples/sec Loss 0.5097 LearningRate 0.0010 Epoch: 17 Global Step: 299990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:23:54,621-Speed 5160.69 samples/sec Loss 0.5199 LearningRate 0.0010 Epoch: 17 Global Step: 300000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:24:21,298-[lfw][300000]XNorm: 21.195638 Training: 2022-04-11 19:24:21,298-[lfw][300000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 19:24:21,299-[lfw][300000]Accuracy-Highest: 0.99833 Training: 2022-04-11 19:24:52,112-[cfp_fp][300000]XNorm: 21.757474 Training: 2022-04-11 19:24:52,112-[cfp_fp][300000]Accuracy-Flip: 0.99014+-0.00386 Training: 2022-04-11 19:24:52,113-[cfp_fp][300000]Accuracy-Highest: 0.99014 Training: 2022-04-11 19:25:18,651-[agedb_30][300000]XNorm: 22.420546 Training: 2022-04-11 19:25:18,651-[agedb_30][300000]Accuracy-Flip: 0.98183+-0.00681 Training: 2022-04-11 19:25:18,652-[agedb_30][300000]Accuracy-Highest: 0.98383 Training: 2022-04-11 19:25:20,630-Speed 119.06 samples/sec Loss 0.5446 LearningRate 0.0010 Epoch: 17 Global Step: 300010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:25:22,604-Speed 5189.27 samples/sec Loss 0.5169 LearningRate 0.0010 Epoch: 17 Global Step: 300020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:25:24,578-Speed 5187.67 samples/sec Loss 0.5537 LearningRate 0.0010 Epoch: 17 Global Step: 300030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:25:26,548-Speed 5199.42 samples/sec Loss 0.5274 LearningRate 0.0010 Epoch: 17 Global Step: 300040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:25:28,512-Speed 5218.74 samples/sec Loss 0.5292 LearningRate 0.0010 Epoch: 17 Global Step: 300050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:25:30,474-Speed 5220.55 samples/sec Loss 0.5265 LearningRate 0.0010 Epoch: 17 Global Step: 300060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:25:32,440-Speed 5209.23 samples/sec Loss 0.5685 LearningRate 0.0010 Epoch: 17 Global Step: 300070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:25:34,406-Speed 5210.68 samples/sec Loss 0.5194 LearningRate 0.0010 Epoch: 17 Global Step: 300080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:25:36,397-Speed 5146.24 samples/sec Loss 0.5534 LearningRate 0.0010 Epoch: 17 Global Step: 300090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:25:38,362-Speed 5211.26 samples/sec Loss 0.5743 LearningRate 0.0010 Epoch: 17 Global Step: 300100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:25:40,335-Speed 5192.41 samples/sec Loss 0.5369 LearningRate 0.0010 Epoch: 17 Global Step: 300110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:25:42,301-Speed 5209.75 samples/sec Loss 0.5252 LearningRate 0.0010 Epoch: 17 Global Step: 300120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:25:44,270-Speed 5203.44 samples/sec Loss 0.5150 LearningRate 0.0010 Epoch: 17 Global Step: 300130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:25:46,247-Speed 5180.49 samples/sec Loss 0.5229 LearningRate 0.0010 Epoch: 17 Global Step: 300140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:25:48,216-Speed 5203.08 samples/sec Loss 0.5542 LearningRate 0.0010 Epoch: 17 Global Step: 300150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:25:50,222-Speed 5105.99 samples/sec Loss 0.4901 LearningRate 0.0010 Epoch: 17 Global Step: 300160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:25:52,210-Speed 5152.09 samples/sec Loss 0.5195 LearningRate 0.0010 Epoch: 17 Global Step: 300170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:25:54,187-Speed 5181.64 samples/sec Loss 0.5427 LearningRate 0.0010 Epoch: 17 Global Step: 300180 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:25:56,149-Speed 5219.60 samples/sec Loss 0.5923 LearningRate 0.0010 Epoch: 17 Global Step: 300190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:25:58,141-Speed 5143.28 samples/sec Loss 0.5159 LearningRate 0.0010 Epoch: 17 Global Step: 300200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:26:00,132-Speed 5146.49 samples/sec Loss 0.5429 LearningRate 0.0010 Epoch: 17 Global Step: 300210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:26:02,125-Speed 5139.06 samples/sec Loss 0.5209 LearningRate 0.0010 Epoch: 17 Global Step: 300220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:26:04,097-Speed 5192.93 samples/sec Loss 0.5476 LearningRate 0.0010 Epoch: 17 Global Step: 300230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:26:06,076-Speed 5178.95 samples/sec Loss 0.5399 LearningRate 0.0010 Epoch: 17 Global Step: 300240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:26:08,044-Speed 5203.92 samples/sec Loss 0.5166 LearningRate 0.0010 Epoch: 17 Global Step: 300250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:26:10,013-Speed 5202.35 samples/sec Loss 0.5397 LearningRate 0.0010 Epoch: 17 Global Step: 300260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:26:11,989-Speed 5182.02 samples/sec Loss 0.5684 LearningRate 0.0010 Epoch: 17 Global Step: 300270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:26:14,009-Speed 5073.67 samples/sec Loss 0.4951 LearningRate 0.0010 Epoch: 17 Global Step: 300280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:26:16,035-Speed 5054.66 samples/sec Loss 0.5386 LearningRate 0.0010 Epoch: 17 Global Step: 300290 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:26:18,020-Speed 5159.53 samples/sec Loss 0.5306 LearningRate 0.0010 Epoch: 17 Global Step: 300300 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:26:19,988-Speed 5205.94 samples/sec Loss 0.5620 LearningRate 0.0010 Epoch: 17 Global Step: 300310 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:26:21,957-Speed 5203.83 samples/sec Loss 0.5519 LearningRate 0.0010 Epoch: 17 Global Step: 300320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:26:23,950-Speed 5137.30 samples/sec Loss 0.5280 LearningRate 0.0010 Epoch: 17 Global Step: 300330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:26:25,928-Speed 5180.79 samples/sec Loss 0.5175 LearningRate 0.0010 Epoch: 17 Global Step: 300340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:26:27,907-Speed 5176.41 samples/sec Loss 0.5326 LearningRate 0.0010 Epoch: 17 Global Step: 300350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:26:29,890-Speed 5165.39 samples/sec Loss 0.4894 LearningRate 0.0010 Epoch: 17 Global Step: 300360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:26:31,867-Speed 5179.23 samples/sec Loss 0.5389 LearningRate 0.0010 Epoch: 17 Global Step: 300370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:26:33,890-Speed 5064.38 samples/sec Loss 0.5218 LearningRate 0.0010 Epoch: 17 Global Step: 300380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:26:35,880-Speed 5146.77 samples/sec Loss 0.5228 LearningRate 0.0010 Epoch: 17 Global Step: 300390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:26:37,877-Speed 5130.60 samples/sec Loss 0.4978 LearningRate 0.0010 Epoch: 17 Global Step: 300400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:26:39,847-Speed 5199.64 samples/sec Loss 0.5484 LearningRate 0.0010 Epoch: 17 Global Step: 300410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:26:41,838-Speed 5143.44 samples/sec Loss 0.5549 LearningRate 0.0010 Epoch: 17 Global Step: 300420 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:26:44,049-Speed 4634.01 samples/sec Loss 0.5288 LearningRate 0.0010 Epoch: 17 Global Step: 300430 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:27:15,412-Speed 326.51 samples/sec Loss 0.5136 LearningRate 0.0010 Epoch: 18 Global Step: 300440 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:27:17,406-Speed 5138.82 samples/sec Loss 0.4245 LearningRate 0.0010 Epoch: 18 Global Step: 300450 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:27:19,373-Speed 5207.88 samples/sec Loss 0.4246 LearningRate 0.0010 Epoch: 18 Global Step: 300460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:27:21,335-Speed 5222.01 samples/sec Loss 0.4325 LearningRate 0.0010 Epoch: 18 Global Step: 300470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:27:23,308-Speed 5192.94 samples/sec Loss 0.4072 LearningRate 0.0010 Epoch: 18 Global Step: 300480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:27:25,461-Speed 4757.99 samples/sec Loss 0.4256 LearningRate 0.0010 Epoch: 18 Global Step: 300490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:27:27,470-Speed 5096.66 samples/sec Loss 0.4266 LearningRate 0.0010 Epoch: 18 Global Step: 300500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:27:29,441-Speed 5197.28 samples/sec Loss 0.4473 LearningRate 0.0010 Epoch: 18 Global Step: 300510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:27:31,406-Speed 5214.00 samples/sec Loss 0.4217 LearningRate 0.0010 Epoch: 18 Global Step: 300520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:27:33,408-Speed 5116.20 samples/sec Loss 0.4500 LearningRate 0.0010 Epoch: 18 Global Step: 300530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:27:35,430-Speed 5067.34 samples/sec Loss 0.4508 LearningRate 0.0010 Epoch: 18 Global Step: 300540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:27:37,412-Speed 5169.61 samples/sec Loss 0.4215 LearningRate 0.0010 Epoch: 18 Global Step: 300550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:27:39,395-Speed 5163.90 samples/sec Loss 0.4089 LearningRate 0.0010 Epoch: 18 Global Step: 300560 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:27:41,408-Speed 5088.31 samples/sec Loss 0.4400 LearningRate 0.0010 Epoch: 18 Global Step: 300570 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:27:43,374-Speed 5212.62 samples/sec Loss 0.4173 LearningRate 0.0010 Epoch: 18 Global Step: 300580 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:27:45,337-Speed 5218.47 samples/sec Loss 0.4040 LearningRate 0.0010 Epoch: 18 Global Step: 300590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:27:47,301-Speed 5215.44 samples/sec Loss 0.4328 LearningRate 0.0010 Epoch: 18 Global Step: 300600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:27:49,281-Speed 5171.76 samples/sec Loss 0.4081 LearningRate 0.0010 Epoch: 18 Global Step: 300610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:27:51,249-Speed 5204.71 samples/sec Loss 0.4128 LearningRate 0.0010 Epoch: 18 Global Step: 300620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:27:53,248-Speed 5125.80 samples/sec Loss 0.4132 LearningRate 0.0010 Epoch: 18 Global Step: 300630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:27:55,216-Speed 5206.27 samples/sec Loss 0.4282 LearningRate 0.0010 Epoch: 18 Global Step: 300640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:27:57,192-Speed 5184.21 samples/sec Loss 0.4314 LearningRate 0.0010 Epoch: 18 Global Step: 300650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:27:59,171-Speed 5175.27 samples/sec Loss 0.4296 LearningRate 0.0010 Epoch: 18 Global Step: 300660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:28:01,141-Speed 5200.25 samples/sec Loss 0.4104 LearningRate 0.0010 Epoch: 18 Global Step: 300670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:28:03,124-Speed 5165.46 samples/sec Loss 0.4221 LearningRate 0.0010 Epoch: 18 Global Step: 300680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:28:05,117-Speed 5138.99 samples/sec Loss 0.4003 LearningRate 0.0010 Epoch: 18 Global Step: 300690 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:28:07,116-Speed 5124.87 samples/sec Loss 0.4108 LearningRate 0.0010 Epoch: 18 Global Step: 300700 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:28:09,087-Speed 5197.36 samples/sec Loss 0.4178 LearningRate 0.0010 Epoch: 18 Global Step: 300710 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:28:11,069-Speed 5168.98 samples/sec Loss 0.4243 LearningRate 0.0010 Epoch: 18 Global Step: 300720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:28:13,039-Speed 5197.19 samples/sec Loss 0.4200 LearningRate 0.0010 Epoch: 18 Global Step: 300730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:28:15,037-Speed 5128.84 samples/sec Loss 0.4207 LearningRate 0.0010 Epoch: 18 Global Step: 300740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:28:17,015-Speed 5180.62 samples/sec Loss 0.4337 LearningRate 0.0010 Epoch: 18 Global Step: 300750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:28:18,988-Speed 5193.14 samples/sec Loss 0.4005 LearningRate 0.0010 Epoch: 18 Global Step: 300760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:28:20,974-Speed 5156.84 samples/sec Loss 0.4227 LearningRate 0.0010 Epoch: 18 Global Step: 300770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:28:23,086-Speed 4849.21 samples/sec Loss 0.4314 LearningRate 0.0010 Epoch: 18 Global Step: 300780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:28:25,054-Speed 5205.39 samples/sec Loss 0.4405 LearningRate 0.0010 Epoch: 18 Global Step: 300790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:28:27,023-Speed 5203.27 samples/sec Loss 0.4241 LearningRate 0.0010 Epoch: 18 Global Step: 300800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:28:29,015-Speed 5143.18 samples/sec Loss 0.4282 LearningRate 0.0010 Epoch: 18 Global Step: 300810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:28:30,982-Speed 5206.77 samples/sec Loss 0.4091 LearningRate 0.0010 Epoch: 18 Global Step: 300820 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:28:32,956-Speed 5189.36 samples/sec Loss 0.4377 LearningRate 0.0010 Epoch: 18 Global Step: 300830 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:28:34,938-Speed 5169.58 samples/sec Loss 0.4382 LearningRate 0.0010 Epoch: 18 Global Step: 300840 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:28:36,931-Speed 5139.68 samples/sec Loss 0.4330 LearningRate 0.0010 Epoch: 18 Global Step: 300850 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:28:38,924-Speed 5137.68 samples/sec Loss 0.4128 LearningRate 0.0010 Epoch: 18 Global Step: 300860 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:28:40,903-Speed 5176.80 samples/sec Loss 0.4285 LearningRate 0.0010 Epoch: 18 Global Step: 300870 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:28:42,878-Speed 5186.43 samples/sec Loss 0.4072 LearningRate 0.0010 Epoch: 18 Global Step: 300880 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:28:44,860-Speed 5166.57 samples/sec Loss 0.4302 LearningRate 0.0010 Epoch: 18 Global Step: 300890 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:28:46,858-Speed 5128.72 samples/sec Loss 0.4201 LearningRate 0.0010 Epoch: 18 Global Step: 300900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:28:48,849-Speed 5145.57 samples/sec Loss 0.4479 LearningRate 0.0010 Epoch: 18 Global Step: 300910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:28:50,845-Speed 5131.92 samples/sec Loss 0.4393 LearningRate 0.0010 Epoch: 18 Global Step: 300920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:28:52,837-Speed 5142.29 samples/sec Loss 0.4050 LearningRate 0.0010 Epoch: 18 Global Step: 300930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:28:54,811-Speed 5189.36 samples/sec Loss 0.4036 LearningRate 0.0010 Epoch: 18 Global Step: 300940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:28:56,801-Speed 5147.39 samples/sec Loss 0.4074 LearningRate 0.0010 Epoch: 18 Global Step: 300950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:28:59,264-Speed 4158.98 samples/sec Loss 0.4056 LearningRate 0.0010 Epoch: 18 Global Step: 300960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:29:01,262-Speed 5124.27 samples/sec Loss 0.4108 LearningRate 0.0010 Epoch: 18 Global Step: 300970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:29:03,230-Speed 5209.00 samples/sec Loss 0.3974 LearningRate 0.0010 Epoch: 18 Global Step: 300980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:29:05,194-Speed 5215.20 samples/sec Loss 0.4148 LearningRate 0.0010 Epoch: 18 Global Step: 300990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:29:07,162-Speed 5203.80 samples/sec Loss 0.4086 LearningRate 0.0010 Epoch: 18 Global Step: 301000 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:29:09,141-Speed 5175.87 samples/sec Loss 0.4340 LearningRate 0.0010 Epoch: 18 Global Step: 301010 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:29:11,151-Speed 5096.08 samples/sec Loss 0.4136 LearningRate 0.0010 Epoch: 18 Global Step: 301020 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:29:13,130-Speed 5176.26 samples/sec Loss 0.4083 LearningRate 0.0010 Epoch: 18 Global Step: 301030 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:29:15,099-Speed 5202.34 samples/sec Loss 0.4032 LearningRate 0.0010 Epoch: 18 Global Step: 301040 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:29:17,066-Speed 5207.88 samples/sec Loss 0.3993 LearningRate 0.0010 Epoch: 18 Global Step: 301050 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:29:19,041-Speed 5186.47 samples/sec Loss 0.4169 LearningRate 0.0010 Epoch: 18 Global Step: 301060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:29:21,015-Speed 5190.32 samples/sec Loss 0.4061 LearningRate 0.0010 Epoch: 18 Global Step: 301070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:29:23,007-Speed 5142.28 samples/sec Loss 0.4454 LearningRate 0.0010 Epoch: 18 Global Step: 301080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:29:24,991-Speed 5161.57 samples/sec Loss 0.4120 LearningRate 0.0010 Epoch: 18 Global Step: 301090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:29:26,962-Speed 5199.62 samples/sec Loss 0.4098 LearningRate 0.0010 Epoch: 18 Global Step: 301100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:29:28,944-Speed 5166.88 samples/sec Loss 0.4262 LearningRate 0.0010 Epoch: 18 Global Step: 301110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:29:30,926-Speed 5168.82 samples/sec Loss 0.4372 LearningRate 0.0010 Epoch: 18 Global Step: 301120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:29:32,895-Speed 5202.62 samples/sec Loss 0.4063 LearningRate 0.0010 Epoch: 18 Global Step: 301130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:29:34,862-Speed 5207.03 samples/sec Loss 0.3842 LearningRate 0.0010 Epoch: 18 Global Step: 301140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:29:36,835-Speed 5191.34 samples/sec Loss 0.4193 LearningRate 0.0010 Epoch: 18 Global Step: 301150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:29:38,801-Speed 5211.36 samples/sec Loss 0.4236 LearningRate 0.0010 Epoch: 18 Global Step: 301160 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:29:40,779-Speed 5178.10 samples/sec Loss 0.4274 LearningRate 0.0010 Epoch: 18 Global Step: 301170 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:29:42,757-Speed 5177.63 samples/sec Loss 0.4020 LearningRate 0.0010 Epoch: 18 Global Step: 301180 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:29:44,738-Speed 5170.87 samples/sec Loss 0.4283 LearningRate 0.0010 Epoch: 18 Global Step: 301190 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:29:46,713-Speed 5187.41 samples/sec Loss 0.4133 LearningRate 0.0010 Epoch: 18 Global Step: 301200 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:29:48,685-Speed 5195.15 samples/sec Loss 0.4195 LearningRate 0.0010 Epoch: 18 Global Step: 301210 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:29:50,671-Speed 5159.97 samples/sec Loss 0.4159 LearningRate 0.0010 Epoch: 18 Global Step: 301220 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:29:52,642-Speed 5196.02 samples/sec Loss 0.4181 LearningRate 0.0010 Epoch: 18 Global Step: 301230 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:29:54,632-Speed 5146.58 samples/sec Loss 0.3886 LearningRate 0.0010 Epoch: 18 Global Step: 301240 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:29:56,604-Speed 5196.19 samples/sec Loss 0.4399 LearningRate 0.0010 Epoch: 18 Global Step: 301250 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:29:58,579-Speed 5185.59 samples/sec Loss 0.4086 LearningRate 0.0010 Epoch: 18 Global Step: 301260 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:30:00,565-Speed 5158.54 samples/sec Loss 0.4360 LearningRate 0.0010 Epoch: 18 Global Step: 301270 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:30:02,573-Speed 5101.17 samples/sec Loss 0.4329 LearningRate 0.0010 Epoch: 18 Global Step: 301280 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:30:04,532-Speed 5227.50 samples/sec Loss 0.4190 LearningRate 0.0009 Epoch: 18 Global Step: 301290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:30:06,503-Speed 5197.98 samples/sec Loss 0.4284 LearningRate 0.0009 Epoch: 18 Global Step: 301300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:30:08,475-Speed 5194.65 samples/sec Loss 0.4353 LearningRate 0.0009 Epoch: 18 Global Step: 301310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:30:10,444-Speed 5201.75 samples/sec Loss 0.4472 LearningRate 0.0009 Epoch: 18 Global Step: 301320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:30:12,417-Speed 5193.30 samples/sec Loss 0.4173 LearningRate 0.0009 Epoch: 18 Global Step: 301330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:30:14,392-Speed 5187.10 samples/sec Loss 0.4281 LearningRate 0.0009 Epoch: 18 Global Step: 301340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:30:16,388-Speed 5129.91 samples/sec Loss 0.4352 LearningRate 0.0009 Epoch: 18 Global Step: 301350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:30:18,374-Speed 5158.92 samples/sec Loss 0.4162 LearningRate 0.0009 Epoch: 18 Global Step: 301360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:30:20,344-Speed 5199.40 samples/sec Loss 0.4072 LearningRate 0.0009 Epoch: 18 Global Step: 301370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:30:22,314-Speed 5198.41 samples/sec Loss 0.4235 LearningRate 0.0009 Epoch: 18 Global Step: 301380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:30:24,316-Speed 5118.46 samples/sec Loss 0.4311 LearningRate 0.0009 Epoch: 18 Global Step: 301390 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:30:26,311-Speed 5134.99 samples/sec Loss 0.4296 LearningRate 0.0009 Epoch: 18 Global Step: 301400 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:30:28,309-Speed 5126.99 samples/sec Loss 0.4013 LearningRate 0.0009 Epoch: 18 Global Step: 301410 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:30:30,283-Speed 5188.28 samples/sec Loss 0.3981 LearningRate 0.0009 Epoch: 18 Global Step: 301420 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:30:32,252-Speed 5203.43 samples/sec Loss 0.4341 LearningRate 0.0009 Epoch: 18 Global Step: 301430 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:30:34,223-Speed 5196.57 samples/sec Loss 0.4460 LearningRate 0.0009 Epoch: 18 Global Step: 301440 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:30:36,201-Speed 5180.58 samples/sec Loss 0.4242 LearningRate 0.0009 Epoch: 18 Global Step: 301450 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:30:38,173-Speed 5195.15 samples/sec Loss 0.4155 LearningRate 0.0009 Epoch: 18 Global Step: 301460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:30:40,146-Speed 5190.90 samples/sec Loss 0.4372 LearningRate 0.0009 Epoch: 18 Global Step: 301470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:30:42,116-Speed 5198.67 samples/sec Loss 0.4430 LearningRate 0.0009 Epoch: 18 Global Step: 301480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:30:44,086-Speed 5198.83 samples/sec Loss 0.4114 LearningRate 0.0009 Epoch: 18 Global Step: 301490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:30:46,057-Speed 5199.40 samples/sec Loss 0.3981 LearningRate 0.0009 Epoch: 18 Global Step: 301500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:30:48,057-Speed 5119.81 samples/sec Loss 0.4435 LearningRate 0.0009 Epoch: 18 Global Step: 301510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:30:50,060-Speed 5116.06 samples/sec Loss 0.4448 LearningRate 0.0009 Epoch: 18 Global Step: 301520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:30:52,039-Speed 5175.47 samples/sec Loss 0.3896 LearningRate 0.0009 Epoch: 18 Global Step: 301530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:30:54,007-Speed 5204.52 samples/sec Loss 0.4222 LearningRate 0.0009 Epoch: 18 Global Step: 301540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:30:56,025-Speed 5078.30 samples/sec Loss 0.4236 LearningRate 0.0009 Epoch: 18 Global Step: 301550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:30:58,014-Speed 5150.79 samples/sec Loss 0.4176 LearningRate 0.0009 Epoch: 18 Global Step: 301560 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:30:59,984-Speed 5198.94 samples/sec Loss 0.4241 LearningRate 0.0009 Epoch: 18 Global Step: 301570 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:31:01,980-Speed 5131.50 samples/sec Loss 0.4418 LearningRate 0.0009 Epoch: 18 Global Step: 301580 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:31:03,949-Speed 5204.78 samples/sec Loss 0.3976 LearningRate 0.0009 Epoch: 18 Global Step: 301590 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:31:05,929-Speed 5173.90 samples/sec Loss 0.4242 LearningRate 0.0009 Epoch: 18 Global Step: 301600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:31:07,908-Speed 5174.20 samples/sec Loss 0.4419 LearningRate 0.0009 Epoch: 18 Global Step: 301610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:31:09,889-Speed 5172.69 samples/sec Loss 0.4424 LearningRate 0.0009 Epoch: 18 Global Step: 301620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:31:11,884-Speed 5134.43 samples/sec Loss 0.4171 LearningRate 0.0009 Epoch: 18 Global Step: 301630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:31:13,865-Speed 5169.11 samples/sec Loss 0.4381 LearningRate 0.0009 Epoch: 18 Global Step: 301640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:31:15,838-Speed 5193.59 samples/sec Loss 0.3965 LearningRate 0.0009 Epoch: 18 Global Step: 301650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:31:17,810-Speed 5197.42 samples/sec Loss 0.4210 LearningRate 0.0009 Epoch: 18 Global Step: 301660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:31:19,781-Speed 5196.02 samples/sec Loss 0.4080 LearningRate 0.0009 Epoch: 18 Global Step: 301670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:31:21,767-Speed 5157.41 samples/sec Loss 0.4279 LearningRate 0.0009 Epoch: 18 Global Step: 301680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:31:23,746-Speed 5174.56 samples/sec Loss 0.4242 LearningRate 0.0009 Epoch: 18 Global Step: 301690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:31:25,735-Speed 5150.38 samples/sec Loss 0.4136 LearningRate 0.0009 Epoch: 18 Global Step: 301700 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:31:27,715-Speed 5173.62 samples/sec Loss 0.4051 LearningRate 0.0009 Epoch: 18 Global Step: 301710 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:31:29,741-Speed 5056.07 samples/sec Loss 0.4215 LearningRate 0.0009 Epoch: 18 Global Step: 301720 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:31:31,712-Speed 5196.47 samples/sec Loss 0.4102 LearningRate 0.0009 Epoch: 18 Global Step: 301730 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:31:33,696-Speed 5164.09 samples/sec Loss 0.4062 LearningRate 0.0009 Epoch: 18 Global Step: 301740 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:31:35,667-Speed 5198.50 samples/sec Loss 0.4237 LearningRate 0.0009 Epoch: 18 Global Step: 301750 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:31:37,664-Speed 5128.79 samples/sec Loss 0.4273 LearningRate 0.0009 Epoch: 18 Global Step: 301760 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:31:39,641-Speed 5181.94 samples/sec Loss 0.4348 LearningRate 0.0009 Epoch: 18 Global Step: 301770 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:31:41,624-Speed 5164.95 samples/sec Loss 0.4047 LearningRate 0.0009 Epoch: 18 Global Step: 301780 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:31:43,597-Speed 5191.15 samples/sec Loss 0.4332 LearningRate 0.0009 Epoch: 18 Global Step: 301790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:31:45,577-Speed 5173.03 samples/sec Loss 0.4165 LearningRate 0.0009 Epoch: 18 Global Step: 301800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:31:47,561-Speed 5165.69 samples/sec Loss 0.4211 LearningRate 0.0009 Epoch: 18 Global Step: 301810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:31:49,543-Speed 5168.15 samples/sec Loss 0.4297 LearningRate 0.0009 Epoch: 18 Global Step: 301820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:31:51,521-Speed 5177.70 samples/sec Loss 0.4156 LearningRate 0.0009 Epoch: 18 Global Step: 301830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:31:53,513-Speed 5142.29 samples/sec Loss 0.4286 LearningRate 0.0009 Epoch: 18 Global Step: 301840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:31:55,480-Speed 5207.10 samples/sec Loss 0.4544 LearningRate 0.0009 Epoch: 18 Global Step: 301850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:31:57,452-Speed 5196.00 samples/sec Loss 0.4337 LearningRate 0.0009 Epoch: 18 Global Step: 301860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:31:59,464-Speed 5089.89 samples/sec Loss 0.4215 LearningRate 0.0009 Epoch: 18 Global Step: 301870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:32:01,464-Speed 5121.65 samples/sec Loss 0.4178 LearningRate 0.0009 Epoch: 18 Global Step: 301880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:32:03,467-Speed 5114.34 samples/sec Loss 0.4133 LearningRate 0.0009 Epoch: 18 Global Step: 301890 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:32:05,444-Speed 5182.73 samples/sec Loss 0.4047 LearningRate 0.0009 Epoch: 18 Global Step: 301900 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:32:07,418-Speed 5188.88 samples/sec Loss 0.4425 LearningRate 0.0009 Epoch: 18 Global Step: 301910 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:32:09,391-Speed 5190.13 samples/sec Loss 0.4281 LearningRate 0.0009 Epoch: 18 Global Step: 301920 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:32:11,379-Speed 5154.32 samples/sec Loss 0.4094 LearningRate 0.0009 Epoch: 18 Global Step: 301930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:32:13,388-Speed 5096.80 samples/sec Loss 0.3987 LearningRate 0.0009 Epoch: 18 Global Step: 301940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:32:15,400-Speed 5092.21 samples/sec Loss 0.4315 LearningRate 0.0009 Epoch: 18 Global Step: 301950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:32:17,402-Speed 5115.36 samples/sec Loss 0.4328 LearningRate 0.0009 Epoch: 18 Global Step: 301960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:32:19,393-Speed 5146.85 samples/sec Loss 0.4072 LearningRate 0.0009 Epoch: 18 Global Step: 301970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:32:21,384-Speed 5144.80 samples/sec Loss 0.4215 LearningRate 0.0009 Epoch: 18 Global Step: 301980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:32:23,357-Speed 5191.26 samples/sec Loss 0.3923 LearningRate 0.0009 Epoch: 18 Global Step: 301990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:32:25,329-Speed 5192.36 samples/sec Loss 0.4408 LearningRate 0.0009 Epoch: 18 Global Step: 302000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:32:51,966-[lfw][302000]XNorm: 21.723595 Training: 2022-04-11 19:32:51,967-[lfw][302000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-04-11 19:32:51,967-[lfw][302000]Accuracy-Highest: 0.99833 Training: 2022-04-11 19:33:22,775-[cfp_fp][302000]XNorm: 22.211034 Training: 2022-04-11 19:33:22,775-[cfp_fp][302000]Accuracy-Flip: 0.98857+-0.00383 Training: 2022-04-11 19:33:22,776-[cfp_fp][302000]Accuracy-Highest: 0.99014 Training: 2022-04-11 19:33:49,345-[agedb_30][302000]XNorm: 22.798142 Training: 2022-04-11 19:33:49,345-[agedb_30][302000]Accuracy-Flip: 0.98317+-0.00626 Training: 2022-04-11 19:33:49,346-[agedb_30][302000]Accuracy-Highest: 0.98383 Training: 2022-04-11 19:33:51,328-Speed 119.07 samples/sec Loss 0.4416 LearningRate 0.0009 Epoch: 18 Global Step: 302010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:33:53,287-Speed 5229.42 samples/sec Loss 0.4049 LearningRate 0.0009 Epoch: 18 Global Step: 302020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:33:55,245-Speed 5230.66 samples/sec Loss 0.3984 LearningRate 0.0009 Epoch: 18 Global Step: 302030 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:33:57,204-Speed 5229.43 samples/sec Loss 0.4453 LearningRate 0.0009 Epoch: 18 Global Step: 302040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:33:59,175-Speed 5197.08 samples/sec Loss 0.4343 LearningRate 0.0009 Epoch: 18 Global Step: 302050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:34:01,142-Speed 5205.29 samples/sec Loss 0.4305 LearningRate 0.0009 Epoch: 18 Global Step: 302060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:34:03,120-Speed 5178.60 samples/sec Loss 0.4222 LearningRate 0.0009 Epoch: 18 Global Step: 302070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:34:05,084-Speed 5218.08 samples/sec Loss 0.4288 LearningRate 0.0009 Epoch: 18 Global Step: 302080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:34:07,064-Speed 5173.49 samples/sec Loss 0.4056 LearningRate 0.0009 Epoch: 18 Global Step: 302090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:34:09,044-Speed 5172.26 samples/sec Loss 0.4172 LearningRate 0.0009 Epoch: 18 Global Step: 302100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:34:11,020-Speed 5184.77 samples/sec Loss 0.4171 LearningRate 0.0009 Epoch: 18 Global Step: 302110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:34:12,989-Speed 5202.21 samples/sec Loss 0.4280 LearningRate 0.0009 Epoch: 18 Global Step: 302120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:34:14,955-Speed 5209.48 samples/sec Loss 0.4253 LearningRate 0.0009 Epoch: 18 Global Step: 302130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:34:16,938-Speed 5165.84 samples/sec Loss 0.4209 LearningRate 0.0009 Epoch: 18 Global Step: 302140 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:34:18,923-Speed 5160.56 samples/sec Loss 0.4258 LearningRate 0.0009 Epoch: 18 Global Step: 302150 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:34:20,899-Speed 5184.21 samples/sec Loss 0.4150 LearningRate 0.0009 Epoch: 18 Global Step: 302160 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:34:22,860-Speed 5224.14 samples/sec Loss 0.4400 LearningRate 0.0009 Epoch: 18 Global Step: 302170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:34:24,833-Speed 5191.28 samples/sec Loss 0.4140 LearningRate 0.0009 Epoch: 18 Global Step: 302180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:34:26,805-Speed 5194.69 samples/sec Loss 0.4192 LearningRate 0.0009 Epoch: 18 Global Step: 302190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:34:28,773-Speed 5204.91 samples/sec Loss 0.4386 LearningRate 0.0009 Epoch: 18 Global Step: 302200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:34:30,762-Speed 5150.04 samples/sec Loss 0.4212 LearningRate 0.0009 Epoch: 18 Global Step: 302210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:34:32,761-Speed 5122.72 samples/sec Loss 0.4357 LearningRate 0.0009 Epoch: 18 Global Step: 302220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:34:34,761-Speed 5123.36 samples/sec Loss 0.4109 LearningRate 0.0009 Epoch: 18 Global Step: 302230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:34:36,782-Speed 5068.41 samples/sec Loss 0.4410 LearningRate 0.0009 Epoch: 18 Global Step: 302240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:34:38,756-Speed 5189.06 samples/sec Loss 0.3839 LearningRate 0.0009 Epoch: 18 Global Step: 302250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:34:40,751-Speed 5133.15 samples/sec Loss 0.3965 LearningRate 0.0009 Epoch: 18 Global Step: 302260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:34:42,725-Speed 5188.53 samples/sec Loss 0.4502 LearningRate 0.0009 Epoch: 18 Global Step: 302270 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:34:44,695-Speed 5200.07 samples/sec Loss 0.3967 LearningRate 0.0009 Epoch: 18 Global Step: 302280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:34:46,677-Speed 5170.45 samples/sec Loss 0.4349 LearningRate 0.0009 Epoch: 18 Global Step: 302290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:34:48,753-Speed 4932.75 samples/sec Loss 0.4198 LearningRate 0.0009 Epoch: 18 Global Step: 302300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:34:50,764-Speed 5095.23 samples/sec Loss 0.4227 LearningRate 0.0009 Epoch: 18 Global Step: 302310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:34:52,751-Speed 5152.82 samples/sec Loss 0.4409 LearningRate 0.0009 Epoch: 18 Global Step: 302320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:34:54,737-Speed 5159.64 samples/sec Loss 0.4244 LearningRate 0.0009 Epoch: 18 Global Step: 302330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:34:56,717-Speed 5172.30 samples/sec Loss 0.4064 LearningRate 0.0009 Epoch: 18 Global Step: 302340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:34:58,716-Speed 5125.53 samples/sec Loss 0.4112 LearningRate 0.0009 Epoch: 18 Global Step: 302350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:35:00,708-Speed 5142.22 samples/sec Loss 0.4165 LearningRate 0.0009 Epoch: 18 Global Step: 302360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:35:02,743-Speed 5032.70 samples/sec Loss 0.4032 LearningRate 0.0009 Epoch: 18 Global Step: 302370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:35:04,737-Speed 5138.13 samples/sec Loss 0.3744 LearningRate 0.0009 Epoch: 18 Global Step: 302380 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:35:06,712-Speed 5185.76 samples/sec Loss 0.3975 LearningRate 0.0009 Epoch: 18 Global Step: 302390 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:35:08,691-Speed 5176.17 samples/sec Loss 0.4617 LearningRate 0.0009 Epoch: 18 Global Step: 302400 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:35:10,670-Speed 5176.10 samples/sec Loss 0.4411 LearningRate 0.0009 Epoch: 18 Global Step: 302410 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:35:12,643-Speed 5191.01 samples/sec Loss 0.4569 LearningRate 0.0009 Epoch: 18 Global Step: 302420 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:35:14,618-Speed 5188.51 samples/sec Loss 0.4298 LearningRate 0.0009 Epoch: 18 Global Step: 302430 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:35:16,600-Speed 5165.52 samples/sec Loss 0.4219 LearningRate 0.0009 Epoch: 18 Global Step: 302440 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:35:18,590-Speed 5149.91 samples/sec Loss 0.4310 LearningRate 0.0009 Epoch: 18 Global Step: 302450 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:35:20,573-Speed 5164.37 samples/sec Loss 0.4314 LearningRate 0.0009 Epoch: 18 Global Step: 302460 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:35:22,569-Speed 5131.87 samples/sec Loss 0.4286 LearningRate 0.0009 Epoch: 18 Global Step: 302470 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:35:24,552-Speed 5164.67 samples/sec Loss 0.4397 LearningRate 0.0009 Epoch: 18 Global Step: 302480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:35:26,532-Speed 5175.27 samples/sec Loss 0.4190 LearningRate 0.0009 Epoch: 18 Global Step: 302490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:35:28,509-Speed 5180.46 samples/sec Loss 0.4418 LearningRate 0.0009 Epoch: 18 Global Step: 302500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:35:30,496-Speed 5155.79 samples/sec Loss 0.4196 LearningRate 0.0009 Epoch: 18 Global Step: 302510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:35:32,470-Speed 5189.78 samples/sec Loss 0.4371 LearningRate 0.0009 Epoch: 18 Global Step: 302520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:35:34,444-Speed 5190.41 samples/sec Loss 0.4445 LearningRate 0.0009 Epoch: 18 Global Step: 302530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:35:36,452-Speed 5100.10 samples/sec Loss 0.4080 LearningRate 0.0009 Epoch: 18 Global Step: 302540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:35:38,437-Speed 5160.02 samples/sec Loss 0.4235 LearningRate 0.0009 Epoch: 18 Global Step: 302550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:35:40,415-Speed 5179.30 samples/sec Loss 0.4360 LearningRate 0.0009 Epoch: 18 Global Step: 302560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:35:42,387-Speed 5193.99 samples/sec Loss 0.4054 LearningRate 0.0009 Epoch: 18 Global Step: 302570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:35:44,380-Speed 5140.98 samples/sec Loss 0.4321 LearningRate 0.0009 Epoch: 18 Global Step: 302580 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:35:46,367-Speed 5153.10 samples/sec Loss 0.4299 LearningRate 0.0009 Epoch: 18 Global Step: 302590 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:35:48,343-Speed 5185.16 samples/sec Loss 0.4089 LearningRate 0.0009 Epoch: 18 Global Step: 302600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:35:50,319-Speed 5184.04 samples/sec Loss 0.4149 LearningRate 0.0009 Epoch: 18 Global Step: 302610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:35:52,294-Speed 5188.76 samples/sec Loss 0.3983 LearningRate 0.0009 Epoch: 18 Global Step: 302620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:35:54,280-Speed 5156.69 samples/sec Loss 0.4198 LearningRate 0.0009 Epoch: 18 Global Step: 302630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:35:56,251-Speed 5196.59 samples/sec Loss 0.4160 LearningRate 0.0009 Epoch: 18 Global Step: 302640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:35:58,214-Speed 5218.19 samples/sec Loss 0.4363 LearningRate 0.0009 Epoch: 18 Global Step: 302650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:36:00,189-Speed 5187.61 samples/sec Loss 0.4380 LearningRate 0.0009 Epoch: 18 Global Step: 302660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:36:02,162-Speed 5190.90 samples/sec Loss 0.4480 LearningRate 0.0009 Epoch: 18 Global Step: 302670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:36:04,159-Speed 5143.67 samples/sec Loss 0.4115 LearningRate 0.0009 Epoch: 18 Global Step: 302680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:36:06,131-Speed 5194.59 samples/sec Loss 0.4192 LearningRate 0.0009 Epoch: 18 Global Step: 302690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:36:08,153-Speed 5064.48 samples/sec Loss 0.4012 LearningRate 0.0009 Epoch: 18 Global Step: 302700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:36:10,121-Speed 5205.47 samples/sec Loss 0.4246 LearningRate 0.0009 Epoch: 18 Global Step: 302710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:36:12,104-Speed 5166.49 samples/sec Loss 0.4302 LearningRate 0.0009 Epoch: 18 Global Step: 302720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:36:14,086-Speed 5169.16 samples/sec Loss 0.3935 LearningRate 0.0009 Epoch: 18 Global Step: 302730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:36:16,067-Speed 5170.68 samples/sec Loss 0.4070 LearningRate 0.0009 Epoch: 18 Global Step: 302740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:36:18,036-Speed 5202.56 samples/sec Loss 0.4391 LearningRate 0.0009 Epoch: 18 Global Step: 302750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:36:20,013-Speed 5179.70 samples/sec Loss 0.4176 LearningRate 0.0009 Epoch: 18 Global Step: 302760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:36:21,985-Speed 5193.74 samples/sec Loss 0.4146 LearningRate 0.0009 Epoch: 18 Global Step: 302770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:36:23,957-Speed 5194.61 samples/sec Loss 0.4267 LearningRate 0.0009 Epoch: 18 Global Step: 302780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:36:25,939-Speed 5169.16 samples/sec Loss 0.4216 LearningRate 0.0009 Epoch: 18 Global Step: 302790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:36:27,909-Speed 5200.56 samples/sec Loss 0.4299 LearningRate 0.0009 Epoch: 18 Global Step: 302800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:36:29,883-Speed 5187.51 samples/sec Loss 0.4130 LearningRate 0.0009 Epoch: 18 Global Step: 302810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:36:31,868-Speed 5160.92 samples/sec Loss 0.4241 LearningRate 0.0009 Epoch: 18 Global Step: 302820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:36:33,845-Speed 5181.54 samples/sec Loss 0.4317 LearningRate 0.0009 Epoch: 18 Global Step: 302830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:36:35,825-Speed 5175.43 samples/sec Loss 0.4293 LearningRate 0.0009 Epoch: 18 Global Step: 302840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:36:37,797-Speed 5193.03 samples/sec Loss 0.4251 LearningRate 0.0009 Epoch: 18 Global Step: 302850 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:36:39,786-Speed 5150.85 samples/sec Loss 0.4092 LearningRate 0.0009 Epoch: 18 Global Step: 302860 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:36:41,756-Speed 5199.26 samples/sec Loss 0.4070 LearningRate 0.0009 Epoch: 18 Global Step: 302870 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:36:43,719-Speed 5217.22 samples/sec Loss 0.4279 LearningRate 0.0009 Epoch: 18 Global Step: 302880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:36:45,688-Speed 5201.20 samples/sec Loss 0.4436 LearningRate 0.0009 Epoch: 18 Global Step: 302890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:36:47,698-Speed 5098.08 samples/sec Loss 0.4375 LearningRate 0.0009 Epoch: 18 Global Step: 302900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:36:49,667-Speed 5202.23 samples/sec Loss 0.4323 LearningRate 0.0009 Epoch: 18 Global Step: 302910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:36:51,639-Speed 5195.45 samples/sec Loss 0.4305 LearningRate 0.0009 Epoch: 18 Global Step: 302920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:36:53,620-Speed 5168.16 samples/sec Loss 0.4264 LearningRate 0.0009 Epoch: 18 Global Step: 302930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:36:55,589-Speed 5203.69 samples/sec Loss 0.4103 LearningRate 0.0009 Epoch: 18 Global Step: 302940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:36:57,572-Speed 5167.79 samples/sec Loss 0.4117 LearningRate 0.0009 Epoch: 18 Global Step: 302950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:36:59,546-Speed 5187.70 samples/sec Loss 0.3963 LearningRate 0.0009 Epoch: 18 Global Step: 302960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:37:01,536-Speed 5147.30 samples/sec Loss 0.4173 LearningRate 0.0009 Epoch: 18 Global Step: 302970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:37:03,524-Speed 5153.70 samples/sec Loss 0.4145 LearningRate 0.0009 Epoch: 18 Global Step: 302980 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:37:05,514-Speed 5149.42 samples/sec Loss 0.4217 LearningRate 0.0009 Epoch: 18 Global Step: 302990 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:37:07,501-Speed 5156.71 samples/sec Loss 0.4225 LearningRate 0.0009 Epoch: 18 Global Step: 303000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:37:09,481-Speed 5171.42 samples/sec Loss 0.4120 LearningRate 0.0009 Epoch: 18 Global Step: 303010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:37:11,471-Speed 5147.04 samples/sec Loss 0.4128 LearningRate 0.0009 Epoch: 18 Global Step: 303020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:37:13,498-Speed 5053.80 samples/sec Loss 0.4196 LearningRate 0.0009 Epoch: 18 Global Step: 303030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:37:15,490-Speed 5141.97 samples/sec Loss 0.4229 LearningRate 0.0009 Epoch: 18 Global Step: 303040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:37:17,463-Speed 5192.98 samples/sec Loss 0.4324 LearningRate 0.0008 Epoch: 18 Global Step: 303050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:37:19,437-Speed 5188.75 samples/sec Loss 0.4158 LearningRate 0.0008 Epoch: 18 Global Step: 303060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:37:21,411-Speed 5190.23 samples/sec Loss 0.4155 LearningRate 0.0008 Epoch: 18 Global Step: 303070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:37:23,412-Speed 5118.79 samples/sec Loss 0.4246 LearningRate 0.0008 Epoch: 18 Global Step: 303080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:37:25,405-Speed 5140.58 samples/sec Loss 0.4328 LearningRate 0.0008 Epoch: 18 Global Step: 303090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:37:27,402-Speed 5129.83 samples/sec Loss 0.4285 LearningRate 0.0008 Epoch: 18 Global Step: 303100 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:37:29,389-Speed 5154.35 samples/sec Loss 0.4119 LearningRate 0.0008 Epoch: 18 Global Step: 303110 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:37:31,377-Speed 5152.92 samples/sec Loss 0.4432 LearningRate 0.0008 Epoch: 18 Global Step: 303120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:37:33,347-Speed 5199.21 samples/sec Loss 0.4103 LearningRate 0.0008 Epoch: 18 Global Step: 303130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:37:35,321-Speed 5188.21 samples/sec Loss 0.4112 LearningRate 0.0008 Epoch: 18 Global Step: 303140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:37:37,300-Speed 5176.75 samples/sec Loss 0.4203 LearningRate 0.0008 Epoch: 18 Global Step: 303150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:37:39,296-Speed 5133.30 samples/sec Loss 0.4141 LearningRate 0.0008 Epoch: 18 Global Step: 303160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:37:41,268-Speed 5192.80 samples/sec Loss 0.4349 LearningRate 0.0008 Epoch: 18 Global Step: 303170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:37:43,243-Speed 5187.79 samples/sec Loss 0.4003 LearningRate 0.0008 Epoch: 18 Global Step: 303180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:37:45,213-Speed 5198.19 samples/sec Loss 0.4110 LearningRate 0.0008 Epoch: 18 Global Step: 303190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:37:47,201-Speed 5152.68 samples/sec Loss 0.4192 LearningRate 0.0008 Epoch: 18 Global Step: 303200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:37:49,174-Speed 5194.24 samples/sec Loss 0.4115 LearningRate 0.0008 Epoch: 18 Global Step: 303210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:37:51,170-Speed 5130.91 samples/sec Loss 0.4234 LearningRate 0.0008 Epoch: 18 Global Step: 303220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:37:53,191-Speed 5068.23 samples/sec Loss 0.4128 LearningRate 0.0008 Epoch: 18 Global Step: 303230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:37:55,165-Speed 5189.42 samples/sec Loss 0.4034 LearningRate 0.0008 Epoch: 18 Global Step: 303240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:37:57,147-Speed 5168.62 samples/sec Loss 0.4279 LearningRate 0.0008 Epoch: 18 Global Step: 303250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:37:59,140-Speed 5138.50 samples/sec Loss 0.4173 LearningRate 0.0008 Epoch: 18 Global Step: 303260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:38:01,126-Speed 5160.13 samples/sec Loss 0.4243 LearningRate 0.0008 Epoch: 18 Global Step: 303270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:38:03,099-Speed 5189.80 samples/sec Loss 0.4446 LearningRate 0.0008 Epoch: 18 Global Step: 303280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:38:05,073-Speed 5191.21 samples/sec Loss 0.4246 LearningRate 0.0008 Epoch: 18 Global Step: 303290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:38:07,049-Speed 5183.20 samples/sec Loss 0.4379 LearningRate 0.0008 Epoch: 18 Global Step: 303300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:38:09,022-Speed 5193.04 samples/sec Loss 0.4045 LearningRate 0.0008 Epoch: 18 Global Step: 303310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:38:11,010-Speed 5152.52 samples/sec Loss 0.4121 LearningRate 0.0008 Epoch: 18 Global Step: 303320 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:38:12,990-Speed 5171.75 samples/sec Loss 0.4350 LearningRate 0.0008 Epoch: 18 Global Step: 303330 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:38:14,961-Speed 5198.28 samples/sec Loss 0.4261 LearningRate 0.0008 Epoch: 18 Global Step: 303340 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:38:16,942-Speed 5169.20 samples/sec Loss 0.4108 LearningRate 0.0008 Epoch: 18 Global Step: 303350 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:38:18,934-Speed 5144.00 samples/sec Loss 0.4271 LearningRate 0.0008 Epoch: 18 Global Step: 303360 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:38:20,910-Speed 5184.06 samples/sec Loss 0.3837 LearningRate 0.0008 Epoch: 18 Global Step: 303370 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:38:22,881-Speed 5196.41 samples/sec Loss 0.4158 LearningRate 0.0008 Epoch: 18 Global Step: 303380 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:38:24,854-Speed 5192.51 samples/sec Loss 0.4157 LearningRate 0.0008 Epoch: 18 Global Step: 303390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:38:26,839-Speed 5160.52 samples/sec Loss 0.4160 LearningRate 0.0008 Epoch: 18 Global Step: 303400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:38:28,811-Speed 5193.79 samples/sec Loss 0.4172 LearningRate 0.0008 Epoch: 18 Global Step: 303410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:38:30,781-Speed 5199.89 samples/sec Loss 0.4431 LearningRate 0.0008 Epoch: 18 Global Step: 303420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:38:32,750-Speed 5202.34 samples/sec Loss 0.4209 LearningRate 0.0008 Epoch: 18 Global Step: 303430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:38:34,720-Speed 5198.24 samples/sec Loss 0.4079 LearningRate 0.0008 Epoch: 18 Global Step: 303440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:38:36,708-Speed 5154.78 samples/sec Loss 0.4356 LearningRate 0.0008 Epoch: 18 Global Step: 303450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:38:38,680-Speed 5194.24 samples/sec Loss 0.4143 LearningRate 0.0008 Epoch: 18 Global Step: 303460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:38:40,650-Speed 5199.52 samples/sec Loss 0.4200 LearningRate 0.0008 Epoch: 18 Global Step: 303470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:38:42,618-Speed 5204.07 samples/sec Loss 0.4257 LearningRate 0.0008 Epoch: 18 Global Step: 303480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:38:44,589-Speed 5198.47 samples/sec Loss 0.4016 LearningRate 0.0008 Epoch: 18 Global Step: 303490 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:38:46,564-Speed 5185.87 samples/sec Loss 0.4276 LearningRate 0.0008 Epoch: 18 Global Step: 303500 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:38:48,534-Speed 5199.27 samples/sec Loss 0.4124 LearningRate 0.0008 Epoch: 18 Global Step: 303510 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:38:50,510-Speed 5185.40 samples/sec Loss 0.4326 LearningRate 0.0008 Epoch: 18 Global Step: 303520 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:38:52,501-Speed 5142.82 samples/sec Loss 0.4282 LearningRate 0.0008 Epoch: 18 Global Step: 303530 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:38:54,466-Speed 5214.74 samples/sec Loss 0.4180 LearningRate 0.0008 Epoch: 18 Global Step: 303540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:38:56,438-Speed 5192.98 samples/sec Loss 0.4190 LearningRate 0.0008 Epoch: 18 Global Step: 303550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:38:58,406-Speed 5204.19 samples/sec Loss 0.4426 LearningRate 0.0008 Epoch: 18 Global Step: 303560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:39:00,393-Speed 5155.87 samples/sec Loss 0.4321 LearningRate 0.0008 Epoch: 18 Global Step: 303570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:39:02,367-Speed 5189.05 samples/sec Loss 0.4331 LearningRate 0.0008 Epoch: 18 Global Step: 303580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:39:04,375-Speed 5101.50 samples/sec Loss 0.4133 LearningRate 0.0008 Epoch: 18 Global Step: 303590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:39:06,345-Speed 5200.65 samples/sec Loss 0.4218 LearningRate 0.0008 Epoch: 18 Global Step: 303600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:39:08,316-Speed 5197.86 samples/sec Loss 0.3902 LearningRate 0.0008 Epoch: 18 Global Step: 303610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:39:10,290-Speed 5190.32 samples/sec Loss 0.4198 LearningRate 0.0008 Epoch: 18 Global Step: 303620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:39:12,282-Speed 5140.52 samples/sec Loss 0.4272 LearningRate 0.0008 Epoch: 18 Global Step: 303630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:39:14,261-Speed 5175.92 samples/sec Loss 0.4016 LearningRate 0.0008 Epoch: 18 Global Step: 303640 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:39:16,237-Speed 5184.10 samples/sec Loss 0.4281 LearningRate 0.0008 Epoch: 18 Global Step: 303650 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:39:18,226-Speed 5150.36 samples/sec Loss 0.4117 LearningRate 0.0008 Epoch: 18 Global Step: 303660 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:39:20,225-Speed 5123.61 samples/sec Loss 0.4469 LearningRate 0.0008 Epoch: 18 Global Step: 303670 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:39:22,203-Speed 5177.64 samples/sec Loss 0.4325 LearningRate 0.0008 Epoch: 18 Global Step: 303680 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:39:24,195-Speed 5145.12 samples/sec Loss 0.4508 LearningRate 0.0008 Epoch: 18 Global Step: 303690 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:39:26,178-Speed 5165.32 samples/sec Loss 0.4386 LearningRate 0.0008 Epoch: 18 Global Step: 303700 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:39:28,175-Speed 5129.44 samples/sec Loss 0.4176 LearningRate 0.0008 Epoch: 18 Global Step: 303710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:39:30,154-Speed 5175.94 samples/sec Loss 0.4346 LearningRate 0.0008 Epoch: 18 Global Step: 303720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:39:32,125-Speed 5198.88 samples/sec Loss 0.4151 LearningRate 0.0008 Epoch: 18 Global Step: 303730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:39:34,096-Speed 5194.49 samples/sec Loss 0.4494 LearningRate 0.0008 Epoch: 18 Global Step: 303740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:39:36,072-Speed 5184.96 samples/sec Loss 0.4351 LearningRate 0.0008 Epoch: 18 Global Step: 303750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:39:38,046-Speed 5189.64 samples/sec Loss 0.4060 LearningRate 0.0008 Epoch: 18 Global Step: 303760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:39:40,026-Speed 5172.32 samples/sec Loss 0.4520 LearningRate 0.0008 Epoch: 18 Global Step: 303770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:39:42,000-Speed 5189.34 samples/sec Loss 0.4433 LearningRate 0.0008 Epoch: 18 Global Step: 303780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:39:43,970-Speed 5201.07 samples/sec Loss 0.4326 LearningRate 0.0008 Epoch: 18 Global Step: 303790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:39:45,943-Speed 5191.52 samples/sec Loss 0.4470 LearningRate 0.0008 Epoch: 18 Global Step: 303800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:39:47,943-Speed 5120.87 samples/sec Loss 0.4257 LearningRate 0.0008 Epoch: 18 Global Step: 303810 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:39:49,952-Speed 5100.80 samples/sec Loss 0.4329 LearningRate 0.0008 Epoch: 18 Global Step: 303820 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:39:51,930-Speed 5178.88 samples/sec Loss 0.4196 LearningRate 0.0008 Epoch: 18 Global Step: 303830 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:39:53,924-Speed 5138.85 samples/sec Loss 0.4321 LearningRate 0.0008 Epoch: 18 Global Step: 303840 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:39:55,894-Speed 5199.93 samples/sec Loss 0.4019 LearningRate 0.0008 Epoch: 18 Global Step: 303850 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:39:57,874-Speed 5171.37 samples/sec Loss 0.4499 LearningRate 0.0008 Epoch: 18 Global Step: 303860 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:39:59,849-Speed 5188.33 samples/sec Loss 0.4132 LearningRate 0.0008 Epoch: 18 Global Step: 303870 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:40:01,824-Speed 5185.71 samples/sec Loss 0.4232 LearningRate 0.0008 Epoch: 18 Global Step: 303880 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:40:03,818-Speed 5136.23 samples/sec Loss 0.4208 LearningRate 0.0008 Epoch: 18 Global Step: 303890 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:40:05,786-Speed 5206.26 samples/sec Loss 0.4396 LearningRate 0.0008 Epoch: 18 Global Step: 303900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:40:07,761-Speed 5184.89 samples/sec Loss 0.4163 LearningRate 0.0008 Epoch: 18 Global Step: 303910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:40:09,736-Speed 5187.44 samples/sec Loss 0.4234 LearningRate 0.0008 Epoch: 18 Global Step: 303920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:40:11,709-Speed 5190.88 samples/sec Loss 0.4299 LearningRate 0.0008 Epoch: 18 Global Step: 303930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:40:13,729-Speed 5072.74 samples/sec Loss 0.4349 LearningRate 0.0008 Epoch: 18 Global Step: 303940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:40:15,726-Speed 5128.31 samples/sec Loss 0.4183 LearningRate 0.0008 Epoch: 18 Global Step: 303950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:40:17,701-Speed 5187.33 samples/sec Loss 0.4327 LearningRate 0.0008 Epoch: 18 Global Step: 303960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:40:19,671-Speed 5199.94 samples/sec Loss 0.4129 LearningRate 0.0008 Epoch: 18 Global Step: 303970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:40:21,662-Speed 5145.28 samples/sec Loss 0.4447 LearningRate 0.0008 Epoch: 18 Global Step: 303980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:40:23,644-Speed 5168.99 samples/sec Loss 0.4410 LearningRate 0.0008 Epoch: 18 Global Step: 303990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:40:25,624-Speed 5171.35 samples/sec Loss 0.4371 LearningRate 0.0008 Epoch: 18 Global Step: 304000 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:40:52,304-[lfw][304000]XNorm: 21.722443 Training: 2022-04-11 19:40:52,305-[lfw][304000]Accuracy-Flip: 0.99833+-0.00236 Training: 2022-04-11 19:40:52,305-[lfw][304000]Accuracy-Highest: 0.99833 Training: 2022-04-11 19:41:23,167-[cfp_fp][304000]XNorm: 22.088611 Training: 2022-04-11 19:41:23,168-[cfp_fp][304000]Accuracy-Flip: 0.98943+-0.00415 Training: 2022-04-11 19:41:23,168-[cfp_fp][304000]Accuracy-Highest: 0.99014 Training: 2022-04-11 19:41:49,643-[agedb_30][304000]XNorm: 22.772920 Training: 2022-04-11 19:41:49,644-[agedb_30][304000]Accuracy-Flip: 0.98317+-0.00639 Training: 2022-04-11 19:41:49,644-[agedb_30][304000]Accuracy-Highest: 0.98383 Training: 2022-04-11 19:41:51,621-Speed 119.07 samples/sec Loss 0.4415 LearningRate 0.0008 Epoch: 18 Global Step: 304010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:41:53,605-Speed 5163.12 samples/sec Loss 0.4037 LearningRate 0.0008 Epoch: 18 Global Step: 304020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:41:55,571-Speed 5209.85 samples/sec Loss 0.4147 LearningRate 0.0008 Epoch: 18 Global Step: 304030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:41:57,551-Speed 5174.92 samples/sec Loss 0.4281 LearningRate 0.0008 Epoch: 18 Global Step: 304040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:41:59,519-Speed 5203.60 samples/sec Loss 0.4374 LearningRate 0.0008 Epoch: 18 Global Step: 304050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:42:01,485-Speed 5210.70 samples/sec Loss 0.4298 LearningRate 0.0008 Epoch: 18 Global Step: 304060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:42:03,455-Speed 5199.84 samples/sec Loss 0.4186 LearningRate 0.0008 Epoch: 18 Global Step: 304070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:42:05,429-Speed 5190.70 samples/sec Loss 0.4120 LearningRate 0.0008 Epoch: 18 Global Step: 304080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:42:07,398-Speed 5201.25 samples/sec Loss 0.4528 LearningRate 0.0008 Epoch: 18 Global Step: 304090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:42:09,369-Speed 5198.46 samples/sec Loss 0.3998 LearningRate 0.0008 Epoch: 18 Global Step: 304100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:42:11,346-Speed 5179.23 samples/sec Loss 0.4265 LearningRate 0.0008 Epoch: 18 Global Step: 304110 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:42:13,347-Speed 5120.61 samples/sec Loss 0.4202 LearningRate 0.0008 Epoch: 18 Global Step: 304120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:42:15,337-Speed 5147.55 samples/sec Loss 0.4437 LearningRate 0.0008 Epoch: 18 Global Step: 304130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:42:17,315-Speed 5177.11 samples/sec Loss 0.4133 LearningRate 0.0008 Epoch: 18 Global Step: 304140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:42:19,283-Speed 5207.09 samples/sec Loss 0.4447 LearningRate 0.0008 Epoch: 18 Global Step: 304150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:42:21,257-Speed 5187.14 samples/sec Loss 0.4273 LearningRate 0.0008 Epoch: 18 Global Step: 304160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:42:23,233-Speed 5186.01 samples/sec Loss 0.4140 LearningRate 0.0008 Epoch: 18 Global Step: 304170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:42:25,223-Speed 5146.08 samples/sec Loss 0.4227 LearningRate 0.0008 Epoch: 18 Global Step: 304180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:42:27,226-Speed 5114.24 samples/sec Loss 0.4212 LearningRate 0.0008 Epoch: 18 Global Step: 304190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:42:29,212-Speed 5158.15 samples/sec Loss 0.4273 LearningRate 0.0008 Epoch: 18 Global Step: 304200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:42:31,211-Speed 5123.02 samples/sec Loss 0.4316 LearningRate 0.0008 Epoch: 18 Global Step: 304210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:42:33,186-Speed 5186.75 samples/sec Loss 0.4144 LearningRate 0.0008 Epoch: 18 Global Step: 304220 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:42:35,188-Speed 5119.11 samples/sec Loss 0.4125 LearningRate 0.0008 Epoch: 18 Global Step: 304230 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:42:37,155-Speed 5206.14 samples/sec Loss 0.4114 LearningRate 0.0008 Epoch: 18 Global Step: 304240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:42:39,133-Speed 5179.99 samples/sec Loss 0.4261 LearningRate 0.0008 Epoch: 18 Global Step: 304250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:42:41,117-Speed 5163.50 samples/sec Loss 0.4375 LearningRate 0.0008 Epoch: 18 Global Step: 304260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:42:43,088-Speed 5196.63 samples/sec Loss 0.4166 LearningRate 0.0008 Epoch: 18 Global Step: 304270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:42:45,059-Speed 5197.37 samples/sec Loss 0.4217 LearningRate 0.0008 Epoch: 18 Global Step: 304280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:42:47,035-Speed 5183.43 samples/sec Loss 0.4143 LearningRate 0.0008 Epoch: 18 Global Step: 304290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:42:49,027-Speed 5142.03 samples/sec Loss 0.4371 LearningRate 0.0008 Epoch: 18 Global Step: 304300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:42:51,024-Speed 5129.15 samples/sec Loss 0.4244 LearningRate 0.0008 Epoch: 18 Global Step: 304310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:42:53,034-Speed 5094.47 samples/sec Loss 0.4404 LearningRate 0.0008 Epoch: 18 Global Step: 304320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:42:55,013-Speed 5177.65 samples/sec Loss 0.4413 LearningRate 0.0008 Epoch: 18 Global Step: 304330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:42:57,015-Speed 5118.19 samples/sec Loss 0.4200 LearningRate 0.0008 Epoch: 18 Global Step: 304340 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:42:59,006-Speed 5149.88 samples/sec Loss 0.4232 LearningRate 0.0008 Epoch: 18 Global Step: 304350 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:43:01,005-Speed 5125.85 samples/sec Loss 0.4383 LearningRate 0.0008 Epoch: 18 Global Step: 304360 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:43:02,989-Speed 5162.66 samples/sec Loss 0.4044 LearningRate 0.0008 Epoch: 18 Global Step: 304370 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:43:05,004-Speed 5082.91 samples/sec Loss 0.4207 LearningRate 0.0008 Epoch: 18 Global Step: 304380 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:43:06,976-Speed 5195.50 samples/sec Loss 0.4222 LearningRate 0.0008 Epoch: 18 Global Step: 304390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:43:08,972-Speed 5130.79 samples/sec Loss 0.4406 LearningRate 0.0008 Epoch: 18 Global Step: 304400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:43:10,981-Speed 5100.73 samples/sec Loss 0.4335 LearningRate 0.0008 Epoch: 18 Global Step: 304410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:43:12,976-Speed 5133.26 samples/sec Loss 0.4338 LearningRate 0.0008 Epoch: 18 Global Step: 304420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:43:14,962-Speed 5159.13 samples/sec Loss 0.4309 LearningRate 0.0008 Epoch: 18 Global Step: 304430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:43:16,951-Speed 5149.04 samples/sec Loss 0.4216 LearningRate 0.0008 Epoch: 18 Global Step: 304440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:43:18,927-Speed 5183.80 samples/sec Loss 0.4122 LearningRate 0.0008 Epoch: 18 Global Step: 304450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:43:20,904-Speed 5181.13 samples/sec Loss 0.4289 LearningRate 0.0008 Epoch: 18 Global Step: 304460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:43:22,897-Speed 5140.13 samples/sec Loss 0.4179 LearningRate 0.0008 Epoch: 18 Global Step: 304470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:43:24,872-Speed 5187.08 samples/sec Loss 0.4218 LearningRate 0.0008 Epoch: 18 Global Step: 304480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:43:26,848-Speed 5183.63 samples/sec Loss 0.4247 LearningRate 0.0008 Epoch: 18 Global Step: 304490 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:43:28,874-Speed 5056.51 samples/sec Loss 0.4312 LearningRate 0.0008 Epoch: 18 Global Step: 304500 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:43:30,864-Speed 5145.95 samples/sec Loss 0.4221 LearningRate 0.0008 Epoch: 18 Global Step: 304510 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:43:32,837-Speed 5192.73 samples/sec Loss 0.4597 LearningRate 0.0008 Epoch: 18 Global Step: 304520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:43:34,828-Speed 5146.55 samples/sec Loss 0.4174 LearningRate 0.0008 Epoch: 18 Global Step: 304530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:43:36,805-Speed 5180.72 samples/sec Loss 0.4230 LearningRate 0.0008 Epoch: 18 Global Step: 304540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:43:38,801-Speed 5132.10 samples/sec Loss 0.4600 LearningRate 0.0008 Epoch: 18 Global Step: 304550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:43:40,772-Speed 5197.65 samples/sec Loss 0.4302 LearningRate 0.0008 Epoch: 18 Global Step: 304560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:43:42,742-Speed 5198.12 samples/sec Loss 0.4515 LearningRate 0.0008 Epoch: 18 Global Step: 304570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:43:44,715-Speed 5193.04 samples/sec Loss 0.4499 LearningRate 0.0008 Epoch: 18 Global Step: 304580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:43:46,698-Speed 5164.30 samples/sec Loss 0.4356 LearningRate 0.0008 Epoch: 18 Global Step: 304590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:43:48,694-Speed 5132.69 samples/sec Loss 0.4474 LearningRate 0.0008 Epoch: 18 Global Step: 304600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:43:50,678-Speed 5162.33 samples/sec Loss 0.4289 LearningRate 0.0008 Epoch: 18 Global Step: 304610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:43:52,655-Speed 5182.68 samples/sec Loss 0.4313 LearningRate 0.0008 Epoch: 18 Global Step: 304620 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:43:54,630-Speed 5185.72 samples/sec Loss 0.4224 LearningRate 0.0008 Epoch: 18 Global Step: 304630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:43:56,620-Speed 5146.81 samples/sec Loss 0.4342 LearningRate 0.0008 Epoch: 18 Global Step: 304640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:43:58,602-Speed 5167.77 samples/sec Loss 0.4132 LearningRate 0.0008 Epoch: 18 Global Step: 304650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:44:00,573-Speed 5197.33 samples/sec Loss 0.4261 LearningRate 0.0008 Epoch: 18 Global Step: 304660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:44:02,546-Speed 5193.69 samples/sec Loss 0.4463 LearningRate 0.0008 Epoch: 18 Global Step: 304670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:44:04,517-Speed 5197.38 samples/sec Loss 0.4155 LearningRate 0.0008 Epoch: 18 Global Step: 304680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:44:06,500-Speed 5164.09 samples/sec Loss 0.4015 LearningRate 0.0008 Epoch: 18 Global Step: 304690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:44:08,476-Speed 5184.79 samples/sec Loss 0.4320 LearningRate 0.0008 Epoch: 18 Global Step: 304700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:44:10,450-Speed 5189.83 samples/sec Loss 0.4327 LearningRate 0.0008 Epoch: 18 Global Step: 304710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:44:12,423-Speed 5192.82 samples/sec Loss 0.4288 LearningRate 0.0008 Epoch: 18 Global Step: 304720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:44:14,404-Speed 5170.50 samples/sec Loss 0.4197 LearningRate 0.0008 Epoch: 18 Global Step: 304730 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:44:16,375-Speed 5197.82 samples/sec Loss 0.4610 LearningRate 0.0008 Epoch: 18 Global Step: 304740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:44:18,380-Speed 5110.00 samples/sec Loss 0.4088 LearningRate 0.0008 Epoch: 18 Global Step: 304750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:44:20,355-Speed 5184.32 samples/sec Loss 0.4301 LearningRate 0.0008 Epoch: 18 Global Step: 304760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:44:22,328-Speed 5194.37 samples/sec Loss 0.3992 LearningRate 0.0008 Epoch: 18 Global Step: 304770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:44:24,298-Speed 5199.35 samples/sec Loss 0.4432 LearningRate 0.0008 Epoch: 18 Global Step: 304780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:44:26,269-Speed 5196.70 samples/sec Loss 0.4393 LearningRate 0.0008 Epoch: 18 Global Step: 304790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:44:28,240-Speed 5197.99 samples/sec Loss 0.4416 LearningRate 0.0008 Epoch: 18 Global Step: 304800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:44:30,223-Speed 5165.55 samples/sec Loss 0.4485 LearningRate 0.0008 Epoch: 18 Global Step: 304810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:44:32,193-Speed 5200.19 samples/sec Loss 0.4300 LearningRate 0.0008 Epoch: 18 Global Step: 304820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:44:34,189-Speed 5132.19 samples/sec Loss 0.4301 LearningRate 0.0008 Epoch: 18 Global Step: 304830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:44:36,178-Speed 5148.58 samples/sec Loss 0.4492 LearningRate 0.0008 Epoch: 18 Global Step: 304840 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:44:38,151-Speed 5192.02 samples/sec Loss 0.4383 LearningRate 0.0008 Epoch: 18 Global Step: 304850 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:44:40,123-Speed 5195.52 samples/sec Loss 0.4394 LearningRate 0.0008 Epoch: 18 Global Step: 304860 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:44:42,096-Speed 5191.96 samples/sec Loss 0.4165 LearningRate 0.0008 Epoch: 18 Global Step: 304870 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:44:44,068-Speed 5194.66 samples/sec Loss 0.4506 LearningRate 0.0008 Epoch: 18 Global Step: 304880 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:44:46,032-Speed 5215.22 samples/sec Loss 0.4352 LearningRate 0.0008 Epoch: 18 Global Step: 304890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:44:48,002-Speed 5198.35 samples/sec Loss 0.4401 LearningRate 0.0008 Epoch: 18 Global Step: 304900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:44:49,977-Speed 5188.45 samples/sec Loss 0.4369 LearningRate 0.0008 Epoch: 18 Global Step: 304910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:44:51,946-Speed 5200.75 samples/sec Loss 0.4254 LearningRate 0.0007 Epoch: 18 Global Step: 304920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:44:53,938-Speed 5143.68 samples/sec Loss 0.4085 LearningRate 0.0007 Epoch: 18 Global Step: 304930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:44:55,912-Speed 5187.25 samples/sec Loss 0.4131 LearningRate 0.0007 Epoch: 18 Global Step: 304940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:44:57,884-Speed 5195.28 samples/sec Loss 0.4392 LearningRate 0.0007 Epoch: 18 Global Step: 304950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:44:59,875-Speed 5143.61 samples/sec Loss 0.4281 LearningRate 0.0007 Epoch: 18 Global Step: 304960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:45:01,864-Speed 5150.52 samples/sec Loss 0.4375 LearningRate 0.0007 Epoch: 18 Global Step: 304970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:45:03,839-Speed 5187.91 samples/sec Loss 0.4073 LearningRate 0.0007 Epoch: 18 Global Step: 304980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:45:05,812-Speed 5191.72 samples/sec Loss 0.4163 LearningRate 0.0007 Epoch: 18 Global Step: 304990 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:45:07,789-Speed 5181.48 samples/sec Loss 0.4474 LearningRate 0.0007 Epoch: 18 Global Step: 305000 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:45:09,780-Speed 5146.09 samples/sec Loss 0.4413 LearningRate 0.0007 Epoch: 18 Global Step: 305010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:45:11,766-Speed 5156.29 samples/sec Loss 0.4137 LearningRate 0.0007 Epoch: 18 Global Step: 305020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:45:13,756-Speed 5147.18 samples/sec Loss 0.4241 LearningRate 0.0007 Epoch: 18 Global Step: 305030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:45:15,741-Speed 5159.32 samples/sec Loss 0.4077 LearningRate 0.0007 Epoch: 18 Global Step: 305040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:45:17,726-Speed 5160.63 samples/sec Loss 0.4115 LearningRate 0.0007 Epoch: 18 Global Step: 305050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:45:19,704-Speed 5179.14 samples/sec Loss 0.4364 LearningRate 0.0007 Epoch: 18 Global Step: 305060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:45:21,714-Speed 5096.95 samples/sec Loss 0.4324 LearningRate 0.0007 Epoch: 18 Global Step: 305070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:45:23,695-Speed 5170.86 samples/sec Loss 0.4268 LearningRate 0.0007 Epoch: 18 Global Step: 305080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:45:25,696-Speed 5118.90 samples/sec Loss 0.4190 LearningRate 0.0007 Epoch: 18 Global Step: 305090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:45:27,737-Speed 5018.94 samples/sec Loss 0.4187 LearningRate 0.0007 Epoch: 18 Global Step: 305100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:45:29,745-Speed 5101.90 samples/sec Loss 0.4620 LearningRate 0.0007 Epoch: 18 Global Step: 305110 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:45:31,722-Speed 5181.12 samples/sec Loss 0.4119 LearningRate 0.0007 Epoch: 18 Global Step: 305120 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:45:33,714-Speed 5142.37 samples/sec Loss 0.4634 LearningRate 0.0007 Epoch: 18 Global Step: 305130 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:45:35,726-Speed 5093.13 samples/sec Loss 0.4112 LearningRate 0.0007 Epoch: 18 Global Step: 305140 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:45:37,727-Speed 5119.83 samples/sec Loss 0.4249 LearningRate 0.0007 Epoch: 18 Global Step: 305150 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:45:39,728-Speed 5118.47 samples/sec Loss 0.4406 LearningRate 0.0007 Epoch: 18 Global Step: 305160 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:45:41,710-Speed 5169.98 samples/sec Loss 0.4290 LearningRate 0.0007 Epoch: 18 Global Step: 305170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:45:43,681-Speed 5196.29 samples/sec Loss 0.4254 LearningRate 0.0007 Epoch: 18 Global Step: 305180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:45:45,676-Speed 5135.40 samples/sec Loss 0.4172 LearningRate 0.0007 Epoch: 18 Global Step: 305190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:45:47,677-Speed 5118.90 samples/sec Loss 0.4337 LearningRate 0.0007 Epoch: 18 Global Step: 305200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:45:49,676-Speed 5123.82 samples/sec Loss 0.4338 LearningRate 0.0007 Epoch: 18 Global Step: 305210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:45:51,661-Speed 5159.06 samples/sec Loss 0.4490 LearningRate 0.0007 Epoch: 18 Global Step: 305220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:45:53,660-Speed 5127.16 samples/sec Loss 0.4348 LearningRate 0.0007 Epoch: 18 Global Step: 305230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:45:55,645-Speed 5158.36 samples/sec Loss 0.4154 LearningRate 0.0007 Epoch: 18 Global Step: 305240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:45:57,620-Speed 5188.21 samples/sec Loss 0.4128 LearningRate 0.0007 Epoch: 18 Global Step: 305250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:45:59,592-Speed 5193.23 samples/sec Loss 0.4169 LearningRate 0.0007 Epoch: 18 Global Step: 305260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:46:01,581-Speed 5151.07 samples/sec Loss 0.4164 LearningRate 0.0007 Epoch: 18 Global Step: 305270 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:46:03,571-Speed 5147.70 samples/sec Loss 0.4232 LearningRate 0.0007 Epoch: 18 Global Step: 305280 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:46:05,555-Speed 5162.82 samples/sec Loss 0.4444 LearningRate 0.0007 Epoch: 18 Global Step: 305290 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:46:07,542-Speed 5156.66 samples/sec Loss 0.4174 LearningRate 0.0007 Epoch: 18 Global Step: 305300 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:46:09,575-Speed 5036.49 samples/sec Loss 0.4253 LearningRate 0.0007 Epoch: 18 Global Step: 305310 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:46:11,558-Speed 5168.28 samples/sec Loss 0.3963 LearningRate 0.0007 Epoch: 18 Global Step: 305320 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:46:13,538-Speed 5171.44 samples/sec Loss 0.3977 LearningRate 0.0007 Epoch: 18 Global Step: 305330 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:46:15,519-Speed 5171.78 samples/sec Loss 0.4110 LearningRate 0.0007 Epoch: 18 Global Step: 305340 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:46:17,506-Speed 5154.82 samples/sec Loss 0.4148 LearningRate 0.0007 Epoch: 18 Global Step: 305350 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:46:19,479-Speed 5191.52 samples/sec Loss 0.4382 LearningRate 0.0007 Epoch: 18 Global Step: 305360 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:46:21,446-Speed 5209.12 samples/sec Loss 0.4252 LearningRate 0.0007 Epoch: 18 Global Step: 305370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:46:23,421-Speed 5186.10 samples/sec Loss 0.4324 LearningRate 0.0007 Epoch: 18 Global Step: 305380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:46:25,397-Speed 5185.46 samples/sec Loss 0.4447 LearningRate 0.0007 Epoch: 18 Global Step: 305390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:46:27,369-Speed 5191.85 samples/sec Loss 0.4183 LearningRate 0.0007 Epoch: 18 Global Step: 305400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:46:29,353-Speed 5164.09 samples/sec Loss 0.4168 LearningRate 0.0007 Epoch: 18 Global Step: 305410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:46:31,338-Speed 5160.02 samples/sec Loss 0.4487 LearningRate 0.0007 Epoch: 18 Global Step: 305420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:46:33,309-Speed 5196.66 samples/sec Loss 0.4178 LearningRate 0.0007 Epoch: 18 Global Step: 305430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:46:35,299-Speed 5147.40 samples/sec Loss 0.4319 LearningRate 0.0007 Epoch: 18 Global Step: 305440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:46:37,277-Speed 5178.73 samples/sec Loss 0.4484 LearningRate 0.0007 Epoch: 18 Global Step: 305450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:46:39,266-Speed 5150.77 samples/sec Loss 0.4428 LearningRate 0.0007 Epoch: 18 Global Step: 305460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:46:41,243-Speed 5181.64 samples/sec Loss 0.4342 LearningRate 0.0007 Epoch: 18 Global Step: 305470 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:46:43,217-Speed 5189.11 samples/sec Loss 0.4243 LearningRate 0.0007 Epoch: 18 Global Step: 305480 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:46:45,192-Speed 5187.48 samples/sec Loss 0.4256 LearningRate 0.0007 Epoch: 18 Global Step: 305490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:46:47,196-Speed 5109.18 samples/sec Loss 0.4220 LearningRate 0.0007 Epoch: 18 Global Step: 305500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:46:49,173-Speed 5181.67 samples/sec Loss 0.4389 LearningRate 0.0007 Epoch: 18 Global Step: 305510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:46:51,146-Speed 5191.54 samples/sec Loss 0.4469 LearningRate 0.0007 Epoch: 18 Global Step: 305520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:46:53,128-Speed 5169.72 samples/sec Loss 0.4162 LearningRate 0.0007 Epoch: 18 Global Step: 305530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:46:55,101-Speed 5192.56 samples/sec Loss 0.4337 LearningRate 0.0007 Epoch: 18 Global Step: 305540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:46:57,074-Speed 5189.70 samples/sec Loss 0.4054 LearningRate 0.0007 Epoch: 18 Global Step: 305550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:46:59,045-Speed 5199.21 samples/sec Loss 0.4523 LearningRate 0.0007 Epoch: 18 Global Step: 305560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:47:01,037-Speed 5141.56 samples/sec Loss 0.4061 LearningRate 0.0007 Epoch: 18 Global Step: 305570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:47:03,016-Speed 5176.39 samples/sec Loss 0.4235 LearningRate 0.0007 Epoch: 18 Global Step: 305580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:47:04,991-Speed 5185.43 samples/sec Loss 0.4300 LearningRate 0.0007 Epoch: 18 Global Step: 305590 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:47:06,966-Speed 5186.59 samples/sec Loss 0.4452 LearningRate 0.0007 Epoch: 18 Global Step: 305600 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:47:08,947-Speed 5171.02 samples/sec Loss 0.4283 LearningRate 0.0007 Epoch: 18 Global Step: 305610 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:47:10,943-Speed 5131.96 samples/sec Loss 0.4391 LearningRate 0.0007 Epoch: 18 Global Step: 305620 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:47:12,916-Speed 5192.86 samples/sec Loss 0.4406 LearningRate 0.0007 Epoch: 18 Global Step: 305630 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:47:14,896-Speed 5174.32 samples/sec Loss 0.4288 LearningRate 0.0007 Epoch: 18 Global Step: 305640 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:47:16,891-Speed 5135.09 samples/sec Loss 0.4301 LearningRate 0.0007 Epoch: 18 Global Step: 305650 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:47:18,859-Speed 5205.07 samples/sec Loss 0.4443 LearningRate 0.0007 Epoch: 18 Global Step: 305660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:47:20,846-Speed 5153.11 samples/sec Loss 0.4096 LearningRate 0.0007 Epoch: 18 Global Step: 305670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:47:22,825-Speed 5176.69 samples/sec Loss 0.4103 LearningRate 0.0007 Epoch: 18 Global Step: 305680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:47:24,816-Speed 5145.59 samples/sec Loss 0.4397 LearningRate 0.0007 Epoch: 18 Global Step: 305690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:47:26,792-Speed 5184.80 samples/sec Loss 0.4225 LearningRate 0.0007 Epoch: 18 Global Step: 305700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:47:28,773-Speed 5170.78 samples/sec Loss 0.4200 LearningRate 0.0007 Epoch: 18 Global Step: 305710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:47:30,748-Speed 5186.32 samples/sec Loss 0.4158 LearningRate 0.0007 Epoch: 18 Global Step: 305720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:47:32,723-Speed 5187.11 samples/sec Loss 0.4085 LearningRate 0.0007 Epoch: 18 Global Step: 305730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:47:34,696-Speed 5193.45 samples/sec Loss 0.4423 LearningRate 0.0007 Epoch: 18 Global Step: 305740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:47:36,672-Speed 5181.47 samples/sec Loss 0.4467 LearningRate 0.0007 Epoch: 18 Global Step: 305750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:47:38,673-Speed 5120.98 samples/sec Loss 0.4529 LearningRate 0.0007 Epoch: 18 Global Step: 305760 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:47:40,648-Speed 5185.44 samples/sec Loss 0.4247 LearningRate 0.0007 Epoch: 18 Global Step: 305770 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:47:42,622-Speed 5189.36 samples/sec Loss 0.4318 LearningRate 0.0007 Epoch: 18 Global Step: 305780 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:47:44,607-Speed 5158.83 samples/sec Loss 0.4448 LearningRate 0.0007 Epoch: 18 Global Step: 305790 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:47:46,583-Speed 5184.66 samples/sec Loss 0.4272 LearningRate 0.0007 Epoch: 18 Global Step: 305800 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:47:48,557-Speed 5190.36 samples/sec Loss 0.4244 LearningRate 0.0007 Epoch: 18 Global Step: 305810 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:47:50,533-Speed 5182.92 samples/sec Loss 0.4398 LearningRate 0.0007 Epoch: 18 Global Step: 305820 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:47:52,504-Speed 5198.41 samples/sec Loss 0.4335 LearningRate 0.0007 Epoch: 18 Global Step: 305830 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:47:54,485-Speed 5172.03 samples/sec Loss 0.4084 LearningRate 0.0007 Epoch: 18 Global Step: 305840 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:47:56,453-Speed 5203.17 samples/sec Loss 0.4264 LearningRate 0.0007 Epoch: 18 Global Step: 305850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:47:58,428-Speed 5185.46 samples/sec Loss 0.4198 LearningRate 0.0007 Epoch: 18 Global Step: 305860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:48:00,447-Speed 5075.41 samples/sec Loss 0.4337 LearningRate 0.0007 Epoch: 18 Global Step: 305870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:48:02,437-Speed 5146.36 samples/sec Loss 0.4299 LearningRate 0.0007 Epoch: 18 Global Step: 305880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:48:04,421-Speed 5162.51 samples/sec Loss 0.4457 LearningRate 0.0007 Epoch: 18 Global Step: 305890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:48:06,406-Speed 5160.94 samples/sec Loss 0.4108 LearningRate 0.0007 Epoch: 18 Global Step: 305900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:48:08,419-Speed 5088.38 samples/sec Loss 0.4044 LearningRate 0.0007 Epoch: 18 Global Step: 305910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:48:10,405-Speed 5159.99 samples/sec Loss 0.4370 LearningRate 0.0007 Epoch: 18 Global Step: 305920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:48:12,378-Speed 5189.93 samples/sec Loss 0.4331 LearningRate 0.0007 Epoch: 18 Global Step: 305930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:48:14,363-Speed 5160.52 samples/sec Loss 0.4380 LearningRate 0.0007 Epoch: 18 Global Step: 305940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:48:16,348-Speed 5161.56 samples/sec Loss 0.4480 LearningRate 0.0007 Epoch: 18 Global Step: 305950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:48:18,328-Speed 5173.02 samples/sec Loss 0.4502 LearningRate 0.0007 Epoch: 18 Global Step: 305960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:48:20,302-Speed 5189.04 samples/sec Loss 0.4168 LearningRate 0.0007 Epoch: 18 Global Step: 305970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 19:48:22,298-Speed 5131.05 samples/sec Loss 0.4193 LearningRate 0.0007 Epoch: 18 Global Step: 305980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:48:24,277-Speed 5176.73 samples/sec Loss 0.4402 LearningRate 0.0007 Epoch: 18 Global Step: 305990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:48:26,247-Speed 5198.41 samples/sec Loss 0.4311 LearningRate 0.0007 Epoch: 18 Global Step: 306000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:48:52,754-[lfw][306000]XNorm: 21.465516 Training: 2022-04-11 19:48:52,754-[lfw][306000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 19:48:52,755-[lfw][306000]Accuracy-Highest: 0.99833 Training: 2022-04-11 19:49:23,501-[cfp_fp][306000]XNorm: 21.873245 Training: 2022-04-11 19:49:23,502-[cfp_fp][306000]Accuracy-Flip: 0.99014+-0.00411 Training: 2022-04-11 19:49:23,502-[cfp_fp][306000]Accuracy-Highest: 0.99014 Training: 2022-04-11 19:49:49,999-[agedb_30][306000]XNorm: 22.496819 Training: 2022-04-11 19:49:49,999-[agedb_30][306000]Accuracy-Flip: 0.98350+-0.00626 Training: 2022-04-11 19:49:50,000-[agedb_30][306000]Accuracy-Highest: 0.98383 Training: 2022-04-11 19:49:51,983-Speed 119.44 samples/sec Loss 0.4259 LearningRate 0.0007 Epoch: 18 Global Step: 306010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:49:53,961-Speed 5177.79 samples/sec Loss 0.4322 LearningRate 0.0007 Epoch: 18 Global Step: 306020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:49:55,944-Speed 5164.33 samples/sec Loss 0.4213 LearningRate 0.0007 Epoch: 18 Global Step: 306030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:49:57,922-Speed 5180.04 samples/sec Loss 0.4365 LearningRate 0.0007 Epoch: 18 Global Step: 306040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:49:59,909-Speed 5153.39 samples/sec Loss 0.4388 LearningRate 0.0007 Epoch: 18 Global Step: 306050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:50:01,884-Speed 5188.30 samples/sec Loss 0.4239 LearningRate 0.0007 Epoch: 18 Global Step: 306060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:50:03,855-Speed 5196.75 samples/sec Loss 0.4103 LearningRate 0.0007 Epoch: 18 Global Step: 306070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:50:05,827-Speed 5193.77 samples/sec Loss 0.4371 LearningRate 0.0007 Epoch: 18 Global Step: 306080 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:50:07,806-Speed 5176.59 samples/sec Loss 0.4264 LearningRate 0.0007 Epoch: 18 Global Step: 306090 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:50:09,789-Speed 5166.83 samples/sec Loss 0.4253 LearningRate 0.0007 Epoch: 18 Global Step: 306100 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:50:11,757-Speed 5205.36 samples/sec Loss 0.4195 LearningRate 0.0007 Epoch: 18 Global Step: 306110 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:50:13,743-Speed 5156.06 samples/sec Loss 0.4240 LearningRate 0.0007 Epoch: 18 Global Step: 306120 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:50:15,740-Speed 5130.76 samples/sec Loss 0.4430 LearningRate 0.0007 Epoch: 18 Global Step: 306130 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:50:17,716-Speed 5184.00 samples/sec Loss 0.3950 LearningRate 0.0007 Epoch: 18 Global Step: 306140 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:50:19,687-Speed 5196.71 samples/sec Loss 0.4313 LearningRate 0.0007 Epoch: 18 Global Step: 306150 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:50:21,665-Speed 5178.93 samples/sec Loss 0.4190 LearningRate 0.0007 Epoch: 18 Global Step: 306160 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:50:23,649-Speed 5161.16 samples/sec Loss 0.4402 LearningRate 0.0007 Epoch: 18 Global Step: 306170 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:50:25,619-Speed 5200.48 samples/sec Loss 0.4265 LearningRate 0.0007 Epoch: 18 Global Step: 306180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:50:27,606-Speed 5156.62 samples/sec Loss 0.4612 LearningRate 0.0007 Epoch: 18 Global Step: 306190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:50:29,580-Speed 5186.72 samples/sec Loss 0.4215 LearningRate 0.0007 Epoch: 18 Global Step: 306200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:50:31,553-Speed 5193.13 samples/sec Loss 0.4233 LearningRate 0.0007 Epoch: 18 Global Step: 306210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:50:33,527-Speed 5190.48 samples/sec Loss 0.4128 LearningRate 0.0007 Epoch: 18 Global Step: 306220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:50:35,498-Speed 5195.97 samples/sec Loss 0.4410 LearningRate 0.0007 Epoch: 18 Global Step: 306230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:50:37,490-Speed 5143.72 samples/sec Loss 0.4192 LearningRate 0.0007 Epoch: 18 Global Step: 306240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:50:39,459-Speed 5200.51 samples/sec Loss 0.4299 LearningRate 0.0007 Epoch: 18 Global Step: 306250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:50:41,458-Speed 5125.76 samples/sec Loss 0.4546 LearningRate 0.0007 Epoch: 18 Global Step: 306260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:50:43,433-Speed 5185.52 samples/sec Loss 0.4342 LearningRate 0.0007 Epoch: 18 Global Step: 306270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:50:45,412-Speed 5177.44 samples/sec Loss 0.4264 LearningRate 0.0007 Epoch: 18 Global Step: 306280 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:50:47,394-Speed 5166.30 samples/sec Loss 0.4279 LearningRate 0.0007 Epoch: 18 Global Step: 306290 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:50:49,370-Speed 5183.40 samples/sec Loss 0.4363 LearningRate 0.0007 Epoch: 18 Global Step: 306300 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:50:52,202-Speed 3617.05 samples/sec Loss 0.3977 LearningRate 0.0007 Epoch: 18 Global Step: 306310 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:50:54,213-Speed 5093.05 samples/sec Loss 0.4245 LearningRate 0.0007 Epoch: 18 Global Step: 306320 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:50:56,190-Speed 5183.13 samples/sec Loss 0.4329 LearningRate 0.0007 Epoch: 18 Global Step: 306330 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:50:58,165-Speed 5186.59 samples/sec Loss 0.4473 LearningRate 0.0007 Epoch: 18 Global Step: 306340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:51:00,136-Speed 5196.51 samples/sec Loss 0.4197 LearningRate 0.0007 Epoch: 18 Global Step: 306350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:51:02,151-Speed 5084.66 samples/sec Loss 0.4124 LearningRate 0.0007 Epoch: 18 Global Step: 306360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:51:04,117-Speed 5208.91 samples/sec Loss 0.4741 LearningRate 0.0007 Epoch: 18 Global Step: 306370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:51:06,092-Speed 5187.45 samples/sec Loss 0.3960 LearningRate 0.0007 Epoch: 18 Global Step: 306380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:51:08,063-Speed 5197.57 samples/sec Loss 0.4514 LearningRate 0.0007 Epoch: 18 Global Step: 306390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:51:10,046-Speed 5164.70 samples/sec Loss 0.4384 LearningRate 0.0007 Epoch: 18 Global Step: 306400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:51:12,017-Speed 5197.91 samples/sec Loss 0.4362 LearningRate 0.0007 Epoch: 18 Global Step: 306410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:51:14,008-Speed 5145.32 samples/sec Loss 0.4257 LearningRate 0.0007 Epoch: 18 Global Step: 306420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:51:15,988-Speed 5173.64 samples/sec Loss 0.4335 LearningRate 0.0007 Epoch: 18 Global Step: 306430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:51:17,968-Speed 5173.43 samples/sec Loss 0.4534 LearningRate 0.0007 Epoch: 18 Global Step: 306440 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:51:19,966-Speed 5126.98 samples/sec Loss 0.4244 LearningRate 0.0007 Epoch: 18 Global Step: 306450 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:51:21,940-Speed 5188.71 samples/sec Loss 0.4167 LearningRate 0.0007 Epoch: 18 Global Step: 306460 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:51:23,931-Speed 5145.43 samples/sec Loss 0.3919 LearningRate 0.0007 Epoch: 18 Global Step: 306470 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:51:25,907-Speed 5181.88 samples/sec Loss 0.4291 LearningRate 0.0007 Epoch: 18 Global Step: 306480 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:51:27,874-Speed 5207.42 samples/sec Loss 0.4241 LearningRate 0.0007 Epoch: 18 Global Step: 306490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:51:29,842-Speed 5206.49 samples/sec Loss 0.4247 LearningRate 0.0007 Epoch: 18 Global Step: 306500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:51:31,831-Speed 5150.71 samples/sec Loss 0.4008 LearningRate 0.0007 Epoch: 18 Global Step: 306510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:51:33,802-Speed 5196.42 samples/sec Loss 0.4121 LearningRate 0.0007 Epoch: 18 Global Step: 306520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:51:35,796-Speed 5136.00 samples/sec Loss 0.4322 LearningRate 0.0007 Epoch: 18 Global Step: 306530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:51:37,767-Speed 5198.95 samples/sec Loss 0.4399 LearningRate 0.0007 Epoch: 18 Global Step: 306540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:51:39,759-Speed 5140.85 samples/sec Loss 0.4162 LearningRate 0.0007 Epoch: 18 Global Step: 306550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:51:41,728-Speed 5203.13 samples/sec Loss 0.4193 LearningRate 0.0007 Epoch: 18 Global Step: 306560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:51:43,707-Speed 5175.75 samples/sec Loss 0.4369 LearningRate 0.0007 Epoch: 18 Global Step: 306570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:51:45,680-Speed 5192.11 samples/sec Loss 0.4348 LearningRate 0.0007 Epoch: 18 Global Step: 306580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:51:47,666-Speed 5158.67 samples/sec Loss 0.4241 LearningRate 0.0007 Epoch: 18 Global Step: 306590 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:51:49,649-Speed 5164.92 samples/sec Loss 0.4310 LearningRate 0.0007 Epoch: 18 Global Step: 306600 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:51:51,623-Speed 5188.13 samples/sec Loss 0.4300 LearningRate 0.0007 Epoch: 18 Global Step: 306610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:51:53,589-Speed 5211.10 samples/sec Loss 0.4571 LearningRate 0.0007 Epoch: 18 Global Step: 306620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:51:55,571-Speed 5169.17 samples/sec Loss 0.4152 LearningRate 0.0007 Epoch: 18 Global Step: 306630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:51:57,542-Speed 5196.94 samples/sec Loss 0.4243 LearningRate 0.0007 Epoch: 18 Global Step: 306640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:51:59,511-Speed 5202.56 samples/sec Loss 0.4375 LearningRate 0.0007 Epoch: 18 Global Step: 306650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:52:01,495-Speed 5161.44 samples/sec Loss 0.4167 LearningRate 0.0007 Epoch: 18 Global Step: 306660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:52:03,482-Speed 5155.37 samples/sec Loss 0.4239 LearningRate 0.0007 Epoch: 18 Global Step: 306670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:52:05,471-Speed 5148.95 samples/sec Loss 0.4126 LearningRate 0.0007 Epoch: 18 Global Step: 306680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:52:07,443-Speed 5194.84 samples/sec Loss 0.4368 LearningRate 0.0007 Epoch: 18 Global Step: 306690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:52:09,430-Speed 5157.89 samples/sec Loss 0.4097 LearningRate 0.0007 Epoch: 18 Global Step: 306700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:52:11,416-Speed 5157.10 samples/sec Loss 0.4263 LearningRate 0.0007 Epoch: 18 Global Step: 306710 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:52:13,425-Speed 5098.77 samples/sec Loss 0.4415 LearningRate 0.0007 Epoch: 18 Global Step: 306720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:52:15,407-Speed 5168.18 samples/sec Loss 0.4277 LearningRate 0.0007 Epoch: 18 Global Step: 306730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:52:17,377-Speed 5200.93 samples/sec Loss 0.4326 LearningRate 0.0007 Epoch: 18 Global Step: 306740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:52:19,362-Speed 5159.08 samples/sec Loss 0.4465 LearningRate 0.0007 Epoch: 18 Global Step: 306750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:52:21,334-Speed 5193.97 samples/sec Loss 0.4386 LearningRate 0.0007 Epoch: 18 Global Step: 306760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:52:23,316-Speed 5167.17 samples/sec Loss 0.4269 LearningRate 0.0007 Epoch: 18 Global Step: 306770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:52:25,289-Speed 5191.73 samples/sec Loss 0.4389 LearningRate 0.0007 Epoch: 18 Global Step: 306780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:52:27,267-Speed 5179.11 samples/sec Loss 0.4417 LearningRate 0.0007 Epoch: 18 Global Step: 306790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:52:29,263-Speed 5132.20 samples/sec Loss 0.4628 LearningRate 0.0007 Epoch: 18 Global Step: 306800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:52:31,256-Speed 5140.22 samples/sec Loss 0.4242 LearningRate 0.0007 Epoch: 18 Global Step: 306810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:52:33,224-Speed 5206.08 samples/sec Loss 0.4065 LearningRate 0.0007 Epoch: 18 Global Step: 306820 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:52:35,195-Speed 5196.20 samples/sec Loss 0.4167 LearningRate 0.0007 Epoch: 18 Global Step: 306830 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:52:37,184-Speed 5150.84 samples/sec Loss 0.4101 LearningRate 0.0007 Epoch: 18 Global Step: 306840 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:52:39,163-Speed 5176.28 samples/sec Loss 0.4183 LearningRate 0.0007 Epoch: 18 Global Step: 306850 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:52:41,143-Speed 5172.87 samples/sec Loss 0.4483 LearningRate 0.0007 Epoch: 18 Global Step: 306860 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:52:43,114-Speed 5197.19 samples/sec Loss 0.4463 LearningRate 0.0007 Epoch: 18 Global Step: 306870 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:52:45,082-Speed 5205.42 samples/sec Loss 0.4230 LearningRate 0.0007 Epoch: 18 Global Step: 306880 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:52:47,056-Speed 5189.97 samples/sec Loss 0.4341 LearningRate 0.0007 Epoch: 18 Global Step: 306890 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:52:49,024-Speed 5204.57 samples/sec Loss 0.4342 LearningRate 0.0007 Epoch: 18 Global Step: 306900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:52:50,999-Speed 5187.74 samples/sec Loss 0.4351 LearningRate 0.0006 Epoch: 18 Global Step: 306910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:52:52,996-Speed 5131.34 samples/sec Loss 0.4271 LearningRate 0.0006 Epoch: 18 Global Step: 306920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:52:54,963-Speed 5207.10 samples/sec Loss 0.4092 LearningRate 0.0006 Epoch: 18 Global Step: 306930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:52:56,933-Speed 5198.98 samples/sec Loss 0.4410 LearningRate 0.0006 Epoch: 18 Global Step: 306940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:52:58,910-Speed 5180.25 samples/sec Loss 0.4351 LearningRate 0.0006 Epoch: 18 Global Step: 306950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:53:00,890-Speed 5173.87 samples/sec Loss 0.4547 LearningRate 0.0006 Epoch: 18 Global Step: 306960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:53:02,861-Speed 5197.83 samples/sec Loss 0.4352 LearningRate 0.0006 Epoch: 18 Global Step: 306970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:53:04,858-Speed 5129.90 samples/sec Loss 0.4192 LearningRate 0.0006 Epoch: 18 Global Step: 306980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:53:06,840-Speed 5166.28 samples/sec Loss 0.4124 LearningRate 0.0006 Epoch: 18 Global Step: 306990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:53:08,809-Speed 5203.76 samples/sec Loss 0.4197 LearningRate 0.0006 Epoch: 18 Global Step: 307000 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:53:10,807-Speed 5127.89 samples/sec Loss 0.4497 LearningRate 0.0006 Epoch: 18 Global Step: 307010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:53:12,790-Speed 5165.74 samples/sec Loss 0.4421 LearningRate 0.0006 Epoch: 18 Global Step: 307020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:53:14,761-Speed 5195.87 samples/sec Loss 0.4242 LearningRate 0.0006 Epoch: 18 Global Step: 307030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:53:16,734-Speed 5191.30 samples/sec Loss 0.4690 LearningRate 0.0006 Epoch: 18 Global Step: 307040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:53:18,703-Speed 5202.34 samples/sec Loss 0.4292 LearningRate 0.0006 Epoch: 18 Global Step: 307050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:53:20,672-Speed 5202.96 samples/sec Loss 0.4479 LearningRate 0.0006 Epoch: 18 Global Step: 307060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:53:22,649-Speed 5180.90 samples/sec Loss 0.4227 LearningRate 0.0006 Epoch: 18 Global Step: 307070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:53:24,618-Speed 5203.30 samples/sec Loss 0.4235 LearningRate 0.0006 Epoch: 18 Global Step: 307080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:53:26,602-Speed 5161.96 samples/sec Loss 0.4308 LearningRate 0.0006 Epoch: 18 Global Step: 307090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:53:28,570-Speed 5204.55 samples/sec Loss 0.4164 LearningRate 0.0006 Epoch: 18 Global Step: 307100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:53:30,541-Speed 5199.16 samples/sec Loss 0.4222 LearningRate 0.0006 Epoch: 18 Global Step: 307110 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:53:32,513-Speed 5193.15 samples/sec Loss 0.4199 LearningRate 0.0006 Epoch: 18 Global Step: 307120 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:53:34,484-Speed 5197.04 samples/sec Loss 0.4250 LearningRate 0.0006 Epoch: 18 Global Step: 307130 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:53:36,476-Speed 5143.42 samples/sec Loss 0.4354 LearningRate 0.0006 Epoch: 18 Global Step: 307140 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:53:38,448-Speed 5194.93 samples/sec Loss 0.4378 LearningRate 0.0006 Epoch: 18 Global Step: 307150 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:53:40,423-Speed 5184.60 samples/sec Loss 0.4372 LearningRate 0.0006 Epoch: 18 Global Step: 307160 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:53:42,392-Speed 5202.59 samples/sec Loss 0.4264 LearningRate 0.0006 Epoch: 18 Global Step: 307170 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:53:44,367-Speed 5188.10 samples/sec Loss 0.4320 LearningRate 0.0006 Epoch: 18 Global Step: 307180 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:53:46,351-Speed 5163.74 samples/sec Loss 0.4157 LearningRate 0.0006 Epoch: 18 Global Step: 307190 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:53:48,326-Speed 5184.60 samples/sec Loss 0.4324 LearningRate 0.0006 Epoch: 18 Global Step: 307200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:53:50,307-Speed 5171.10 samples/sec Loss 0.4267 LearningRate 0.0006 Epoch: 18 Global Step: 307210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:53:52,278-Speed 5197.37 samples/sec Loss 0.4083 LearningRate 0.0006 Epoch: 18 Global Step: 307220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:53:54,253-Speed 5186.43 samples/sec Loss 0.4580 LearningRate 0.0006 Epoch: 18 Global Step: 307230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:53:56,221-Speed 5205.63 samples/sec Loss 0.4473 LearningRate 0.0006 Epoch: 18 Global Step: 307240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:53:58,207-Speed 5156.59 samples/sec Loss 0.4132 LearningRate 0.0006 Epoch: 18 Global Step: 307250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:54:00,192-Speed 5176.34 samples/sec Loss 0.4306 LearningRate 0.0006 Epoch: 18 Global Step: 307260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:54:02,183-Speed 5145.10 samples/sec Loss 0.4228 LearningRate 0.0006 Epoch: 18 Global Step: 307270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:54:04,157-Speed 5188.60 samples/sec Loss 0.4291 LearningRate 0.0006 Epoch: 18 Global Step: 307280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:54:06,132-Speed 5187.05 samples/sec Loss 0.4247 LearningRate 0.0006 Epoch: 18 Global Step: 307290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:54:08,106-Speed 5189.01 samples/sec Loss 0.4455 LearningRate 0.0006 Epoch: 18 Global Step: 307300 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:54:10,103-Speed 5128.21 samples/sec Loss 0.4259 LearningRate 0.0006 Epoch: 18 Global Step: 307310 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:54:12,110-Speed 5103.74 samples/sec Loss 0.4581 LearningRate 0.0006 Epoch: 18 Global Step: 307320 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:54:14,084-Speed 5188.77 samples/sec Loss 0.4273 LearningRate 0.0006 Epoch: 18 Global Step: 307330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:54:16,056-Speed 5195.80 samples/sec Loss 0.4366 LearningRate 0.0006 Epoch: 18 Global Step: 307340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:54:18,026-Speed 5200.59 samples/sec Loss 0.4270 LearningRate 0.0006 Epoch: 18 Global Step: 307350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:54:19,998-Speed 5193.61 samples/sec Loss 0.4637 LearningRate 0.0006 Epoch: 18 Global Step: 307360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:54:21,988-Speed 5147.94 samples/sec Loss 0.4441 LearningRate 0.0006 Epoch: 18 Global Step: 307370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:54:23,984-Speed 5130.63 samples/sec Loss 0.4177 LearningRate 0.0006 Epoch: 18 Global Step: 307380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:54:25,960-Speed 5185.83 samples/sec Loss 0.4288 LearningRate 0.0006 Epoch: 18 Global Step: 307390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:54:27,936-Speed 5182.95 samples/sec Loss 0.4192 LearningRate 0.0006 Epoch: 18 Global Step: 307400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:54:29,912-Speed 5185.16 samples/sec Loss 0.4034 LearningRate 0.0006 Epoch: 18 Global Step: 307410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:54:31,888-Speed 5182.76 samples/sec Loss 0.4400 LearningRate 0.0006 Epoch: 18 Global Step: 307420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:54:33,909-Speed 5067.15 samples/sec Loss 0.4225 LearningRate 0.0006 Epoch: 18 Global Step: 307430 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:54:35,904-Speed 5136.02 samples/sec Loss 0.4107 LearningRate 0.0006 Epoch: 18 Global Step: 307440 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:54:37,887-Speed 5165.40 samples/sec Loss 0.4116 LearningRate 0.0006 Epoch: 18 Global Step: 307450 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:54:39,879-Speed 5144.11 samples/sec Loss 0.4077 LearningRate 0.0006 Epoch: 18 Global Step: 307460 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:54:41,852-Speed 5190.66 samples/sec Loss 0.4159 LearningRate 0.0006 Epoch: 18 Global Step: 307470 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:54:43,827-Speed 5185.75 samples/sec Loss 0.4411 LearningRate 0.0006 Epoch: 18 Global Step: 307480 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:54:45,823-Speed 5134.24 samples/sec Loss 0.4206 LearningRate 0.0006 Epoch: 18 Global Step: 307490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:54:47,796-Speed 5191.24 samples/sec Loss 0.4457 LearningRate 0.0006 Epoch: 18 Global Step: 307500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:54:49,791-Speed 5133.96 samples/sec Loss 0.4365 LearningRate 0.0006 Epoch: 18 Global Step: 307510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:54:51,770-Speed 5176.82 samples/sec Loss 0.4440 LearningRate 0.0006 Epoch: 18 Global Step: 307520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:54:53,747-Speed 5180.61 samples/sec Loss 0.4530 LearningRate 0.0006 Epoch: 18 Global Step: 307530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:54:55,726-Speed 5177.07 samples/sec Loss 0.4399 LearningRate 0.0006 Epoch: 18 Global Step: 307540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:54:57,722-Speed 5131.80 samples/sec Loss 0.4268 LearningRate 0.0006 Epoch: 18 Global Step: 307550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:54:59,710-Speed 5154.18 samples/sec Loss 0.4051 LearningRate 0.0006 Epoch: 18 Global Step: 307560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:55:01,700-Speed 5147.78 samples/sec Loss 0.4525 LearningRate 0.0006 Epoch: 18 Global Step: 307570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:55:03,692-Speed 5140.72 samples/sec Loss 0.4264 LearningRate 0.0006 Epoch: 18 Global Step: 307580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:55:05,691-Speed 5125.40 samples/sec Loss 0.4292 LearningRate 0.0006 Epoch: 18 Global Step: 307590 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:55:07,668-Speed 5182.28 samples/sec Loss 0.4274 LearningRate 0.0006 Epoch: 18 Global Step: 307600 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:55:09,642-Speed 5188.93 samples/sec Loss 0.4328 LearningRate 0.0006 Epoch: 18 Global Step: 307610 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:55:11,644-Speed 5115.80 samples/sec Loss 0.4300 LearningRate 0.0006 Epoch: 18 Global Step: 307620 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:55:13,643-Speed 5127.69 samples/sec Loss 0.4358 LearningRate 0.0006 Epoch: 18 Global Step: 307630 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:55:15,651-Speed 5101.89 samples/sec Loss 0.4089 LearningRate 0.0006 Epoch: 18 Global Step: 307640 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:55:17,627-Speed 5182.86 samples/sec Loss 0.4238 LearningRate 0.0006 Epoch: 18 Global Step: 307650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:55:19,621-Speed 5137.41 samples/sec Loss 0.4307 LearningRate 0.0006 Epoch: 18 Global Step: 307660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:55:21,595-Speed 5191.05 samples/sec Loss 0.4249 LearningRate 0.0006 Epoch: 18 Global Step: 307670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:55:23,592-Speed 5128.81 samples/sec Loss 0.4306 LearningRate 0.0006 Epoch: 18 Global Step: 307680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:55:25,566-Speed 5188.47 samples/sec Loss 0.4330 LearningRate 0.0006 Epoch: 18 Global Step: 307690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:55:27,547-Speed 5172.46 samples/sec Loss 0.4555 LearningRate 0.0006 Epoch: 18 Global Step: 307700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:55:29,525-Speed 5178.34 samples/sec Loss 0.4337 LearningRate 0.0006 Epoch: 18 Global Step: 307710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:55:31,500-Speed 5184.36 samples/sec Loss 0.3954 LearningRate 0.0006 Epoch: 18 Global Step: 307720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:55:33,507-Speed 5105.01 samples/sec Loss 0.4424 LearningRate 0.0006 Epoch: 18 Global Step: 307730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:55:35,481-Speed 5190.22 samples/sec Loss 0.4272 LearningRate 0.0006 Epoch: 18 Global Step: 307740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:55:37,460-Speed 5173.75 samples/sec Loss 0.4199 LearningRate 0.0006 Epoch: 18 Global Step: 307750 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:55:39,455-Speed 5136.02 samples/sec Loss 0.4340 LearningRate 0.0006 Epoch: 18 Global Step: 307760 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:55:41,427-Speed 5194.19 samples/sec Loss 0.4059 LearningRate 0.0006 Epoch: 18 Global Step: 307770 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:55:43,404-Speed 5180.75 samples/sec Loss 0.4316 LearningRate 0.0006 Epoch: 18 Global Step: 307780 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:55:45,388-Speed 5163.61 samples/sec Loss 0.4220 LearningRate 0.0006 Epoch: 18 Global Step: 307790 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:55:47,369-Speed 5170.43 samples/sec Loss 0.4353 LearningRate 0.0006 Epoch: 18 Global Step: 307800 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:55:49,369-Speed 5121.51 samples/sec Loss 0.4276 LearningRate 0.0006 Epoch: 18 Global Step: 307810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:55:51,355-Speed 5157.76 samples/sec Loss 0.4184 LearningRate 0.0006 Epoch: 18 Global Step: 307820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:55:53,328-Speed 5193.93 samples/sec Loss 0.4469 LearningRate 0.0006 Epoch: 18 Global Step: 307830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:55:55,304-Speed 5182.13 samples/sec Loss 0.4121 LearningRate 0.0006 Epoch: 18 Global Step: 307840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:55:57,283-Speed 5175.48 samples/sec Loss 0.4252 LearningRate 0.0006 Epoch: 18 Global Step: 307850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:55:59,270-Speed 5154.88 samples/sec Loss 0.4404 LearningRate 0.0006 Epoch: 18 Global Step: 307860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:56:01,246-Speed 5185.71 samples/sec Loss 0.4295 LearningRate 0.0006 Epoch: 18 Global Step: 307870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:56:03,270-Speed 5060.37 samples/sec Loss 0.4102 LearningRate 0.0006 Epoch: 18 Global Step: 307880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:56:05,264-Speed 5138.05 samples/sec Loss 0.4479 LearningRate 0.0006 Epoch: 18 Global Step: 307890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:56:07,238-Speed 5189.06 samples/sec Loss 0.4209 LearningRate 0.0006 Epoch: 18 Global Step: 307900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:56:09,223-Speed 5161.15 samples/sec Loss 0.4259 LearningRate 0.0006 Epoch: 18 Global Step: 307910 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:56:11,196-Speed 5190.64 samples/sec Loss 0.4231 LearningRate 0.0006 Epoch: 18 Global Step: 307920 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:56:13,172-Speed 5185.20 samples/sec Loss 0.4565 LearningRate 0.0006 Epoch: 18 Global Step: 307930 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:56:15,154-Speed 5169.19 samples/sec Loss 0.4316 LearningRate 0.0006 Epoch: 18 Global Step: 307940 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:56:17,130-Speed 5183.66 samples/sec Loss 0.4670 LearningRate 0.0006 Epoch: 18 Global Step: 307950 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:56:19,107-Speed 5179.26 samples/sec Loss 0.4344 LearningRate 0.0006 Epoch: 18 Global Step: 307960 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:56:21,099-Speed 5142.58 samples/sec Loss 0.4480 LearningRate 0.0006 Epoch: 18 Global Step: 307970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:56:23,092-Speed 5139.48 samples/sec Loss 0.4297 LearningRate 0.0006 Epoch: 18 Global Step: 307980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:56:25,091-Speed 5123.88 samples/sec Loss 0.4317 LearningRate 0.0006 Epoch: 18 Global Step: 307990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:56:27,085-Speed 5138.63 samples/sec Loss 0.4612 LearningRate 0.0006 Epoch: 18 Global Step: 308000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:56:53,644-[lfw][308000]XNorm: 21.329993 Training: 2022-04-11 19:56:53,645-[lfw][308000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 19:56:53,645-[lfw][308000]Accuracy-Highest: 0.99833 Training: 2022-04-11 19:57:24,580-[cfp_fp][308000]XNorm: 21.799297 Training: 2022-04-11 19:57:24,581-[cfp_fp][308000]Accuracy-Flip: 0.99029+-0.00460 Training: 2022-04-11 19:57:24,582-[cfp_fp][308000]Accuracy-Highest: 0.99029 Training: 2022-04-11 19:57:51,234-[agedb_30][308000]XNorm: 22.507954 Training: 2022-04-11 19:57:51,235-[agedb_30][308000]Accuracy-Flip: 0.98383+-0.00683 Training: 2022-04-11 19:57:51,235-[agedb_30][308000]Accuracy-Highest: 0.98383 Training: 2022-04-11 19:57:53,214-Speed 118.89 samples/sec Loss 0.4217 LearningRate 0.0006 Epoch: 18 Global Step: 308010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:57:55,176-Speed 5219.88 samples/sec Loss 0.4315 LearningRate 0.0006 Epoch: 18 Global Step: 308020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:57:57,139-Speed 5217.61 samples/sec Loss 0.4313 LearningRate 0.0006 Epoch: 18 Global Step: 308030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:57:59,109-Speed 5198.56 samples/sec Loss 0.4283 LearningRate 0.0006 Epoch: 18 Global Step: 308040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:58:01,077-Speed 5208.09 samples/sec Loss 0.4283 LearningRate 0.0006 Epoch: 18 Global Step: 308050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:58:03,051-Speed 5188.48 samples/sec Loss 0.4133 LearningRate 0.0006 Epoch: 18 Global Step: 308060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:58:05,016-Speed 5213.78 samples/sec Loss 0.4439 LearningRate 0.0006 Epoch: 18 Global Step: 308070 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:58:06,986-Speed 5199.96 samples/sec Loss 0.4413 LearningRate 0.0006 Epoch: 18 Global Step: 308080 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:58:08,967-Speed 5170.59 samples/sec Loss 0.4068 LearningRate 0.0006 Epoch: 18 Global Step: 308090 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:58:10,953-Speed 5155.34 samples/sec Loss 0.4441 LearningRate 0.0006 Epoch: 18 Global Step: 308100 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:58:12,945-Speed 5142.55 samples/sec Loss 0.4489 LearningRate 0.0006 Epoch: 18 Global Step: 308110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:58:14,928-Speed 5166.89 samples/sec Loss 0.4133 LearningRate 0.0006 Epoch: 18 Global Step: 308120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:58:16,902-Speed 5187.85 samples/sec Loss 0.4254 LearningRate 0.0006 Epoch: 18 Global Step: 308130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:58:18,902-Speed 5121.39 samples/sec Loss 0.4396 LearningRate 0.0006 Epoch: 18 Global Step: 308140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:58:20,870-Speed 5207.41 samples/sec Loss 0.4239 LearningRate 0.0006 Epoch: 18 Global Step: 308150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:58:22,839-Speed 5199.83 samples/sec Loss 0.4418 LearningRate 0.0006 Epoch: 18 Global Step: 308160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:58:24,817-Speed 5178.49 samples/sec Loss 0.4494 LearningRate 0.0006 Epoch: 18 Global Step: 308170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:58:26,808-Speed 5146.68 samples/sec Loss 0.4334 LearningRate 0.0006 Epoch: 18 Global Step: 308180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:58:28,809-Speed 5119.23 samples/sec Loss 0.4183 LearningRate 0.0006 Epoch: 18 Global Step: 308190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:58:30,779-Speed 5199.64 samples/sec Loss 0.4242 LearningRate 0.0006 Epoch: 18 Global Step: 308200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:58:32,750-Speed 5196.04 samples/sec Loss 0.4410 LearningRate 0.0006 Epoch: 18 Global Step: 308210 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:58:34,733-Speed 5165.29 samples/sec Loss 0.4191 LearningRate 0.0006 Epoch: 18 Global Step: 308220 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:58:36,713-Speed 5175.22 samples/sec Loss 0.4314 LearningRate 0.0006 Epoch: 18 Global Step: 308230 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:58:38,683-Speed 5198.21 samples/sec Loss 0.4201 LearningRate 0.0006 Epoch: 18 Global Step: 308240 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:58:40,647-Speed 5216.61 samples/sec Loss 0.4482 LearningRate 0.0006 Epoch: 18 Global Step: 308250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:58:42,618-Speed 5197.19 samples/sec Loss 0.4277 LearningRate 0.0006 Epoch: 18 Global Step: 308260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:58:44,599-Speed 5169.07 samples/sec Loss 0.4445 LearningRate 0.0006 Epoch: 18 Global Step: 308270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:58:46,581-Speed 5168.49 samples/sec Loss 0.4325 LearningRate 0.0006 Epoch: 18 Global Step: 308280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:58:48,574-Speed 5140.94 samples/sec Loss 0.4221 LearningRate 0.0006 Epoch: 18 Global Step: 308290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:58:50,579-Speed 5107.84 samples/sec Loss 0.4345 LearningRate 0.0006 Epoch: 18 Global Step: 308300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:58:52,571-Speed 5142.52 samples/sec Loss 0.4119 LearningRate 0.0006 Epoch: 18 Global Step: 308310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:58:54,560-Speed 5150.02 samples/sec Loss 0.4124 LearningRate 0.0006 Epoch: 18 Global Step: 308320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:58:56,548-Speed 5154.97 samples/sec Loss 0.4308 LearningRate 0.0006 Epoch: 18 Global Step: 308330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:58:58,520-Speed 5193.50 samples/sec Loss 0.4399 LearningRate 0.0006 Epoch: 18 Global Step: 308340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:59:00,495-Speed 5186.65 samples/sec Loss 0.4370 LearningRate 0.0006 Epoch: 18 Global Step: 308350 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:59:02,464-Speed 5200.19 samples/sec Loss 0.4460 LearningRate 0.0006 Epoch: 18 Global Step: 308360 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:59:04,447-Speed 5165.85 samples/sec Loss 0.4333 LearningRate 0.0006 Epoch: 18 Global Step: 308370 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:59:06,416-Speed 5204.85 samples/sec Loss 0.3946 LearningRate 0.0006 Epoch: 18 Global Step: 308380 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:59:08,416-Speed 5120.82 samples/sec Loss 0.4260 LearningRate 0.0006 Epoch: 18 Global Step: 308390 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:59:10,383-Speed 5206.00 samples/sec Loss 0.4282 LearningRate 0.0006 Epoch: 18 Global Step: 308400 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:59:12,354-Speed 5197.61 samples/sec Loss 0.4737 LearningRate 0.0006 Epoch: 18 Global Step: 308410 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:59:14,331-Speed 5181.10 samples/sec Loss 0.4313 LearningRate 0.0006 Epoch: 18 Global Step: 308420 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:59:16,303-Speed 5195.41 samples/sec Loss 0.4091 LearningRate 0.0006 Epoch: 18 Global Step: 308430 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:59:18,280-Speed 5182.95 samples/sec Loss 0.4314 LearningRate 0.0006 Epoch: 18 Global Step: 308440 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:59:20,244-Speed 5215.78 samples/sec Loss 0.4300 LearningRate 0.0006 Epoch: 18 Global Step: 308450 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:59:22,225-Speed 5170.32 samples/sec Loss 0.4166 LearningRate 0.0006 Epoch: 18 Global Step: 308460 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 19:59:24,197-Speed 5196.17 samples/sec Loss 0.4466 LearningRate 0.0006 Epoch: 18 Global Step: 308470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:59:26,170-Speed 5190.99 samples/sec Loss 0.4453 LearningRate 0.0006 Epoch: 18 Global Step: 308480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:59:28,140-Speed 5200.32 samples/sec Loss 0.4254 LearningRate 0.0006 Epoch: 18 Global Step: 308490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:59:30,119-Speed 5174.80 samples/sec Loss 0.4188 LearningRate 0.0006 Epoch: 18 Global Step: 308500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:59:32,090-Speed 5197.65 samples/sec Loss 0.4154 LearningRate 0.0006 Epoch: 18 Global Step: 308510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:59:34,080-Speed 5145.54 samples/sec Loss 0.4420 LearningRate 0.0006 Epoch: 18 Global Step: 308520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:59:36,055-Speed 5188.79 samples/sec Loss 0.4391 LearningRate 0.0006 Epoch: 18 Global Step: 308530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:59:38,027-Speed 5195.71 samples/sec Loss 0.4179 LearningRate 0.0006 Epoch: 18 Global Step: 308540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:59:39,998-Speed 5197.61 samples/sec Loss 0.4255 LearningRate 0.0006 Epoch: 18 Global Step: 308550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:59:41,970-Speed 5192.51 samples/sec Loss 0.4523 LearningRate 0.0006 Epoch: 18 Global Step: 308560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:59:43,934-Speed 5215.64 samples/sec Loss 0.4328 LearningRate 0.0006 Epoch: 18 Global Step: 308570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:59:45,906-Speed 5195.07 samples/sec Loss 0.4138 LearningRate 0.0006 Epoch: 18 Global Step: 308580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:59:47,887-Speed 5170.40 samples/sec Loss 0.4502 LearningRate 0.0006 Epoch: 18 Global Step: 308590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:59:49,868-Speed 5170.07 samples/sec Loss 0.4170 LearningRate 0.0006 Epoch: 18 Global Step: 308600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:59:51,851-Speed 5167.78 samples/sec Loss 0.4383 LearningRate 0.0006 Epoch: 18 Global Step: 308610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:59:53,826-Speed 5184.92 samples/sec Loss 0.4341 LearningRate 0.0006 Epoch: 18 Global Step: 308620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:59:55,801-Speed 5186.61 samples/sec Loss 0.4480 LearningRate 0.0006 Epoch: 18 Global Step: 308630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:59:57,771-Speed 5200.04 samples/sec Loss 0.4438 LearningRate 0.0006 Epoch: 18 Global Step: 308640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 19:59:59,758-Speed 5156.65 samples/sec Loss 0.4043 LearningRate 0.0006 Epoch: 18 Global Step: 308650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:00:01,730-Speed 5192.94 samples/sec Loss 0.4370 LearningRate 0.0006 Epoch: 18 Global Step: 308660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:00:03,705-Speed 5186.33 samples/sec Loss 0.4139 LearningRate 0.0006 Epoch: 18 Global Step: 308670 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:00:05,691-Speed 5159.77 samples/sec Loss 0.4387 LearningRate 0.0006 Epoch: 18 Global Step: 308680 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:00:07,652-Speed 5221.66 samples/sec Loss 0.4098 LearningRate 0.0006 Epoch: 18 Global Step: 308690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:00:09,650-Speed 5127.87 samples/sec Loss 0.4441 LearningRate 0.0006 Epoch: 18 Global Step: 308700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:00:11,642-Speed 5141.92 samples/sec Loss 0.4200 LearningRate 0.0006 Epoch: 18 Global Step: 308710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:00:13,634-Speed 5144.26 samples/sec Loss 0.4134 LearningRate 0.0006 Epoch: 18 Global Step: 308720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:00:15,612-Speed 5178.96 samples/sec Loss 0.4030 LearningRate 0.0006 Epoch: 18 Global Step: 308730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:00:17,597-Speed 5159.87 samples/sec Loss 0.4251 LearningRate 0.0006 Epoch: 18 Global Step: 308740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:00:19,567-Speed 5200.71 samples/sec Loss 0.4125 LearningRate 0.0006 Epoch: 18 Global Step: 308750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:00:21,540-Speed 5191.68 samples/sec Loss 0.4386 LearningRate 0.0006 Epoch: 18 Global Step: 308760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:00:23,515-Speed 5186.32 samples/sec Loss 0.4229 LearningRate 0.0006 Epoch: 18 Global Step: 308770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:00:25,486-Speed 5197.02 samples/sec Loss 0.4004 LearningRate 0.0006 Epoch: 18 Global Step: 308780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:00:27,456-Speed 5199.80 samples/sec Loss 0.4192 LearningRate 0.0006 Epoch: 18 Global Step: 308790 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:00:29,438-Speed 5168.35 samples/sec Loss 0.4455 LearningRate 0.0006 Epoch: 18 Global Step: 308800 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:00:31,401-Speed 5217.70 samples/sec Loss 0.4215 LearningRate 0.0006 Epoch: 18 Global Step: 308810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:00:33,381-Speed 5173.45 samples/sec Loss 0.4141 LearningRate 0.0006 Epoch: 18 Global Step: 308820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:00:35,350-Speed 5201.54 samples/sec Loss 0.4400 LearningRate 0.0006 Epoch: 18 Global Step: 308830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:00:37,327-Speed 5181.55 samples/sec Loss 0.4302 LearningRate 0.0006 Epoch: 18 Global Step: 308840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:00:39,326-Speed 5125.07 samples/sec Loss 0.4214 LearningRate 0.0006 Epoch: 18 Global Step: 308850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:00:41,306-Speed 5172.31 samples/sec Loss 0.4139 LearningRate 0.0006 Epoch: 18 Global Step: 308860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:00:43,280-Speed 5189.98 samples/sec Loss 0.4528 LearningRate 0.0006 Epoch: 18 Global Step: 308870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:00:45,269-Speed 5150.99 samples/sec Loss 0.4345 LearningRate 0.0006 Epoch: 18 Global Step: 308880 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:00:47,265-Speed 5133.17 samples/sec Loss 0.4308 LearningRate 0.0006 Epoch: 18 Global Step: 308890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:00:49,237-Speed 5193.65 samples/sec Loss 0.4448 LearningRate 0.0006 Epoch: 18 Global Step: 308900 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:00:51,206-Speed 5201.73 samples/sec Loss 0.4189 LearningRate 0.0006 Epoch: 18 Global Step: 308910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:00:53,189-Speed 5165.54 samples/sec Loss 0.4239 LearningRate 0.0006 Epoch: 18 Global Step: 308920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:00:55,162-Speed 5192.43 samples/sec Loss 0.4190 LearningRate 0.0006 Epoch: 18 Global Step: 308930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:00:57,133-Speed 5195.81 samples/sec Loss 0.4345 LearningRate 0.0006 Epoch: 18 Global Step: 308940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:00:59,128-Speed 5133.67 samples/sec Loss 0.4042 LearningRate 0.0006 Epoch: 18 Global Step: 308950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:01:01,130-Speed 5118.79 samples/sec Loss 0.4376 LearningRate 0.0006 Epoch: 18 Global Step: 308960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:01:03,150-Speed 5069.93 samples/sec Loss 0.3998 LearningRate 0.0006 Epoch: 18 Global Step: 308970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:01:05,176-Speed 5055.39 samples/sec Loss 0.4500 LearningRate 0.0006 Epoch: 18 Global Step: 308980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:01:07,148-Speed 5197.31 samples/sec Loss 0.4250 LearningRate 0.0006 Epoch: 18 Global Step: 308990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:01:09,133-Speed 5159.51 samples/sec Loss 0.4300 LearningRate 0.0006 Epoch: 18 Global Step: 309000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:01:11,109-Speed 5184.31 samples/sec Loss 0.4234 LearningRate 0.0006 Epoch: 18 Global Step: 309010 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:01:13,092-Speed 5164.19 samples/sec Loss 0.4455 LearningRate 0.0006 Epoch: 18 Global Step: 309020 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:01:15,066-Speed 5190.04 samples/sec Loss 0.4416 LearningRate 0.0006 Epoch: 18 Global Step: 309030 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:01:17,048-Speed 5168.74 samples/sec Loss 0.4257 LearningRate 0.0006 Epoch: 18 Global Step: 309040 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:01:19,017-Speed 5200.20 samples/sec Loss 0.4443 LearningRate 0.0006 Epoch: 18 Global Step: 309050 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:01:20,982-Speed 5212.81 samples/sec Loss 0.4238 LearningRate 0.0006 Epoch: 18 Global Step: 309060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:01:22,948-Speed 5210.48 samples/sec Loss 0.4165 LearningRate 0.0005 Epoch: 18 Global Step: 309070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 20:01:24,943-Speed 5138.56 samples/sec Loss 0.4212 LearningRate 0.0005 Epoch: 18 Global Step: 309080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 20:01:26,939-Speed 5132.78 samples/sec Loss 0.4387 LearningRate 0.0005 Epoch: 18 Global Step: 309090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 20:01:28,921-Speed 5168.38 samples/sec Loss 0.4202 LearningRate 0.0005 Epoch: 18 Global Step: 309100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 20:01:30,891-Speed 5200.30 samples/sec Loss 0.4389 LearningRate 0.0005 Epoch: 18 Global Step: 309110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 20:01:32,860-Speed 5201.88 samples/sec Loss 0.4348 LearningRate 0.0005 Epoch: 18 Global Step: 309120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 20:01:34,843-Speed 5165.19 samples/sec Loss 0.4536 LearningRate 0.0005 Epoch: 18 Global Step: 309130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 20:01:36,818-Speed 5188.22 samples/sec Loss 0.4614 LearningRate 0.0005 Epoch: 18 Global Step: 309140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 20:01:38,808-Speed 5146.80 samples/sec Loss 0.4358 LearningRate 0.0005 Epoch: 18 Global Step: 309150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 20:01:40,786-Speed 5177.76 samples/sec Loss 0.4264 LearningRate 0.0005 Epoch: 18 Global Step: 309160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-11 20:01:42,757-Speed 5196.14 samples/sec Loss 0.4163 LearningRate 0.0005 Epoch: 18 Global Step: 309170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:01:44,729-Speed 5198.88 samples/sec Loss 0.4298 LearningRate 0.0005 Epoch: 18 Global Step: 309180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:01:46,707-Speed 5176.92 samples/sec Loss 0.4340 LearningRate 0.0005 Epoch: 18 Global Step: 309190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:01:48,677-Speed 5199.19 samples/sec Loss 0.4284 LearningRate 0.0005 Epoch: 18 Global Step: 309200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:01:50,674-Speed 5129.85 samples/sec Loss 0.4250 LearningRate 0.0005 Epoch: 18 Global Step: 309210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:01:52,653-Speed 5177.85 samples/sec Loss 0.4683 LearningRate 0.0005 Epoch: 18 Global Step: 309220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:01:54,636-Speed 5163.95 samples/sec Loss 0.4221 LearningRate 0.0005 Epoch: 18 Global Step: 309230 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:01:56,615-Speed 5178.01 samples/sec Loss 0.4346 LearningRate 0.0005 Epoch: 18 Global Step: 309240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:01:58,597-Speed 5168.58 samples/sec Loss 0.4312 LearningRate 0.0005 Epoch: 18 Global Step: 309250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:02:00,569-Speed 5191.94 samples/sec Loss 0.4297 LearningRate 0.0005 Epoch: 18 Global Step: 309260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:02:02,567-Speed 5126.79 samples/sec Loss 0.4220 LearningRate 0.0005 Epoch: 18 Global Step: 309270 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:02:04,534-Speed 5209.55 samples/sec Loss 0.4301 LearningRate 0.0005 Epoch: 18 Global Step: 309280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:02:06,504-Speed 5198.75 samples/sec Loss 0.4138 LearningRate 0.0005 Epoch: 18 Global Step: 309290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:02:08,479-Speed 5185.34 samples/sec Loss 0.4015 LearningRate 0.0005 Epoch: 18 Global Step: 309300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:02:10,470-Speed 5144.75 samples/sec Loss 0.4159 LearningRate 0.0005 Epoch: 18 Global Step: 309310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:02:12,466-Speed 5133.78 samples/sec Loss 0.4381 LearningRate 0.0005 Epoch: 18 Global Step: 309320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:02:14,447-Speed 5172.12 samples/sec Loss 0.4517 LearningRate 0.0005 Epoch: 18 Global Step: 309330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:02:16,430-Speed 5165.27 samples/sec Loss 0.4442 LearningRate 0.0005 Epoch: 18 Global Step: 309340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:02:18,405-Speed 5186.44 samples/sec Loss 0.4443 LearningRate 0.0005 Epoch: 18 Global Step: 309350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:02:20,378-Speed 5191.51 samples/sec Loss 0.4302 LearningRate 0.0005 Epoch: 18 Global Step: 309360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:02:22,371-Speed 5140.52 samples/sec Loss 0.4567 LearningRate 0.0005 Epoch: 18 Global Step: 309370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:02:24,346-Speed 5185.90 samples/sec Loss 0.4449 LearningRate 0.0005 Epoch: 18 Global Step: 309380 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:02:26,335-Speed 5148.71 samples/sec Loss 0.4209 LearningRate 0.0005 Epoch: 18 Global Step: 309390 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:02:28,313-Speed 5179.64 samples/sec Loss 0.4301 LearningRate 0.0005 Epoch: 18 Global Step: 309400 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:02:30,315-Speed 5116.48 samples/sec Loss 0.4364 LearningRate 0.0005 Epoch: 18 Global Step: 309410 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:02:32,288-Speed 5190.48 samples/sec Loss 0.4309 LearningRate 0.0005 Epoch: 18 Global Step: 309420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:02:34,274-Speed 5158.24 samples/sec Loss 0.4136 LearningRate 0.0005 Epoch: 18 Global Step: 309430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:02:36,283-Speed 5099.11 samples/sec Loss 0.4413 LearningRate 0.0005 Epoch: 18 Global Step: 309440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:02:38,272-Speed 5150.62 samples/sec Loss 0.4209 LearningRate 0.0005 Epoch: 18 Global Step: 309450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:02:40,257-Speed 5159.90 samples/sec Loss 0.4090 LearningRate 0.0005 Epoch: 18 Global Step: 309460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:02:42,252-Speed 5136.51 samples/sec Loss 0.4271 LearningRate 0.0005 Epoch: 18 Global Step: 309470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:02:44,225-Speed 5191.40 samples/sec Loss 0.4298 LearningRate 0.0005 Epoch: 18 Global Step: 309480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:02:46,220-Speed 5134.72 samples/sec Loss 0.4304 LearningRate 0.0005 Epoch: 18 Global Step: 309490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:02:48,199-Speed 5174.85 samples/sec Loss 0.4438 LearningRate 0.0005 Epoch: 18 Global Step: 309500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:02:50,173-Speed 5188.50 samples/sec Loss 0.4343 LearningRate 0.0005 Epoch: 18 Global Step: 309510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:02:52,163-Speed 5158.70 samples/sec Loss 0.4253 LearningRate 0.0005 Epoch: 18 Global Step: 309520 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:02:54,156-Speed 5139.86 samples/sec Loss 0.4246 LearningRate 0.0005 Epoch: 18 Global Step: 309530 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:02:56,160-Speed 5112.33 samples/sec Loss 0.4134 LearningRate 0.0005 Epoch: 18 Global Step: 309540 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:02:58,153-Speed 5137.35 samples/sec Loss 0.4322 LearningRate 0.0005 Epoch: 18 Global Step: 309550 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:03:00,160-Speed 5105.58 samples/sec Loss 0.4370 LearningRate 0.0005 Epoch: 18 Global Step: 309560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:03:02,149-Speed 5149.47 samples/sec Loss 0.4196 LearningRate 0.0005 Epoch: 18 Global Step: 309570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:03:04,127-Speed 5179.31 samples/sec Loss 0.4638 LearningRate 0.0005 Epoch: 18 Global Step: 309580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:03:06,100-Speed 5191.45 samples/sec Loss 0.4143 LearningRate 0.0005 Epoch: 18 Global Step: 309590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:03:08,096-Speed 5133.19 samples/sec Loss 0.4162 LearningRate 0.0005 Epoch: 18 Global Step: 309600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:03:10,121-Speed 5057.96 samples/sec Loss 0.4346 LearningRate 0.0005 Epoch: 18 Global Step: 309610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:03:12,143-Speed 5067.04 samples/sec Loss 0.4329 LearningRate 0.0005 Epoch: 18 Global Step: 309620 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:03:14,141-Speed 5126.44 samples/sec Loss 0.4374 LearningRate 0.0005 Epoch: 18 Global Step: 309630 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:03:16,115-Speed 5189.28 samples/sec Loss 0.4237 LearningRate 0.0005 Epoch: 18 Global Step: 309640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:03:18,111-Speed 5132.69 samples/sec Loss 0.4629 LearningRate 0.0005 Epoch: 18 Global Step: 309650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:03:20,083-Speed 5193.89 samples/sec Loss 0.4228 LearningRate 0.0005 Epoch: 18 Global Step: 309660 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:03:22,071-Speed 5155.26 samples/sec Loss 0.4458 LearningRate 0.0005 Epoch: 18 Global Step: 309670 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:03:24,064-Speed 5138.23 samples/sec Loss 0.4216 LearningRate 0.0005 Epoch: 18 Global Step: 309680 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:03:26,062-Speed 5128.40 samples/sec Loss 0.4109 LearningRate 0.0005 Epoch: 18 Global Step: 309690 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:03:28,051-Speed 5148.38 samples/sec Loss 0.4182 LearningRate 0.0005 Epoch: 18 Global Step: 309700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:03:30,021-Speed 5199.31 samples/sec Loss 0.4456 LearningRate 0.0005 Epoch: 18 Global Step: 309710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:03:32,002-Speed 5171.54 samples/sec Loss 0.4331 LearningRate 0.0005 Epoch: 18 Global Step: 309720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:03:33,978-Speed 5185.15 samples/sec Loss 0.4171 LearningRate 0.0005 Epoch: 18 Global Step: 309730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:03:35,962-Speed 5163.39 samples/sec Loss 0.4447 LearningRate 0.0005 Epoch: 18 Global Step: 309740 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:03:37,939-Speed 5181.07 samples/sec Loss 0.4395 LearningRate 0.0005 Epoch: 18 Global Step: 309750 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:03:39,913-Speed 5189.95 samples/sec Loss 0.4289 LearningRate 0.0005 Epoch: 18 Global Step: 309760 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:03:41,885-Speed 5192.60 samples/sec Loss 0.4387 LearningRate 0.0005 Epoch: 18 Global Step: 309770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:03:43,859-Speed 5190.74 samples/sec Loss 0.4246 LearningRate 0.0005 Epoch: 18 Global Step: 309780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:03:45,842-Speed 5165.53 samples/sec Loss 0.4116 LearningRate 0.0005 Epoch: 18 Global Step: 309790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:03:47,827-Speed 5158.51 samples/sec Loss 0.4451 LearningRate 0.0005 Epoch: 18 Global Step: 309800 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:03:49,811-Speed 5163.95 samples/sec Loss 0.4346 LearningRate 0.0005 Epoch: 18 Global Step: 309810 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:03:51,786-Speed 5188.05 samples/sec Loss 0.4007 LearningRate 0.0005 Epoch: 18 Global Step: 309820 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:03:53,780-Speed 5135.78 samples/sec Loss 0.4116 LearningRate 0.0005 Epoch: 18 Global Step: 309830 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:03:55,754-Speed 5190.94 samples/sec Loss 0.4562 LearningRate 0.0005 Epoch: 18 Global Step: 309840 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:03:57,726-Speed 5192.41 samples/sec Loss 0.4524 LearningRate 0.0005 Epoch: 18 Global Step: 309850 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:03:59,699-Speed 5191.73 samples/sec Loss 0.4255 LearningRate 0.0005 Epoch: 18 Global Step: 309860 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:04:01,693-Speed 5136.70 samples/sec Loss 0.4520 LearningRate 0.0005 Epoch: 18 Global Step: 309870 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:04:03,667-Speed 5190.13 samples/sec Loss 0.4210 LearningRate 0.0005 Epoch: 18 Global Step: 309880 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:04:05,653-Speed 5158.17 samples/sec Loss 0.4217 LearningRate 0.0005 Epoch: 18 Global Step: 309890 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:04:07,626-Speed 5191.52 samples/sec Loss 0.3935 LearningRate 0.0005 Epoch: 18 Global Step: 309900 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:04:09,626-Speed 5121.34 samples/sec Loss 0.4109 LearningRate 0.0005 Epoch: 18 Global Step: 309910 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:04:11,598-Speed 5193.27 samples/sec Loss 0.4250 LearningRate 0.0005 Epoch: 18 Global Step: 309920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:04:13,585-Speed 5156.33 samples/sec Loss 0.4455 LearningRate 0.0005 Epoch: 18 Global Step: 309930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:04:15,583-Speed 5128.32 samples/sec Loss 0.4357 LearningRate 0.0005 Epoch: 18 Global Step: 309940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:04:17,572-Speed 5149.73 samples/sec Loss 0.4441 LearningRate 0.0005 Epoch: 18 Global Step: 309950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:04:19,542-Speed 5200.11 samples/sec Loss 0.4204 LearningRate 0.0005 Epoch: 18 Global Step: 309960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:04:21,515-Speed 5191.50 samples/sec Loss 0.4357 LearningRate 0.0005 Epoch: 18 Global Step: 309970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:04:23,491-Speed 5184.49 samples/sec Loss 0.4442 LearningRate 0.0005 Epoch: 18 Global Step: 309980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:04:25,466-Speed 5186.52 samples/sec Loss 0.4206 LearningRate 0.0005 Epoch: 18 Global Step: 309990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:04:27,442-Speed 5184.02 samples/sec Loss 0.4164 LearningRate 0.0005 Epoch: 18 Global Step: 310000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:04:54,010-[lfw][310000]XNorm: 21.424494 Training: 2022-04-11 20:04:54,011-[lfw][310000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 20:04:54,011-[lfw][310000]Accuracy-Highest: 0.99833 Training: 2022-04-11 20:05:24,741-[cfp_fp][310000]XNorm: 21.904959 Training: 2022-04-11 20:05:24,742-[cfp_fp][310000]Accuracy-Flip: 0.98900+-0.00409 Training: 2022-04-11 20:05:24,742-[cfp_fp][310000]Accuracy-Highest: 0.99029 Training: 2022-04-11 20:05:51,235-[agedb_30][310000]XNorm: 22.523392 Training: 2022-04-11 20:05:51,235-[agedb_30][310000]Accuracy-Flip: 0.98333+-0.00645 Training: 2022-04-11 20:05:51,236-[agedb_30][310000]Accuracy-Highest: 0.98383 Training: 2022-04-11 20:05:53,234-Speed 119.36 samples/sec Loss 0.4135 LearningRate 0.0005 Epoch: 18 Global Step: 310010 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:05:55,198-Speed 5214.87 samples/sec Loss 0.4104 LearningRate 0.0005 Epoch: 18 Global Step: 310020 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:05:57,168-Speed 5199.36 samples/sec Loss 0.4457 LearningRate 0.0005 Epoch: 18 Global Step: 310030 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:05:59,143-Speed 5187.39 samples/sec Loss 0.4171 LearningRate 0.0005 Epoch: 18 Global Step: 310040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:06:01,108-Speed 5214.05 samples/sec Loss 0.4436 LearningRate 0.0005 Epoch: 18 Global Step: 310050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:06:03,071-Speed 5216.35 samples/sec Loss 0.4385 LearningRate 0.0005 Epoch: 18 Global Step: 310060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:06:05,035-Speed 5215.67 samples/sec Loss 0.4460 LearningRate 0.0005 Epoch: 18 Global Step: 310070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:06:06,997-Speed 5220.88 samples/sec Loss 0.4444 LearningRate 0.0005 Epoch: 18 Global Step: 310080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:06:08,962-Speed 5213.88 samples/sec Loss 0.4458 LearningRate 0.0005 Epoch: 18 Global Step: 310090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:06:10,934-Speed 5193.98 samples/sec Loss 0.4418 LearningRate 0.0005 Epoch: 18 Global Step: 310100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:06:12,908-Speed 5191.11 samples/sec Loss 0.4719 LearningRate 0.0005 Epoch: 18 Global Step: 310110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:06:14,904-Speed 5131.50 samples/sec Loss 0.4644 LearningRate 0.0005 Epoch: 18 Global Step: 310120 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:06:16,866-Speed 5219.67 samples/sec Loss 0.4247 LearningRate 0.0005 Epoch: 18 Global Step: 310130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:06:18,836-Speed 5200.52 samples/sec Loss 0.4214 LearningRate 0.0005 Epoch: 18 Global Step: 310140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:06:20,803-Speed 5206.52 samples/sec Loss 0.4192 LearningRate 0.0005 Epoch: 18 Global Step: 310150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:06:22,787-Speed 5164.71 samples/sec Loss 0.4345 LearningRate 0.0005 Epoch: 18 Global Step: 310160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:06:24,767-Speed 5171.99 samples/sec Loss 0.4093 LearningRate 0.0005 Epoch: 18 Global Step: 310170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:06:26,752-Speed 5161.37 samples/sec Loss 0.4127 LearningRate 0.0005 Epoch: 18 Global Step: 310180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:06:28,739-Speed 5153.82 samples/sec Loss 0.4504 LearningRate 0.0005 Epoch: 18 Global Step: 310190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:06:30,721-Speed 5169.97 samples/sec Loss 0.4291 LearningRate 0.0005 Epoch: 18 Global Step: 310200 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:06:32,715-Speed 5136.45 samples/sec Loss 0.4289 LearningRate 0.0005 Epoch: 18 Global Step: 310210 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:06:34,687-Speed 5195.70 samples/sec Loss 0.4261 LearningRate 0.0005 Epoch: 18 Global Step: 310220 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:06:36,670-Speed 5164.38 samples/sec Loss 0.4232 LearningRate 0.0005 Epoch: 18 Global Step: 310230 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:06:38,644-Speed 5189.05 samples/sec Loss 0.4460 LearningRate 0.0005 Epoch: 18 Global Step: 310240 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:06:40,614-Speed 5201.48 samples/sec Loss 0.4420 LearningRate 0.0005 Epoch: 18 Global Step: 310250 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:06:42,586-Speed 5193.19 samples/sec Loss 0.4100 LearningRate 0.0005 Epoch: 18 Global Step: 310260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:06:44,560-Speed 5188.68 samples/sec Loss 0.4422 LearningRate 0.0005 Epoch: 18 Global Step: 310270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:06:46,536-Speed 5183.36 samples/sec Loss 0.4812 LearningRate 0.0005 Epoch: 18 Global Step: 310280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:06:48,522-Speed 5158.28 samples/sec Loss 0.4157 LearningRate 0.0005 Epoch: 18 Global Step: 310290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:06:50,520-Speed 5127.14 samples/sec Loss 0.4675 LearningRate 0.0005 Epoch: 18 Global Step: 310300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:06:52,505-Speed 5159.50 samples/sec Loss 0.4284 LearningRate 0.0005 Epoch: 18 Global Step: 310310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:06:54,485-Speed 5175.45 samples/sec Loss 0.3945 LearningRate 0.0005 Epoch: 18 Global Step: 310320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:06:56,453-Speed 5204.03 samples/sec Loss 0.4221 LearningRate 0.0005 Epoch: 18 Global Step: 310330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:06:58,444-Speed 5145.86 samples/sec Loss 0.4187 LearningRate 0.0005 Epoch: 18 Global Step: 310340 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:07:00,417-Speed 5191.42 samples/sec Loss 0.4091 LearningRate 0.0005 Epoch: 18 Global Step: 310350 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:07:02,383-Speed 5211.91 samples/sec Loss 0.4096 LearningRate 0.0005 Epoch: 18 Global Step: 310360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:07:04,359-Speed 5183.13 samples/sec Loss 0.4062 LearningRate 0.0005 Epoch: 18 Global Step: 310370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:07:06,332-Speed 5192.71 samples/sec Loss 0.4216 LearningRate 0.0005 Epoch: 18 Global Step: 310380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:07:08,300-Speed 5204.31 samples/sec Loss 0.4344 LearningRate 0.0005 Epoch: 18 Global Step: 310390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:07:10,273-Speed 5192.44 samples/sec Loss 0.4411 LearningRate 0.0005 Epoch: 18 Global Step: 310400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:07:12,242-Speed 5201.87 samples/sec Loss 0.4245 LearningRate 0.0005 Epoch: 18 Global Step: 310410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:07:14,210-Speed 5204.75 samples/sec Loss 0.4485 LearningRate 0.0005 Epoch: 18 Global Step: 310420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:07:16,185-Speed 5186.94 samples/sec Loss 0.4495 LearningRate 0.0005 Epoch: 18 Global Step: 310430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:07:18,159-Speed 5188.59 samples/sec Loss 0.4142 LearningRate 0.0005 Epoch: 18 Global Step: 310440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:07:20,127-Speed 5205.43 samples/sec Loss 0.4161 LearningRate 0.0005 Epoch: 18 Global Step: 310450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:07:22,100-Speed 5192.77 samples/sec Loss 0.4389 LearningRate 0.0005 Epoch: 18 Global Step: 310460 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:07:24,068-Speed 5205.60 samples/sec Loss 0.4138 LearningRate 0.0005 Epoch: 18 Global Step: 310470 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:07:26,036-Speed 5205.48 samples/sec Loss 0.4338 LearningRate 0.0005 Epoch: 18 Global Step: 310480 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:07:28,010-Speed 5188.46 samples/sec Loss 0.4477 LearningRate 0.0005 Epoch: 18 Global Step: 310490 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:07:29,990-Speed 5174.13 samples/sec Loss 0.4377 LearningRate 0.0005 Epoch: 18 Global Step: 310500 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:07:31,973-Speed 5164.81 samples/sec Loss 0.4464 LearningRate 0.0005 Epoch: 18 Global Step: 310510 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:07:33,932-Speed 5227.84 samples/sec Loss 0.4527 LearningRate 0.0005 Epoch: 18 Global Step: 310520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:07:35,911-Speed 5176.15 samples/sec Loss 0.4359 LearningRate 0.0005 Epoch: 18 Global Step: 310530 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:07:37,897-Speed 5160.08 samples/sec Loss 0.4441 LearningRate 0.0005 Epoch: 18 Global Step: 310540 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:07:39,876-Speed 5174.17 samples/sec Loss 0.4713 LearningRate 0.0005 Epoch: 18 Global Step: 310550 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:07:41,857-Speed 5172.33 samples/sec Loss 0.4672 LearningRate 0.0005 Epoch: 18 Global Step: 310560 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:07:43,823-Speed 5209.38 samples/sec Loss 0.4418 LearningRate 0.0005 Epoch: 18 Global Step: 310570 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:07:45,804-Speed 5169.86 samples/sec Loss 0.4286 LearningRate 0.0005 Epoch: 18 Global Step: 310580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:07:47,774-Speed 5201.81 samples/sec Loss 0.4257 LearningRate 0.0005 Epoch: 18 Global Step: 310590 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:07:49,743-Speed 5203.19 samples/sec Loss 0.4452 LearningRate 0.0005 Epoch: 18 Global Step: 310600 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:07:51,729-Speed 5156.79 samples/sec Loss 0.4102 LearningRate 0.0005 Epoch: 18 Global Step: 310610 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:07:53,693-Speed 5214.12 samples/sec Loss 0.4277 LearningRate 0.0005 Epoch: 18 Global Step: 310620 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:07:55,661-Speed 5204.84 samples/sec Loss 0.4592 LearningRate 0.0005 Epoch: 18 Global Step: 310630 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:07:57,619-Speed 5233.87 samples/sec Loss 0.4303 LearningRate 0.0005 Epoch: 18 Global Step: 310640 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:07:59,585-Speed 5210.19 samples/sec Loss 0.4327 LearningRate 0.0005 Epoch: 18 Global Step: 310650 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:08:01,582-Speed 5129.99 samples/sec Loss 0.4571 LearningRate 0.0005 Epoch: 18 Global Step: 310660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:08:03,573-Speed 5142.17 samples/sec Loss 0.4340 LearningRate 0.0005 Epoch: 18 Global Step: 310670 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:08:05,585-Speed 5092.56 samples/sec Loss 0.4128 LearningRate 0.0005 Epoch: 18 Global Step: 310680 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:08:07,552-Speed 5206.63 samples/sec Loss 0.4616 LearningRate 0.0005 Epoch: 18 Global Step: 310690 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:08:09,532-Speed 5174.57 samples/sec Loss 0.4319 LearningRate 0.0005 Epoch: 18 Global Step: 310700 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:08:11,507-Speed 5188.09 samples/sec Loss 0.4218 LearningRate 0.0005 Epoch: 18 Global Step: 310710 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:08:13,488-Speed 5171.46 samples/sec Loss 0.4322 LearningRate 0.0005 Epoch: 18 Global Step: 310720 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:08:15,456-Speed 5203.31 samples/sec Loss 0.4306 LearningRate 0.0005 Epoch: 18 Global Step: 310730 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:08:17,450-Speed 5138.14 samples/sec Loss 0.4373 LearningRate 0.0005 Epoch: 18 Global Step: 310740 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:08:19,421-Speed 5196.69 samples/sec Loss 0.4433 LearningRate 0.0005 Epoch: 18 Global Step: 310750 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:08:21,417-Speed 5130.39 samples/sec Loss 0.4086 LearningRate 0.0005 Epoch: 18 Global Step: 310760 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:08:23,386-Speed 5203.31 samples/sec Loss 0.4095 LearningRate 0.0005 Epoch: 18 Global Step: 310770 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:08:25,369-Speed 5164.23 samples/sec Loss 0.4411 LearningRate 0.0005 Epoch: 18 Global Step: 310780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:08:27,340-Speed 5199.77 samples/sec Loss 0.4525 LearningRate 0.0005 Epoch: 18 Global Step: 310790 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:08:29,320-Speed 5173.15 samples/sec Loss 0.4627 LearningRate 0.0005 Epoch: 18 Global Step: 310800 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:08:31,314-Speed 5136.89 samples/sec Loss 0.4575 LearningRate 0.0005 Epoch: 18 Global Step: 310810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:08:33,281-Speed 5207.34 samples/sec Loss 0.4353 LearningRate 0.0005 Epoch: 18 Global Step: 310820 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:08:35,261-Speed 5172.73 samples/sec Loss 0.4411 LearningRate 0.0005 Epoch: 18 Global Step: 310830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:08:37,244-Speed 5166.77 samples/sec Loss 0.4232 LearningRate 0.0005 Epoch: 18 Global Step: 310840 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:08:39,214-Speed 5199.93 samples/sec Loss 0.4448 LearningRate 0.0005 Epoch: 18 Global Step: 310850 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:08:41,182-Speed 5205.49 samples/sec Loss 0.4511 LearningRate 0.0005 Epoch: 18 Global Step: 310860 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:08:43,150-Speed 5203.08 samples/sec Loss 0.4177 LearningRate 0.0005 Epoch: 18 Global Step: 310870 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:08:45,132-Speed 5167.32 samples/sec Loss 0.4376 LearningRate 0.0005 Epoch: 18 Global Step: 310880 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:08:47,109-Speed 5182.17 samples/sec Loss 0.4304 LearningRate 0.0005 Epoch: 18 Global Step: 310890 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:08:49,088-Speed 5175.43 samples/sec Loss 0.4124 LearningRate 0.0005 Epoch: 18 Global Step: 310900 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:08:51,082-Speed 5138.31 samples/sec Loss 0.4233 LearningRate 0.0005 Epoch: 18 Global Step: 310910 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:08:53,073-Speed 5145.78 samples/sec Loss 0.4442 LearningRate 0.0005 Epoch: 18 Global Step: 310920 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:08:55,041-Speed 5204.18 samples/sec Loss 0.4370 LearningRate 0.0005 Epoch: 18 Global Step: 310930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:08:57,023-Speed 5169.07 samples/sec Loss 0.4435 LearningRate 0.0005 Epoch: 18 Global Step: 310940 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:08:59,007-Speed 5161.74 samples/sec Loss 0.4412 LearningRate 0.0005 Epoch: 18 Global Step: 310950 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:09:00,980-Speed 5192.10 samples/sec Loss 0.4363 LearningRate 0.0005 Epoch: 18 Global Step: 310960 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:09:02,949-Speed 5203.92 samples/sec Loss 0.4164 LearningRate 0.0005 Epoch: 18 Global Step: 310970 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:09:04,917-Speed 5205.07 samples/sec Loss 0.4273 LearningRate 0.0005 Epoch: 18 Global Step: 310980 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:09:06,885-Speed 5202.73 samples/sec Loss 0.4406 LearningRate 0.0005 Epoch: 18 Global Step: 310990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:09:08,865-Speed 5173.33 samples/sec Loss 0.4173 LearningRate 0.0005 Epoch: 18 Global Step: 311000 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:09:10,853-Speed 5154.81 samples/sec Loss 0.4357 LearningRate 0.0005 Epoch: 18 Global Step: 311010 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:09:12,836-Speed 5164.11 samples/sec Loss 0.4244 LearningRate 0.0005 Epoch: 18 Global Step: 311020 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:09:14,818-Speed 5168.67 samples/sec Loss 0.4247 LearningRate 0.0005 Epoch: 18 Global Step: 311030 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:09:16,797-Speed 5174.55 samples/sec Loss 0.4432 LearningRate 0.0005 Epoch: 18 Global Step: 311040 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:09:18,767-Speed 5201.82 samples/sec Loss 0.4518 LearningRate 0.0005 Epoch: 18 Global Step: 311050 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:09:20,739-Speed 5194.78 samples/sec Loss 0.4519 LearningRate 0.0005 Epoch: 18 Global Step: 311060 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:09:22,709-Speed 5199.32 samples/sec Loss 0.4245 LearningRate 0.0005 Epoch: 18 Global Step: 311070 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:09:24,680-Speed 5197.65 samples/sec Loss 0.4233 LearningRate 0.0005 Epoch: 18 Global Step: 311080 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:09:26,645-Speed 5213.47 samples/sec Loss 0.4615 LearningRate 0.0005 Epoch: 18 Global Step: 311090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:09:28,625-Speed 5172.01 samples/sec Loss 0.4494 LearningRate 0.0005 Epoch: 18 Global Step: 311100 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:09:30,597-Speed 5196.17 samples/sec Loss 0.4362 LearningRate 0.0005 Epoch: 18 Global Step: 311110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:09:32,566-Speed 5201.00 samples/sec Loss 0.4505 LearningRate 0.0005 Epoch: 18 Global Step: 311120 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:09:34,538-Speed 5195.69 samples/sec Loss 0.4387 LearningRate 0.0005 Epoch: 18 Global Step: 311130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:09:36,524-Speed 5157.81 samples/sec Loss 0.4408 LearningRate 0.0005 Epoch: 18 Global Step: 311140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:09:38,496-Speed 5194.36 samples/sec Loss 0.4406 LearningRate 0.0005 Epoch: 18 Global Step: 311150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:09:40,468-Speed 5194.66 samples/sec Loss 0.4147 LearningRate 0.0005 Epoch: 18 Global Step: 311160 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:09:42,438-Speed 5200.28 samples/sec Loss 0.4354 LearningRate 0.0005 Epoch: 18 Global Step: 311170 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:09:44,411-Speed 5192.45 samples/sec Loss 0.4316 LearningRate 0.0005 Epoch: 18 Global Step: 311180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:09:46,405-Speed 5135.66 samples/sec Loss 0.4423 LearningRate 0.0005 Epoch: 18 Global Step: 311190 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:09:48,391-Speed 5158.73 samples/sec Loss 0.4406 LearningRate 0.0005 Epoch: 18 Global Step: 311200 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:09:50,366-Speed 5186.02 samples/sec Loss 0.4246 LearningRate 0.0005 Epoch: 18 Global Step: 311210 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:09:52,354-Speed 5153.22 samples/sec Loss 0.4467 LearningRate 0.0005 Epoch: 18 Global Step: 311220 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:09:54,323-Speed 5201.73 samples/sec Loss 0.4250 LearningRate 0.0005 Epoch: 18 Global Step: 311230 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:09:56,289-Speed 5211.30 samples/sec Loss 0.4263 LearningRate 0.0005 Epoch: 18 Global Step: 311240 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:09:58,264-Speed 5186.67 samples/sec Loss 0.4320 LearningRate 0.0005 Epoch: 18 Global Step: 311250 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:10:00,226-Speed 5219.89 samples/sec Loss 0.4336 LearningRate 0.0005 Epoch: 18 Global Step: 311260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:10:02,195-Speed 5202.84 samples/sec Loss 0.4356 LearningRate 0.0005 Epoch: 18 Global Step: 311270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:10:04,172-Speed 5181.04 samples/sec Loss 0.4246 LearningRate 0.0005 Epoch: 18 Global Step: 311280 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:10:06,162-Speed 5148.02 samples/sec Loss 0.4272 LearningRate 0.0005 Epoch: 18 Global Step: 311290 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:10:08,138-Speed 5184.30 samples/sec Loss 0.4118 LearningRate 0.0005 Epoch: 18 Global Step: 311300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:10:10,108-Speed 5201.13 samples/sec Loss 0.4189 LearningRate 0.0005 Epoch: 18 Global Step: 311310 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:10:12,087-Speed 5174.87 samples/sec Loss 0.3997 LearningRate 0.0005 Epoch: 18 Global Step: 311320 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:10:14,069-Speed 5168.18 samples/sec Loss 0.4151 LearningRate 0.0005 Epoch: 18 Global Step: 311330 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:10:16,045-Speed 5183.61 samples/sec Loss 0.4183 LearningRate 0.0005 Epoch: 18 Global Step: 311340 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:10:18,018-Speed 5191.57 samples/sec Loss 0.4380 LearningRate 0.0005 Epoch: 18 Global Step: 311350 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:10:19,991-Speed 5191.88 samples/sec Loss 0.4417 LearningRate 0.0005 Epoch: 18 Global Step: 311360 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:10:21,959-Speed 5206.16 samples/sec Loss 0.4279 LearningRate 0.0005 Epoch: 18 Global Step: 311370 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:10:23,949-Speed 5146.76 samples/sec Loss 0.4160 LearningRate 0.0005 Epoch: 18 Global Step: 311380 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:10:25,922-Speed 5192.18 samples/sec Loss 0.4437 LearningRate 0.0005 Epoch: 18 Global Step: 311390 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:10:27,908-Speed 5159.03 samples/sec Loss 0.4509 LearningRate 0.0005 Epoch: 18 Global Step: 311400 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:10:29,895-Speed 5153.52 samples/sec Loss 0.4347 LearningRate 0.0005 Epoch: 18 Global Step: 311410 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:10:31,872-Speed 5181.32 samples/sec Loss 0.4475 LearningRate 0.0005 Epoch: 18 Global Step: 311420 Fp16 Grad Scale: 131072 Required: 2 hours Training: 2022-04-11 20:10:33,838-Speed 5211.82 samples/sec Loss 0.4341 LearningRate 0.0004 Epoch: 18 Global Step: 311430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:10:35,820-Speed 5166.75 samples/sec Loss 0.4269 LearningRate 0.0004 Epoch: 18 Global Step: 311440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:10:37,816-Speed 5132.99 samples/sec Loss 0.4397 LearningRate 0.0004 Epoch: 18 Global Step: 311450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:10:39,799-Speed 5165.85 samples/sec Loss 0.4585 LearningRate 0.0004 Epoch: 18 Global Step: 311460 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:10:41,811-Speed 5089.56 samples/sec Loss 0.4306 LearningRate 0.0004 Epoch: 18 Global Step: 311470 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:10:43,777-Speed 5210.53 samples/sec Loss 0.4161 LearningRate 0.0004 Epoch: 18 Global Step: 311480 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:10:45,753-Speed 5183.30 samples/sec Loss 0.4315 LearningRate 0.0004 Epoch: 18 Global Step: 311490 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-11 20:10:47,741-Speed 5153.54 samples/sec Loss 0.4296 LearningRate 0.0004 Epoch: 18 Global Step: 311500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:10:49,715-Speed 5189.98 samples/sec Loss 0.4435 LearningRate 0.0004 Epoch: 18 Global Step: 311510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:10:51,715-Speed 5121.15 samples/sec Loss 0.4273 LearningRate 0.0004 Epoch: 18 Global Step: 311520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:10:53,729-Speed 5085.81 samples/sec Loss 0.4121 LearningRate 0.0004 Epoch: 18 Global Step: 311530 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:10:55,722-Speed 5140.09 samples/sec Loss 0.4583 LearningRate 0.0004 Epoch: 18 Global Step: 311540 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:10:57,690-Speed 5207.29 samples/sec Loss 0.4357 LearningRate 0.0004 Epoch: 18 Global Step: 311550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:10:59,679-Speed 5148.80 samples/sec Loss 0.4736 LearningRate 0.0004 Epoch: 18 Global Step: 311560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:11:01,655-Speed 5184.51 samples/sec Loss 0.4438 LearningRate 0.0004 Epoch: 18 Global Step: 311570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:11:03,627-Speed 5193.24 samples/sec Loss 0.4415 LearningRate 0.0004 Epoch: 18 Global Step: 311580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:11:05,602-Speed 5186.25 samples/sec Loss 0.4390 LearningRate 0.0004 Epoch: 18 Global Step: 311590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:11:07,579-Speed 5183.04 samples/sec Loss 0.4210 LearningRate 0.0004 Epoch: 18 Global Step: 311600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:11:09,556-Speed 5179.43 samples/sec Loss 0.3920 LearningRate 0.0004 Epoch: 18 Global Step: 311610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:11:11,574-Speed 5077.35 samples/sec Loss 0.4486 LearningRate 0.0004 Epoch: 18 Global Step: 311620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:11:13,555-Speed 5169.13 samples/sec Loss 0.4256 LearningRate 0.0004 Epoch: 18 Global Step: 311630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:11:15,550-Speed 5136.01 samples/sec Loss 0.4315 LearningRate 0.0004 Epoch: 18 Global Step: 311640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:11:17,537-Speed 5155.23 samples/sec Loss 0.4132 LearningRate 0.0004 Epoch: 18 Global Step: 311650 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:11:19,520-Speed 5165.52 samples/sec Loss 0.4555 LearningRate 0.0004 Epoch: 18 Global Step: 311660 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:11:21,517-Speed 5130.56 samples/sec Loss 0.4073 LearningRate 0.0004 Epoch: 18 Global Step: 311670 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:11:23,500-Speed 5164.32 samples/sec Loss 0.4283 LearningRate 0.0004 Epoch: 18 Global Step: 311680 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:11:25,474-Speed 5190.38 samples/sec Loss 0.4399 LearningRate 0.0004 Epoch: 18 Global Step: 311690 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:11:27,476-Speed 5118.06 samples/sec Loss 0.4541 LearningRate 0.0004 Epoch: 18 Global Step: 311700 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:11:29,467-Speed 5142.56 samples/sec Loss 0.4197 LearningRate 0.0004 Epoch: 18 Global Step: 311710 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:11:31,435-Speed 5207.02 samples/sec Loss 0.4224 LearningRate 0.0004 Epoch: 18 Global Step: 311720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:11:33,413-Speed 5178.37 samples/sec Loss 0.4289 LearningRate 0.0004 Epoch: 18 Global Step: 311730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:11:35,390-Speed 5183.12 samples/sec Loss 0.4322 LearningRate 0.0004 Epoch: 18 Global Step: 311740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:11:37,381-Speed 5143.58 samples/sec Loss 0.4597 LearningRate 0.0004 Epoch: 18 Global Step: 311750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:11:39,365-Speed 5162.50 samples/sec Loss 0.4311 LearningRate 0.0004 Epoch: 18 Global Step: 311760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:11:41,343-Speed 5177.39 samples/sec Loss 0.4298 LearningRate 0.0004 Epoch: 18 Global Step: 311770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:11:43,314-Speed 5196.91 samples/sec Loss 0.4178 LearningRate 0.0004 Epoch: 18 Global Step: 311780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:11:45,297-Speed 5165.74 samples/sec Loss 0.4198 LearningRate 0.0004 Epoch: 18 Global Step: 311790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:11:47,297-Speed 5122.01 samples/sec Loss 0.4706 LearningRate 0.0004 Epoch: 18 Global Step: 311800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:11:49,271-Speed 5191.60 samples/sec Loss 0.4236 LearningRate 0.0004 Epoch: 18 Global Step: 311810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:11:51,253-Speed 5166.27 samples/sec Loss 0.4509 LearningRate 0.0004 Epoch: 18 Global Step: 311820 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:11:53,233-Speed 5174.81 samples/sec Loss 0.4453 LearningRate 0.0004 Epoch: 18 Global Step: 311830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:11:55,206-Speed 5190.81 samples/sec Loss 0.4179 LearningRate 0.0004 Epoch: 18 Global Step: 311840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:11:57,192-Speed 5157.56 samples/sec Loss 0.4238 LearningRate 0.0004 Epoch: 18 Global Step: 311850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:11:59,199-Speed 5106.13 samples/sec Loss 0.4454 LearningRate 0.0004 Epoch: 18 Global Step: 311860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:12:01,195-Speed 5129.92 samples/sec Loss 0.4247 LearningRate 0.0004 Epoch: 18 Global Step: 311870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:12:03,212-Speed 5079.14 samples/sec Loss 0.4129 LearningRate 0.0004 Epoch: 18 Global Step: 311880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:12:05,204-Speed 5144.36 samples/sec Loss 0.4395 LearningRate 0.0004 Epoch: 18 Global Step: 311890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:12:07,179-Speed 5185.27 samples/sec Loss 0.4217 LearningRate 0.0004 Epoch: 18 Global Step: 311900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:12:09,185-Speed 5107.75 samples/sec Loss 0.4213 LearningRate 0.0004 Epoch: 18 Global Step: 311910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:12:11,178-Speed 5138.02 samples/sec Loss 0.4491 LearningRate 0.0004 Epoch: 18 Global Step: 311920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:12:13,176-Speed 5126.87 samples/sec Loss 0.4257 LearningRate 0.0004 Epoch: 18 Global Step: 311930 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:12:15,147-Speed 5199.14 samples/sec Loss 0.4331 LearningRate 0.0004 Epoch: 18 Global Step: 311940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:12:17,124-Speed 5179.17 samples/sec Loss 0.4452 LearningRate 0.0004 Epoch: 18 Global Step: 311950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:12:19,139-Speed 5083.72 samples/sec Loss 0.4560 LearningRate 0.0004 Epoch: 18 Global Step: 311960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:12:21,144-Speed 5110.83 samples/sec Loss 0.4274 LearningRate 0.0004 Epoch: 18 Global Step: 311970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:12:23,144-Speed 5120.49 samples/sec Loss 0.4158 LearningRate 0.0004 Epoch: 18 Global Step: 311980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:12:25,136-Speed 5143.45 samples/sec Loss 0.4217 LearningRate 0.0004 Epoch: 18 Global Step: 311990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:12:27,125-Speed 5149.95 samples/sec Loss 0.4282 LearningRate 0.0004 Epoch: 18 Global Step: 312000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:12:53,779-[lfw][312000]XNorm: 21.674008 Training: 2022-04-11 20:12:53,780-[lfw][312000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 20:12:53,780-[lfw][312000]Accuracy-Highest: 0.99833 Training: 2022-04-11 20:13:24,637-[cfp_fp][312000]XNorm: 22.150954 Training: 2022-04-11 20:13:24,637-[cfp_fp][312000]Accuracy-Flip: 0.99014+-0.00401 Training: 2022-04-11 20:13:24,638-[cfp_fp][312000]Accuracy-Highest: 0.99029 Training: 2022-04-11 20:13:51,189-[agedb_30][312000]XNorm: 22.794081 Training: 2022-04-11 20:13:51,190-[agedb_30][312000]Accuracy-Flip: 0.98367+-0.00690 Training: 2022-04-11 20:13:51,190-[agedb_30][312000]Accuracy-Highest: 0.98383 Training: 2022-04-11 20:13:53,177-Speed 119.00 samples/sec Loss 0.4289 LearningRate 0.0004 Epoch: 18 Global Step: 312010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:13:55,138-Speed 5223.92 samples/sec Loss 0.4377 LearningRate 0.0004 Epoch: 18 Global Step: 312020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:13:57,101-Speed 5217.52 samples/sec Loss 0.4222 LearningRate 0.0004 Epoch: 18 Global Step: 312030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:13:59,094-Speed 5140.45 samples/sec Loss 0.4193 LearningRate 0.0004 Epoch: 18 Global Step: 312040 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:14:01,079-Speed 5161.64 samples/sec Loss 0.4446 LearningRate 0.0004 Epoch: 18 Global Step: 312050 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:14:03,049-Speed 5198.72 samples/sec Loss 0.4214 LearningRate 0.0004 Epoch: 18 Global Step: 312060 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:14:05,036-Speed 5155.56 samples/sec Loss 0.4323 LearningRate 0.0004 Epoch: 18 Global Step: 312070 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:14:06,997-Speed 5222.07 samples/sec Loss 0.4347 LearningRate 0.0004 Epoch: 18 Global Step: 312080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:14:08,962-Speed 5213.57 samples/sec Loss 0.4438 LearningRate 0.0004 Epoch: 18 Global Step: 312090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:14:10,932-Speed 5200.28 samples/sec Loss 0.4320 LearningRate 0.0004 Epoch: 18 Global Step: 312100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:14:12,913-Speed 5169.97 samples/sec Loss 0.4342 LearningRate 0.0004 Epoch: 18 Global Step: 312110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:14:14,879-Speed 5209.12 samples/sec Loss 0.4213 LearningRate 0.0004 Epoch: 18 Global Step: 312120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:14:16,847-Speed 5206.98 samples/sec Loss 0.4019 LearningRate 0.0004 Epoch: 18 Global Step: 312130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:14:18,822-Speed 5187.33 samples/sec Loss 0.4270 LearningRate 0.0004 Epoch: 18 Global Step: 312140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:14:20,788-Speed 5210.79 samples/sec Loss 0.3961 LearningRate 0.0004 Epoch: 18 Global Step: 312150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:14:22,759-Speed 5197.22 samples/sec Loss 0.4557 LearningRate 0.0004 Epoch: 18 Global Step: 312160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:14:24,739-Speed 5171.72 samples/sec Loss 0.4140 LearningRate 0.0004 Epoch: 18 Global Step: 312170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:14:26,722-Speed 5165.63 samples/sec Loss 0.4352 LearningRate 0.0004 Epoch: 18 Global Step: 312180 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:14:28,706-Speed 5164.00 samples/sec Loss 0.4429 LearningRate 0.0004 Epoch: 18 Global Step: 312190 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:14:30,679-Speed 5190.77 samples/sec Loss 0.4387 LearningRate 0.0004 Epoch: 18 Global Step: 312200 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:14:32,657-Speed 5179.14 samples/sec Loss 0.4350 LearningRate 0.0004 Epoch: 18 Global Step: 312210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:14:34,630-Speed 5192.21 samples/sec Loss 0.4372 LearningRate 0.0004 Epoch: 18 Global Step: 312220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:14:36,614-Speed 5164.30 samples/sec Loss 0.4297 LearningRate 0.0004 Epoch: 18 Global Step: 312230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:14:38,620-Speed 5105.35 samples/sec Loss 0.4164 LearningRate 0.0004 Epoch: 18 Global Step: 312240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:14:40,594-Speed 5191.15 samples/sec Loss 0.4266 LearningRate 0.0004 Epoch: 18 Global Step: 312250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:14:42,567-Speed 5190.87 samples/sec Loss 0.4271 LearningRate 0.0004 Epoch: 18 Global Step: 312260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:14:44,539-Speed 5195.20 samples/sec Loss 0.4336 LearningRate 0.0004 Epoch: 18 Global Step: 312270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:14:46,517-Speed 5177.65 samples/sec Loss 0.4167 LearningRate 0.0004 Epoch: 18 Global Step: 312280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:14:48,500-Speed 5165.13 samples/sec Loss 0.4193 LearningRate 0.0004 Epoch: 18 Global Step: 312290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:14:50,475-Speed 5185.16 samples/sec Loss 0.4388 LearningRate 0.0004 Epoch: 18 Global Step: 312300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:14:52,454-Speed 5176.74 samples/sec Loss 0.4218 LearningRate 0.0004 Epoch: 18 Global Step: 312310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:14:54,440-Speed 5157.27 samples/sec Loss 0.4284 LearningRate 0.0004 Epoch: 18 Global Step: 312320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:14:56,443-Speed 5115.91 samples/sec Loss 0.4301 LearningRate 0.0004 Epoch: 18 Global Step: 312330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:14:58,413-Speed 5199.67 samples/sec Loss 0.4619 LearningRate 0.0004 Epoch: 18 Global Step: 312340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:15:00,435-Speed 5063.95 samples/sec Loss 0.4346 LearningRate 0.0004 Epoch: 18 Global Step: 312350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:15:02,410-Speed 5188.31 samples/sec Loss 0.4416 LearningRate 0.0004 Epoch: 18 Global Step: 312360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:15:04,378-Speed 5204.59 samples/sec Loss 0.4357 LearningRate 0.0004 Epoch: 18 Global Step: 312370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:15:06,355-Speed 5183.02 samples/sec Loss 0.4167 LearningRate 0.0004 Epoch: 18 Global Step: 312380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:15:08,328-Speed 5190.44 samples/sec Loss 0.4319 LearningRate 0.0004 Epoch: 18 Global Step: 312390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:15:10,303-Speed 5187.22 samples/sec Loss 0.4472 LearningRate 0.0004 Epoch: 18 Global Step: 312400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:15:12,281-Speed 5178.03 samples/sec Loss 0.4186 LearningRate 0.0004 Epoch: 18 Global Step: 312410 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:15:14,253-Speed 5193.68 samples/sec Loss 0.4292 LearningRate 0.0004 Epoch: 18 Global Step: 312420 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:15:16,223-Speed 5205.28 samples/sec Loss 0.4402 LearningRate 0.0004 Epoch: 18 Global Step: 312430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:15:18,190-Speed 5206.66 samples/sec Loss 0.4193 LearningRate 0.0004 Epoch: 18 Global Step: 312440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:15:20,177-Speed 5155.51 samples/sec Loss 0.4193 LearningRate 0.0004 Epoch: 18 Global Step: 312450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:15:22,148-Speed 5196.45 samples/sec Loss 0.4228 LearningRate 0.0004 Epoch: 18 Global Step: 312460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:15:24,113-Speed 5211.93 samples/sec Loss 0.4346 LearningRate 0.0004 Epoch: 18 Global Step: 312470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:15:26,086-Speed 5191.26 samples/sec Loss 0.4432 LearningRate 0.0004 Epoch: 18 Global Step: 312480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:15:28,074-Speed 5153.25 samples/sec Loss 0.4380 LearningRate 0.0004 Epoch: 18 Global Step: 312490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:15:30,044-Speed 5200.71 samples/sec Loss 0.4273 LearningRate 0.0004 Epoch: 18 Global Step: 312500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:15:32,009-Speed 5212.30 samples/sec Loss 0.4253 LearningRate 0.0004 Epoch: 18 Global Step: 312510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:15:33,979-Speed 5201.81 samples/sec Loss 0.4443 LearningRate 0.0004 Epoch: 18 Global Step: 312520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:15:35,946-Speed 5207.68 samples/sec Loss 0.4382 LearningRate 0.0004 Epoch: 18 Global Step: 312530 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:15:37,949-Speed 5116.49 samples/sec Loss 0.4492 LearningRate 0.0004 Epoch: 18 Global Step: 312540 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:15:39,940-Speed 5142.85 samples/sec Loss 0.4573 LearningRate 0.0004 Epoch: 18 Global Step: 312550 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:15:41,907-Speed 5207.42 samples/sec Loss 0.4424 LearningRate 0.0004 Epoch: 18 Global Step: 312560 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:15:43,905-Speed 5128.39 samples/sec Loss 0.4243 LearningRate 0.0004 Epoch: 18 Global Step: 312570 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:15:45,879-Speed 5187.99 samples/sec Loss 0.4343 LearningRate 0.0004 Epoch: 18 Global Step: 312580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:15:47,868-Speed 5150.51 samples/sec Loss 0.4208 LearningRate 0.0004 Epoch: 18 Global Step: 312590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:15:49,851-Speed 5167.27 samples/sec Loss 0.4270 LearningRate 0.0004 Epoch: 18 Global Step: 312600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:15:51,838-Speed 5153.48 samples/sec Loss 0.4348 LearningRate 0.0004 Epoch: 18 Global Step: 312610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:15:53,808-Speed 5201.33 samples/sec Loss 0.4220 LearningRate 0.0004 Epoch: 18 Global Step: 312620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:15:55,772-Speed 5213.58 samples/sec Loss 0.4437 LearningRate 0.0004 Epoch: 18 Global Step: 312630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:15:57,741-Speed 5202.93 samples/sec Loss 0.4219 LearningRate 0.0004 Epoch: 18 Global Step: 312640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:15:59,722-Speed 5171.32 samples/sec Loss 0.3977 LearningRate 0.0004 Epoch: 18 Global Step: 312650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:16:01,688-Speed 5210.08 samples/sec Loss 0.4350 LearningRate 0.0004 Epoch: 18 Global Step: 312660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:16:03,658-Speed 5200.54 samples/sec Loss 0.4228 LearningRate 0.0004 Epoch: 18 Global Step: 312670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:16:05,652-Speed 5136.12 samples/sec Loss 0.4326 LearningRate 0.0004 Epoch: 18 Global Step: 312680 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:16:07,634-Speed 5167.44 samples/sec Loss 0.4292 LearningRate 0.0004 Epoch: 18 Global Step: 312690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:16:09,637-Speed 5116.52 samples/sec Loss 0.4287 LearningRate 0.0004 Epoch: 18 Global Step: 312700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:16:11,649-Speed 5089.66 samples/sec Loss 0.4495 LearningRate 0.0004 Epoch: 18 Global Step: 312710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:16:13,617-Speed 5206.44 samples/sec Loss 0.4658 LearningRate 0.0004 Epoch: 18 Global Step: 312720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:16:15,583-Speed 5209.55 samples/sec Loss 0.4412 LearningRate 0.0004 Epoch: 18 Global Step: 312730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:16:17,547-Speed 5215.10 samples/sec Loss 0.4359 LearningRate 0.0004 Epoch: 18 Global Step: 312740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:16:19,517-Speed 5201.38 samples/sec Loss 0.4279 LearningRate 0.0004 Epoch: 18 Global Step: 312750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:16:21,482-Speed 5211.62 samples/sec Loss 0.4374 LearningRate 0.0004 Epoch: 18 Global Step: 312760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:16:23,471-Speed 5150.71 samples/sec Loss 0.4253 LearningRate 0.0004 Epoch: 18 Global Step: 312770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:16:25,455-Speed 5162.66 samples/sec Loss 0.4377 LearningRate 0.0004 Epoch: 18 Global Step: 312780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:16:27,425-Speed 5198.01 samples/sec Loss 0.4212 LearningRate 0.0004 Epoch: 18 Global Step: 312790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:16:29,393-Speed 5205.50 samples/sec Loss 0.4217 LearningRate 0.0004 Epoch: 18 Global Step: 312800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:16:31,360-Speed 5208.66 samples/sec Loss 0.4522 LearningRate 0.0004 Epoch: 18 Global Step: 312810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:16:33,338-Speed 5179.80 samples/sec Loss 0.4506 LearningRate 0.0004 Epoch: 18 Global Step: 312820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:16:35,305-Speed 5207.22 samples/sec Loss 0.4417 LearningRate 0.0004 Epoch: 18 Global Step: 312830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:16:37,269-Speed 5214.78 samples/sec Loss 0.4095 LearningRate 0.0004 Epoch: 18 Global Step: 312840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:16:39,235-Speed 5209.57 samples/sec Loss 0.4361 LearningRate 0.0004 Epoch: 18 Global Step: 312850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:16:41,203-Speed 5207.17 samples/sec Loss 0.4143 LearningRate 0.0004 Epoch: 18 Global Step: 312860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:16:43,173-Speed 5198.12 samples/sec Loss 0.4345 LearningRate 0.0004 Epoch: 18 Global Step: 312870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:16:45,147-Speed 5189.42 samples/sec Loss 0.4278 LearningRate 0.0004 Epoch: 18 Global Step: 312880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:16:47,114-Speed 5207.97 samples/sec Loss 0.4320 LearningRate 0.0004 Epoch: 18 Global Step: 312890 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:16:49,079-Speed 5211.82 samples/sec Loss 0.4306 LearningRate 0.0004 Epoch: 18 Global Step: 312900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:16:51,069-Speed 5148.86 samples/sec Loss 0.4209 LearningRate 0.0004 Epoch: 18 Global Step: 312910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:16:53,057-Speed 5152.66 samples/sec Loss 0.4452 LearningRate 0.0004 Epoch: 18 Global Step: 312920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:16:55,041-Speed 5162.05 samples/sec Loss 0.4417 LearningRate 0.0004 Epoch: 18 Global Step: 312930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:16:57,021-Speed 5175.15 samples/sec Loss 0.4376 LearningRate 0.0004 Epoch: 18 Global Step: 312940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:16:59,041-Speed 5069.90 samples/sec Loss 0.4331 LearningRate 0.0004 Epoch: 18 Global Step: 312950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:17:01,029-Speed 5154.68 samples/sec Loss 0.4209 LearningRate 0.0004 Epoch: 18 Global Step: 312960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:17:03,021-Speed 5140.77 samples/sec Loss 0.4488 LearningRate 0.0004 Epoch: 18 Global Step: 312970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:17:04,994-Speed 5190.81 samples/sec Loss 0.4343 LearningRate 0.0004 Epoch: 18 Global Step: 312980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:17:06,962-Speed 5205.56 samples/sec Loss 0.4297 LearningRate 0.0004 Epoch: 18 Global Step: 312990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:17:08,942-Speed 5174.21 samples/sec Loss 0.4351 LearningRate 0.0004 Epoch: 18 Global Step: 313000 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:17:10,909-Speed 5206.83 samples/sec Loss 0.4321 LearningRate 0.0004 Epoch: 18 Global Step: 313010 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:17:12,899-Speed 5147.74 samples/sec Loss 0.4281 LearningRate 0.0004 Epoch: 18 Global Step: 313020 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:17:14,877-Speed 5177.46 samples/sec Loss 0.4330 LearningRate 0.0004 Epoch: 18 Global Step: 313030 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:17:16,857-Speed 5173.38 samples/sec Loss 0.4314 LearningRate 0.0004 Epoch: 18 Global Step: 313040 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:17:18,829-Speed 5195.86 samples/sec Loss 0.4333 LearningRate 0.0004 Epoch: 18 Global Step: 313050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:17:20,821-Speed 5143.26 samples/sec Loss 0.4496 LearningRate 0.0004 Epoch: 18 Global Step: 313060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:17:22,795-Speed 5188.94 samples/sec Loss 0.4049 LearningRate 0.0004 Epoch: 18 Global Step: 313070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:17:24,788-Speed 5140.00 samples/sec Loss 0.4270 LearningRate 0.0004 Epoch: 18 Global Step: 313080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:17:26,767-Speed 5173.97 samples/sec Loss 0.4539 LearningRate 0.0004 Epoch: 18 Global Step: 313090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:17:28,755-Speed 5153.15 samples/sec Loss 0.4474 LearningRate 0.0004 Epoch: 18 Global Step: 313100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:17:30,727-Speed 5195.80 samples/sec Loss 0.4053 LearningRate 0.0004 Epoch: 18 Global Step: 313110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:17:32,711-Speed 5161.48 samples/sec Loss 0.4415 LearningRate 0.0004 Epoch: 18 Global Step: 313120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:17:34,694-Speed 5167.37 samples/sec Loss 0.4323 LearningRate 0.0004 Epoch: 18 Global Step: 313130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:17:36,667-Speed 5191.97 samples/sec Loss 0.4102 LearningRate 0.0004 Epoch: 18 Global Step: 313140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:17:38,647-Speed 5171.34 samples/sec Loss 0.4333 LearningRate 0.0004 Epoch: 18 Global Step: 313150 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:17:40,620-Speed 5193.04 samples/sec Loss 0.4426 LearningRate 0.0004 Epoch: 18 Global Step: 313160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:17:42,598-Speed 5177.61 samples/sec Loss 0.4013 LearningRate 0.0004 Epoch: 18 Global Step: 313170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:17:44,578-Speed 5173.86 samples/sec Loss 0.4168 LearningRate 0.0004 Epoch: 18 Global Step: 313180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:17:46,562-Speed 5164.31 samples/sec Loss 0.4213 LearningRate 0.0004 Epoch: 18 Global Step: 313190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:17:48,536-Speed 5189.18 samples/sec Loss 0.4329 LearningRate 0.0004 Epoch: 18 Global Step: 313200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:17:50,510-Speed 5189.43 samples/sec Loss 0.4245 LearningRate 0.0004 Epoch: 18 Global Step: 313210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:17:52,482-Speed 5194.33 samples/sec Loss 0.4301 LearningRate 0.0004 Epoch: 18 Global Step: 313220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:17:54,454-Speed 5193.84 samples/sec Loss 0.4390 LearningRate 0.0004 Epoch: 18 Global Step: 313230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:17:56,423-Speed 5200.99 samples/sec Loss 0.4298 LearningRate 0.0004 Epoch: 18 Global Step: 313240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:17:58,397-Speed 5188.94 samples/sec Loss 0.4531 LearningRate 0.0004 Epoch: 18 Global Step: 313250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:18:00,368-Speed 5197.52 samples/sec Loss 0.4470 LearningRate 0.0004 Epoch: 18 Global Step: 313260 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:18:02,339-Speed 5198.64 samples/sec Loss 0.4149 LearningRate 0.0004 Epoch: 18 Global Step: 313270 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:18:04,310-Speed 5195.88 samples/sec Loss 0.4271 LearningRate 0.0004 Epoch: 18 Global Step: 313280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:18:06,281-Speed 5196.98 samples/sec Loss 0.4425 LearningRate 0.0004 Epoch: 18 Global Step: 313290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:18:08,254-Speed 5192.51 samples/sec Loss 0.4336 LearningRate 0.0004 Epoch: 18 Global Step: 313300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:18:10,240-Speed 5157.75 samples/sec Loss 0.4386 LearningRate 0.0004 Epoch: 18 Global Step: 313310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:18:12,213-Speed 5192.72 samples/sec Loss 0.4165 LearningRate 0.0004 Epoch: 18 Global Step: 313320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:18:14,196-Speed 5164.50 samples/sec Loss 0.4217 LearningRate 0.0004 Epoch: 18 Global Step: 313330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:18:16,181-Speed 5163.04 samples/sec Loss 0.4430 LearningRate 0.0004 Epoch: 18 Global Step: 313340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:18:18,168-Speed 5152.90 samples/sec Loss 0.4245 LearningRate 0.0004 Epoch: 18 Global Step: 313350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:18:20,142-Speed 5189.55 samples/sec Loss 0.4367 LearningRate 0.0004 Epoch: 18 Global Step: 313360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:18:22,116-Speed 5188.97 samples/sec Loss 0.4237 LearningRate 0.0004 Epoch: 18 Global Step: 313370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:18:24,102-Speed 5156.63 samples/sec Loss 0.4393 LearningRate 0.0004 Epoch: 18 Global Step: 313380 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:18:26,098-Speed 5132.69 samples/sec Loss 0.4244 LearningRate 0.0004 Epoch: 18 Global Step: 313390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:18:28,079-Speed 5170.34 samples/sec Loss 0.4334 LearningRate 0.0004 Epoch: 18 Global Step: 313400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:18:30,068-Speed 5152.52 samples/sec Loss 0.4450 LearningRate 0.0004 Epoch: 18 Global Step: 313410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:18:32,038-Speed 5198.35 samples/sec Loss 0.4266 LearningRate 0.0004 Epoch: 18 Global Step: 313420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:18:34,011-Speed 5193.07 samples/sec Loss 0.4183 LearningRate 0.0004 Epoch: 18 Global Step: 313430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:18:35,996-Speed 5159.21 samples/sec Loss 0.4314 LearningRate 0.0004 Epoch: 18 Global Step: 313440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:18:37,987-Speed 5149.21 samples/sec Loss 0.4404 LearningRate 0.0004 Epoch: 18 Global Step: 313450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:18:40,003-Speed 5080.63 samples/sec Loss 0.4220 LearningRate 0.0004 Epoch: 18 Global Step: 313460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:18:41,977-Speed 5188.23 samples/sec Loss 0.4212 LearningRate 0.0004 Epoch: 18 Global Step: 313470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:18:43,946-Speed 5201.82 samples/sec Loss 0.4379 LearningRate 0.0004 Epoch: 18 Global Step: 313480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:18:45,917-Speed 5197.04 samples/sec Loss 0.4328 LearningRate 0.0004 Epoch: 18 Global Step: 313490 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:18:47,884-Speed 5209.71 samples/sec Loss 0.4198 LearningRate 0.0004 Epoch: 18 Global Step: 313500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:18:49,868-Speed 5162.60 samples/sec Loss 0.4330 LearningRate 0.0004 Epoch: 18 Global Step: 313510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:18:51,837-Speed 5201.92 samples/sec Loss 0.3992 LearningRate 0.0004 Epoch: 18 Global Step: 313520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:18:53,835-Speed 5127.05 samples/sec Loss 0.4079 LearningRate 0.0004 Epoch: 18 Global Step: 313530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:18:55,814-Speed 5174.90 samples/sec Loss 0.4347 LearningRate 0.0004 Epoch: 18 Global Step: 313540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:18:57,783-Speed 5204.50 samples/sec Loss 0.4159 LearningRate 0.0004 Epoch: 18 Global Step: 313550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:18:59,770-Speed 5155.14 samples/sec Loss 0.4461 LearningRate 0.0004 Epoch: 18 Global Step: 313560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:19:01,769-Speed 5122.99 samples/sec Loss 0.4465 LearningRate 0.0004 Epoch: 18 Global Step: 313570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:19:03,741-Speed 5194.22 samples/sec Loss 0.4393 LearningRate 0.0004 Epoch: 18 Global Step: 313580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:19:05,711-Speed 5200.92 samples/sec Loss 0.4472 LearningRate 0.0004 Epoch: 18 Global Step: 313590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:19:07,679-Speed 5205.66 samples/sec Loss 0.4239 LearningRate 0.0004 Epoch: 18 Global Step: 313600 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:19:09,641-Speed 5220.27 samples/sec Loss 0.4538 LearningRate 0.0004 Epoch: 18 Global Step: 313610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:19:11,615-Speed 5187.57 samples/sec Loss 0.4375 LearningRate 0.0004 Epoch: 18 Global Step: 313620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:19:13,585-Speed 5199.32 samples/sec Loss 0.4435 LearningRate 0.0004 Epoch: 18 Global Step: 313630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:19:15,566-Speed 5171.92 samples/sec Loss 0.4475 LearningRate 0.0004 Epoch: 18 Global Step: 313640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:19:17,539-Speed 5191.74 samples/sec Loss 0.4180 LearningRate 0.0004 Epoch: 18 Global Step: 313650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:19:19,510-Speed 5199.82 samples/sec Loss 0.4517 LearningRate 0.0004 Epoch: 18 Global Step: 313660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:19:21,497-Speed 5153.73 samples/sec Loss 0.4323 LearningRate 0.0004 Epoch: 18 Global Step: 313670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:19:23,473-Speed 5184.00 samples/sec Loss 0.4396 LearningRate 0.0004 Epoch: 18 Global Step: 313680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:19:25,450-Speed 5180.54 samples/sec Loss 0.4086 LearningRate 0.0004 Epoch: 18 Global Step: 313690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:19:27,457-Speed 5104.69 samples/sec Loss 0.4336 LearningRate 0.0004 Epoch: 18 Global Step: 313700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:19:29,431-Speed 5191.39 samples/sec Loss 0.4249 LearningRate 0.0004 Epoch: 18 Global Step: 313710 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:19:31,402-Speed 5196.27 samples/sec Loss 0.4258 LearningRate 0.0004 Epoch: 18 Global Step: 313720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:19:33,411-Speed 5098.63 samples/sec Loss 0.4319 LearningRate 0.0004 Epoch: 18 Global Step: 313730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:19:35,409-Speed 5126.69 samples/sec Loss 0.4398 LearningRate 0.0004 Epoch: 18 Global Step: 313740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:19:37,396-Speed 5155.70 samples/sec Loss 0.4356 LearningRate 0.0004 Epoch: 18 Global Step: 313750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:19:39,394-Speed 5127.95 samples/sec Loss 0.4567 LearningRate 0.0004 Epoch: 18 Global Step: 313760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:19:41,402-Speed 5100.81 samples/sec Loss 0.4487 LearningRate 0.0004 Epoch: 18 Global Step: 313770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:19:43,373-Speed 5198.65 samples/sec Loss 0.4494 LearningRate 0.0004 Epoch: 18 Global Step: 313780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:19:45,359-Speed 5156.54 samples/sec Loss 0.4111 LearningRate 0.0004 Epoch: 18 Global Step: 313790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:19:47,351-Speed 5143.56 samples/sec Loss 0.4269 LearningRate 0.0004 Epoch: 18 Global Step: 313800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:19:49,342-Speed 5145.11 samples/sec Loss 0.4575 LearningRate 0.0004 Epoch: 18 Global Step: 313810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:19:51,338-Speed 5132.47 samples/sec Loss 0.4246 LearningRate 0.0004 Epoch: 18 Global Step: 313820 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:19:53,319-Speed 5170.42 samples/sec Loss 0.4365 LearningRate 0.0004 Epoch: 18 Global Step: 313830 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:19:55,293-Speed 5187.55 samples/sec Loss 0.4618 LearningRate 0.0004 Epoch: 18 Global Step: 313840 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:19:57,266-Speed 5192.10 samples/sec Loss 0.4117 LearningRate 0.0004 Epoch: 18 Global Step: 313850 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:19:59,250-Speed 5163.80 samples/sec Loss 0.4520 LearningRate 0.0004 Epoch: 18 Global Step: 313860 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:20:01,233-Speed 5166.10 samples/sec Loss 0.4344 LearningRate 0.0004 Epoch: 18 Global Step: 313870 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:20:03,226-Speed 5140.20 samples/sec Loss 0.4318 LearningRate 0.0004 Epoch: 18 Global Step: 313880 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:20:05,213-Speed 5154.81 samples/sec Loss 0.4284 LearningRate 0.0004 Epoch: 18 Global Step: 313890 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:20:07,185-Speed 5192.79 samples/sec Loss 0.4298 LearningRate 0.0004 Epoch: 18 Global Step: 313900 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:20:09,164-Speed 5178.96 samples/sec Loss 0.4316 LearningRate 0.0004 Epoch: 18 Global Step: 313910 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:20:11,131-Speed 5206.77 samples/sec Loss 0.4210 LearningRate 0.0004 Epoch: 18 Global Step: 313920 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:20:13,094-Speed 5217.22 samples/sec Loss 0.4250 LearningRate 0.0004 Epoch: 18 Global Step: 313930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:20:15,076-Speed 5167.55 samples/sec Loss 0.4434 LearningRate 0.0004 Epoch: 18 Global Step: 313940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:20:17,051-Speed 5188.03 samples/sec Loss 0.4227 LearningRate 0.0004 Epoch: 18 Global Step: 313950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:20:19,035-Speed 5162.79 samples/sec Loss 0.4347 LearningRate 0.0004 Epoch: 18 Global Step: 313960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:20:21,010-Speed 5187.03 samples/sec Loss 0.4168 LearningRate 0.0004 Epoch: 18 Global Step: 313970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:20:22,982-Speed 5193.58 samples/sec Loss 0.4401 LearningRate 0.0004 Epoch: 18 Global Step: 313980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:20:24,974-Speed 5142.50 samples/sec Loss 0.4518 LearningRate 0.0004 Epoch: 18 Global Step: 313990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:20:26,964-Speed 5146.67 samples/sec Loss 0.4614 LearningRate 0.0004 Epoch: 18 Global Step: 314000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:20:53,668-[lfw][314000]XNorm: 21.443996 Training: 2022-04-11 20:20:53,669-[lfw][314000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 20:20:53,669-[lfw][314000]Accuracy-Highest: 0.99833 Training: 2022-04-11 20:21:24,439-[cfp_fp][314000]XNorm: 22.019167 Training: 2022-04-11 20:21:24,439-[cfp_fp][314000]Accuracy-Flip: 0.98900+-0.00414 Training: 2022-04-11 20:21:24,440-[cfp_fp][314000]Accuracy-Highest: 0.99029 Training: 2022-04-11 20:21:50,939-[agedb_30][314000]XNorm: 22.625001 Training: 2022-04-11 20:21:50,940-[agedb_30][314000]Accuracy-Flip: 0.98383+-0.00654 Training: 2022-04-11 20:21:50,940-[agedb_30][314000]Accuracy-Highest: 0.98383 Training: 2022-04-11 20:21:52,945-Speed 119.10 samples/sec Loss 0.4368 LearningRate 0.0004 Epoch: 18 Global Step: 314010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:21:54,905-Speed 5227.53 samples/sec Loss 0.4374 LearningRate 0.0004 Epoch: 18 Global Step: 314020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:21:56,861-Speed 5235.10 samples/sec Loss 0.4327 LearningRate 0.0004 Epoch: 18 Global Step: 314030 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:21:58,821-Speed 5225.57 samples/sec Loss 0.4299 LearningRate 0.0004 Epoch: 18 Global Step: 314040 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:22:00,780-Speed 5230.82 samples/sec Loss 0.4173 LearningRate 0.0004 Epoch: 18 Global Step: 314050 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:22:02,735-Speed 5238.77 samples/sec Loss 0.4504 LearningRate 0.0004 Epoch: 18 Global Step: 314060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:22:04,706-Speed 5197.49 samples/sec Loss 0.4453 LearningRate 0.0004 Epoch: 18 Global Step: 314070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:22:06,670-Speed 5213.83 samples/sec Loss 0.4473 LearningRate 0.0003 Epoch: 18 Global Step: 314080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:22:08,637-Speed 5209.85 samples/sec Loss 0.4154 LearningRate 0.0003 Epoch: 18 Global Step: 314090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:22:10,610-Speed 5192.54 samples/sec Loss 0.4281 LearningRate 0.0003 Epoch: 18 Global Step: 314100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:22:12,570-Speed 5224.27 samples/sec Loss 0.4161 LearningRate 0.0003 Epoch: 18 Global Step: 314110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:22:14,532-Speed 5220.75 samples/sec Loss 0.4467 LearningRate 0.0003 Epoch: 18 Global Step: 314120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:22:16,496-Speed 5216.30 samples/sec Loss 0.4209 LearningRate 0.0003 Epoch: 18 Global Step: 314130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:22:18,455-Speed 5228.50 samples/sec Loss 0.4237 LearningRate 0.0003 Epoch: 18 Global Step: 314140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:22:20,422-Speed 5209.43 samples/sec Loss 0.4409 LearningRate 0.0003 Epoch: 18 Global Step: 314150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:22:22,394-Speed 5194.00 samples/sec Loss 0.4757 LearningRate 0.0003 Epoch: 18 Global Step: 314160 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:22:24,384-Speed 5146.45 samples/sec Loss 0.4381 LearningRate 0.0003 Epoch: 18 Global Step: 314170 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:22:26,367-Speed 5165.49 samples/sec Loss 0.4237 LearningRate 0.0003 Epoch: 18 Global Step: 314180 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:22:28,331-Speed 5215.42 samples/sec Loss 0.4467 LearningRate 0.0003 Epoch: 18 Global Step: 314190 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:22:30,323-Speed 5143.54 samples/sec Loss 0.4216 LearningRate 0.0003 Epoch: 18 Global Step: 314200 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:22:32,288-Speed 5212.94 samples/sec Loss 0.4694 LearningRate 0.0003 Epoch: 18 Global Step: 314210 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:22:34,268-Speed 5175.47 samples/sec Loss 0.4549 LearningRate 0.0003 Epoch: 18 Global Step: 314220 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:22:36,245-Speed 5180.10 samples/sec Loss 0.4123 LearningRate 0.0003 Epoch: 18 Global Step: 314230 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:22:38,225-Speed 5173.52 samples/sec Loss 0.4371 LearningRate 0.0003 Epoch: 18 Global Step: 314240 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:22:40,195-Speed 5200.35 samples/sec Loss 0.4254 LearningRate 0.0003 Epoch: 18 Global Step: 314250 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:22:42,154-Speed 5229.06 samples/sec Loss 0.4530 LearningRate 0.0003 Epoch: 18 Global Step: 314260 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:22:44,120-Speed 5210.52 samples/sec Loss 0.4380 LearningRate 0.0003 Epoch: 18 Global Step: 314270 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:22:46,102-Speed 5169.07 samples/sec Loss 0.4492 LearningRate 0.0003 Epoch: 18 Global Step: 314280 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:22:48,090-Speed 5152.26 samples/sec Loss 0.4344 LearningRate 0.0003 Epoch: 18 Global Step: 314290 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:22:50,084-Speed 5135.28 samples/sec Loss 0.4201 LearningRate 0.0003 Epoch: 18 Global Step: 314300 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:22:52,060-Speed 5184.56 samples/sec Loss 0.4333 LearningRate 0.0003 Epoch: 18 Global Step: 314310 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:22:54,052-Speed 5144.05 samples/sec Loss 0.4471 LearningRate 0.0003 Epoch: 18 Global Step: 314320 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:22:56,019-Speed 5209.16 samples/sec Loss 0.4438 LearningRate 0.0003 Epoch: 18 Global Step: 314330 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:22:57,989-Speed 5198.12 samples/sec Loss 0.4408 LearningRate 0.0003 Epoch: 18 Global Step: 314340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:22:59,980-Speed 5145.43 samples/sec Loss 0.4366 LearningRate 0.0003 Epoch: 18 Global Step: 314350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:23:01,958-Speed 5181.01 samples/sec Loss 0.4577 LearningRate 0.0003 Epoch: 18 Global Step: 314360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:23:03,949-Speed 5144.80 samples/sec Loss 0.4242 LearningRate 0.0003 Epoch: 18 Global Step: 314370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:23:05,919-Speed 5199.91 samples/sec Loss 0.4114 LearningRate 0.0003 Epoch: 18 Global Step: 314380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:23:07,884-Speed 5213.54 samples/sec Loss 0.4336 LearningRate 0.0003 Epoch: 18 Global Step: 314390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:23:09,870-Speed 5156.87 samples/sec Loss 0.4380 LearningRate 0.0003 Epoch: 18 Global Step: 314400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:23:11,848-Speed 5179.45 samples/sec Loss 0.4474 LearningRate 0.0003 Epoch: 18 Global Step: 314410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:23:13,811-Speed 5217.06 samples/sec Loss 0.4455 LearningRate 0.0003 Epoch: 18 Global Step: 314420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:23:15,799-Speed 5153.34 samples/sec Loss 0.4120 LearningRate 0.0003 Epoch: 18 Global Step: 314430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:23:17,766-Speed 5208.96 samples/sec Loss 0.4197 LearningRate 0.0003 Epoch: 18 Global Step: 314440 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:23:19,734-Speed 5205.83 samples/sec Loss 0.4493 LearningRate 0.0003 Epoch: 18 Global Step: 314450 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:23:21,694-Speed 5225.69 samples/sec Loss 0.4292 LearningRate 0.0003 Epoch: 18 Global Step: 314460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:23:23,664-Speed 5198.17 samples/sec Loss 0.4244 LearningRate 0.0003 Epoch: 18 Global Step: 314470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:23:25,654-Speed 5148.41 samples/sec Loss 0.4456 LearningRate 0.0003 Epoch: 18 Global Step: 314480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:23:27,632-Speed 5178.41 samples/sec Loss 0.4463 LearningRate 0.0003 Epoch: 18 Global Step: 314490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:23:29,608-Speed 5184.65 samples/sec Loss 0.4342 LearningRate 0.0003 Epoch: 18 Global Step: 314500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:23:31,580-Speed 5193.65 samples/sec Loss 0.4376 LearningRate 0.0003 Epoch: 18 Global Step: 314510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:23:33,543-Speed 5216.88 samples/sec Loss 0.4306 LearningRate 0.0003 Epoch: 18 Global Step: 314520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:23:35,525-Speed 5169.96 samples/sec Loss 0.4329 LearningRate 0.0003 Epoch: 18 Global Step: 314530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:23:37,498-Speed 5189.95 samples/sec Loss 0.4311 LearningRate 0.0003 Epoch: 18 Global Step: 314540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:23:39,471-Speed 5195.05 samples/sec Loss 0.4461 LearningRate 0.0003 Epoch: 18 Global Step: 314550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:23:41,437-Speed 5210.87 samples/sec Loss 0.4417 LearningRate 0.0003 Epoch: 18 Global Step: 314560 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:23:43,401-Speed 5215.21 samples/sec Loss 0.4360 LearningRate 0.0003 Epoch: 18 Global Step: 314570 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:23:45,364-Speed 5216.69 samples/sec Loss 0.4529 LearningRate 0.0003 Epoch: 18 Global Step: 314580 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:23:47,320-Speed 5237.84 samples/sec Loss 0.4267 LearningRate 0.0003 Epoch: 18 Global Step: 314590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:23:49,298-Speed 5179.02 samples/sec Loss 0.4326 LearningRate 0.0003 Epoch: 18 Global Step: 314600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:23:51,283-Speed 5159.64 samples/sec Loss 0.3961 LearningRate 0.0003 Epoch: 18 Global Step: 314610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:23:53,246-Speed 5217.31 samples/sec Loss 0.4208 LearningRate 0.0003 Epoch: 18 Global Step: 314620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:23:55,208-Speed 5221.86 samples/sec Loss 0.4098 LearningRate 0.0003 Epoch: 18 Global Step: 314630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:23:57,182-Speed 5189.16 samples/sec Loss 0.4496 LearningRate 0.0003 Epoch: 18 Global Step: 314640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:23:59,149-Speed 5206.51 samples/sec Loss 0.4241 LearningRate 0.0003 Epoch: 18 Global Step: 314650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:24:01,113-Speed 5217.84 samples/sec Loss 0.4277 LearningRate 0.0003 Epoch: 18 Global Step: 314660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:24:03,110-Speed 5128.27 samples/sec Loss 0.4466 LearningRate 0.0003 Epoch: 18 Global Step: 314670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:24:05,073-Speed 5218.44 samples/sec Loss 0.4147 LearningRate 0.0003 Epoch: 18 Global Step: 314680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:24:07,037-Speed 5214.71 samples/sec Loss 0.4443 LearningRate 0.0003 Epoch: 18 Global Step: 314690 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:24:09,001-Speed 5218.07 samples/sec Loss 0.4100 LearningRate 0.0003 Epoch: 18 Global Step: 314700 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:24:11,017-Speed 5080.01 samples/sec Loss 0.4430 LearningRate 0.0003 Epoch: 18 Global Step: 314710 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:24:12,983-Speed 5209.95 samples/sec Loss 0.4344 LearningRate 0.0003 Epoch: 18 Global Step: 314720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:24:14,972-Speed 5150.69 samples/sec Loss 0.4290 LearningRate 0.0003 Epoch: 18 Global Step: 314730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:24:16,950-Speed 5176.89 samples/sec Loss 0.4364 LearningRate 0.0003 Epoch: 18 Global Step: 314740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:24:18,928-Speed 5178.88 samples/sec Loss 0.4352 LearningRate 0.0003 Epoch: 18 Global Step: 314750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:24:20,914-Speed 5159.36 samples/sec Loss 0.4408 LearningRate 0.0003 Epoch: 18 Global Step: 314760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:24:22,889-Speed 5186.10 samples/sec Loss 0.4340 LearningRate 0.0003 Epoch: 18 Global Step: 314770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:24:24,861-Speed 5195.61 samples/sec Loss 0.4527 LearningRate 0.0003 Epoch: 18 Global Step: 314780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:24:26,825-Speed 5213.55 samples/sec Loss 0.4352 LearningRate 0.0003 Epoch: 18 Global Step: 314790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:24:28,787-Speed 5221.35 samples/sec Loss 0.4440 LearningRate 0.0003 Epoch: 18 Global Step: 314800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:24:30,750-Speed 5218.09 samples/sec Loss 0.4344 LearningRate 0.0003 Epoch: 18 Global Step: 314810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:24:32,716-Speed 5210.60 samples/sec Loss 0.4593 LearningRate 0.0003 Epoch: 18 Global Step: 314820 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:24:34,684-Speed 5205.12 samples/sec Loss 0.4216 LearningRate 0.0003 Epoch: 18 Global Step: 314830 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:24:36,656-Speed 5193.72 samples/sec Loss 0.4410 LearningRate 0.0003 Epoch: 18 Global Step: 314840 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:24:38,628-Speed 5196.05 samples/sec Loss 0.4238 LearningRate 0.0003 Epoch: 18 Global Step: 314850 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:24:40,622-Speed 5136.22 samples/sec Loss 0.4556 LearningRate 0.0003 Epoch: 18 Global Step: 314860 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:24:42,594-Speed 5195.28 samples/sec Loss 0.4392 LearningRate 0.0003 Epoch: 18 Global Step: 314870 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:24:44,565-Speed 5196.92 samples/sec Loss 0.4521 LearningRate 0.0003 Epoch: 18 Global Step: 314880 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:24:46,526-Speed 5223.52 samples/sec Loss 0.4338 LearningRate 0.0003 Epoch: 18 Global Step: 314890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:24:48,491-Speed 5212.85 samples/sec Loss 0.4022 LearningRate 0.0003 Epoch: 18 Global Step: 314900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:24:50,476-Speed 5161.85 samples/sec Loss 0.4348 LearningRate 0.0003 Epoch: 18 Global Step: 314910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:24:52,450-Speed 5187.14 samples/sec Loss 0.4122 LearningRate 0.0003 Epoch: 18 Global Step: 314920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:24:54,421-Speed 5197.63 samples/sec Loss 0.4324 LearningRate 0.0003 Epoch: 18 Global Step: 314930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:24:56,388-Speed 5208.17 samples/sec Loss 0.4599 LearningRate 0.0003 Epoch: 18 Global Step: 314940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:24:58,353-Speed 5212.28 samples/sec Loss 0.4410 LearningRate 0.0003 Epoch: 18 Global Step: 314950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:25:00,319-Speed 5212.34 samples/sec Loss 0.4034 LearningRate 0.0003 Epoch: 18 Global Step: 314960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:25:02,287-Speed 5204.37 samples/sec Loss 0.4592 LearningRate 0.0003 Epoch: 18 Global Step: 314970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:25:04,261-Speed 5187.88 samples/sec Loss 0.4494 LearningRate 0.0003 Epoch: 18 Global Step: 314980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:25:06,232-Speed 5195.91 samples/sec Loss 0.4233 LearningRate 0.0003 Epoch: 18 Global Step: 314990 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:25:08,199-Speed 5209.11 samples/sec Loss 0.4383 LearningRate 0.0003 Epoch: 18 Global Step: 315000 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:25:10,169-Speed 5200.84 samples/sec Loss 0.4287 LearningRate 0.0003 Epoch: 18 Global Step: 315010 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:25:12,149-Speed 5174.26 samples/sec Loss 0.4314 LearningRate 0.0003 Epoch: 18 Global Step: 315020 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:25:14,123-Speed 5189.11 samples/sec Loss 0.4315 LearningRate 0.0003 Epoch: 18 Global Step: 315030 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:25:16,113-Speed 5147.21 samples/sec Loss 0.4355 LearningRate 0.0003 Epoch: 18 Global Step: 315040 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:25:18,079-Speed 5210.27 samples/sec Loss 0.4376 LearningRate 0.0003 Epoch: 18 Global Step: 315050 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:25:20,046-Speed 5207.46 samples/sec Loss 0.4268 LearningRate 0.0003 Epoch: 18 Global Step: 315060 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:25:22,051-Speed 5109.03 samples/sec Loss 0.4401 LearningRate 0.0003 Epoch: 18 Global Step: 315070 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:25:24,011-Speed 5226.91 samples/sec Loss 0.4365 LearningRate 0.0003 Epoch: 18 Global Step: 315080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:25:26,007-Speed 5129.69 samples/sec Loss 0.4301 LearningRate 0.0003 Epoch: 18 Global Step: 315090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:25:27,988-Speed 5172.07 samples/sec Loss 0.4338 LearningRate 0.0003 Epoch: 18 Global Step: 315100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:25:29,976-Speed 5151.74 samples/sec Loss 0.4583 LearningRate 0.0003 Epoch: 18 Global Step: 315110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:25:31,953-Speed 5183.87 samples/sec Loss 0.4353 LearningRate 0.0003 Epoch: 18 Global Step: 315120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:25:33,957-Speed 5111.32 samples/sec Loss 0.4389 LearningRate 0.0003 Epoch: 18 Global Step: 315130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:25:35,957-Speed 5121.13 samples/sec Loss 0.4216 LearningRate 0.0003 Epoch: 18 Global Step: 315140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:25:37,950-Speed 5139.87 samples/sec Loss 0.4136 LearningRate 0.0003 Epoch: 18 Global Step: 315150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:25:39,927-Speed 5181.57 samples/sec Loss 0.4230 LearningRate 0.0003 Epoch: 18 Global Step: 315160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:25:41,897-Speed 5197.54 samples/sec Loss 0.4414 LearningRate 0.0003 Epoch: 18 Global Step: 315170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:25:43,862-Speed 5213.19 samples/sec Loss 0.4465 LearningRate 0.0003 Epoch: 18 Global Step: 315180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:25:45,865-Speed 5114.88 samples/sec Loss 0.4311 LearningRate 0.0003 Epoch: 18 Global Step: 315190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:25:47,834-Speed 5203.25 samples/sec Loss 0.4433 LearningRate 0.0003 Epoch: 18 Global Step: 315200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:25:49,810-Speed 5181.79 samples/sec Loss 0.4238 LearningRate 0.0003 Epoch: 18 Global Step: 315210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:25:51,775-Speed 5214.02 samples/sec Loss 0.4119 LearningRate 0.0003 Epoch: 18 Global Step: 315220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:25:53,761-Speed 5157.30 samples/sec Loss 0.4423 LearningRate 0.0003 Epoch: 18 Global Step: 315230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:25:55,731-Speed 5202.87 samples/sec Loss 0.4349 LearningRate 0.0003 Epoch: 18 Global Step: 315240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:25:57,714-Speed 5166.70 samples/sec Loss 0.4519 LearningRate 0.0003 Epoch: 18 Global Step: 315250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:25:59,687-Speed 5191.51 samples/sec Loss 0.4337 LearningRate 0.0003 Epoch: 18 Global Step: 315260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:26:01,667-Speed 5175.00 samples/sec Loss 0.4226 LearningRate 0.0003 Epoch: 18 Global Step: 315270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:26:03,633-Speed 5209.61 samples/sec Loss 0.4469 LearningRate 0.0003 Epoch: 18 Global Step: 315280 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:26:05,600-Speed 5207.97 samples/sec Loss 0.4574 LearningRate 0.0003 Epoch: 18 Global Step: 315290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:26:07,568-Speed 5204.70 samples/sec Loss 0.4482 LearningRate 0.0003 Epoch: 18 Global Step: 315300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:26:09,541-Speed 5191.89 samples/sec Loss 0.4436 LearningRate 0.0003 Epoch: 18 Global Step: 315310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:26:11,517-Speed 5181.88 samples/sec Loss 0.4259 LearningRate 0.0003 Epoch: 18 Global Step: 315320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:26:13,505-Speed 5153.40 samples/sec Loss 0.4232 LearningRate 0.0003 Epoch: 18 Global Step: 315330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:26:15,503-Speed 5126.52 samples/sec Loss 0.4191 LearningRate 0.0003 Epoch: 18 Global Step: 315340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:26:17,473-Speed 5200.64 samples/sec Loss 0.4509 LearningRate 0.0003 Epoch: 18 Global Step: 315350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:26:19,440-Speed 5207.19 samples/sec Loss 0.4399 LearningRate 0.0003 Epoch: 18 Global Step: 315360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:26:21,417-Speed 5181.57 samples/sec Loss 0.4355 LearningRate 0.0003 Epoch: 18 Global Step: 315370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:26:27,055-Speed 1816.45 samples/sec Loss 0.4270 LearningRate 0.0003 Epoch: 18 Global Step: 315380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:26:29,021-Speed 5210.08 samples/sec Loss 0.4184 LearningRate 0.0003 Epoch: 18 Global Step: 315390 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:26:30,995-Speed 5190.69 samples/sec Loss 0.4427 LearningRate 0.0003 Epoch: 18 Global Step: 315400 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:26:32,962-Speed 5208.39 samples/sec Loss 0.4444 LearningRate 0.0003 Epoch: 18 Global Step: 315410 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:26:34,944-Speed 5166.94 samples/sec Loss 0.4085 LearningRate 0.0003 Epoch: 18 Global Step: 315420 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:26:36,929-Speed 5160.31 samples/sec Loss 0.4341 LearningRate 0.0003 Epoch: 18 Global Step: 315430 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:26:38,919-Speed 5147.96 samples/sec Loss 0.4363 LearningRate 0.0003 Epoch: 18 Global Step: 315440 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:26:40,897-Speed 5179.83 samples/sec Loss 0.4419 LearningRate 0.0003 Epoch: 18 Global Step: 315450 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:26:42,863-Speed 5208.07 samples/sec Loss 0.4461 LearningRate 0.0003 Epoch: 18 Global Step: 315460 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:26:44,849-Speed 5159.62 samples/sec Loss 0.4540 LearningRate 0.0003 Epoch: 18 Global Step: 315470 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:26:46,874-Speed 5058.93 samples/sec Loss 0.4195 LearningRate 0.0003 Epoch: 18 Global Step: 315480 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:26:48,859-Speed 5158.35 samples/sec Loss 0.4313 LearningRate 0.0003 Epoch: 18 Global Step: 315490 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:26:50,854-Speed 5135.45 samples/sec Loss 0.4374 LearningRate 0.0003 Epoch: 18 Global Step: 315500 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:26:52,858-Speed 5112.11 samples/sec Loss 0.4375 LearningRate 0.0003 Epoch: 18 Global Step: 315510 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:26:54,819-Speed 5222.57 samples/sec Loss 0.4435 LearningRate 0.0003 Epoch: 18 Global Step: 315520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:26:56,831-Speed 5091.11 samples/sec Loss 0.4124 LearningRate 0.0003 Epoch: 18 Global Step: 315530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:26:58,846-Speed 5084.45 samples/sec Loss 0.4536 LearningRate 0.0003 Epoch: 18 Global Step: 315540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:27:00,833-Speed 5155.16 samples/sec Loss 0.4283 LearningRate 0.0003 Epoch: 18 Global Step: 315550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:27:02,825-Speed 5142.63 samples/sec Loss 0.4375 LearningRate 0.0003 Epoch: 18 Global Step: 315560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:27:04,808-Speed 5165.26 samples/sec Loss 0.4428 LearningRate 0.0003 Epoch: 18 Global Step: 315570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:27:06,818-Speed 5097.97 samples/sec Loss 0.4260 LearningRate 0.0003 Epoch: 18 Global Step: 315580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:27:08,849-Speed 5043.21 samples/sec Loss 0.4386 LearningRate 0.0003 Epoch: 18 Global Step: 315590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:27:10,834-Speed 5158.69 samples/sec Loss 0.4369 LearningRate 0.0003 Epoch: 18 Global Step: 315600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:27:12,804-Speed 5199.83 samples/sec Loss 0.3904 LearningRate 0.0003 Epoch: 18 Global Step: 315610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:27:14,776-Speed 5195.38 samples/sec Loss 0.4291 LearningRate 0.0003 Epoch: 18 Global Step: 315620 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:27:16,775-Speed 5126.98 samples/sec Loss 0.4520 LearningRate 0.0003 Epoch: 18 Global Step: 315630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:27:18,765-Speed 5145.59 samples/sec Loss 0.4380 LearningRate 0.0003 Epoch: 18 Global Step: 315640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:27:20,753-Speed 5153.29 samples/sec Loss 0.4401 LearningRate 0.0003 Epoch: 18 Global Step: 315650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:27:22,720-Speed 5208.10 samples/sec Loss 0.4420 LearningRate 0.0003 Epoch: 18 Global Step: 315660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:27:24,738-Speed 5075.09 samples/sec Loss 0.4286 LearningRate 0.0003 Epoch: 18 Global Step: 315670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:27:26,716-Speed 5178.42 samples/sec Loss 0.4287 LearningRate 0.0003 Epoch: 18 Global Step: 315680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:27:28,688-Speed 5194.78 samples/sec Loss 0.4610 LearningRate 0.0003 Epoch: 18 Global Step: 315690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:27:30,674-Speed 5158.23 samples/sec Loss 0.4400 LearningRate 0.0003 Epoch: 18 Global Step: 315700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:27:32,660-Speed 5157.82 samples/sec Loss 0.4228 LearningRate 0.0003 Epoch: 18 Global Step: 315710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:27:34,666-Speed 5105.98 samples/sec Loss 0.4452 LearningRate 0.0003 Epoch: 18 Global Step: 315720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:27:36,671-Speed 5107.54 samples/sec Loss 0.4158 LearningRate 0.0003 Epoch: 18 Global Step: 315730 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:27:38,665-Speed 5139.86 samples/sec Loss 0.4232 LearningRate 0.0003 Epoch: 18 Global Step: 315740 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:27:40,646-Speed 5169.38 samples/sec Loss 0.4283 LearningRate 0.0003 Epoch: 18 Global Step: 315750 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:27:42,618-Speed 5195.36 samples/sec Loss 0.4273 LearningRate 0.0003 Epoch: 18 Global Step: 315760 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:27:44,577-Speed 5229.01 samples/sec Loss 0.4282 LearningRate 0.0003 Epoch: 18 Global Step: 315770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:27:46,543-Speed 5209.13 samples/sec Loss 0.4278 LearningRate 0.0003 Epoch: 18 Global Step: 315780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:27:48,513-Speed 5199.23 samples/sec Loss 0.4539 LearningRate 0.0003 Epoch: 18 Global Step: 315790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:27:50,494-Speed 5172.88 samples/sec Loss 0.4255 LearningRate 0.0003 Epoch: 18 Global Step: 315800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:27:52,468-Speed 5187.38 samples/sec Loss 0.4353 LearningRate 0.0003 Epoch: 18 Global Step: 315810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:27:54,434-Speed 5210.92 samples/sec Loss 0.4309 LearningRate 0.0003 Epoch: 18 Global Step: 315820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:27:56,403-Speed 5202.74 samples/sec Loss 0.4362 LearningRate 0.0003 Epoch: 18 Global Step: 315830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:27:58,388-Speed 5159.59 samples/sec Loss 0.4455 LearningRate 0.0003 Epoch: 18 Global Step: 315840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:28:00,374-Speed 5158.28 samples/sec Loss 0.3960 LearningRate 0.0003 Epoch: 18 Global Step: 315850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:28:02,349-Speed 5186.24 samples/sec Loss 0.4301 LearningRate 0.0003 Epoch: 18 Global Step: 315860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:28:04,328-Speed 5177.30 samples/sec Loss 0.4250 LearningRate 0.0003 Epoch: 18 Global Step: 315870 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:28:06,292-Speed 5216.61 samples/sec Loss 0.4165 LearningRate 0.0003 Epoch: 18 Global Step: 315880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:28:08,271-Speed 5173.59 samples/sec Loss 0.4421 LearningRate 0.0003 Epoch: 18 Global Step: 315890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:28:10,254-Speed 5165.92 samples/sec Loss 0.4343 LearningRate 0.0003 Epoch: 18 Global Step: 315900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:28:12,220-Speed 5211.57 samples/sec Loss 0.4292 LearningRate 0.0003 Epoch: 18 Global Step: 315910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:28:14,197-Speed 5181.01 samples/sec Loss 0.4405 LearningRate 0.0003 Epoch: 18 Global Step: 315920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:28:16,175-Speed 5181.36 samples/sec Loss 0.4199 LearningRate 0.0003 Epoch: 18 Global Step: 315930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:28:18,156-Speed 5172.05 samples/sec Loss 0.4299 LearningRate 0.0003 Epoch: 18 Global Step: 315940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:28:20,142-Speed 5156.77 samples/sec Loss 0.4016 LearningRate 0.0003 Epoch: 18 Global Step: 315950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:28:22,113-Speed 5195.65 samples/sec Loss 0.4540 LearningRate 0.0003 Epoch: 18 Global Step: 315960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:28:24,120-Speed 5104.30 samples/sec Loss 0.4190 LearningRate 0.0003 Epoch: 18 Global Step: 315970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:28:26,098-Speed 5178.64 samples/sec Loss 0.4312 LearningRate 0.0003 Epoch: 18 Global Step: 315980 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:28:28,095-Speed 5129.66 samples/sec Loss 0.4183 LearningRate 0.0003 Epoch: 18 Global Step: 315990 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:28:30,068-Speed 5192.50 samples/sec Loss 0.4289 LearningRate 0.0003 Epoch: 18 Global Step: 316000 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:28:56,624-[lfw][316000]XNorm: 21.596550 Training: 2022-04-11 20:28:56,625-[lfw][316000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 20:28:56,626-[lfw][316000]Accuracy-Highest: 0.99833 Training: 2022-04-11 20:29:27,454-[cfp_fp][316000]XNorm: 22.058381 Training: 2022-04-11 20:29:27,455-[cfp_fp][316000]Accuracy-Flip: 0.98957+-0.00404 Training: 2022-04-11 20:29:27,455-[cfp_fp][316000]Accuracy-Highest: 0.99029 Training: 2022-04-11 20:29:54,037-[agedb_30][316000]XNorm: 22.735019 Training: 2022-04-11 20:29:54,037-[agedb_30][316000]Accuracy-Flip: 0.98400+-0.00659 Training: 2022-04-11 20:29:54,038-[agedb_30][316000]Accuracy-Highest: 0.98400 Training: 2022-04-11 20:29:56,032-Speed 119.12 samples/sec Loss 0.4657 LearningRate 0.0003 Epoch: 18 Global Step: 316010 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:29:58,020-Speed 5151.27 samples/sec Loss 0.4191 LearningRate 0.0003 Epoch: 18 Global Step: 316020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:30:00,042-Speed 5064.95 samples/sec Loss 0.4207 LearningRate 0.0003 Epoch: 18 Global Step: 316030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:30:02,012-Speed 5200.60 samples/sec Loss 0.3941 LearningRate 0.0003 Epoch: 18 Global Step: 316040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:30:03,987-Speed 5185.48 samples/sec Loss 0.4303 LearningRate 0.0003 Epoch: 18 Global Step: 316050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:30:05,962-Speed 5186.93 samples/sec Loss 0.4540 LearningRate 0.0003 Epoch: 18 Global Step: 316060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:30:07,929-Speed 5208.52 samples/sec Loss 0.4549 LearningRate 0.0003 Epoch: 18 Global Step: 316070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:30:09,942-Speed 5087.89 samples/sec Loss 0.4353 LearningRate 0.0003 Epoch: 18 Global Step: 316080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:30:11,935-Speed 5139.61 samples/sec Loss 0.4372 LearningRate 0.0003 Epoch: 18 Global Step: 316090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:30:13,909-Speed 5189.01 samples/sec Loss 0.4461 LearningRate 0.0003 Epoch: 18 Global Step: 316100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:30:15,889-Speed 5174.01 samples/sec Loss 0.4284 LearningRate 0.0003 Epoch: 18 Global Step: 316110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:30:17,880-Speed 5145.99 samples/sec Loss 0.4268 LearningRate 0.0003 Epoch: 18 Global Step: 316120 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:30:19,866-Speed 5157.38 samples/sec Loss 0.4300 LearningRate 0.0003 Epoch: 18 Global Step: 316130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:30:21,854-Speed 5152.30 samples/sec Loss 0.4212 LearningRate 0.0003 Epoch: 18 Global Step: 316140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:30:23,840-Speed 5157.43 samples/sec Loss 0.4307 LearningRate 0.0003 Epoch: 18 Global Step: 316150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:30:25,813-Speed 5192.63 samples/sec Loss 0.4203 LearningRate 0.0003 Epoch: 18 Global Step: 316160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:30:27,809-Speed 5131.80 samples/sec Loss 0.4328 LearningRate 0.0003 Epoch: 18 Global Step: 316170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:30:29,785-Speed 5183.97 samples/sec Loss 0.4505 LearningRate 0.0003 Epoch: 18 Global Step: 316180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:30:31,757-Speed 5193.08 samples/sec Loss 0.4402 LearningRate 0.0003 Epoch: 18 Global Step: 316190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:30:33,753-Speed 5131.43 samples/sec Loss 0.4445 LearningRate 0.0003 Epoch: 18 Global Step: 316200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:30:35,730-Speed 5181.30 samples/sec Loss 0.4327 LearningRate 0.0003 Epoch: 18 Global Step: 316210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:30:37,757-Speed 5055.16 samples/sec Loss 0.4187 LearningRate 0.0003 Epoch: 18 Global Step: 316220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:30:39,729-Speed 5192.60 samples/sec Loss 0.4149 LearningRate 0.0003 Epoch: 18 Global Step: 316230 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:30:41,721-Speed 5143.35 samples/sec Loss 0.4318 LearningRate 0.0003 Epoch: 18 Global Step: 316240 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:30:43,710-Speed 5149.50 samples/sec Loss 0.4164 LearningRate 0.0003 Epoch: 18 Global Step: 316250 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:30:45,681-Speed 5197.94 samples/sec Loss 0.4508 LearningRate 0.0003 Epoch: 18 Global Step: 316260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:30:47,673-Speed 5143.82 samples/sec Loss 0.4437 LearningRate 0.0003 Epoch: 18 Global Step: 316270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:30:49,681-Speed 5101.15 samples/sec Loss 0.4465 LearningRate 0.0003 Epoch: 18 Global Step: 316280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:30:51,683-Speed 5116.07 samples/sec Loss 0.4369 LearningRate 0.0003 Epoch: 18 Global Step: 316290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:30:53,684-Speed 5118.26 samples/sec Loss 0.3895 LearningRate 0.0003 Epoch: 18 Global Step: 316300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:30:55,669-Speed 5160.79 samples/sec Loss 0.4388 LearningRate 0.0003 Epoch: 18 Global Step: 316310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:30:57,683-Speed 5086.80 samples/sec Loss 0.4201 LearningRate 0.0003 Epoch: 18 Global Step: 316320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:30:59,701-Speed 5074.26 samples/sec Loss 0.4313 LearningRate 0.0003 Epoch: 18 Global Step: 316330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:31:01,693-Speed 5144.38 samples/sec Loss 0.4119 LearningRate 0.0003 Epoch: 18 Global Step: 316340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:31:03,679-Speed 5155.63 samples/sec Loss 0.4301 LearningRate 0.0003 Epoch: 18 Global Step: 316350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:31:05,658-Speed 5177.75 samples/sec Loss 0.4418 LearningRate 0.0003 Epoch: 18 Global Step: 316360 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:31:07,637-Speed 5176.96 samples/sec Loss 0.4417 LearningRate 0.0003 Epoch: 18 Global Step: 316370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:31:09,637-Speed 5119.48 samples/sec Loss 0.4268 LearningRate 0.0003 Epoch: 18 Global Step: 316380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:31:11,609-Speed 5195.57 samples/sec Loss 0.4250 LearningRate 0.0003 Epoch: 18 Global Step: 316390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:31:13,598-Speed 5151.10 samples/sec Loss 0.4704 LearningRate 0.0003 Epoch: 18 Global Step: 316400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:31:15,579-Speed 5169.26 samples/sec Loss 0.4436 LearningRate 0.0003 Epoch: 18 Global Step: 316410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:31:17,560-Speed 5171.11 samples/sec Loss 0.4422 LearningRate 0.0003 Epoch: 18 Global Step: 316420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:31:19,536-Speed 5183.94 samples/sec Loss 0.4293 LearningRate 0.0003 Epoch: 18 Global Step: 316430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:31:21,586-Speed 4997.98 samples/sec Loss 0.4436 LearningRate 0.0003 Epoch: 18 Global Step: 316440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:31:23,582-Speed 5131.81 samples/sec Loss 0.4291 LearningRate 0.0003 Epoch: 18 Global Step: 316450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:31:25,583-Speed 5118.24 samples/sec Loss 0.4266 LearningRate 0.0003 Epoch: 18 Global Step: 316460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:31:27,562-Speed 5175.52 samples/sec Loss 0.4293 LearningRate 0.0003 Epoch: 18 Global Step: 316470 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:31:29,546-Speed 5164.51 samples/sec Loss 0.4265 LearningRate 0.0003 Epoch: 18 Global Step: 316480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:31:31,516-Speed 5199.69 samples/sec Loss 0.4388 LearningRate 0.0003 Epoch: 18 Global Step: 316490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:31:33,500-Speed 5163.75 samples/sec Loss 0.4256 LearningRate 0.0003 Epoch: 18 Global Step: 316500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:31:35,492-Speed 5140.61 samples/sec Loss 0.4381 LearningRate 0.0003 Epoch: 18 Global Step: 316510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:31:37,473-Speed 5175.42 samples/sec Loss 0.4372 LearningRate 0.0003 Epoch: 18 Global Step: 316520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:31:39,453-Speed 5171.28 samples/sec Loss 0.4238 LearningRate 0.0003 Epoch: 18 Global Step: 316530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:31:41,425-Speed 5196.20 samples/sec Loss 0.4339 LearningRate 0.0003 Epoch: 18 Global Step: 316540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:31:43,396-Speed 5195.89 samples/sec Loss 0.4589 LearningRate 0.0003 Epoch: 18 Global Step: 316550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:31:45,389-Speed 5139.96 samples/sec Loss 0.4251 LearningRate 0.0003 Epoch: 18 Global Step: 316560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:31:47,355-Speed 5208.79 samples/sec Loss 0.4564 LearningRate 0.0003 Epoch: 18 Global Step: 316570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:31:49,347-Speed 5143.03 samples/sec Loss 0.4340 LearningRate 0.0003 Epoch: 18 Global Step: 316580 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:31:51,314-Speed 5207.98 samples/sec Loss 0.4589 LearningRate 0.0003 Epoch: 18 Global Step: 316590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:31:53,319-Speed 5109.34 samples/sec Loss 0.4182 LearningRate 0.0003 Epoch: 18 Global Step: 316600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:31:55,285-Speed 5210.04 samples/sec Loss 0.4557 LearningRate 0.0003 Epoch: 18 Global Step: 316610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:31:57,263-Speed 5178.30 samples/sec Loss 0.4398 LearningRate 0.0003 Epoch: 18 Global Step: 316620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:31:59,283-Speed 5074.36 samples/sec Loss 0.4554 LearningRate 0.0003 Epoch: 18 Global Step: 316630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:32:01,257-Speed 5188.25 samples/sec Loss 0.4442 LearningRate 0.0003 Epoch: 18 Global Step: 316640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:32:03,238-Speed 5172.29 samples/sec Loss 0.4581 LearningRate 0.0003 Epoch: 18 Global Step: 316650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:32:05,206-Speed 5203.05 samples/sec Loss 0.4128 LearningRate 0.0003 Epoch: 18 Global Step: 316660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:32:07,176-Speed 5201.62 samples/sec Loss 0.4229 LearningRate 0.0003 Epoch: 18 Global Step: 316670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:32:09,139-Speed 5216.16 samples/sec Loss 0.4257 LearningRate 0.0003 Epoch: 18 Global Step: 316680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:32:11,132-Speed 5139.75 samples/sec Loss 0.4235 LearningRate 0.0003 Epoch: 18 Global Step: 316690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:32:13,129-Speed 5129.16 samples/sec Loss 0.4259 LearningRate 0.0003 Epoch: 18 Global Step: 316700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:32:15,119-Speed 5148.63 samples/sec Loss 0.4213 LearningRate 0.0003 Epoch: 18 Global Step: 316710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:32:17,093-Speed 5188.29 samples/sec Loss 0.4148 LearningRate 0.0003 Epoch: 18 Global Step: 316720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:32:19,074-Speed 5170.01 samples/sec Loss 0.4427 LearningRate 0.0003 Epoch: 18 Global Step: 316730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:32:21,041-Speed 5210.46 samples/sec Loss 0.4140 LearningRate 0.0003 Epoch: 18 Global Step: 316740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:32:23,019-Speed 5177.35 samples/sec Loss 0.4140 LearningRate 0.0003 Epoch: 18 Global Step: 316750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:32:24,994-Speed 5187.19 samples/sec Loss 0.4268 LearningRate 0.0003 Epoch: 18 Global Step: 316760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:32:26,972-Speed 5179.04 samples/sec Loss 0.4397 LearningRate 0.0003 Epoch: 18 Global Step: 316770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:32:28,959-Speed 5155.32 samples/sec Loss 0.4399 LearningRate 0.0003 Epoch: 18 Global Step: 316780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:32:30,939-Speed 5174.10 samples/sec Loss 0.4185 LearningRate 0.0003 Epoch: 18 Global Step: 316790 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:32:32,916-Speed 5181.03 samples/sec Loss 0.4436 LearningRate 0.0003 Epoch: 18 Global Step: 316800 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:32:34,926-Speed 5096.21 samples/sec Loss 0.4160 LearningRate 0.0003 Epoch: 18 Global Step: 316810 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:32:36,921-Speed 5134.99 samples/sec Loss 0.4318 LearningRate 0.0003 Epoch: 18 Global Step: 316820 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:32:38,919-Speed 5128.39 samples/sec Loss 0.4145 LearningRate 0.0003 Epoch: 18 Global Step: 316830 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:32:40,906-Speed 5154.96 samples/sec Loss 0.4365 LearningRate 0.0003 Epoch: 18 Global Step: 316840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:32:42,872-Speed 5210.76 samples/sec Loss 0.4255 LearningRate 0.0003 Epoch: 18 Global Step: 316850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:32:44,839-Speed 5206.25 samples/sec Loss 0.4404 LearningRate 0.0003 Epoch: 18 Global Step: 316860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:32:46,854-Speed 5083.69 samples/sec Loss 0.4462 LearningRate 0.0003 Epoch: 18 Global Step: 316870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:32:48,856-Speed 5117.45 samples/sec Loss 0.4335 LearningRate 0.0003 Epoch: 18 Global Step: 316880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:32:50,860-Speed 5111.54 samples/sec Loss 0.4558 LearningRate 0.0003 Epoch: 18 Global Step: 316890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:32:52,891-Speed 5043.30 samples/sec Loss 0.4252 LearningRate 0.0003 Epoch: 18 Global Step: 316900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:32:54,893-Speed 5117.71 samples/sec Loss 0.4391 LearningRate 0.0003 Epoch: 18 Global Step: 316910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:32:56,880-Speed 5154.60 samples/sec Loss 0.4338 LearningRate 0.0003 Epoch: 18 Global Step: 316920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:32:58,853-Speed 5191.60 samples/sec Loss 0.4537 LearningRate 0.0003 Epoch: 18 Global Step: 316930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:33:00,845-Speed 5142.85 samples/sec Loss 0.4308 LearningRate 0.0003 Epoch: 18 Global Step: 316940 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:33:02,848-Speed 5114.56 samples/sec Loss 0.4176 LearningRate 0.0003 Epoch: 18 Global Step: 316950 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:33:04,842-Speed 5135.90 samples/sec Loss 0.4546 LearningRate 0.0003 Epoch: 18 Global Step: 316960 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:33:06,810-Speed 5205.84 samples/sec Loss 0.3881 LearningRate 0.0003 Epoch: 18 Global Step: 316970 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:33:08,818-Speed 5099.47 samples/sec Loss 0.4232 LearningRate 0.0003 Epoch: 18 Global Step: 316980 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:33:10,806-Speed 5152.52 samples/sec Loss 0.4312 LearningRate 0.0003 Epoch: 18 Global Step: 316990 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:33:12,785-Speed 5177.59 samples/sec Loss 0.4153 LearningRate 0.0003 Epoch: 18 Global Step: 317000 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:33:14,753-Speed 5205.02 samples/sec Loss 0.4197 LearningRate 0.0003 Epoch: 18 Global Step: 317010 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:33:16,731-Speed 5178.08 samples/sec Loss 0.4504 LearningRate 0.0003 Epoch: 18 Global Step: 317020 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:33:18,706-Speed 5186.58 samples/sec Loss 0.4225 LearningRate 0.0003 Epoch: 18 Global Step: 317030 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:33:20,673-Speed 5207.23 samples/sec Loss 0.4324 LearningRate 0.0003 Epoch: 18 Global Step: 317040 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-04-11 20:33:22,647-Speed 5191.38 samples/sec Loss 0.4484 LearningRate 0.0003 Epoch: 18 Global Step: 317050 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:33:24,623-Speed 5183.96 samples/sec Loss 0.4222 LearningRate 0.0003 Epoch: 18 Global Step: 317060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:33:26,657-Speed 5035.15 samples/sec Loss 0.4331 LearningRate 0.0003 Epoch: 18 Global Step: 317070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:33:28,682-Speed 5057.11 samples/sec Loss 0.4390 LearningRate 0.0003 Epoch: 18 Global Step: 317080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:33:30,661-Speed 5178.22 samples/sec Loss 0.4359 LearningRate 0.0003 Epoch: 18 Global Step: 317090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:33:32,664-Speed 5112.95 samples/sec Loss 0.4207 LearningRate 0.0003 Epoch: 18 Global Step: 317100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:33:34,647-Speed 5164.04 samples/sec Loss 0.4272 LearningRate 0.0003 Epoch: 18 Global Step: 317110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:33:36,930-Speed 4487.02 samples/sec Loss 0.4179 LearningRate 0.0003 Epoch: 18 Global Step: 317120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:34:08,447-Speed 324.92 samples/sec Loss 0.3974 LearningRate 0.0002 Epoch: 19 Global Step: 317130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:34:10,449-Speed 5115.98 samples/sec Loss 0.3974 LearningRate 0.0002 Epoch: 19 Global Step: 317140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:34:12,428-Speed 5177.35 samples/sec Loss 0.4088 LearningRate 0.0002 Epoch: 19 Global Step: 317150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:34:14,728-Speed 4453.82 samples/sec Loss 0.4105 LearningRate 0.0002 Epoch: 19 Global Step: 317160 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:34:16,720-Speed 5141.86 samples/sec Loss 0.3943 LearningRate 0.0002 Epoch: 19 Global Step: 317170 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:34:18,683-Speed 5219.17 samples/sec Loss 0.3813 LearningRate 0.0002 Epoch: 19 Global Step: 317180 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:34:20,673-Speed 5145.08 samples/sec Loss 0.3848 LearningRate 0.0002 Epoch: 19 Global Step: 317190 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:34:22,681-Speed 5101.21 samples/sec Loss 0.3781 LearningRate 0.0002 Epoch: 19 Global Step: 317200 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:34:24,653-Speed 5196.36 samples/sec Loss 0.3900 LearningRate 0.0002 Epoch: 19 Global Step: 317210 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:34:26,627-Speed 5187.96 samples/sec Loss 0.3791 LearningRate 0.0002 Epoch: 19 Global Step: 317220 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:34:28,604-Speed 5179.96 samples/sec Loss 0.3736 LearningRate 0.0002 Epoch: 19 Global Step: 317230 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:34:30,568-Speed 5216.77 samples/sec Loss 0.3597 LearningRate 0.0002 Epoch: 19 Global Step: 317240 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:34:32,537-Speed 5202.41 samples/sec Loss 0.3794 LearningRate 0.0002 Epoch: 19 Global Step: 317250 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:34:34,528-Speed 5145.19 samples/sec Loss 0.3985 LearningRate 0.0002 Epoch: 19 Global Step: 317260 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:34:36,490-Speed 5220.11 samples/sec Loss 0.3734 LearningRate 0.0002 Epoch: 19 Global Step: 317270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:34:38,480-Speed 5147.42 samples/sec Loss 0.3803 LearningRate 0.0002 Epoch: 19 Global Step: 317280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:34:40,459-Speed 5176.14 samples/sec Loss 0.3857 LearningRate 0.0002 Epoch: 19 Global Step: 317290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:34:42,447-Speed 5153.96 samples/sec Loss 0.3905 LearningRate 0.0002 Epoch: 19 Global Step: 317300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:34:44,416-Speed 5201.01 samples/sec Loss 0.3654 LearningRate 0.0002 Epoch: 19 Global Step: 317310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:34:46,412-Speed 5133.64 samples/sec Loss 0.3707 LearningRate 0.0002 Epoch: 19 Global Step: 317320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:34:48,384-Speed 5192.60 samples/sec Loss 0.3838 LearningRate 0.0002 Epoch: 19 Global Step: 317330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:34:50,360-Speed 5184.51 samples/sec Loss 0.3873 LearningRate 0.0002 Epoch: 19 Global Step: 317340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:34:52,374-Speed 5085.74 samples/sec Loss 0.3936 LearningRate 0.0002 Epoch: 19 Global Step: 317350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:34:54,355-Speed 5172.93 samples/sec Loss 0.3905 LearningRate 0.0002 Epoch: 19 Global Step: 317360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:34:56,344-Speed 5149.57 samples/sec Loss 0.3867 LearningRate 0.0002 Epoch: 19 Global Step: 317370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:34:58,332-Speed 5154.06 samples/sec Loss 0.3769 LearningRate 0.0002 Epoch: 19 Global Step: 317380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:35:01,029-Speed 3797.38 samples/sec Loss 0.3728 LearningRate 0.0002 Epoch: 19 Global Step: 317390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:35:03,035-Speed 5105.83 samples/sec Loss 0.4073 LearningRate 0.0002 Epoch: 19 Global Step: 317400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:35:05,014-Speed 5176.67 samples/sec Loss 0.3861 LearningRate 0.0002 Epoch: 19 Global Step: 317410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:35:07,004-Speed 5147.26 samples/sec Loss 0.3882 LearningRate 0.0002 Epoch: 19 Global Step: 317420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:35:08,987-Speed 5169.06 samples/sec Loss 0.3772 LearningRate 0.0002 Epoch: 19 Global Step: 317430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:35:11,005-Speed 5074.24 samples/sec Loss 0.3867 LearningRate 0.0002 Epoch: 19 Global Step: 317440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:35:12,983-Speed 5180.30 samples/sec Loss 0.3830 LearningRate 0.0002 Epoch: 19 Global Step: 317450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:35:14,956-Speed 5190.32 samples/sec Loss 0.4107 LearningRate 0.0002 Epoch: 19 Global Step: 317460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:35:16,940-Speed 5164.12 samples/sec Loss 0.3855 LearningRate 0.0002 Epoch: 19 Global Step: 317470 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:35:18,929-Speed 5149.61 samples/sec Loss 0.3936 LearningRate 0.0002 Epoch: 19 Global Step: 317480 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:35:20,951-Speed 5067.47 samples/sec Loss 0.3673 LearningRate 0.0002 Epoch: 19 Global Step: 317490 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:35:22,995-Speed 5011.55 samples/sec Loss 0.3728 LearningRate 0.0002 Epoch: 19 Global Step: 317500 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:35:25,042-Speed 5002.14 samples/sec Loss 0.3862 LearningRate 0.0002 Epoch: 19 Global Step: 317510 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:35:27,028-Speed 5159.01 samples/sec Loss 0.3958 LearningRate 0.0002 Epoch: 19 Global Step: 317520 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:35:29,014-Speed 5157.12 samples/sec Loss 0.3898 LearningRate 0.0002 Epoch: 19 Global Step: 317530 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:35:30,988-Speed 5188.10 samples/sec Loss 0.3982 LearningRate 0.0002 Epoch: 19 Global Step: 317540 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:35:32,963-Speed 5188.79 samples/sec Loss 0.3960 LearningRate 0.0002 Epoch: 19 Global Step: 317550 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:35:34,944-Speed 5169.51 samples/sec Loss 0.3734 LearningRate 0.0002 Epoch: 19 Global Step: 317560 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:35:36,973-Speed 5048.90 samples/sec Loss 0.4135 LearningRate 0.0002 Epoch: 19 Global Step: 317570 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:35:38,948-Speed 5188.13 samples/sec Loss 0.4054 LearningRate 0.0002 Epoch: 19 Global Step: 317580 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:35:40,925-Speed 5180.43 samples/sec Loss 0.3874 LearningRate 0.0002 Epoch: 19 Global Step: 317590 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:35:42,937-Speed 5091.67 samples/sec Loss 0.3885 LearningRate 0.0002 Epoch: 19 Global Step: 317600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:35:44,915-Speed 5178.52 samples/sec Loss 0.4199 LearningRate 0.0002 Epoch: 19 Global Step: 317610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:35:46,907-Speed 5142.94 samples/sec Loss 0.3726 LearningRate 0.0002 Epoch: 19 Global Step: 317620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:35:49,079-Speed 4715.41 samples/sec Loss 0.3883 LearningRate 0.0002 Epoch: 19 Global Step: 317630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:35:51,096-Speed 5078.80 samples/sec Loss 0.3766 LearningRate 0.0002 Epoch: 19 Global Step: 317640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:35:53,078-Speed 5166.13 samples/sec Loss 0.3845 LearningRate 0.0002 Epoch: 19 Global Step: 317650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:35:55,054-Speed 5184.26 samples/sec Loss 0.3844 LearningRate 0.0002 Epoch: 19 Global Step: 317660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:35:57,046-Speed 5144.21 samples/sec Loss 0.3866 LearningRate 0.0002 Epoch: 19 Global Step: 317670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:35:59,018-Speed 5194.94 samples/sec Loss 0.3718 LearningRate 0.0002 Epoch: 19 Global Step: 317680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:36:01,000-Speed 5166.27 samples/sec Loss 0.3798 LearningRate 0.0002 Epoch: 19 Global Step: 317690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:36:03,012-Speed 5091.10 samples/sec Loss 0.3657 LearningRate 0.0002 Epoch: 19 Global Step: 317700 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:36:05,002-Speed 5149.21 samples/sec Loss 0.3823 LearningRate 0.0002 Epoch: 19 Global Step: 317710 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:36:06,982-Speed 5172.20 samples/sec Loss 0.3610 LearningRate 0.0002 Epoch: 19 Global Step: 317720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:36:08,983-Speed 5119.53 samples/sec Loss 0.4044 LearningRate 0.0002 Epoch: 19 Global Step: 317730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:36:11,006-Speed 5063.16 samples/sec Loss 0.3890 LearningRate 0.0002 Epoch: 19 Global Step: 317740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:36:13,007-Speed 5118.34 samples/sec Loss 0.3948 LearningRate 0.0002 Epoch: 19 Global Step: 317750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:36:15,005-Speed 5129.83 samples/sec Loss 0.3945 LearningRate 0.0002 Epoch: 19 Global Step: 317760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:36:16,980-Speed 5185.77 samples/sec Loss 0.4015 LearningRate 0.0002 Epoch: 19 Global Step: 317770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:36:18,957-Speed 5180.94 samples/sec Loss 0.3772 LearningRate 0.0002 Epoch: 19 Global Step: 317780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:36:20,930-Speed 5191.65 samples/sec Loss 0.3891 LearningRate 0.0002 Epoch: 19 Global Step: 317790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:36:22,905-Speed 5188.02 samples/sec Loss 0.3932 LearningRate 0.0002 Epoch: 19 Global Step: 317800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:36:24,896-Speed 5143.70 samples/sec Loss 0.3932 LearningRate 0.0002 Epoch: 19 Global Step: 317810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:36:26,895-Speed 5123.42 samples/sec Loss 0.4160 LearningRate 0.0002 Epoch: 19 Global Step: 317820 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:36:28,875-Speed 5175.17 samples/sec Loss 0.3990 LearningRate 0.0002 Epoch: 19 Global Step: 317830 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:36:30,855-Speed 5172.36 samples/sec Loss 0.3985 LearningRate 0.0002 Epoch: 19 Global Step: 317840 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:36:32,838-Speed 5165.14 samples/sec Loss 0.3842 LearningRate 0.0002 Epoch: 19 Global Step: 317850 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:36:34,820-Speed 5169.17 samples/sec Loss 0.4062 LearningRate 0.0002 Epoch: 19 Global Step: 317860 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:36:36,809-Speed 5151.31 samples/sec Loss 0.3868 LearningRate 0.0002 Epoch: 19 Global Step: 317870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:36:38,789-Speed 5174.26 samples/sec Loss 0.3828 LearningRate 0.0002 Epoch: 19 Global Step: 317880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:36:40,793-Speed 5110.13 samples/sec Loss 0.3745 LearningRate 0.0002 Epoch: 19 Global Step: 317890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:36:42,767-Speed 5188.90 samples/sec Loss 0.3750 LearningRate 0.0002 Epoch: 19 Global Step: 317900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:36:44,753-Speed 5159.66 samples/sec Loss 0.3791 LearningRate 0.0002 Epoch: 19 Global Step: 317910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:36:46,743-Speed 5146.78 samples/sec Loss 0.3780 LearningRate 0.0002 Epoch: 19 Global Step: 317920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:36:48,745-Speed 5115.91 samples/sec Loss 0.3848 LearningRate 0.0002 Epoch: 19 Global Step: 317930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:36:50,733-Speed 5153.68 samples/sec Loss 0.3806 LearningRate 0.0002 Epoch: 19 Global Step: 317940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:36:52,733-Speed 5120.43 samples/sec Loss 0.3819 LearningRate 0.0002 Epoch: 19 Global Step: 317950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:36:54,709-Speed 5184.42 samples/sec Loss 0.3868 LearningRate 0.0002 Epoch: 19 Global Step: 317960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:36:56,708-Speed 5124.99 samples/sec Loss 0.4002 LearningRate 0.0002 Epoch: 19 Global Step: 317970 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:36:58,691-Speed 5164.60 samples/sec Loss 0.3825 LearningRate 0.0002 Epoch: 19 Global Step: 317980 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:37:00,721-Speed 5047.08 samples/sec Loss 0.3767 LearningRate 0.0002 Epoch: 19 Global Step: 317990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:37:02,758-Speed 5028.97 samples/sec Loss 0.3986 LearningRate 0.0002 Epoch: 19 Global Step: 318000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:37:29,400-[lfw][318000]XNorm: 21.579660 Training: 2022-04-11 20:37:29,400-[lfw][318000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 20:37:29,401-[lfw][318000]Accuracy-Highest: 0.99833 Training: 2022-04-11 20:38:00,175-[cfp_fp][318000]XNorm: 22.097112 Training: 2022-04-11 20:38:00,176-[cfp_fp][318000]Accuracy-Flip: 0.98986+-0.00396 Training: 2022-04-11 20:38:00,176-[cfp_fp][318000]Accuracy-Highest: 0.99029 Training: 2022-04-11 20:38:26,759-[agedb_30][318000]XNorm: 22.722179 Training: 2022-04-11 20:38:26,760-[agedb_30][318000]Accuracy-Flip: 0.98383+-0.00654 Training: 2022-04-11 20:38:26,760-[agedb_30][318000]Accuracy-Highest: 0.98400 Training: 2022-04-11 20:38:28,745-Speed 119.09 samples/sec Loss 0.3805 LearningRate 0.0002 Epoch: 19 Global Step: 318010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:38:30,731-Speed 5159.60 samples/sec Loss 0.3935 LearningRate 0.0002 Epoch: 19 Global Step: 318020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:38:32,712-Speed 5168.54 samples/sec Loss 0.3671 LearningRate 0.0002 Epoch: 19 Global Step: 318030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:38:34,688-Speed 5184.47 samples/sec Loss 0.3797 LearningRate 0.0002 Epoch: 19 Global Step: 318040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:38:36,680-Speed 5143.95 samples/sec Loss 0.4052 LearningRate 0.0002 Epoch: 19 Global Step: 318050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:38:38,655-Speed 5184.63 samples/sec Loss 0.3814 LearningRate 0.0002 Epoch: 19 Global Step: 318060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:38:40,635-Speed 5172.62 samples/sec Loss 0.3855 LearningRate 0.0002 Epoch: 19 Global Step: 318070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:38:42,606-Speed 5196.76 samples/sec Loss 0.3918 LearningRate 0.0002 Epoch: 19 Global Step: 318080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:38:44,576-Speed 5202.10 samples/sec Loss 0.3839 LearningRate 0.0002 Epoch: 19 Global Step: 318090 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:38:46,560-Speed 5161.27 samples/sec Loss 0.3840 LearningRate 0.0002 Epoch: 19 Global Step: 318100 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:38:48,530-Speed 5201.57 samples/sec Loss 0.3787 LearningRate 0.0002 Epoch: 19 Global Step: 318110 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:38:50,512-Speed 5166.54 samples/sec Loss 0.3828 LearningRate 0.0002 Epoch: 19 Global Step: 318120 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:38:52,483-Speed 5196.63 samples/sec Loss 0.3806 LearningRate 0.0002 Epoch: 19 Global Step: 318130 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:38:54,483-Speed 5122.77 samples/sec Loss 0.3950 LearningRate 0.0002 Epoch: 19 Global Step: 318140 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:38:56,460-Speed 5182.74 samples/sec Loss 0.3823 LearningRate 0.0002 Epoch: 19 Global Step: 318150 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:38:58,444-Speed 5162.42 samples/sec Loss 0.3862 LearningRate 0.0002 Epoch: 19 Global Step: 318160 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:39:00,452-Speed 5101.39 samples/sec Loss 0.3616 LearningRate 0.0002 Epoch: 19 Global Step: 318170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:39:02,435-Speed 5166.23 samples/sec Loss 0.3878 LearningRate 0.0002 Epoch: 19 Global Step: 318180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:39:04,420-Speed 5159.26 samples/sec Loss 0.3778 LearningRate 0.0002 Epoch: 19 Global Step: 318190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:39:06,412-Speed 5142.34 samples/sec Loss 0.3829 LearningRate 0.0002 Epoch: 19 Global Step: 318200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:39:08,386-Speed 5188.49 samples/sec Loss 0.3996 LearningRate 0.0002 Epoch: 19 Global Step: 318210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:39:10,373-Speed 5155.78 samples/sec Loss 0.3749 LearningRate 0.0002 Epoch: 19 Global Step: 318220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:39:12,361-Speed 5152.41 samples/sec Loss 0.3857 LearningRate 0.0002 Epoch: 19 Global Step: 318230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:39:14,354-Speed 5139.18 samples/sec Loss 0.3719 LearningRate 0.0002 Epoch: 19 Global Step: 318240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:39:16,359-Speed 5110.34 samples/sec Loss 0.3888 LearningRate 0.0002 Epoch: 19 Global Step: 318250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:39:18,342-Speed 5166.34 samples/sec Loss 0.3831 LearningRate 0.0002 Epoch: 19 Global Step: 318260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:39:20,327-Speed 5161.26 samples/sec Loss 0.3818 LearningRate 0.0002 Epoch: 19 Global Step: 318270 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:39:22,290-Speed 5218.59 samples/sec Loss 0.3952 LearningRate 0.0002 Epoch: 19 Global Step: 318280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:39:24,306-Speed 5079.71 samples/sec Loss 0.3718 LearningRate 0.0002 Epoch: 19 Global Step: 318290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:39:26,315-Speed 5098.35 samples/sec Loss 0.3734 LearningRate 0.0002 Epoch: 19 Global Step: 318300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:39:28,290-Speed 5185.92 samples/sec Loss 0.3787 LearningRate 0.0002 Epoch: 19 Global Step: 318310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:39:30,282-Speed 5142.07 samples/sec Loss 0.3892 LearningRate 0.0002 Epoch: 19 Global Step: 318320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:39:32,268-Speed 5159.60 samples/sec Loss 0.3732 LearningRate 0.0002 Epoch: 19 Global Step: 318330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:39:34,247-Speed 5175.15 samples/sec Loss 0.3768 LearningRate 0.0002 Epoch: 19 Global Step: 318340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:39:36,238-Speed 5145.45 samples/sec Loss 0.4088 LearningRate 0.0002 Epoch: 19 Global Step: 318350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:39:38,215-Speed 5183.39 samples/sec Loss 0.3822 LearningRate 0.0002 Epoch: 19 Global Step: 318360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:39:40,207-Speed 5143.71 samples/sec Loss 0.3999 LearningRate 0.0002 Epoch: 19 Global Step: 318370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:39:42,186-Speed 5175.33 samples/sec Loss 0.3984 LearningRate 0.0002 Epoch: 19 Global Step: 318380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:39:44,153-Speed 5207.08 samples/sec Loss 0.3862 LearningRate 0.0002 Epoch: 19 Global Step: 318390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:39:46,149-Speed 5133.59 samples/sec Loss 0.3713 LearningRate 0.0002 Epoch: 19 Global Step: 318400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:39:48,121-Speed 5193.68 samples/sec Loss 0.3995 LearningRate 0.0002 Epoch: 19 Global Step: 318410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:39:50,117-Speed 5131.94 samples/sec Loss 0.3515 LearningRate 0.0002 Epoch: 19 Global Step: 318420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:39:52,127-Speed 5095.55 samples/sec Loss 0.3702 LearningRate 0.0002 Epoch: 19 Global Step: 318430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:39:54,120-Speed 5139.86 samples/sec Loss 0.3956 LearningRate 0.0002 Epoch: 19 Global Step: 318440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:39:56,106-Speed 5157.98 samples/sec Loss 0.3885 LearningRate 0.0002 Epoch: 19 Global Step: 318450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:39:58,109-Speed 5113.76 samples/sec Loss 0.3784 LearningRate 0.0002 Epoch: 19 Global Step: 318460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:40:00,089-Speed 5173.81 samples/sec Loss 0.3734 LearningRate 0.0002 Epoch: 19 Global Step: 318470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:40:02,077-Speed 5152.19 samples/sec Loss 0.3803 LearningRate 0.0002 Epoch: 19 Global Step: 318480 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:40:04,057-Speed 5172.60 samples/sec Loss 0.3868 LearningRate 0.0002 Epoch: 19 Global Step: 318490 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:40:06,022-Speed 5212.78 samples/sec Loss 0.3748 LearningRate 0.0002 Epoch: 19 Global Step: 318500 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:40:07,990-Speed 5207.78 samples/sec Loss 0.4076 LearningRate 0.0002 Epoch: 19 Global Step: 318510 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:40:09,993-Speed 5111.79 samples/sec Loss 0.4078 LearningRate 0.0002 Epoch: 19 Global Step: 318520 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:40:11,966-Speed 5192.50 samples/sec Loss 0.3904 LearningRate 0.0002 Epoch: 19 Global Step: 318530 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:40:13,957-Speed 5144.17 samples/sec Loss 0.3968 LearningRate 0.0002 Epoch: 19 Global Step: 318540 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:40:15,932-Speed 5188.48 samples/sec Loss 0.4007 LearningRate 0.0002 Epoch: 19 Global Step: 318550 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:40:17,911-Speed 5174.30 samples/sec Loss 0.3814 LearningRate 0.0002 Epoch: 19 Global Step: 318560 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:40:19,876-Speed 5212.84 samples/sec Loss 0.3634 LearningRate 0.0002 Epoch: 19 Global Step: 318570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:40:21,854-Speed 5180.38 samples/sec Loss 0.3700 LearningRate 0.0002 Epoch: 19 Global Step: 318580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:40:23,845-Speed 5143.75 samples/sec Loss 0.3663 LearningRate 0.0002 Epoch: 19 Global Step: 318590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:40:25,821-Speed 5184.67 samples/sec Loss 0.3904 LearningRate 0.0002 Epoch: 19 Global Step: 318600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:40:27,789-Speed 5205.08 samples/sec Loss 0.4156 LearningRate 0.0002 Epoch: 19 Global Step: 318610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:40:29,760-Speed 5196.55 samples/sec Loss 0.3835 LearningRate 0.0002 Epoch: 19 Global Step: 318620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:40:31,733-Speed 5193.10 samples/sec Loss 0.3971 LearningRate 0.0002 Epoch: 19 Global Step: 318630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:40:33,731-Speed 5126.38 samples/sec Loss 0.3773 LearningRate 0.0002 Epoch: 19 Global Step: 318640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:40:35,698-Speed 5206.09 samples/sec Loss 0.3762 LearningRate 0.0002 Epoch: 19 Global Step: 318650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:40:37,694-Speed 5132.77 samples/sec Loss 0.3889 LearningRate 0.0002 Epoch: 19 Global Step: 318660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:40:39,685-Speed 5146.26 samples/sec Loss 0.3811 LearningRate 0.0002 Epoch: 19 Global Step: 318670 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:40:41,654-Speed 5201.79 samples/sec Loss 0.4080 LearningRate 0.0002 Epoch: 19 Global Step: 318680 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:40:43,620-Speed 5208.47 samples/sec Loss 0.3762 LearningRate 0.0002 Epoch: 19 Global Step: 318690 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:40:45,609-Speed 5150.67 samples/sec Loss 0.4015 LearningRate 0.0002 Epoch: 19 Global Step: 318700 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:40:47,578-Speed 5203.39 samples/sec Loss 0.3887 LearningRate 0.0002 Epoch: 19 Global Step: 318710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:40:49,572-Speed 5136.73 samples/sec Loss 0.4112 LearningRate 0.0002 Epoch: 19 Global Step: 318720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:40:51,550-Speed 5177.80 samples/sec Loss 0.3799 LearningRate 0.0002 Epoch: 19 Global Step: 318730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:40:53,555-Speed 5110.57 samples/sec Loss 0.4065 LearningRate 0.0002 Epoch: 19 Global Step: 318740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:40:55,523-Speed 5203.58 samples/sec Loss 0.3951 LearningRate 0.0002 Epoch: 19 Global Step: 318750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:40:57,511-Speed 5153.40 samples/sec Loss 0.3600 LearningRate 0.0002 Epoch: 19 Global Step: 318760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:40:59,493-Speed 5166.61 samples/sec Loss 0.3849 LearningRate 0.0002 Epoch: 19 Global Step: 318770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:41:01,474-Speed 5170.47 samples/sec Loss 0.3870 LearningRate 0.0002 Epoch: 19 Global Step: 318780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:41:03,457-Speed 5166.84 samples/sec Loss 0.4064 LearningRate 0.0002 Epoch: 19 Global Step: 318790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:41:05,445-Speed 5151.05 samples/sec Loss 0.3758 LearningRate 0.0002 Epoch: 19 Global Step: 318800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:41:07,408-Speed 5218.97 samples/sec Loss 0.3993 LearningRate 0.0002 Epoch: 19 Global Step: 318810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:41:09,376-Speed 5205.92 samples/sec Loss 0.3878 LearningRate 0.0002 Epoch: 19 Global Step: 318820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:41:11,354-Speed 5178.12 samples/sec Loss 0.3831 LearningRate 0.0002 Epoch: 19 Global Step: 318830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:41:13,330-Speed 5185.48 samples/sec Loss 0.3793 LearningRate 0.0002 Epoch: 19 Global Step: 318840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:41:15,297-Speed 5206.18 samples/sec Loss 0.3770 LearningRate 0.0002 Epoch: 19 Global Step: 318850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:41:17,269-Speed 5194.49 samples/sec Loss 0.3814 LearningRate 0.0002 Epoch: 19 Global Step: 318860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:41:19,290-Speed 5067.99 samples/sec Loss 0.3893 LearningRate 0.0002 Epoch: 19 Global Step: 318870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:41:21,259-Speed 5202.12 samples/sec Loss 0.3968 LearningRate 0.0002 Epoch: 19 Global Step: 318880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:41:23,245-Speed 5160.61 samples/sec Loss 0.4029 LearningRate 0.0002 Epoch: 19 Global Step: 318890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:41:25,240-Speed 5134.40 samples/sec Loss 0.3855 LearningRate 0.0002 Epoch: 19 Global Step: 318900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:41:27,279-Speed 5023.41 samples/sec Loss 0.3702 LearningRate 0.0002 Epoch: 19 Global Step: 318910 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:41:29,269-Speed 5145.75 samples/sec Loss 0.3903 LearningRate 0.0002 Epoch: 19 Global Step: 318920 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:41:31,237-Speed 5206.17 samples/sec Loss 0.3890 LearningRate 0.0002 Epoch: 19 Global Step: 318930 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:41:33,206-Speed 5200.63 samples/sec Loss 0.3966 LearningRate 0.0002 Epoch: 19 Global Step: 318940 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:41:35,178-Speed 5196.85 samples/sec Loss 0.3914 LearningRate 0.0002 Epoch: 19 Global Step: 318950 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:41:37,140-Speed 5221.42 samples/sec Loss 0.3722 LearningRate 0.0002 Epoch: 19 Global Step: 318960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:41:39,119-Speed 5175.48 samples/sec Loss 0.4033 LearningRate 0.0002 Epoch: 19 Global Step: 318970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:41:41,093-Speed 5189.11 samples/sec Loss 0.3727 LearningRate 0.0002 Epoch: 19 Global Step: 318980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:41:43,081-Speed 5152.57 samples/sec Loss 0.3848 LearningRate 0.0002 Epoch: 19 Global Step: 318990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:41:45,058-Speed 5179.90 samples/sec Loss 0.3938 LearningRate 0.0002 Epoch: 19 Global Step: 319000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:41:47,052-Speed 5136.49 samples/sec Loss 0.4017 LearningRate 0.0002 Epoch: 19 Global Step: 319010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:41:49,036-Speed 5163.12 samples/sec Loss 0.3934 LearningRate 0.0002 Epoch: 19 Global Step: 319020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:41:51,029-Speed 5138.95 samples/sec Loss 0.4175 LearningRate 0.0002 Epoch: 19 Global Step: 319030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:41:53,006-Speed 5182.33 samples/sec Loss 0.3706 LearningRate 0.0002 Epoch: 19 Global Step: 319040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:41:54,983-Speed 5180.62 samples/sec Loss 0.3780 LearningRate 0.0002 Epoch: 19 Global Step: 319050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:41:56,970-Speed 5155.80 samples/sec Loss 0.4161 LearningRate 0.0002 Epoch: 19 Global Step: 319060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:41:58,953-Speed 5166.97 samples/sec Loss 0.3961 LearningRate 0.0002 Epoch: 19 Global Step: 319070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:42:00,937-Speed 5161.80 samples/sec Loss 0.3790 LearningRate 0.0002 Epoch: 19 Global Step: 319080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:42:02,929-Speed 5141.38 samples/sec Loss 0.3942 LearningRate 0.0002 Epoch: 19 Global Step: 319090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:42:04,903-Speed 5190.49 samples/sec Loss 0.3981 LearningRate 0.0002 Epoch: 19 Global Step: 319100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:42:06,871-Speed 5205.07 samples/sec Loss 0.3798 LearningRate 0.0002 Epoch: 19 Global Step: 319110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:42:08,864-Speed 5140.43 samples/sec Loss 0.3788 LearningRate 0.0002 Epoch: 19 Global Step: 319120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:42:10,844-Speed 5171.25 samples/sec Loss 0.3761 LearningRate 0.0002 Epoch: 19 Global Step: 319130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:42:12,845-Speed 5119.21 samples/sec Loss 0.3801 LearningRate 0.0002 Epoch: 19 Global Step: 319140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:42:14,855-Speed 5097.87 samples/sec Loss 0.3671 LearningRate 0.0002 Epoch: 19 Global Step: 319150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:42:16,850-Speed 5132.96 samples/sec Loss 0.3747 LearningRate 0.0002 Epoch: 19 Global Step: 319160 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:42:18,827-Speed 5182.33 samples/sec Loss 0.3909 LearningRate 0.0002 Epoch: 19 Global Step: 319170 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:42:20,819-Speed 5143.82 samples/sec Loss 0.3923 LearningRate 0.0002 Epoch: 19 Global Step: 319180 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:42:22,782-Speed 5217.05 samples/sec Loss 0.3816 LearningRate 0.0002 Epoch: 19 Global Step: 319190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:42:24,760-Speed 5180.01 samples/sec Loss 0.3679 LearningRate 0.0002 Epoch: 19 Global Step: 319200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:42:26,742-Speed 5167.94 samples/sec Loss 0.4046 LearningRate 0.0002 Epoch: 19 Global Step: 319210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:42:28,723-Speed 5168.28 samples/sec Loss 0.3924 LearningRate 0.0002 Epoch: 19 Global Step: 319220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:42:30,691-Speed 5206.06 samples/sec Loss 0.3959 LearningRate 0.0002 Epoch: 19 Global Step: 319230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:42:32,691-Speed 5120.58 samples/sec Loss 0.3905 LearningRate 0.0002 Epoch: 19 Global Step: 319240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:42:34,699-Speed 5103.00 samples/sec Loss 0.3737 LearningRate 0.0002 Epoch: 19 Global Step: 319250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:42:36,680-Speed 5170.69 samples/sec Loss 0.4144 LearningRate 0.0002 Epoch: 19 Global Step: 319260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:42:38,668-Speed 5152.08 samples/sec Loss 0.3728 LearningRate 0.0002 Epoch: 19 Global Step: 319270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:42:40,694-Speed 5055.53 samples/sec Loss 0.3584 LearningRate 0.0002 Epoch: 19 Global Step: 319280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:42:42,701-Speed 5105.76 samples/sec Loss 0.4032 LearningRate 0.0002 Epoch: 19 Global Step: 319290 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:42:44,689-Speed 5150.45 samples/sec Loss 0.3934 LearningRate 0.0002 Epoch: 19 Global Step: 319300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:42:46,661-Speed 5195.82 samples/sec Loss 0.3781 LearningRate 0.0002 Epoch: 19 Global Step: 319310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:42:48,637-Speed 5183.18 samples/sec Loss 0.3875 LearningRate 0.0002 Epoch: 19 Global Step: 319320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:42:50,631-Speed 5138.39 samples/sec Loss 0.3912 LearningRate 0.0002 Epoch: 19 Global Step: 319330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:42:52,607-Speed 5182.80 samples/sec Loss 0.4246 LearningRate 0.0002 Epoch: 19 Global Step: 319340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:42:54,592-Speed 5160.42 samples/sec Loss 0.3902 LearningRate 0.0002 Epoch: 19 Global Step: 319350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:42:56,604-Speed 5092.34 samples/sec Loss 0.3985 LearningRate 0.0002 Epoch: 19 Global Step: 319360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:42:58,612-Speed 5100.80 samples/sec Loss 0.3954 LearningRate 0.0002 Epoch: 19 Global Step: 319370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:43:00,611-Speed 5124.66 samples/sec Loss 0.3999 LearningRate 0.0002 Epoch: 19 Global Step: 319380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:43:02,625-Speed 5085.82 samples/sec Loss 0.3973 LearningRate 0.0002 Epoch: 19 Global Step: 319390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:43:04,633-Speed 5101.76 samples/sec Loss 0.3598 LearningRate 0.0002 Epoch: 19 Global Step: 319400 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:43:06,607-Speed 5188.47 samples/sec Loss 0.3800 LearningRate 0.0002 Epoch: 19 Global Step: 319410 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:43:08,594-Speed 5155.50 samples/sec Loss 0.3885 LearningRate 0.0002 Epoch: 19 Global Step: 319420 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:43:10,577-Speed 5164.09 samples/sec Loss 0.3859 LearningRate 0.0002 Epoch: 19 Global Step: 319430 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:43:12,550-Speed 5194.05 samples/sec Loss 0.3562 LearningRate 0.0002 Epoch: 19 Global Step: 319440 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:43:14,528-Speed 5176.54 samples/sec Loss 0.3890 LearningRate 0.0002 Epoch: 19 Global Step: 319450 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:43:16,509-Speed 5170.83 samples/sec Loss 0.3669 LearningRate 0.0002 Epoch: 19 Global Step: 319460 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:43:18,470-Speed 5223.39 samples/sec Loss 0.3627 LearningRate 0.0002 Epoch: 19 Global Step: 319470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:43:20,455-Speed 5161.95 samples/sec Loss 0.4419 LearningRate 0.0002 Epoch: 19 Global Step: 319480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:43:22,420-Speed 5211.35 samples/sec Loss 0.3823 LearningRate 0.0002 Epoch: 19 Global Step: 319490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:43:24,407-Speed 5156.36 samples/sec Loss 0.3995 LearningRate 0.0002 Epoch: 19 Global Step: 319500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:43:26,408-Speed 5119.31 samples/sec Loss 0.3877 LearningRate 0.0002 Epoch: 19 Global Step: 319510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:43:28,391-Speed 5164.97 samples/sec Loss 0.3888 LearningRate 0.0002 Epoch: 19 Global Step: 319520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:43:30,386-Speed 5134.50 samples/sec Loss 0.3819 LearningRate 0.0002 Epoch: 19 Global Step: 319530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:43:32,362-Speed 5185.90 samples/sec Loss 0.3948 LearningRate 0.0002 Epoch: 19 Global Step: 319540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:43:34,354-Speed 5141.18 samples/sec Loss 0.3974 LearningRate 0.0002 Epoch: 19 Global Step: 319550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:43:36,359-Speed 5107.94 samples/sec Loss 0.3831 LearningRate 0.0002 Epoch: 19 Global Step: 319560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:43:38,343-Speed 5164.30 samples/sec Loss 0.3633 LearningRate 0.0002 Epoch: 19 Global Step: 319570 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:43:40,331-Speed 5151.72 samples/sec Loss 0.3880 LearningRate 0.0002 Epoch: 19 Global Step: 319580 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:43:42,320-Speed 5148.96 samples/sec Loss 0.3729 LearningRate 0.0002 Epoch: 19 Global Step: 319590 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:43:44,297-Speed 5183.69 samples/sec Loss 0.3847 LearningRate 0.0002 Epoch: 19 Global Step: 319600 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:43:46,286-Speed 5149.99 samples/sec Loss 0.3731 LearningRate 0.0002 Epoch: 19 Global Step: 319610 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:43:48,289-Speed 5113.29 samples/sec Loss 0.3835 LearningRate 0.0002 Epoch: 19 Global Step: 319620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:43:50,276-Speed 5156.96 samples/sec Loss 0.3754 LearningRate 0.0002 Epoch: 19 Global Step: 319630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:43:52,305-Speed 5046.72 samples/sec Loss 0.3708 LearningRate 0.0002 Epoch: 19 Global Step: 319640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:43:54,276-Speed 5196.82 samples/sec Loss 0.4009 LearningRate 0.0002 Epoch: 19 Global Step: 319650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:43:56,249-Speed 5192.08 samples/sec Loss 0.3886 LearningRate 0.0002 Epoch: 19 Global Step: 319660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:43:58,220-Speed 5196.84 samples/sec Loss 0.4130 LearningRate 0.0002 Epoch: 19 Global Step: 319670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:44:00,212-Speed 5143.55 samples/sec Loss 0.3871 LearningRate 0.0002 Epoch: 19 Global Step: 319680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:44:02,195-Speed 5163.42 samples/sec Loss 0.3990 LearningRate 0.0002 Epoch: 19 Global Step: 319690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:44:04,166-Speed 5198.56 samples/sec Loss 0.3749 LearningRate 0.0002 Epoch: 19 Global Step: 319700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:44:06,139-Speed 5191.24 samples/sec Loss 0.3970 LearningRate 0.0002 Epoch: 19 Global Step: 319710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:44:08,108-Speed 5203.61 samples/sec Loss 0.3949 LearningRate 0.0002 Epoch: 19 Global Step: 319720 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:44:10,138-Speed 5045.30 samples/sec Loss 0.3713 LearningRate 0.0002 Epoch: 19 Global Step: 319730 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:44:12,118-Speed 5173.51 samples/sec Loss 0.3891 LearningRate 0.0002 Epoch: 19 Global Step: 319740 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:44:14,091-Speed 5191.19 samples/sec Loss 0.3773 LearningRate 0.0002 Epoch: 19 Global Step: 319750 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:44:16,082-Speed 5146.84 samples/sec Loss 0.3868 LearningRate 0.0002 Epoch: 19 Global Step: 319760 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:44:18,095-Speed 5087.95 samples/sec Loss 0.3729 LearningRate 0.0002 Epoch: 19 Global Step: 319770 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:44:20,068-Speed 5190.50 samples/sec Loss 0.3709 LearningRate 0.0002 Epoch: 19 Global Step: 319780 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:44:22,057-Speed 5149.40 samples/sec Loss 0.3901 LearningRate 0.0002 Epoch: 19 Global Step: 319790 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:44:24,066-Speed 5100.32 samples/sec Loss 0.3935 LearningRate 0.0002 Epoch: 19 Global Step: 319800 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:44:26,040-Speed 5189.70 samples/sec Loss 0.3708 LearningRate 0.0002 Epoch: 19 Global Step: 319810 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:44:28,031-Speed 5144.32 samples/sec Loss 0.3842 LearningRate 0.0002 Epoch: 19 Global Step: 319820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:44:30,024-Speed 5138.67 samples/sec Loss 0.4026 LearningRate 0.0002 Epoch: 19 Global Step: 319830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:44:31,992-Speed 5206.97 samples/sec Loss 0.3614 LearningRate 0.0002 Epoch: 19 Global Step: 319840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:44:33,967-Speed 5185.84 samples/sec Loss 0.3866 LearningRate 0.0002 Epoch: 19 Global Step: 319850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:44:35,951-Speed 5162.19 samples/sec Loss 0.3953 LearningRate 0.0002 Epoch: 19 Global Step: 319860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:44:37,949-Speed 5127.66 samples/sec Loss 0.3675 LearningRate 0.0002 Epoch: 19 Global Step: 319870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:44:39,958-Speed 5097.72 samples/sec Loss 0.4092 LearningRate 0.0002 Epoch: 19 Global Step: 319880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:44:41,933-Speed 5186.31 samples/sec Loss 0.3868 LearningRate 0.0002 Epoch: 19 Global Step: 319890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:44:43,907-Speed 5188.99 samples/sec Loss 0.3611 LearningRate 0.0002 Epoch: 19 Global Step: 319900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:44:45,902-Speed 5136.53 samples/sec Loss 0.3733 LearningRate 0.0002 Epoch: 19 Global Step: 319910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:44:47,888-Speed 5158.11 samples/sec Loss 0.3829 LearningRate 0.0002 Epoch: 19 Global Step: 319920 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:44:49,895-Speed 5102.47 samples/sec Loss 0.3641 LearningRate 0.0002 Epoch: 19 Global Step: 319930 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:44:51,914-Speed 5072.09 samples/sec Loss 0.3662 LearningRate 0.0002 Epoch: 19 Global Step: 319940 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:44:53,905-Speed 5147.64 samples/sec Loss 0.3855 LearningRate 0.0002 Epoch: 19 Global Step: 319950 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:44:55,882-Speed 5179.77 samples/sec Loss 0.4004 LearningRate 0.0002 Epoch: 19 Global Step: 319960 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:44:57,862-Speed 5175.17 samples/sec Loss 0.4052 LearningRate 0.0002 Epoch: 19 Global Step: 319970 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:44:59,857-Speed 5132.71 samples/sec Loss 0.4053 LearningRate 0.0002 Epoch: 19 Global Step: 319980 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:45:01,837-Speed 5172.99 samples/sec Loss 0.3744 LearningRate 0.0002 Epoch: 19 Global Step: 319990 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:45:03,834-Speed 5130.59 samples/sec Loss 0.4040 LearningRate 0.0002 Epoch: 19 Global Step: 320000 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:45:30,392-[lfw][320000]XNorm: 21.531423 Training: 2022-04-11 20:45:30,392-[lfw][320000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 20:45:30,393-[lfw][320000]Accuracy-Highest: 0.99833 Training: 2022-04-11 20:46:01,189-[cfp_fp][320000]XNorm: 22.066853 Training: 2022-04-11 20:46:01,189-[cfp_fp][320000]Accuracy-Flip: 0.98914+-0.00405 Training: 2022-04-11 20:46:01,190-[cfp_fp][320000]Accuracy-Highest: 0.99029 Training: 2022-04-11 20:46:27,733-[agedb_30][320000]XNorm: 22.653325 Training: 2022-04-11 20:46:27,734-[agedb_30][320000]Accuracy-Flip: 0.98450+-0.00650 Training: 2022-04-11 20:46:27,734-[agedb_30][320000]Accuracy-Highest: 0.98450 Training: 2022-04-11 20:46:29,726-Speed 119.22 samples/sec Loss 0.3922 LearningRate 0.0002 Epoch: 19 Global Step: 320010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:46:31,697-Speed 5197.34 samples/sec Loss 0.3760 LearningRate 0.0002 Epoch: 19 Global Step: 320020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:46:33,653-Speed 5234.13 samples/sec Loss 0.3965 LearningRate 0.0002 Epoch: 19 Global Step: 320030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:46:35,617-Speed 5216.94 samples/sec Loss 0.3866 LearningRate 0.0002 Epoch: 19 Global Step: 320040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:46:37,585-Speed 5205.08 samples/sec Loss 0.3716 LearningRate 0.0002 Epoch: 19 Global Step: 320050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:46:39,579-Speed 5136.40 samples/sec Loss 0.3707 LearningRate 0.0002 Epoch: 19 Global Step: 320060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:46:41,558-Speed 5175.83 samples/sec Loss 0.3932 LearningRate 0.0002 Epoch: 19 Global Step: 320070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:46:43,522-Speed 5216.66 samples/sec Loss 0.3926 LearningRate 0.0002 Epoch: 19 Global Step: 320080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:46:45,492-Speed 5198.39 samples/sec Loss 0.4019 LearningRate 0.0002 Epoch: 19 Global Step: 320090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:46:47,459-Speed 5207.50 samples/sec Loss 0.3786 LearningRate 0.0002 Epoch: 19 Global Step: 320100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:46:49,424-Speed 5214.24 samples/sec Loss 0.3774 LearningRate 0.0002 Epoch: 19 Global Step: 320110 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:46:51,400-Speed 5184.08 samples/sec Loss 0.3758 LearningRate 0.0002 Epoch: 19 Global Step: 320120 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:46:53,396-Speed 5131.29 samples/sec Loss 0.3812 LearningRate 0.0002 Epoch: 19 Global Step: 320130 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:46:55,366-Speed 5200.12 samples/sec Loss 0.3545 LearningRate 0.0002 Epoch: 19 Global Step: 320140 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:46:57,348-Speed 5168.63 samples/sec Loss 0.3908 LearningRate 0.0002 Epoch: 19 Global Step: 320150 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:46:59,353-Speed 5107.40 samples/sec Loss 0.3918 LearningRate 0.0002 Epoch: 19 Global Step: 320160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:47:01,369-Speed 5083.07 samples/sec Loss 0.4150 LearningRate 0.0002 Epoch: 19 Global Step: 320170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:47:03,368-Speed 5122.14 samples/sec Loss 0.3941 LearningRate 0.0002 Epoch: 19 Global Step: 320180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:47:05,339-Speed 5197.00 samples/sec Loss 0.3959 LearningRate 0.0002 Epoch: 19 Global Step: 320190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:47:07,310-Speed 5196.83 samples/sec Loss 0.4008 LearningRate 0.0002 Epoch: 19 Global Step: 320200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:47:09,298-Speed 5154.09 samples/sec Loss 0.3998 LearningRate 0.0002 Epoch: 19 Global Step: 320210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:47:11,280-Speed 5169.21 samples/sec Loss 0.3775 LearningRate 0.0002 Epoch: 19 Global Step: 320220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:47:13,263-Speed 5165.71 samples/sec Loss 0.4110 LearningRate 0.0002 Epoch: 19 Global Step: 320230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:47:15,237-Speed 5188.52 samples/sec Loss 0.4152 LearningRate 0.0002 Epoch: 19 Global Step: 320240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:47:17,215-Speed 5180.40 samples/sec Loss 0.3798 LearningRate 0.0002 Epoch: 19 Global Step: 320250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:47:19,185-Speed 5197.54 samples/sec Loss 0.3751 LearningRate 0.0002 Epoch: 19 Global Step: 320260 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:47:21,160-Speed 5189.06 samples/sec Loss 0.4178 LearningRate 0.0002 Epoch: 19 Global Step: 320270 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:47:23,138-Speed 5178.39 samples/sec Loss 0.3869 LearningRate 0.0002 Epoch: 19 Global Step: 320280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:47:25,141-Speed 5113.18 samples/sec Loss 0.3966 LearningRate 0.0002 Epoch: 19 Global Step: 320290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:47:27,135-Speed 5136.10 samples/sec Loss 0.3783 LearningRate 0.0002 Epoch: 19 Global Step: 320300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:47:29,107-Speed 5196.03 samples/sec Loss 0.3900 LearningRate 0.0002 Epoch: 19 Global Step: 320310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:47:31,074-Speed 5207.28 samples/sec Loss 0.3789 LearningRate 0.0002 Epoch: 19 Global Step: 320320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:47:33,048-Speed 5188.67 samples/sec Loss 0.3550 LearningRate 0.0002 Epoch: 19 Global Step: 320330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:47:35,037-Speed 5150.41 samples/sec Loss 0.3977 LearningRate 0.0002 Epoch: 19 Global Step: 320340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:47:37,008-Speed 5196.87 samples/sec Loss 0.3732 LearningRate 0.0002 Epoch: 19 Global Step: 320350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:47:38,976-Speed 5203.70 samples/sec Loss 0.4174 LearningRate 0.0002 Epoch: 19 Global Step: 320360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:47:40,949-Speed 5193.81 samples/sec Loss 0.3565 LearningRate 0.0002 Epoch: 19 Global Step: 320370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:47:42,920-Speed 5197.37 samples/sec Loss 0.3656 LearningRate 0.0002 Epoch: 19 Global Step: 320380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:47:44,885-Speed 5212.51 samples/sec Loss 0.4096 LearningRate 0.0002 Epoch: 19 Global Step: 320390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:47:46,870-Speed 5160.21 samples/sec Loss 0.3660 LearningRate 0.0002 Epoch: 19 Global Step: 320400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:47:48,837-Speed 5206.50 samples/sec Loss 0.3822 LearningRate 0.0002 Epoch: 19 Global Step: 320410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:47:50,822-Speed 5160.59 samples/sec Loss 0.3815 LearningRate 0.0002 Epoch: 19 Global Step: 320420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:47:52,810-Speed 5153.95 samples/sec Loss 0.3857 LearningRate 0.0002 Epoch: 19 Global Step: 320430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:47:54,781-Speed 5196.80 samples/sec Loss 0.3784 LearningRate 0.0002 Epoch: 19 Global Step: 320440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:47:56,751-Speed 5200.30 samples/sec Loss 0.3774 LearningRate 0.0002 Epoch: 19 Global Step: 320450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:47:58,742-Speed 5144.03 samples/sec Loss 0.3769 LearningRate 0.0002 Epoch: 19 Global Step: 320460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:48:00,732-Speed 5147.63 samples/sec Loss 0.3951 LearningRate 0.0002 Epoch: 19 Global Step: 320470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:48:02,701-Speed 5203.67 samples/sec Loss 0.3816 LearningRate 0.0002 Epoch: 19 Global Step: 320480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:48:04,676-Speed 5185.27 samples/sec Loss 0.3846 LearningRate 0.0002 Epoch: 19 Global Step: 320490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:48:06,654-Speed 5179.44 samples/sec Loss 0.3974 LearningRate 0.0002 Epoch: 19 Global Step: 320500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:48:08,623-Speed 5202.64 samples/sec Loss 0.3836 LearningRate 0.0002 Epoch: 19 Global Step: 320510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:48:10,591-Speed 5204.88 samples/sec Loss 0.3794 LearningRate 0.0002 Epoch: 19 Global Step: 320520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:48:12,562-Speed 5196.46 samples/sec Loss 0.3985 LearningRate 0.0002 Epoch: 19 Global Step: 320530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:48:14,529-Speed 5208.27 samples/sec Loss 0.3991 LearningRate 0.0002 Epoch: 19 Global Step: 320540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:48:16,496-Speed 5205.41 samples/sec Loss 0.3616 LearningRate 0.0002 Epoch: 19 Global Step: 320550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:48:18,476-Speed 5175.08 samples/sec Loss 0.3889 LearningRate 0.0002 Epoch: 19 Global Step: 320560 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:48:20,438-Speed 5221.92 samples/sec Loss 0.3764 LearningRate 0.0002 Epoch: 19 Global Step: 320570 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:48:22,417-Speed 5174.87 samples/sec Loss 0.3742 LearningRate 0.0002 Epoch: 19 Global Step: 320580 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:48:24,418-Speed 5119.51 samples/sec Loss 0.3734 LearningRate 0.0002 Epoch: 19 Global Step: 320590 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:48:26,399-Speed 5171.23 samples/sec Loss 0.3753 LearningRate 0.0002 Epoch: 19 Global Step: 320600 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:48:28,381-Speed 5167.34 samples/sec Loss 0.3840 LearningRate 0.0002 Epoch: 19 Global Step: 320610 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:48:30,372-Speed 5144.19 samples/sec Loss 0.3648 LearningRate 0.0002 Epoch: 19 Global Step: 320620 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:48:32,337-Speed 5214.63 samples/sec Loss 0.3832 LearningRate 0.0002 Epoch: 19 Global Step: 320630 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:48:34,345-Speed 5100.37 samples/sec Loss 0.3966 LearningRate 0.0002 Epoch: 19 Global Step: 320640 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:48:36,325-Speed 5174.43 samples/sec Loss 0.3735 LearningRate 0.0002 Epoch: 19 Global Step: 320650 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:48:38,308-Speed 5164.80 samples/sec Loss 0.3992 LearningRate 0.0002 Epoch: 19 Global Step: 320660 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:48:40,323-Speed 5084.47 samples/sec Loss 0.3855 LearningRate 0.0002 Epoch: 19 Global Step: 320670 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:48:42,279-Speed 5235.37 samples/sec Loss 0.3840 LearningRate 0.0002 Epoch: 19 Global Step: 320680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:48:44,245-Speed 5212.28 samples/sec Loss 0.3649 LearningRate 0.0002 Epoch: 19 Global Step: 320690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:48:46,249-Speed 5110.20 samples/sec Loss 0.3844 LearningRate 0.0002 Epoch: 19 Global Step: 320700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:48:48,215-Speed 5210.58 samples/sec Loss 0.3775 LearningRate 0.0002 Epoch: 19 Global Step: 320710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:48:50,212-Speed 5130.68 samples/sec Loss 0.3824 LearningRate 0.0002 Epoch: 19 Global Step: 320720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:48:52,203-Speed 5142.47 samples/sec Loss 0.3802 LearningRate 0.0002 Epoch: 19 Global Step: 320730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:48:54,183-Speed 5174.38 samples/sec Loss 0.3793 LearningRate 0.0002 Epoch: 19 Global Step: 320740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:48:56,159-Speed 5185.52 samples/sec Loss 0.3807 LearningRate 0.0002 Epoch: 19 Global Step: 320750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:48:58,157-Speed 5124.58 samples/sec Loss 0.3803 LearningRate 0.0002 Epoch: 19 Global Step: 320760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:49:00,124-Speed 5209.01 samples/sec Loss 0.3714 LearningRate 0.0002 Epoch: 19 Global Step: 320770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:49:02,128-Speed 5110.97 samples/sec Loss 0.3936 LearningRate 0.0002 Epoch: 19 Global Step: 320780 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:49:04,140-Speed 5091.38 samples/sec Loss 0.3691 LearningRate 0.0002 Epoch: 19 Global Step: 320790 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:49:06,129-Speed 5151.52 samples/sec Loss 0.3870 LearningRate 0.0002 Epoch: 19 Global Step: 320800 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:49:08,101-Speed 5194.63 samples/sec Loss 0.3885 LearningRate 0.0002 Epoch: 19 Global Step: 320810 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:49:10,085-Speed 5160.84 samples/sec Loss 0.3717 LearningRate 0.0002 Epoch: 19 Global Step: 320820 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:49:12,053-Speed 5204.75 samples/sec Loss 0.3838 LearningRate 0.0002 Epoch: 19 Global Step: 320830 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:49:14,056-Speed 5114.59 samples/sec Loss 0.3806 LearningRate 0.0002 Epoch: 19 Global Step: 320840 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:49:16,037-Speed 5171.26 samples/sec Loss 0.3706 LearningRate 0.0002 Epoch: 19 Global Step: 320850 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:49:18,002-Speed 5211.93 samples/sec Loss 0.4154 LearningRate 0.0002 Epoch: 19 Global Step: 320860 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:49:19,968-Speed 5211.88 samples/sec Loss 0.3698 LearningRate 0.0002 Epoch: 19 Global Step: 320870 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:49:21,951-Speed 5163.89 samples/sec Loss 0.3890 LearningRate 0.0002 Epoch: 19 Global Step: 320880 Fp16 Grad Scale: 262144 Required: 1 hours Training: 2022-04-11 20:49:23,948-Speed 5128.94 samples/sec Loss 0.3751 LearningRate 0.0002 Epoch: 19 Global Step: 320890 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:49:25,925-Speed 5182.81 samples/sec Loss 0.4000 LearningRate 0.0001 Epoch: 19 Global Step: 320900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:49:27,906-Speed 5172.24 samples/sec Loss 0.3975 LearningRate 0.0001 Epoch: 19 Global Step: 320910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:49:29,936-Speed 5045.57 samples/sec Loss 0.3972 LearningRate 0.0001 Epoch: 19 Global Step: 320920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:49:31,903-Speed 5207.62 samples/sec Loss 0.4060 LearningRate 0.0001 Epoch: 19 Global Step: 320930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:49:33,872-Speed 5200.96 samples/sec Loss 0.3926 LearningRate 0.0001 Epoch: 19 Global Step: 320940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:49:35,842-Speed 5200.27 samples/sec Loss 0.3795 LearningRate 0.0001 Epoch: 19 Global Step: 320950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:49:37,834-Speed 5141.79 samples/sec Loss 0.3927 LearningRate 0.0001 Epoch: 19 Global Step: 320960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:49:39,824-Speed 5148.93 samples/sec Loss 0.3944 LearningRate 0.0001 Epoch: 19 Global Step: 320970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:49:41,793-Speed 5201.38 samples/sec Loss 0.3960 LearningRate 0.0001 Epoch: 19 Global Step: 320980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:49:43,764-Speed 5196.45 samples/sec Loss 0.3760 LearningRate 0.0001 Epoch: 19 Global Step: 320990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:49:45,734-Speed 5199.84 samples/sec Loss 0.3811 LearningRate 0.0001 Epoch: 19 Global Step: 321000 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:49:47,732-Speed 5126.04 samples/sec Loss 0.3825 LearningRate 0.0001 Epoch: 19 Global Step: 321010 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:49:49,694-Speed 5222.43 samples/sec Loss 0.4105 LearningRate 0.0001 Epoch: 19 Global Step: 321020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:49:51,686-Speed 5142.47 samples/sec Loss 0.3795 LearningRate 0.0001 Epoch: 19 Global Step: 321030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:49:53,666-Speed 5173.92 samples/sec Loss 0.3937 LearningRate 0.0001 Epoch: 19 Global Step: 321040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:49:55,634-Speed 5203.63 samples/sec Loss 0.3872 LearningRate 0.0001 Epoch: 19 Global Step: 321050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:49:57,617-Speed 5167.02 samples/sec Loss 0.3763 LearningRate 0.0001 Epoch: 19 Global Step: 321060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:49:59,587-Speed 5198.05 samples/sec Loss 0.3945 LearningRate 0.0001 Epoch: 19 Global Step: 321070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:50:01,593-Speed 5107.68 samples/sec Loss 0.3846 LearningRate 0.0001 Epoch: 19 Global Step: 321080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:50:03,592-Speed 5123.64 samples/sec Loss 0.3790 LearningRate 0.0001 Epoch: 19 Global Step: 321090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:50:05,581-Speed 5150.58 samples/sec Loss 0.4071 LearningRate 0.0001 Epoch: 19 Global Step: 321100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:50:07,558-Speed 5181.59 samples/sec Loss 0.3742 LearningRate 0.0001 Epoch: 19 Global Step: 321110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:50:09,539-Speed 5168.66 samples/sec Loss 0.3726 LearningRate 0.0001 Epoch: 19 Global Step: 321120 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:50:11,525-Speed 5159.37 samples/sec Loss 0.4035 LearningRate 0.0001 Epoch: 19 Global Step: 321130 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:50:13,500-Speed 5184.70 samples/sec Loss 0.3697 LearningRate 0.0001 Epoch: 19 Global Step: 321140 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:50:15,484-Speed 5164.43 samples/sec Loss 0.3832 LearningRate 0.0001 Epoch: 19 Global Step: 321150 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:50:17,457-Speed 5191.92 samples/sec Loss 0.3762 LearningRate 0.0001 Epoch: 19 Global Step: 321160 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:50:19,431-Speed 5189.28 samples/sec Loss 0.4084 LearningRate 0.0001 Epoch: 19 Global Step: 321170 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:50:21,415-Speed 5162.34 samples/sec Loss 0.3699 LearningRate 0.0001 Epoch: 19 Global Step: 321180 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:50:23,401-Speed 5159.05 samples/sec Loss 0.3798 LearningRate 0.0001 Epoch: 19 Global Step: 321190 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:50:25,372-Speed 5197.32 samples/sec Loss 0.3771 LearningRate 0.0001 Epoch: 19 Global Step: 321200 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:50:27,336-Speed 5215.78 samples/sec Loss 0.3825 LearningRate 0.0001 Epoch: 19 Global Step: 321210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:50:29,313-Speed 5180.67 samples/sec Loss 0.4086 LearningRate 0.0001 Epoch: 19 Global Step: 321220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:50:31,292-Speed 5176.49 samples/sec Loss 0.3961 LearningRate 0.0001 Epoch: 19 Global Step: 321230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:50:33,269-Speed 5179.30 samples/sec Loss 0.3762 LearningRate 0.0001 Epoch: 19 Global Step: 321240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:50:35,269-Speed 5123.64 samples/sec Loss 0.3741 LearningRate 0.0001 Epoch: 19 Global Step: 321250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:50:37,285-Speed 5081.00 samples/sec Loss 0.3931 LearningRate 0.0001 Epoch: 19 Global Step: 321260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:50:39,267-Speed 5167.95 samples/sec Loss 0.3869 LearningRate 0.0001 Epoch: 19 Global Step: 321270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:50:41,254-Speed 5154.88 samples/sec Loss 0.3869 LearningRate 0.0001 Epoch: 19 Global Step: 321280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:50:43,225-Speed 5197.87 samples/sec Loss 0.3974 LearningRate 0.0001 Epoch: 19 Global Step: 321290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:50:45,193-Speed 5204.26 samples/sec Loss 0.3916 LearningRate 0.0001 Epoch: 19 Global Step: 321300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:50:47,182-Speed 5149.73 samples/sec Loss 0.3975 LearningRate 0.0001 Epoch: 19 Global Step: 321310 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:50:49,181-Speed 5125.03 samples/sec Loss 0.3609 LearningRate 0.0001 Epoch: 19 Global Step: 321320 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:50:51,156-Speed 5186.59 samples/sec Loss 0.3795 LearningRate 0.0001 Epoch: 19 Global Step: 321330 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:50:53,154-Speed 5125.23 samples/sec Loss 0.3674 LearningRate 0.0001 Epoch: 19 Global Step: 321340 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:50:55,131-Speed 5181.89 samples/sec Loss 0.3781 LearningRate 0.0001 Epoch: 19 Global Step: 321350 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:50:57,124-Speed 5140.36 samples/sec Loss 0.4046 LearningRate 0.0001 Epoch: 19 Global Step: 321360 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:50:59,100-Speed 5183.81 samples/sec Loss 0.4160 LearningRate 0.0001 Epoch: 19 Global Step: 321370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:51:01,079-Speed 5177.11 samples/sec Loss 0.3579 LearningRate 0.0001 Epoch: 19 Global Step: 321380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:51:03,068-Speed 5149.23 samples/sec Loss 0.3714 LearningRate 0.0001 Epoch: 19 Global Step: 321390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:51:05,041-Speed 5191.54 samples/sec Loss 0.3795 LearningRate 0.0001 Epoch: 19 Global Step: 321400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:51:07,023-Speed 5167.27 samples/sec Loss 0.3761 LearningRate 0.0001 Epoch: 19 Global Step: 321410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:51:08,993-Speed 5199.36 samples/sec Loss 0.3866 LearningRate 0.0001 Epoch: 19 Global Step: 321420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:51:10,985-Speed 5143.60 samples/sec Loss 0.3813 LearningRate 0.0001 Epoch: 19 Global Step: 321430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:51:12,958-Speed 5192.41 samples/sec Loss 0.3936 LearningRate 0.0001 Epoch: 19 Global Step: 321440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:51:14,933-Speed 5184.07 samples/sec Loss 0.3855 LearningRate 0.0001 Epoch: 19 Global Step: 321450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:51:16,926-Speed 5140.20 samples/sec Loss 0.3931 LearningRate 0.0001 Epoch: 19 Global Step: 321460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:51:18,896-Speed 5199.18 samples/sec Loss 0.3702 LearningRate 0.0001 Epoch: 19 Global Step: 321470 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:51:20,874-Speed 5179.49 samples/sec Loss 0.3788 LearningRate 0.0001 Epoch: 19 Global Step: 321480 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:51:22,869-Speed 5135.17 samples/sec Loss 0.3861 LearningRate 0.0001 Epoch: 19 Global Step: 321490 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:51:24,852-Speed 5166.39 samples/sec Loss 0.3889 LearningRate 0.0001 Epoch: 19 Global Step: 321500 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:51:26,835-Speed 5166.40 samples/sec Loss 0.3695 LearningRate 0.0001 Epoch: 19 Global Step: 321510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:51:28,813-Speed 5178.32 samples/sec Loss 0.3654 LearningRate 0.0001 Epoch: 19 Global Step: 321520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:51:30,787-Speed 5187.81 samples/sec Loss 0.3868 LearningRate 0.0001 Epoch: 19 Global Step: 321530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:51:32,767-Speed 5173.53 samples/sec Loss 0.3916 LearningRate 0.0001 Epoch: 19 Global Step: 321540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:51:34,761-Speed 5137.17 samples/sec Loss 0.3703 LearningRate 0.0001 Epoch: 19 Global Step: 321550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:51:36,741-Speed 5174.82 samples/sec Loss 0.3880 LearningRate 0.0001 Epoch: 19 Global Step: 321560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:51:38,725-Speed 5161.11 samples/sec Loss 0.3931 LearningRate 0.0001 Epoch: 19 Global Step: 321570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:51:40,712-Speed 5155.04 samples/sec Loss 0.3874 LearningRate 0.0001 Epoch: 19 Global Step: 321580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:51:42,692-Speed 5174.25 samples/sec Loss 0.3981 LearningRate 0.0001 Epoch: 19 Global Step: 321590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:51:44,679-Speed 5155.44 samples/sec Loss 0.3943 LearningRate 0.0001 Epoch: 19 Global Step: 321600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:51:46,667-Speed 5152.91 samples/sec Loss 0.3969 LearningRate 0.0001 Epoch: 19 Global Step: 321610 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:51:48,664-Speed 5130.47 samples/sec Loss 0.3778 LearningRate 0.0001 Epoch: 19 Global Step: 321620 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:51:50,646-Speed 5166.58 samples/sec Loss 0.3753 LearningRate 0.0001 Epoch: 19 Global Step: 321630 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:51:52,636-Speed 5148.03 samples/sec Loss 0.3684 LearningRate 0.0001 Epoch: 19 Global Step: 321640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:51:54,619-Speed 5166.18 samples/sec Loss 0.3747 LearningRate 0.0001 Epoch: 19 Global Step: 321650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:51:56,589-Speed 5198.88 samples/sec Loss 0.4008 LearningRate 0.0001 Epoch: 19 Global Step: 321660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:51:58,596-Speed 5103.02 samples/sec Loss 0.3693 LearningRate 0.0001 Epoch: 19 Global Step: 321670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:52:00,571-Speed 5187.79 samples/sec Loss 0.3894 LearningRate 0.0001 Epoch: 19 Global Step: 321680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:52:02,544-Speed 5191.49 samples/sec Loss 0.3853 LearningRate 0.0001 Epoch: 19 Global Step: 321690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:52:04,542-Speed 5127.31 samples/sec Loss 0.3748 LearningRate 0.0001 Epoch: 19 Global Step: 321700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:52:06,524-Speed 5168.90 samples/sec Loss 0.3598 LearningRate 0.0001 Epoch: 19 Global Step: 321710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:52:08,519-Speed 5135.04 samples/sec Loss 0.3783 LearningRate 0.0001 Epoch: 19 Global Step: 321720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:52:10,494-Speed 5185.86 samples/sec Loss 0.3823 LearningRate 0.0001 Epoch: 19 Global Step: 321730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:52:12,500-Speed 5105.66 samples/sec Loss 0.3705 LearningRate 0.0001 Epoch: 19 Global Step: 321740 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:52:14,494-Speed 5137.29 samples/sec Loss 0.3996 LearningRate 0.0001 Epoch: 19 Global Step: 321750 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:52:16,472-Speed 5177.73 samples/sec Loss 0.4015 LearningRate 0.0001 Epoch: 19 Global Step: 321760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:52:18,446-Speed 5191.50 samples/sec Loss 0.3857 LearningRate 0.0001 Epoch: 19 Global Step: 321770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:52:20,423-Speed 5179.84 samples/sec Loss 0.3851 LearningRate 0.0001 Epoch: 19 Global Step: 321780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:52:22,392-Speed 5202.69 samples/sec Loss 0.3991 LearningRate 0.0001 Epoch: 19 Global Step: 321790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:52:24,366-Speed 5190.01 samples/sec Loss 0.3892 LearningRate 0.0001 Epoch: 19 Global Step: 321800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:52:26,349-Speed 5166.09 samples/sec Loss 0.3854 LearningRate 0.0001 Epoch: 19 Global Step: 321810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:52:28,331-Speed 5166.67 samples/sec Loss 0.3814 LearningRate 0.0001 Epoch: 19 Global Step: 321820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:52:30,308-Speed 5180.91 samples/sec Loss 0.3882 LearningRate 0.0001 Epoch: 19 Global Step: 321830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:52:32,286-Speed 5179.89 samples/sec Loss 0.3758 LearningRate 0.0001 Epoch: 19 Global Step: 321840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:52:34,292-Speed 5107.86 samples/sec Loss 0.3967 LearningRate 0.0001 Epoch: 19 Global Step: 321850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:52:36,277-Speed 5158.36 samples/sec Loss 0.4135 LearningRate 0.0001 Epoch: 19 Global Step: 321860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:52:38,275-Speed 5128.39 samples/sec Loss 0.3722 LearningRate 0.0001 Epoch: 19 Global Step: 321870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:52:40,290-Speed 5083.92 samples/sec Loss 0.3686 LearningRate 0.0001 Epoch: 19 Global Step: 321880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:52:42,271-Speed 5168.18 samples/sec Loss 0.4192 LearningRate 0.0001 Epoch: 19 Global Step: 321890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:52:44,251-Speed 5175.02 samples/sec Loss 0.4003 LearningRate 0.0001 Epoch: 19 Global Step: 321900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:52:46,223-Speed 5192.68 samples/sec Loss 0.3833 LearningRate 0.0001 Epoch: 19 Global Step: 321910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:52:48,226-Speed 5115.61 samples/sec Loss 0.3856 LearningRate 0.0001 Epoch: 19 Global Step: 321920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:52:50,210-Speed 5162.24 samples/sec Loss 0.3764 LearningRate 0.0001 Epoch: 19 Global Step: 321930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:52:52,199-Speed 5149.87 samples/sec Loss 0.3881 LearningRate 0.0001 Epoch: 19 Global Step: 321940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:52:54,185-Speed 5159.55 samples/sec Loss 0.3892 LearningRate 0.0001 Epoch: 19 Global Step: 321950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:52:56,155-Speed 5198.20 samples/sec Loss 0.3760 LearningRate 0.0001 Epoch: 19 Global Step: 321960 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:52:58,154-Speed 5125.44 samples/sec Loss 0.3825 LearningRate 0.0001 Epoch: 19 Global Step: 321970 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:53:00,129-Speed 5187.03 samples/sec Loss 0.3867 LearningRate 0.0001 Epoch: 19 Global Step: 321980 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:53:02,102-Speed 5191.99 samples/sec Loss 0.4000 LearningRate 0.0001 Epoch: 19 Global Step: 321990 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:53:04,082-Speed 5172.89 samples/sec Loss 0.4024 LearningRate 0.0001 Epoch: 19 Global Step: 322000 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:53:30,863-[lfw][322000]XNorm: 21.519585 Training: 2022-04-11 20:53:30,863-[lfw][322000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 20:53:30,864-[lfw][322000]Accuracy-Highest: 0.99833 Training: 2022-04-11 20:54:01,790-[cfp_fp][322000]XNorm: 22.042390 Training: 2022-04-11 20:54:01,791-[cfp_fp][322000]Accuracy-Flip: 0.98929+-0.00420 Training: 2022-04-11 20:54:01,791-[cfp_fp][322000]Accuracy-Highest: 0.99029 Training: 2022-04-11 20:54:28,388-[agedb_30][322000]XNorm: 22.671760 Training: 2022-04-11 20:54:28,388-[agedb_30][322000]Accuracy-Flip: 0.98383+-0.00691 Training: 2022-04-11 20:54:28,389-[agedb_30][322000]Accuracy-Highest: 0.98450 Training: 2022-04-11 20:54:30,371-Speed 118.67 samples/sec Loss 0.3725 LearningRate 0.0001 Epoch: 19 Global Step: 322010 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:54:32,340-Speed 5201.68 samples/sec Loss 0.3886 LearningRate 0.0001 Epoch: 19 Global Step: 322020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:54:34,310-Speed 5198.93 samples/sec Loss 0.3716 LearningRate 0.0001 Epoch: 19 Global Step: 322030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:54:36,283-Speed 5193.42 samples/sec Loss 0.3776 LearningRate 0.0001 Epoch: 19 Global Step: 322040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:54:38,280-Speed 5128.20 samples/sec Loss 0.3830 LearningRate 0.0001 Epoch: 19 Global Step: 322050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:54:40,245-Speed 5213.53 samples/sec Loss 0.3987 LearningRate 0.0001 Epoch: 19 Global Step: 322060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:54:42,206-Speed 5223.05 samples/sec Loss 0.3847 LearningRate 0.0001 Epoch: 19 Global Step: 322070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:54:44,174-Speed 5206.40 samples/sec Loss 0.4094 LearningRate 0.0001 Epoch: 19 Global Step: 322080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:54:46,145-Speed 5194.81 samples/sec Loss 0.3859 LearningRate 0.0001 Epoch: 19 Global Step: 322090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:54:48,104-Speed 5228.71 samples/sec Loss 0.3699 LearningRate 0.0001 Epoch: 19 Global Step: 322100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:54:50,145-Speed 5020.19 samples/sec Loss 0.3627 LearningRate 0.0001 Epoch: 19 Global Step: 322110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:54:52,146-Speed 5118.47 samples/sec Loss 0.3946 LearningRate 0.0001 Epoch: 19 Global Step: 322120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:54:54,114-Speed 5205.50 samples/sec Loss 0.3794 LearningRate 0.0001 Epoch: 19 Global Step: 322130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:54:56,073-Speed 5228.93 samples/sec Loss 0.3913 LearningRate 0.0001 Epoch: 19 Global Step: 322140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:54:58,036-Speed 5219.57 samples/sec Loss 0.3674 LearningRate 0.0001 Epoch: 19 Global Step: 322150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:55:00,039-Speed 5112.84 samples/sec Loss 0.3836 LearningRate 0.0001 Epoch: 19 Global Step: 322160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:55:02,049-Speed 5095.86 samples/sec Loss 0.3815 LearningRate 0.0001 Epoch: 19 Global Step: 322170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:55:04,025-Speed 5185.30 samples/sec Loss 0.3991 LearningRate 0.0001 Epoch: 19 Global Step: 322180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:55:05,988-Speed 5217.39 samples/sec Loss 0.3896 LearningRate 0.0001 Epoch: 19 Global Step: 322190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:55:07,954-Speed 5209.28 samples/sec Loss 0.3785 LearningRate 0.0001 Epoch: 19 Global Step: 322200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:55:09,927-Speed 5192.41 samples/sec Loss 0.3712 LearningRate 0.0001 Epoch: 19 Global Step: 322210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:55:11,922-Speed 5135.58 samples/sec Loss 0.4021 LearningRate 0.0001 Epoch: 19 Global Step: 322220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:55:13,907-Speed 5160.38 samples/sec Loss 0.4002 LearningRate 0.0001 Epoch: 19 Global Step: 322230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:55:15,890-Speed 5165.76 samples/sec Loss 0.4022 LearningRate 0.0001 Epoch: 19 Global Step: 322240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:55:17,861-Speed 5197.38 samples/sec Loss 0.3830 LearningRate 0.0001 Epoch: 19 Global Step: 322250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:55:19,846-Speed 5159.18 samples/sec Loss 0.4232 LearningRate 0.0001 Epoch: 19 Global Step: 322260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:55:21,839-Speed 5139.21 samples/sec Loss 0.4066 LearningRate 0.0001 Epoch: 19 Global Step: 322270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:55:23,826-Speed 5156.14 samples/sec Loss 0.4004 LearningRate 0.0001 Epoch: 19 Global Step: 322280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:55:25,812-Speed 5156.85 samples/sec Loss 0.3746 LearningRate 0.0001 Epoch: 19 Global Step: 322290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:55:27,802-Speed 5148.29 samples/sec Loss 0.3905 LearningRate 0.0001 Epoch: 19 Global Step: 322300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:55:29,790-Speed 5152.69 samples/sec Loss 0.3954 LearningRate 0.0001 Epoch: 19 Global Step: 322310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:55:31,760-Speed 5198.87 samples/sec Loss 0.4046 LearningRate 0.0001 Epoch: 19 Global Step: 322320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:55:33,729-Speed 5203.49 samples/sec Loss 0.3786 LearningRate 0.0001 Epoch: 19 Global Step: 322330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:55:35,719-Speed 5148.51 samples/sec Loss 0.3875 LearningRate 0.0001 Epoch: 19 Global Step: 322340 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:55:37,698-Speed 5175.15 samples/sec Loss 0.3809 LearningRate 0.0001 Epoch: 19 Global Step: 322350 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:55:39,670-Speed 5195.07 samples/sec Loss 0.3620 LearningRate 0.0001 Epoch: 19 Global Step: 322360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:55:41,653-Speed 5164.89 samples/sec Loss 0.3837 LearningRate 0.0001 Epoch: 19 Global Step: 322370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:55:43,615-Speed 5219.80 samples/sec Loss 0.3898 LearningRate 0.0001 Epoch: 19 Global Step: 322380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:55:45,590-Speed 5187.79 samples/sec Loss 0.3752 LearningRate 0.0001 Epoch: 19 Global Step: 322390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:55:47,598-Speed 5100.42 samples/sec Loss 0.3770 LearningRate 0.0001 Epoch: 19 Global Step: 322400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:55:49,670-Speed 4943.82 samples/sec Loss 0.3897 LearningRate 0.0001 Epoch: 19 Global Step: 322410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:55:51,738-Speed 4953.98 samples/sec Loss 0.3876 LearningRate 0.0001 Epoch: 19 Global Step: 322420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:55:53,711-Speed 5191.40 samples/sec Loss 0.3878 LearningRate 0.0001 Epoch: 19 Global Step: 322430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:55:55,680-Speed 5201.99 samples/sec Loss 0.4182 LearningRate 0.0001 Epoch: 19 Global Step: 322440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:55:57,671-Speed 5147.04 samples/sec Loss 0.3858 LearningRate 0.0001 Epoch: 19 Global Step: 322450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:55:59,639-Speed 5204.06 samples/sec Loss 0.3687 LearningRate 0.0001 Epoch: 19 Global Step: 322460 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:56:01,623-Speed 5168.34 samples/sec Loss 0.3875 LearningRate 0.0001 Epoch: 19 Global Step: 322470 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:56:03,605-Speed 5166.74 samples/sec Loss 0.3730 LearningRate 0.0001 Epoch: 19 Global Step: 322480 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:56:05,569-Speed 5216.65 samples/sec Loss 0.3909 LearningRate 0.0001 Epoch: 19 Global Step: 322490 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:56:07,547-Speed 5178.91 samples/sec Loss 0.3839 LearningRate 0.0001 Epoch: 19 Global Step: 322500 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:56:09,539-Speed 5141.29 samples/sec Loss 0.3911 LearningRate 0.0001 Epoch: 19 Global Step: 322510 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:56:11,511-Speed 5193.19 samples/sec Loss 0.3783 LearningRate 0.0001 Epoch: 19 Global Step: 322520 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:56:13,482-Speed 5197.42 samples/sec Loss 0.3847 LearningRate 0.0001 Epoch: 19 Global Step: 322530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:56:15,458-Speed 5185.48 samples/sec Loss 0.3875 LearningRate 0.0001 Epoch: 19 Global Step: 322540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:56:17,424-Speed 5210.62 samples/sec Loss 0.4140 LearningRate 0.0001 Epoch: 19 Global Step: 322550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:56:19,386-Speed 5220.87 samples/sec Loss 0.3780 LearningRate 0.0001 Epoch: 19 Global Step: 322560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:56:21,371-Speed 5159.89 samples/sec Loss 0.3665 LearningRate 0.0001 Epoch: 19 Global Step: 322570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:56:23,369-Speed 5127.71 samples/sec Loss 0.3864 LearningRate 0.0001 Epoch: 19 Global Step: 322580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:56:25,335-Speed 5208.41 samples/sec Loss 0.4120 LearningRate 0.0001 Epoch: 19 Global Step: 322590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:56:27,316-Speed 5172.13 samples/sec Loss 0.3906 LearningRate 0.0001 Epoch: 19 Global Step: 322600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:56:29,287-Speed 5197.01 samples/sec Loss 0.3901 LearningRate 0.0001 Epoch: 19 Global Step: 322610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:56:31,295-Speed 5101.44 samples/sec Loss 0.3958 LearningRate 0.0001 Epoch: 19 Global Step: 322620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:56:33,270-Speed 5187.69 samples/sec Loss 0.4081 LearningRate 0.0001 Epoch: 19 Global Step: 322630 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:56:35,247-Speed 5181.76 samples/sec Loss 0.4176 LearningRate 0.0001 Epoch: 19 Global Step: 322640 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:56:37,223-Speed 5183.70 samples/sec Loss 0.3841 LearningRate 0.0001 Epoch: 19 Global Step: 322650 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:56:39,230-Speed 5102.99 samples/sec Loss 0.3811 LearningRate 0.0001 Epoch: 19 Global Step: 322660 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:56:41,230-Speed 5121.53 samples/sec Loss 0.3861 LearningRate 0.0001 Epoch: 19 Global Step: 322670 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:56:43,190-Speed 5226.42 samples/sec Loss 0.3920 LearningRate 0.0001 Epoch: 19 Global Step: 322680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:56:45,170-Speed 5173.30 samples/sec Loss 0.3860 LearningRate 0.0001 Epoch: 19 Global Step: 322690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:56:47,154-Speed 5164.11 samples/sec Loss 0.4039 LearningRate 0.0001 Epoch: 19 Global Step: 322700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:56:49,137-Speed 5163.12 samples/sec Loss 0.3893 LearningRate 0.0001 Epoch: 19 Global Step: 322710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:56:51,104-Speed 5208.00 samples/sec Loss 0.3873 LearningRate 0.0001 Epoch: 19 Global Step: 322720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:56:53,115-Speed 5093.20 samples/sec Loss 0.3868 LearningRate 0.0001 Epoch: 19 Global Step: 322730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:56:55,090-Speed 5186.67 samples/sec Loss 0.3958 LearningRate 0.0001 Epoch: 19 Global Step: 322740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:56:57,109-Speed 5074.00 samples/sec Loss 0.3944 LearningRate 0.0001 Epoch: 19 Global Step: 322750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:56:59,130-Speed 5069.09 samples/sec Loss 0.4040 LearningRate 0.0001 Epoch: 19 Global Step: 322760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:57:01,123-Speed 5139.47 samples/sec Loss 0.4195 LearningRate 0.0001 Epoch: 19 Global Step: 322770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:57:03,131-Speed 5101.65 samples/sec Loss 0.3831 LearningRate 0.0001 Epoch: 19 Global Step: 322780 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:57:05,107-Speed 5184.03 samples/sec Loss 0.4036 LearningRate 0.0001 Epoch: 19 Global Step: 322790 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:57:07,075-Speed 5205.70 samples/sec Loss 0.3697 LearningRate 0.0001 Epoch: 19 Global Step: 322800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:57:09,050-Speed 5187.57 samples/sec Loss 0.3609 LearningRate 0.0001 Epoch: 19 Global Step: 322810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:57:11,024-Speed 5188.72 samples/sec Loss 0.3989 LearningRate 0.0001 Epoch: 19 Global Step: 322820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:57:13,009-Speed 5159.29 samples/sec Loss 0.3924 LearningRate 0.0001 Epoch: 19 Global Step: 322830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:57:15,007-Speed 5127.85 samples/sec Loss 0.4045 LearningRate 0.0001 Epoch: 19 Global Step: 322840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:57:17,008-Speed 5118.08 samples/sec Loss 0.3914 LearningRate 0.0001 Epoch: 19 Global Step: 322850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:57:19,012-Speed 5112.38 samples/sec Loss 0.3744 LearningRate 0.0001 Epoch: 19 Global Step: 322860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:57:20,979-Speed 5208.46 samples/sec Loss 0.3859 LearningRate 0.0001 Epoch: 19 Global Step: 322870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:57:22,963-Speed 5162.71 samples/sec Loss 0.3906 LearningRate 0.0001 Epoch: 19 Global Step: 322880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:57:24,929-Speed 5208.74 samples/sec Loss 0.3742 LearningRate 0.0001 Epoch: 19 Global Step: 322890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:57:26,913-Speed 5163.84 samples/sec Loss 0.3679 LearningRate 0.0001 Epoch: 19 Global Step: 322900 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:57:28,911-Speed 5127.19 samples/sec Loss 0.3716 LearningRate 0.0001 Epoch: 19 Global Step: 322910 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:57:30,889-Speed 5177.94 samples/sec Loss 0.3645 LearningRate 0.0001 Epoch: 19 Global Step: 322920 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:57:32,863-Speed 5189.35 samples/sec Loss 0.3688 LearningRate 0.0001 Epoch: 19 Global Step: 322930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:57:34,848-Speed 5160.38 samples/sec Loss 0.3980 LearningRate 0.0001 Epoch: 19 Global Step: 322940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:57:36,852-Speed 5112.87 samples/sec Loss 0.4001 LearningRate 0.0001 Epoch: 19 Global Step: 322950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:57:38,854-Speed 5114.20 samples/sec Loss 0.4026 LearningRate 0.0001 Epoch: 19 Global Step: 322960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:57:40,835-Speed 5171.91 samples/sec Loss 0.3984 LearningRate 0.0001 Epoch: 19 Global Step: 322970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:57:42,799-Speed 5215.69 samples/sec Loss 0.3937 LearningRate 0.0001 Epoch: 19 Global Step: 322980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:57:44,804-Speed 5109.96 samples/sec Loss 0.3832 LearningRate 0.0001 Epoch: 19 Global Step: 322990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:57:46,782-Speed 5179.60 samples/sec Loss 0.4147 LearningRate 0.0001 Epoch: 19 Global Step: 323000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:57:48,777-Speed 5134.23 samples/sec Loss 0.3929 LearningRate 0.0001 Epoch: 19 Global Step: 323010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:57:50,758-Speed 5170.29 samples/sec Loss 0.3592 LearningRate 0.0001 Epoch: 19 Global Step: 323020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:57:52,782-Speed 5060.18 samples/sec Loss 0.3827 LearningRate 0.0001 Epoch: 19 Global Step: 323030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:57:54,749-Speed 5207.49 samples/sec Loss 0.3872 LearningRate 0.0001 Epoch: 19 Global Step: 323040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:57:56,717-Speed 5204.32 samples/sec Loss 0.3904 LearningRate 0.0001 Epoch: 19 Global Step: 323050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:57:58,696-Speed 5176.07 samples/sec Loss 0.3746 LearningRate 0.0001 Epoch: 19 Global Step: 323060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:58:00,692-Speed 5131.75 samples/sec Loss 0.3647 LearningRate 0.0001 Epoch: 19 Global Step: 323070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:58:02,673-Speed 5173.16 samples/sec Loss 0.3895 LearningRate 0.0001 Epoch: 19 Global Step: 323080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:58:04,639-Speed 5209.73 samples/sec Loss 0.3908 LearningRate 0.0001 Epoch: 19 Global Step: 323090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-11 20:58:06,608-Speed 5201.93 samples/sec Loss 0.3727 LearningRate 0.0001 Epoch: 19 Global Step: 323100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:58:08,595-Speed 5154.87 samples/sec Loss 0.4092 LearningRate 0.0001 Epoch: 19 Global Step: 323110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:58:10,575-Speed 5174.46 samples/sec Loss 0.3946 LearningRate 0.0001 Epoch: 19 Global Step: 323120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:58:12,573-Speed 5128.59 samples/sec Loss 0.4040 LearningRate 0.0001 Epoch: 19 Global Step: 323130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:58:14,546-Speed 5191.39 samples/sec Loss 0.3819 LearningRate 0.0001 Epoch: 19 Global Step: 323140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:58:16,530-Speed 5160.95 samples/sec Loss 0.4008 LearningRate 0.0001 Epoch: 19 Global Step: 323150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:58:18,511-Speed 5170.93 samples/sec Loss 0.4078 LearningRate 0.0001 Epoch: 19 Global Step: 323160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:58:20,485-Speed 5189.00 samples/sec Loss 0.3740 LearningRate 0.0001 Epoch: 19 Global Step: 323170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:58:22,453-Speed 5205.59 samples/sec Loss 0.3975 LearningRate 0.0001 Epoch: 19 Global Step: 323180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:58:24,437-Speed 5163.03 samples/sec Loss 0.4197 LearningRate 0.0001 Epoch: 19 Global Step: 323190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:58:26,411-Speed 5189.19 samples/sec Loss 0.3655 LearningRate 0.0001 Epoch: 19 Global Step: 323200 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:58:28,408-Speed 5130.08 samples/sec Loss 0.3851 LearningRate 0.0001 Epoch: 19 Global Step: 323210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:58:30,398-Speed 5146.99 samples/sec Loss 0.3803 LearningRate 0.0001 Epoch: 19 Global Step: 323220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:58:32,375-Speed 5181.48 samples/sec Loss 0.3869 LearningRate 0.0001 Epoch: 19 Global Step: 323230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:58:34,347-Speed 5194.29 samples/sec Loss 0.4208 LearningRate 0.0001 Epoch: 19 Global Step: 323240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:58:36,328-Speed 5169.90 samples/sec Loss 0.3900 LearningRate 0.0001 Epoch: 19 Global Step: 323250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:58:38,298-Speed 5200.94 samples/sec Loss 0.3981 LearningRate 0.0001 Epoch: 19 Global Step: 323260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:58:40,282-Speed 5163.76 samples/sec Loss 0.3873 LearningRate 0.0001 Epoch: 19 Global Step: 323270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:58:42,255-Speed 5190.41 samples/sec Loss 0.3788 LearningRate 0.0001 Epoch: 19 Global Step: 323280 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:58:44,221-Speed 5209.49 samples/sec Loss 0.3866 LearningRate 0.0001 Epoch: 19 Global Step: 323290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:58:46,192-Speed 5198.34 samples/sec Loss 0.3903 LearningRate 0.0001 Epoch: 19 Global Step: 323300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:58:48,176-Speed 5162.23 samples/sec Loss 0.3863 LearningRate 0.0001 Epoch: 19 Global Step: 323310 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:58:50,154-Speed 5180.07 samples/sec Loss 0.3973 LearningRate 0.0001 Epoch: 19 Global Step: 323320 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:58:52,148-Speed 5136.03 samples/sec Loss 0.3777 LearningRate 0.0001 Epoch: 19 Global Step: 323330 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:58:54,151-Speed 5115.65 samples/sec Loss 0.3686 LearningRate 0.0001 Epoch: 19 Global Step: 323340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:58:56,120-Speed 5200.40 samples/sec Loss 0.3969 LearningRate 0.0001 Epoch: 19 Global Step: 323350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:58:58,133-Speed 5089.63 samples/sec Loss 0.3890 LearningRate 0.0001 Epoch: 19 Global Step: 323360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:59:00,110-Speed 5182.02 samples/sec Loss 0.3997 LearningRate 0.0001 Epoch: 19 Global Step: 323370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:59:02,112-Speed 5115.03 samples/sec Loss 0.3951 LearningRate 0.0001 Epoch: 19 Global Step: 323380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:59:04,108-Speed 5131.26 samples/sec Loss 0.3811 LearningRate 0.0001 Epoch: 19 Global Step: 323390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:59:06,076-Speed 5207.22 samples/sec Loss 0.3753 LearningRate 0.0001 Epoch: 19 Global Step: 323400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:59:08,044-Speed 5204.58 samples/sec Loss 0.3695 LearningRate 0.0001 Epoch: 19 Global Step: 323410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:59:10,031-Speed 5156.38 samples/sec Loss 0.3727 LearningRate 0.0001 Epoch: 19 Global Step: 323420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:59:12,003-Speed 5192.89 samples/sec Loss 0.3856 LearningRate 0.0001 Epoch: 19 Global Step: 323430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:59:13,963-Speed 5227.29 samples/sec Loss 0.3722 LearningRate 0.0001 Epoch: 19 Global Step: 323440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:59:15,932-Speed 5202.64 samples/sec Loss 0.3728 LearningRate 0.0001 Epoch: 19 Global Step: 323450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:59:17,901-Speed 5201.11 samples/sec Loss 0.3969 LearningRate 0.0001 Epoch: 19 Global Step: 323460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:59:19,869-Speed 5204.62 samples/sec Loss 0.3819 LearningRate 0.0001 Epoch: 19 Global Step: 323470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:59:21,865-Speed 5132.62 samples/sec Loss 0.3770 LearningRate 0.0001 Epoch: 19 Global Step: 323480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:59:23,853-Speed 5151.81 samples/sec Loss 0.4108 LearningRate 0.0001 Epoch: 19 Global Step: 323490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:59:25,840-Speed 5154.21 samples/sec Loss 0.3912 LearningRate 0.0001 Epoch: 19 Global Step: 323500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:59:27,825-Speed 5161.93 samples/sec Loss 0.3766 LearningRate 0.0001 Epoch: 19 Global Step: 323510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:59:29,820-Speed 5132.81 samples/sec Loss 0.3652 LearningRate 0.0001 Epoch: 19 Global Step: 323520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:59:31,790-Speed 5201.28 samples/sec Loss 0.3887 LearningRate 0.0001 Epoch: 19 Global Step: 323530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:59:33,774-Speed 5162.96 samples/sec Loss 0.3807 LearningRate 0.0001 Epoch: 19 Global Step: 323540 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:59:35,751-Speed 5181.54 samples/sec Loss 0.3815 LearningRate 0.0001 Epoch: 19 Global Step: 323550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:59:37,734-Speed 5165.16 samples/sec Loss 0.3640 LearningRate 0.0001 Epoch: 19 Global Step: 323560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:59:39,708-Speed 5188.68 samples/sec Loss 0.3847 LearningRate 0.0001 Epoch: 19 Global Step: 323570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:59:41,676-Speed 5207.07 samples/sec Loss 0.3985 LearningRate 0.0001 Epoch: 19 Global Step: 323580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:59:43,642-Speed 5209.98 samples/sec Loss 0.3556 LearningRate 0.0001 Epoch: 19 Global Step: 323590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:59:45,613-Speed 5197.48 samples/sec Loss 0.3559 LearningRate 0.0001 Epoch: 19 Global Step: 323600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:59:47,612-Speed 5123.38 samples/sec Loss 0.3943 LearningRate 0.0001 Epoch: 19 Global Step: 323610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:59:49,588-Speed 5183.91 samples/sec Loss 0.3889 LearningRate 0.0001 Epoch: 19 Global Step: 323620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:59:51,598-Speed 5094.73 samples/sec Loss 0.4010 LearningRate 0.0001 Epoch: 19 Global Step: 323630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:59:53,584-Speed 5158.73 samples/sec Loss 0.3811 LearningRate 0.0001 Epoch: 19 Global Step: 323640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 20:59:55,569-Speed 5160.98 samples/sec Loss 0.4050 LearningRate 0.0001 Epoch: 19 Global Step: 323650 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:59:57,540-Speed 5197.15 samples/sec Loss 0.3561 LearningRate 0.0001 Epoch: 19 Global Step: 323660 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 20:59:59,548-Speed 5101.77 samples/sec Loss 0.3844 LearningRate 0.0001 Epoch: 19 Global Step: 323670 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:00:01,521-Speed 5189.73 samples/sec Loss 0.4102 LearningRate 0.0001 Epoch: 19 Global Step: 323680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:00:03,502-Speed 5172.74 samples/sec Loss 0.3709 LearningRate 0.0001 Epoch: 19 Global Step: 323690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:00:05,475-Speed 5191.43 samples/sec Loss 0.3864 LearningRate 0.0001 Epoch: 19 Global Step: 323700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:00:07,446-Speed 5197.09 samples/sec Loss 0.3760 LearningRate 0.0001 Epoch: 19 Global Step: 323710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:00:09,425-Speed 5175.24 samples/sec Loss 0.3994 LearningRate 0.0001 Epoch: 19 Global Step: 323720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:00:11,406-Speed 5171.87 samples/sec Loss 0.3818 LearningRate 0.0001 Epoch: 19 Global Step: 323730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:00:13,401-Speed 5133.29 samples/sec Loss 0.3980 LearningRate 0.0001 Epoch: 19 Global Step: 323740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:00:15,415-Speed 5085.95 samples/sec Loss 0.3827 LearningRate 0.0001 Epoch: 19 Global Step: 323750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:00:17,457-Speed 5017.62 samples/sec Loss 0.3822 LearningRate 0.0001 Epoch: 19 Global Step: 323760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:00:19,431-Speed 5188.66 samples/sec Loss 0.3731 LearningRate 0.0001 Epoch: 19 Global Step: 323770 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:00:21,418-Speed 5154.48 samples/sec Loss 0.3854 LearningRate 0.0001 Epoch: 19 Global Step: 323780 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:00:23,415-Speed 5130.53 samples/sec Loss 0.3908 LearningRate 0.0001 Epoch: 19 Global Step: 323790 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:00:25,418-Speed 5115.16 samples/sec Loss 0.3887 LearningRate 0.0001 Epoch: 19 Global Step: 323800 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:00:27,408-Speed 5147.57 samples/sec Loss 0.3636 LearningRate 0.0001 Epoch: 19 Global Step: 323810 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:00:29,386-Speed 5178.10 samples/sec Loss 0.3867 LearningRate 0.0001 Epoch: 19 Global Step: 323820 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:00:31,358-Speed 5193.99 samples/sec Loss 0.3984 LearningRate 0.0001 Epoch: 19 Global Step: 323830 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:00:33,332-Speed 5189.64 samples/sec Loss 0.3814 LearningRate 0.0001 Epoch: 19 Global Step: 323840 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:00:35,324-Speed 5141.95 samples/sec Loss 0.3896 LearningRate 0.0001 Epoch: 19 Global Step: 323850 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:00:37,304-Speed 5171.51 samples/sec Loss 0.3926 LearningRate 0.0001 Epoch: 19 Global Step: 323860 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:00:39,302-Speed 5128.92 samples/sec Loss 0.3955 LearningRate 0.0001 Epoch: 19 Global Step: 323870 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:00:41,326-Speed 5059.90 samples/sec Loss 0.3770 LearningRate 0.0001 Epoch: 19 Global Step: 323880 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:00:43,292-Speed 5211.02 samples/sec Loss 0.3838 LearningRate 0.0001 Epoch: 19 Global Step: 323890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:00:45,276-Speed 5164.16 samples/sec Loss 0.3675 LearningRate 0.0001 Epoch: 19 Global Step: 323900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:00:47,247-Speed 5195.92 samples/sec Loss 0.4027 LearningRate 0.0001 Epoch: 19 Global Step: 323910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:00:49,232-Speed 5159.24 samples/sec Loss 0.3910 LearningRate 0.0001 Epoch: 19 Global Step: 323920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:00:51,219-Speed 5154.81 samples/sec Loss 0.3794 LearningRate 0.0001 Epoch: 19 Global Step: 323930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:00:53,212-Speed 5140.04 samples/sec Loss 0.3883 LearningRate 0.0001 Epoch: 19 Global Step: 323940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:00:55,201-Speed 5150.19 samples/sec Loss 0.3707 LearningRate 0.0001 Epoch: 19 Global Step: 323950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:00:57,228-Speed 5053.52 samples/sec Loss 0.3930 LearningRate 0.0001 Epoch: 19 Global Step: 323960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:00:59,229-Speed 5119.17 samples/sec Loss 0.3921 LearningRate 0.0001 Epoch: 19 Global Step: 323970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:01:01,241-Speed 5091.15 samples/sec Loss 0.3565 LearningRate 0.0001 Epoch: 19 Global Step: 323980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:01:03,233-Speed 5143.07 samples/sec Loss 0.3889 LearningRate 0.0001 Epoch: 19 Global Step: 323990 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:01:05,224-Speed 5144.40 samples/sec Loss 0.3847 LearningRate 0.0001 Epoch: 19 Global Step: 324000 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:01:31,868-[lfw][324000]XNorm: 21.546946 Training: 2022-04-11 21:01:31,869-[lfw][324000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 21:01:31,869-[lfw][324000]Accuracy-Highest: 0.99833 Training: 2022-04-11 21:02:02,692-[cfp_fp][324000]XNorm: 22.032609 Training: 2022-04-11 21:02:02,693-[cfp_fp][324000]Accuracy-Flip: 0.99071+-0.00400 Training: 2022-04-11 21:02:02,693-[cfp_fp][324000]Accuracy-Highest: 0.99071 Training: 2022-04-11 21:02:29,309-[agedb_30][324000]XNorm: 22.684689 Training: 2022-04-11 21:02:29,309-[agedb_30][324000]Accuracy-Flip: 0.98317+-0.00612 Training: 2022-04-11 21:02:29,310-[agedb_30][324000]Accuracy-Highest: 0.98450 Training: 2022-04-11 21:02:31,298-Speed 118.97 samples/sec Loss 0.3825 LearningRate 0.0001 Epoch: 19 Global Step: 324010 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:02:33,267-Speed 5202.10 samples/sec Loss 0.3793 LearningRate 0.0001 Epoch: 19 Global Step: 324020 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:02:35,222-Speed 5239.87 samples/sec Loss 0.3862 LearningRate 0.0001 Epoch: 19 Global Step: 324030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:02:37,182-Speed 5225.73 samples/sec Loss 0.3862 LearningRate 0.0001 Epoch: 19 Global Step: 324040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:02:39,143-Speed 5224.13 samples/sec Loss 0.3832 LearningRate 0.0001 Epoch: 19 Global Step: 324050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:02:41,127-Speed 5163.64 samples/sec Loss 0.4062 LearningRate 0.0001 Epoch: 19 Global Step: 324060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:02:43,088-Speed 5223.26 samples/sec Loss 0.3773 LearningRate 0.0001 Epoch: 19 Global Step: 324070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:02:45,048-Speed 5224.18 samples/sec Loss 0.4009 LearningRate 0.0001 Epoch: 19 Global Step: 324080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:02:47,011-Speed 5220.41 samples/sec Loss 0.3871 LearningRate 0.0001 Epoch: 19 Global Step: 324090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:02:48,984-Speed 5191.77 samples/sec Loss 0.3856 LearningRate 0.0001 Epoch: 19 Global Step: 324100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:02:50,944-Speed 5224.77 samples/sec Loss 0.3820 LearningRate 0.0001 Epoch: 19 Global Step: 324110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:02:52,965-Speed 5069.64 samples/sec Loss 0.4075 LearningRate 0.0001 Epoch: 19 Global Step: 324120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:02:54,930-Speed 5212.95 samples/sec Loss 0.3825 LearningRate 0.0001 Epoch: 19 Global Step: 324130 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:02:56,919-Speed 5149.14 samples/sec Loss 0.3707 LearningRate 0.0001 Epoch: 19 Global Step: 324140 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:02:58,952-Speed 5037.97 samples/sec Loss 0.3887 LearningRate 0.0001 Epoch: 19 Global Step: 324150 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:03:01,004-Speed 4992.56 samples/sec Loss 0.3956 LearningRate 0.0001 Epoch: 19 Global Step: 324160 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:03:03,035-Speed 5044.73 samples/sec Loss 0.3780 LearningRate 0.0001 Epoch: 19 Global Step: 324170 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:03:05,030-Speed 5133.68 samples/sec Loss 0.3911 LearningRate 0.0001 Epoch: 19 Global Step: 324180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:03:07,008-Speed 5178.78 samples/sec Loss 0.3895 LearningRate 0.0001 Epoch: 19 Global Step: 324190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:03:08,983-Speed 5188.40 samples/sec Loss 0.4114 LearningRate 0.0001 Epoch: 19 Global Step: 324200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:03:10,954-Speed 5196.57 samples/sec Loss 0.3847 LearningRate 0.0001 Epoch: 19 Global Step: 324210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:03:12,959-Speed 5107.07 samples/sec Loss 0.4043 LearningRate 0.0001 Epoch: 19 Global Step: 324220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:03:14,949-Speed 5147.58 samples/sec Loss 0.3969 LearningRate 0.0001 Epoch: 19 Global Step: 324230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:03:16,927-Speed 5178.74 samples/sec Loss 0.3596 LearningRate 0.0001 Epoch: 19 Global Step: 324240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:03:18,898-Speed 5196.33 samples/sec Loss 0.3824 LearningRate 0.0001 Epoch: 19 Global Step: 324250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:03:20,867-Speed 5202.02 samples/sec Loss 0.3895 LearningRate 0.0001 Epoch: 19 Global Step: 324260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:03:22,872-Speed 5109.84 samples/sec Loss 0.3919 LearningRate 0.0001 Epoch: 19 Global Step: 324270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:03:24,890-Speed 5077.21 samples/sec Loss 0.3670 LearningRate 0.0001 Epoch: 19 Global Step: 324280 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:03:26,888-Speed 5125.48 samples/sec Loss 0.3785 LearningRate 0.0001 Epoch: 19 Global Step: 324290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:03:28,873-Speed 5162.89 samples/sec Loss 0.3702 LearningRate 0.0001 Epoch: 19 Global Step: 324300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:03:30,856-Speed 5163.45 samples/sec Loss 0.3845 LearningRate 0.0001 Epoch: 19 Global Step: 324310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:03:32,827-Speed 5198.58 samples/sec Loss 0.3769 LearningRate 0.0001 Epoch: 19 Global Step: 324320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:03:34,794-Speed 5207.50 samples/sec Loss 0.3876 LearningRate 0.0001 Epoch: 19 Global Step: 324330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:03:36,763-Speed 5201.70 samples/sec Loss 0.3886 LearningRate 0.0001 Epoch: 19 Global Step: 324340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:03:38,740-Speed 5182.23 samples/sec Loss 0.4050 LearningRate 0.0001 Epoch: 19 Global Step: 324350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:03:40,722-Speed 5168.87 samples/sec Loss 0.3615 LearningRate 0.0001 Epoch: 19 Global Step: 324360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:03:42,695-Speed 5189.89 samples/sec Loss 0.3922 LearningRate 0.0001 Epoch: 19 Global Step: 324370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:03:44,663-Speed 5206.56 samples/sec Loss 0.4047 LearningRate 0.0001 Epoch: 19 Global Step: 324380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:03:46,650-Speed 5154.01 samples/sec Loss 0.3875 LearningRate 0.0001 Epoch: 19 Global Step: 324390 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:03:48,629-Speed 5177.50 samples/sec Loss 0.3612 LearningRate 0.0001 Epoch: 19 Global Step: 324400 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:03:50,607-Speed 5178.94 samples/sec Loss 0.3695 LearningRate 0.0001 Epoch: 19 Global Step: 324410 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:03:52,589-Speed 5168.30 samples/sec Loss 0.3839 LearningRate 0.0001 Epoch: 19 Global Step: 324420 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:03:54,564-Speed 5185.30 samples/sec Loss 0.4080 LearningRate 0.0001 Epoch: 19 Global Step: 324430 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:03:56,534-Speed 5200.34 samples/sec Loss 0.3828 LearningRate 0.0001 Epoch: 19 Global Step: 324440 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:03:58,509-Speed 5187.16 samples/sec Loss 0.3729 LearningRate 0.0001 Epoch: 19 Global Step: 324450 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:04:00,509-Speed 5120.39 samples/sec Loss 0.4131 LearningRate 0.0001 Epoch: 19 Global Step: 324460 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:04:02,485-Speed 5185.17 samples/sec Loss 0.3648 LearningRate 0.0001 Epoch: 19 Global Step: 324470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:04:04,503-Speed 5075.05 samples/sec Loss 0.3763 LearningRate 0.0001 Epoch: 19 Global Step: 324480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:04:06,473-Speed 5200.68 samples/sec Loss 0.4033 LearningRate 0.0001 Epoch: 19 Global Step: 324490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:04:08,448-Speed 5185.59 samples/sec Loss 0.3839 LearningRate 0.0001 Epoch: 19 Global Step: 324500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:04:10,418-Speed 5201.62 samples/sec Loss 0.3611 LearningRate 0.0001 Epoch: 19 Global Step: 324510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:04:12,384-Speed 5210.57 samples/sec Loss 0.4161 LearningRate 0.0001 Epoch: 19 Global Step: 324520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:04:14,351-Speed 5206.25 samples/sec Loss 0.3608 LearningRate 0.0001 Epoch: 19 Global Step: 324530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:04:16,314-Speed 5218.52 samples/sec Loss 0.3923 LearningRate 0.0001 Epoch: 19 Global Step: 324540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:04:18,291-Speed 5181.05 samples/sec Loss 0.3772 LearningRate 0.0001 Epoch: 19 Global Step: 324550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:04:20,261-Speed 5200.86 samples/sec Loss 0.3751 LearningRate 0.0001 Epoch: 19 Global Step: 324560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:04:22,224-Speed 5216.63 samples/sec Loss 0.3969 LearningRate 0.0001 Epoch: 19 Global Step: 324570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:04:24,191-Speed 5209.44 samples/sec Loss 0.3939 LearningRate 0.0001 Epoch: 19 Global Step: 324580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:04:26,151-Speed 5224.11 samples/sec Loss 0.3940 LearningRate 0.0001 Epoch: 19 Global Step: 324590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:04:28,128-Speed 5182.89 samples/sec Loss 0.3822 LearningRate 0.0001 Epoch: 19 Global Step: 324600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:04:30,100-Speed 5193.59 samples/sec Loss 0.4215 LearningRate 0.0001 Epoch: 19 Global Step: 324610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:04:32,066-Speed 5210.58 samples/sec Loss 0.4150 LearningRate 0.0001 Epoch: 19 Global Step: 324620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:04:34,035-Speed 5202.22 samples/sec Loss 0.3960 LearningRate 0.0001 Epoch: 19 Global Step: 324630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:04:36,018-Speed 5165.06 samples/sec Loss 0.3955 LearningRate 0.0001 Epoch: 19 Global Step: 324640 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:04:37,994-Speed 5185.24 samples/sec Loss 0.3824 LearningRate 0.0001 Epoch: 19 Global Step: 324650 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:04:39,973-Speed 5175.81 samples/sec Loss 0.4004 LearningRate 0.0001 Epoch: 19 Global Step: 324660 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:04:41,978-Speed 5108.90 samples/sec Loss 0.3868 LearningRate 0.0001 Epoch: 19 Global Step: 324670 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:04:43,935-Speed 5233.05 samples/sec Loss 0.3656 LearningRate 0.0001 Epoch: 19 Global Step: 324680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:04:45,904-Speed 5204.16 samples/sec Loss 0.3987 LearningRate 0.0001 Epoch: 19 Global Step: 324690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:04:47,895-Speed 5145.19 samples/sec Loss 0.4065 LearningRate 0.0001 Epoch: 19 Global Step: 324700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:04:49,871-Speed 5182.53 samples/sec Loss 0.3743 LearningRate 0.0001 Epoch: 19 Global Step: 324710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:04:51,846-Speed 5185.29 samples/sec Loss 0.4063 LearningRate 0.0001 Epoch: 19 Global Step: 324720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:04:53,814-Speed 5204.66 samples/sec Loss 0.3699 LearningRate 0.0001 Epoch: 19 Global Step: 324730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:04:55,793-Speed 5178.80 samples/sec Loss 0.3961 LearningRate 0.0001 Epoch: 19 Global Step: 324740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:04:57,770-Speed 5181.07 samples/sec Loss 0.3835 LearningRate 0.0001 Epoch: 19 Global Step: 324750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:04:59,736-Speed 5210.64 samples/sec Loss 0.3863 LearningRate 0.0001 Epoch: 19 Global Step: 324760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:05:01,754-Speed 5076.55 samples/sec Loss 0.4065 LearningRate 0.0001 Epoch: 19 Global Step: 324770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:05:03,739-Speed 5158.67 samples/sec Loss 0.3644 LearningRate 0.0001 Epoch: 19 Global Step: 324780 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:05:05,706-Speed 5207.20 samples/sec Loss 0.3820 LearningRate 0.0001 Epoch: 19 Global Step: 324790 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:05:07,680-Speed 5188.82 samples/sec Loss 0.4062 LearningRate 0.0001 Epoch: 19 Global Step: 324800 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:05:09,683-Speed 5115.83 samples/sec Loss 0.3894 LearningRate 0.0001 Epoch: 19 Global Step: 324810 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:05:11,682-Speed 5123.20 samples/sec Loss 0.3823 LearningRate 0.0001 Epoch: 19 Global Step: 324820 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:05:13,681-Speed 5124.36 samples/sec Loss 0.3807 LearningRate 0.0001 Epoch: 19 Global Step: 324830 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:05:15,667-Speed 5157.45 samples/sec Loss 0.3953 LearningRate 0.0001 Epoch: 19 Global Step: 324840 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:05:17,637-Speed 5200.54 samples/sec Loss 0.4014 LearningRate 0.0001 Epoch: 19 Global Step: 324850 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:05:19,602-Speed 5212.04 samples/sec Loss 0.3776 LearningRate 0.0001 Epoch: 19 Global Step: 324860 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:05:21,613-Speed 5095.31 samples/sec Loss 0.3741 LearningRate 0.0001 Epoch: 19 Global Step: 324870 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:05:23,594-Speed 5170.01 samples/sec Loss 0.3850 LearningRate 0.0001 Epoch: 19 Global Step: 324880 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:05:25,611-Speed 5079.75 samples/sec Loss 0.3664 LearningRate 0.0001 Epoch: 19 Global Step: 324890 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:05:27,669-Speed 4976.86 samples/sec Loss 0.4012 LearningRate 0.0001 Epoch: 19 Global Step: 324900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:05:29,663-Speed 5136.94 samples/sec Loss 0.4104 LearningRate 0.0001 Epoch: 19 Global Step: 324910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:05:31,627-Speed 5213.47 samples/sec Loss 0.3754 LearningRate 0.0001 Epoch: 19 Global Step: 324920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:05:33,604-Speed 5183.30 samples/sec Loss 0.3889 LearningRate 0.0001 Epoch: 19 Global Step: 324930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:05:35,597-Speed 5139.43 samples/sec Loss 0.4045 LearningRate 0.0001 Epoch: 19 Global Step: 324940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:05:37,582-Speed 5159.34 samples/sec Loss 0.4193 LearningRate 0.0001 Epoch: 19 Global Step: 324950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:05:39,568-Speed 5157.92 samples/sec Loss 0.3962 LearningRate 0.0001 Epoch: 19 Global Step: 324960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:05:41,532-Speed 5217.03 samples/sec Loss 0.3916 LearningRate 0.0001 Epoch: 19 Global Step: 324970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:05:43,500-Speed 5204.17 samples/sec Loss 0.3811 LearningRate 0.0001 Epoch: 19 Global Step: 324980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:05:45,478-Speed 5178.46 samples/sec Loss 0.3944 LearningRate 0.0001 Epoch: 19 Global Step: 324990 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:05:47,457-Speed 5176.59 samples/sec Loss 0.4053 LearningRate 0.0001 Epoch: 19 Global Step: 325000 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:05:49,457-Speed 5122.19 samples/sec Loss 0.3648 LearningRate 0.0001 Epoch: 19 Global Step: 325010 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:05:51,433-Speed 5184.06 samples/sec Loss 0.4060 LearningRate 0.0001 Epoch: 19 Global Step: 325020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:05:53,427-Speed 5136.61 samples/sec Loss 0.3740 LearningRate 0.0001 Epoch: 19 Global Step: 325030 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:05:55,394-Speed 5206.22 samples/sec Loss 0.3840 LearningRate 0.0001 Epoch: 19 Global Step: 325040 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:05:57,376-Speed 5169.17 samples/sec Loss 0.4005 LearningRate 0.0001 Epoch: 19 Global Step: 325050 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:05:59,379-Speed 5113.74 samples/sec Loss 0.3931 LearningRate 0.0001 Epoch: 19 Global Step: 325060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:06:01,364-Speed 5159.23 samples/sec Loss 0.3903 LearningRate 0.0001 Epoch: 19 Global Step: 325070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:06:03,347-Speed 5166.90 samples/sec Loss 0.3920 LearningRate 0.0001 Epoch: 19 Global Step: 325080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:06:05,336-Speed 5151.44 samples/sec Loss 0.3783 LearningRate 0.0001 Epoch: 19 Global Step: 325090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:06:07,302-Speed 5209.89 samples/sec Loss 0.3492 LearningRate 0.0001 Epoch: 19 Global Step: 325100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:06:09,273-Speed 5196.51 samples/sec Loss 0.3819 LearningRate 0.0001 Epoch: 19 Global Step: 325110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:06:11,262-Speed 5150.04 samples/sec Loss 0.3942 LearningRate 0.0001 Epoch: 19 Global Step: 325120 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:06:13,273-Speed 5094.99 samples/sec Loss 0.4157 LearningRate 0.0001 Epoch: 19 Global Step: 325130 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:06:15,263-Speed 5147.03 samples/sec Loss 0.3943 LearningRate 0.0001 Epoch: 19 Global Step: 325140 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:06:17,239-Speed 5184.62 samples/sec Loss 0.4031 LearningRate 0.0001 Epoch: 19 Global Step: 325150 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:06:19,204-Speed 5210.76 samples/sec Loss 0.3981 LearningRate 0.0001 Epoch: 19 Global Step: 325160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:06:21,189-Speed 5160.12 samples/sec Loss 0.3849 LearningRate 0.0001 Epoch: 19 Global Step: 325170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:06:23,199-Speed 5097.44 samples/sec Loss 0.3819 LearningRate 0.0001 Epoch: 19 Global Step: 325180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:06:25,193-Speed 5136.96 samples/sec Loss 0.4015 LearningRate 0.0001 Epoch: 19 Global Step: 325190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:06:27,159-Speed 5209.45 samples/sec Loss 0.3814 LearningRate 0.0001 Epoch: 19 Global Step: 325200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:06:29,126-Speed 5209.52 samples/sec Loss 0.3738 LearningRate 0.0001 Epoch: 19 Global Step: 325210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:06:31,090-Speed 5214.41 samples/sec Loss 0.3921 LearningRate 0.0001 Epoch: 19 Global Step: 325220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:06:33,056-Speed 5212.08 samples/sec Loss 0.4064 LearningRate 0.0001 Epoch: 19 Global Step: 325230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:06:35,024-Speed 5205.74 samples/sec Loss 0.3982 LearningRate 0.0001 Epoch: 19 Global Step: 325240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:06:36,989-Speed 5213.01 samples/sec Loss 0.3737 LearningRate 0.0001 Epoch: 19 Global Step: 325250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:06:38,963-Speed 5189.18 samples/sec Loss 0.3809 LearningRate 0.0001 Epoch: 19 Global Step: 325260 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:06:40,942-Speed 5175.55 samples/sec Loss 0.3851 LearningRate 0.0001 Epoch: 19 Global Step: 325270 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:06:42,935-Speed 5139.10 samples/sec Loss 0.3859 LearningRate 0.0001 Epoch: 19 Global Step: 325280 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:06:44,912-Speed 5183.64 samples/sec Loss 0.3947 LearningRate 0.0001 Epoch: 19 Global Step: 325290 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:06:46,880-Speed 5204.33 samples/sec Loss 0.3969 LearningRate 0.0001 Epoch: 19 Global Step: 325300 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:06:48,858-Speed 5178.48 samples/sec Loss 0.3864 LearningRate 0.0001 Epoch: 19 Global Step: 325310 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:06:50,844-Speed 5158.91 samples/sec Loss 0.3850 LearningRate 0.0001 Epoch: 19 Global Step: 325320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:06:52,822-Speed 5177.26 samples/sec Loss 0.3825 LearningRate 0.0001 Epoch: 19 Global Step: 325330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:06:54,788-Speed 5211.71 samples/sec Loss 0.3991 LearningRate 0.0001 Epoch: 19 Global Step: 325340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:06:56,766-Speed 5176.80 samples/sec Loss 0.3775 LearningRate 0.0001 Epoch: 19 Global Step: 325350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:06:58,761-Speed 5133.73 samples/sec Loss 0.3832 LearningRate 0.0001 Epoch: 19 Global Step: 325360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:07:00,757-Speed 5132.96 samples/sec Loss 0.3986 LearningRate 0.0001 Epoch: 19 Global Step: 325370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:07:02,754-Speed 5129.93 samples/sec Loss 0.3981 LearningRate 0.0001 Epoch: 19 Global Step: 325380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:07:04,732-Speed 5179.10 samples/sec Loss 0.3998 LearningRate 0.0001 Epoch: 19 Global Step: 325390 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:07:06,705-Speed 5192.62 samples/sec Loss 0.3900 LearningRate 0.0001 Epoch: 19 Global Step: 325400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:07:08,674-Speed 5202.68 samples/sec Loss 0.3922 LearningRate 0.0001 Epoch: 19 Global Step: 325410 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:07:10,641-Speed 5207.59 samples/sec Loss 0.3622 LearningRate 0.0001 Epoch: 19 Global Step: 325420 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:07:12,619-Speed 5178.00 samples/sec Loss 0.3989 LearningRate 0.0001 Epoch: 19 Global Step: 325430 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:07:14,598-Speed 5176.23 samples/sec Loss 0.3753 LearningRate 0.0001 Epoch: 19 Global Step: 325440 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:07:16,567-Speed 5202.65 samples/sec Loss 0.3578 LearningRate 0.0001 Epoch: 19 Global Step: 325450 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:07:18,551-Speed 5162.80 samples/sec Loss 0.3990 LearningRate 0.0001 Epoch: 19 Global Step: 325460 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:07:20,529-Speed 5178.46 samples/sec Loss 0.3804 LearningRate 0.0001 Epoch: 19 Global Step: 325470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:07:22,530-Speed 5117.56 samples/sec Loss 0.3994 LearningRate 0.0001 Epoch: 19 Global Step: 325480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:07:24,499-Speed 5203.77 samples/sec Loss 0.3792 LearningRate 0.0001 Epoch: 19 Global Step: 325490 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:07:26,485-Speed 5156.66 samples/sec Loss 0.4065 LearningRate 0.0001 Epoch: 19 Global Step: 325500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:07:28,465-Speed 5173.46 samples/sec Loss 0.3903 LearningRate 0.0001 Epoch: 19 Global Step: 325510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:07:30,456-Speed 5146.33 samples/sec Loss 0.3756 LearningRate 0.0001 Epoch: 19 Global Step: 325520 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:07:32,425-Speed 5203.27 samples/sec Loss 0.3767 LearningRate 0.0001 Epoch: 19 Global Step: 325530 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:07:34,381-Speed 5237.21 samples/sec Loss 0.4014 LearningRate 0.0001 Epoch: 19 Global Step: 325540 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:07:36,360-Speed 5175.20 samples/sec Loss 0.3974 LearningRate 0.0001 Epoch: 19 Global Step: 325550 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:07:38,332-Speed 5193.86 samples/sec Loss 0.3674 LearningRate 0.0001 Epoch: 19 Global Step: 325560 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:07:40,299-Speed 5209.77 samples/sec Loss 0.4061 LearningRate 0.0001 Epoch: 19 Global Step: 325570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:07:42,269-Speed 5198.11 samples/sec Loss 0.3919 LearningRate 0.0001 Epoch: 19 Global Step: 325580 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:07:44,235-Speed 5210.72 samples/sec Loss 0.4081 LearningRate 0.0001 Epoch: 19 Global Step: 325590 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:07:46,212-Speed 5181.33 samples/sec Loss 0.3829 LearningRate 0.0001 Epoch: 19 Global Step: 325600 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:07:48,201-Speed 5151.66 samples/sec Loss 0.3933 LearningRate 0.0001 Epoch: 19 Global Step: 325610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:07:50,201-Speed 5119.59 samples/sec Loss 0.3964 LearningRate 0.0001 Epoch: 19 Global Step: 325620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:07:52,188-Speed 5156.49 samples/sec Loss 0.3994 LearningRate 0.0001 Epoch: 19 Global Step: 325630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:07:54,157-Speed 5201.72 samples/sec Loss 0.3781 LearningRate 0.0001 Epoch: 19 Global Step: 325640 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:07:56,123-Speed 5209.58 samples/sec Loss 0.3934 LearningRate 0.0001 Epoch: 19 Global Step: 325650 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:07:58,116-Speed 5141.30 samples/sec Loss 0.3854 LearningRate 0.0001 Epoch: 19 Global Step: 325660 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:08:00,108-Speed 5141.23 samples/sec Loss 0.3920 LearningRate 0.0001 Epoch: 19 Global Step: 325670 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:08:02,100-Speed 5142.67 samples/sec Loss 0.3865 LearningRate 0.0001 Epoch: 19 Global Step: 325680 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:08:04,068-Speed 5207.00 samples/sec Loss 0.3934 LearningRate 0.0001 Epoch: 19 Global Step: 325690 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:08:06,034-Speed 5209.43 samples/sec Loss 0.3877 LearningRate 0.0001 Epoch: 19 Global Step: 325700 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:08:07,997-Speed 5217.56 samples/sec Loss 0.3824 LearningRate 0.0001 Epoch: 19 Global Step: 325710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:08:10,027-Speed 5046.06 samples/sec Loss 0.3734 LearningRate 0.0001 Epoch: 19 Global Step: 325720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:08:12,022-Speed 5134.37 samples/sec Loss 0.3943 LearningRate 0.0001 Epoch: 19 Global Step: 325730 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:08:14,034-Speed 5090.71 samples/sec Loss 0.3926 LearningRate 0.0001 Epoch: 19 Global Step: 325740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:08:16,026-Speed 5143.74 samples/sec Loss 0.4047 LearningRate 0.0001 Epoch: 19 Global Step: 325750 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:08:18,007-Speed 5168.98 samples/sec Loss 0.3892 LearningRate 0.0001 Epoch: 19 Global Step: 325760 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:08:19,981-Speed 5189.88 samples/sec Loss 0.3515 LearningRate 0.0001 Epoch: 19 Global Step: 325770 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:08:21,946-Speed 5212.54 samples/sec Loss 0.3986 LearningRate 0.0001 Epoch: 19 Global Step: 325780 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:08:23,917-Speed 5198.35 samples/sec Loss 0.4212 LearningRate 0.0001 Epoch: 19 Global Step: 325790 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:08:25,889-Speed 5194.12 samples/sec Loss 0.3901 LearningRate 0.0001 Epoch: 19 Global Step: 325800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:08:27,870-Speed 5171.14 samples/sec Loss 0.3915 LearningRate 0.0001 Epoch: 19 Global Step: 325810 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:08:29,834-Speed 5215.52 samples/sec Loss 0.4003 LearningRate 0.0001 Epoch: 19 Global Step: 325820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:08:31,815-Speed 5171.16 samples/sec Loss 0.3832 LearningRate 0.0001 Epoch: 19 Global Step: 325830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:08:33,817-Speed 5115.31 samples/sec Loss 0.3898 LearningRate 0.0001 Epoch: 19 Global Step: 325840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:08:35,827-Speed 5097.28 samples/sec Loss 0.4161 LearningRate 0.0001 Epoch: 19 Global Step: 325850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:08:37,817-Speed 5146.59 samples/sec Loss 0.3871 LearningRate 0.0001 Epoch: 19 Global Step: 325860 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:08:39,823-Speed 5126.38 samples/sec Loss 0.3832 LearningRate 0.0001 Epoch: 19 Global Step: 325870 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:08:41,798-Speed 5185.01 samples/sec Loss 0.3677 LearningRate 0.0001 Epoch: 19 Global Step: 325880 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:08:43,767-Speed 5204.47 samples/sec Loss 0.4051 LearningRate 0.0001 Epoch: 19 Global Step: 325890 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:08:45,749-Speed 5166.74 samples/sec Loss 0.3651 LearningRate 0.0001 Epoch: 19 Global Step: 325900 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:08:47,730-Speed 5172.71 samples/sec Loss 0.3902 LearningRate 0.0001 Epoch: 19 Global Step: 325910 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:08:49,718-Speed 5151.57 samples/sec Loss 0.3778 LearningRate 0.0001 Epoch: 19 Global Step: 325920 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:08:51,695-Speed 5181.45 samples/sec Loss 0.4043 LearningRate 0.0001 Epoch: 19 Global Step: 325930 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:08:53,663-Speed 5205.77 samples/sec Loss 0.3848 LearningRate 0.0001 Epoch: 19 Global Step: 325940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:08:55,632-Speed 5202.46 samples/sec Loss 0.3839 LearningRate 0.0001 Epoch: 19 Global Step: 325950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:08:57,626-Speed 5135.01 samples/sec Loss 0.3832 LearningRate 0.0001 Epoch: 19 Global Step: 325960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:08:59,672-Speed 5007.89 samples/sec Loss 0.3819 LearningRate 0.0001 Epoch: 19 Global Step: 325970 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:09:01,691-Speed 5072.18 samples/sec Loss 0.3912 LearningRate 0.0001 Epoch: 19 Global Step: 325980 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:09:03,688-Speed 5129.11 samples/sec Loss 0.3680 LearningRate 0.0001 Epoch: 19 Global Step: 325990 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:09:05,663-Speed 5188.08 samples/sec Loss 0.3917 LearningRate 0.0001 Epoch: 19 Global Step: 326000 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:09:32,390-[lfw][326000]XNorm: 21.583963 Training: 2022-04-11 21:09:32,390-[lfw][326000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 21:09:32,391-[lfw][326000]Accuracy-Highest: 0.99833 Training: 2022-04-11 21:10:03,157-[cfp_fp][326000]XNorm: 22.094932 Training: 2022-04-11 21:10:03,157-[cfp_fp][326000]Accuracy-Flip: 0.99029+-0.00398 Training: 2022-04-11 21:10:03,158-[cfp_fp][326000]Accuracy-Highest: 0.99071 Training: 2022-04-11 21:10:29,738-[agedb_30][326000]XNorm: 22.721378 Training: 2022-04-11 21:10:29,738-[agedb_30][326000]Accuracy-Flip: 0.98333+-0.00596 Training: 2022-04-11 21:10:29,739-[agedb_30][326000]Accuracy-Highest: 0.98450 Training: 2022-04-11 21:10:31,711-Speed 119.00 samples/sec Loss 0.3718 LearningRate 0.0001 Epoch: 19 Global Step: 326010 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:10:33,673-Speed 5219.76 samples/sec Loss 0.3950 LearningRate 0.0001 Epoch: 19 Global Step: 326020 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:10:35,691-Speed 5075.33 samples/sec Loss 0.3843 LearningRate 0.0001 Epoch: 19 Global Step: 326030 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:10:37,666-Speed 5188.12 samples/sec Loss 0.3968 LearningRate 0.0001 Epoch: 19 Global Step: 326040 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:10:39,680-Speed 5086.24 samples/sec Loss 0.4258 LearningRate 0.0001 Epoch: 19 Global Step: 326050 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:10:41,669-Speed 5151.45 samples/sec Loss 0.3776 LearningRate 0.0001 Epoch: 19 Global Step: 326060 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:10:43,628-Speed 5226.43 samples/sec Loss 0.4229 LearningRate 0.0001 Epoch: 19 Global Step: 326070 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:10:45,599-Speed 5197.58 samples/sec Loss 0.3889 LearningRate 0.0001 Epoch: 19 Global Step: 326080 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:10:47,582-Speed 5166.37 samples/sec Loss 0.4129 LearningRate 0.0001 Epoch: 19 Global Step: 326090 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:10:49,546-Speed 5215.74 samples/sec Loss 0.3940 LearningRate 0.0001 Epoch: 19 Global Step: 326100 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:10:51,508-Speed 5220.16 samples/sec Loss 0.3877 LearningRate 0.0001 Epoch: 19 Global Step: 326110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:10:53,476-Speed 5205.78 samples/sec Loss 0.3904 LearningRate 0.0001 Epoch: 19 Global Step: 326120 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:10:55,456-Speed 5171.56 samples/sec Loss 0.3861 LearningRate 0.0001 Epoch: 19 Global Step: 326130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:10:57,426-Speed 5200.55 samples/sec Loss 0.4015 LearningRate 0.0001 Epoch: 19 Global Step: 326140 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:10:59,406-Speed 5172.41 samples/sec Loss 0.3720 LearningRate 0.0001 Epoch: 19 Global Step: 326150 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:11:01,384-Speed 5179.52 samples/sec Loss 0.4085 LearningRate 0.0001 Epoch: 19 Global Step: 326160 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:11:03,357-Speed 5194.07 samples/sec Loss 0.3929 LearningRate 0.0001 Epoch: 19 Global Step: 326170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:11:05,320-Speed 5218.21 samples/sec Loss 0.3968 LearningRate 0.0001 Epoch: 19 Global Step: 326180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:11:07,281-Speed 5223.57 samples/sec Loss 0.3453 LearningRate 0.0001 Epoch: 19 Global Step: 326190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:11:09,266-Speed 5159.57 samples/sec Loss 0.3858 LearningRate 0.0001 Epoch: 19 Global Step: 326200 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:11:11,248-Speed 5169.25 samples/sec Loss 0.3837 LearningRate 0.0001 Epoch: 19 Global Step: 326210 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:11:13,216-Speed 5204.10 samples/sec Loss 0.4008 LearningRate 0.0001 Epoch: 19 Global Step: 326220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:11:15,185-Speed 5202.66 samples/sec Loss 0.3923 LearningRate 0.0001 Epoch: 19 Global Step: 326230 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:11:17,160-Speed 5187.12 samples/sec Loss 0.3655 LearningRate 0.0001 Epoch: 19 Global Step: 326240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:11:19,128-Speed 5204.35 samples/sec Loss 0.3932 LearningRate 0.0001 Epoch: 19 Global Step: 326250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:11:21,095-Speed 5206.34 samples/sec Loss 0.4026 LearningRate 0.0001 Epoch: 19 Global Step: 326260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:11:23,062-Speed 5208.20 samples/sec Loss 0.3966 LearningRate 0.0001 Epoch: 19 Global Step: 326270 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:11:25,024-Speed 5220.69 samples/sec Loss 0.3727 LearningRate 0.0001 Epoch: 19 Global Step: 326280 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:11:26,995-Speed 5197.14 samples/sec Loss 0.4190 LearningRate 0.0001 Epoch: 19 Global Step: 326290 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:11:28,969-Speed 5191.32 samples/sec Loss 0.3620 LearningRate 0.0001 Epoch: 19 Global Step: 326300 Fp16 Grad Scale: 131072 Required: 1 hours Training: 2022-04-11 21:11:30,930-Speed 5221.80 samples/sec Loss 0.3947 LearningRate 0.0001 Epoch: 19 Global Step: 326310 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:11:32,913-Speed 5167.06 samples/sec Loss 0.3952 LearningRate 0.0001 Epoch: 19 Global Step: 326320 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:11:34,903-Speed 5146.63 samples/sec Loss 0.3918 LearningRate 0.0001 Epoch: 19 Global Step: 326330 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:11:36,871-Speed 5203.54 samples/sec Loss 0.4058 LearningRate 0.0001 Epoch: 19 Global Step: 326340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:11:38,841-Speed 5200.69 samples/sec Loss 0.3861 LearningRate 0.0001 Epoch: 19 Global Step: 326350 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:11:40,809-Speed 5206.18 samples/sec Loss 0.3952 LearningRate 0.0000 Epoch: 19 Global Step: 326360 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:11:42,797-Speed 5151.95 samples/sec Loss 0.4033 LearningRate 0.0000 Epoch: 19 Global Step: 326370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:11:44,762-Speed 5212.07 samples/sec Loss 0.3864 LearningRate 0.0000 Epoch: 19 Global Step: 326380 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-11 21:11:46,787-Speed 5060.07 samples/sec Loss 0.3640 LearningRate 0.0000 Epoch: 19 Global Step: 326390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:11:48,767-Speed 5173.67 samples/sec Loss 0.3921 LearningRate 0.0000 Epoch: 19 Global Step: 326400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:11:50,737-Speed 5200.06 samples/sec Loss 0.3920 LearningRate 0.0000 Epoch: 19 Global Step: 326410 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:11:52,768-Speed 5042.82 samples/sec Loss 0.3940 LearningRate 0.0000 Epoch: 19 Global Step: 326420 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:11:54,738-Speed 5199.37 samples/sec Loss 0.3766 LearningRate 0.0000 Epoch: 19 Global Step: 326430 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:11:56,700-Speed 5220.25 samples/sec Loss 0.3827 LearningRate 0.0000 Epoch: 19 Global Step: 326440 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:11:58,667-Speed 5207.03 samples/sec Loss 0.4059 LearningRate 0.0000 Epoch: 19 Global Step: 326450 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:12:00,637-Speed 5200.86 samples/sec Loss 0.3570 LearningRate 0.0000 Epoch: 19 Global Step: 326460 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:12:02,601-Speed 5215.74 samples/sec Loss 0.3676 LearningRate 0.0000 Epoch: 19 Global Step: 326470 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:12:04,563-Speed 5221.59 samples/sec Loss 0.3686 LearningRate 0.0000 Epoch: 19 Global Step: 326480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:12:06,541-Speed 5177.54 samples/sec Loss 0.3837 LearningRate 0.0000 Epoch: 19 Global Step: 326490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:12:08,509-Speed 5207.68 samples/sec Loss 0.3652 LearningRate 0.0000 Epoch: 19 Global Step: 326500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:12:10,499-Speed 5148.50 samples/sec Loss 0.3878 LearningRate 0.0000 Epoch: 19 Global Step: 326510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:12:12,483-Speed 5163.11 samples/sec Loss 0.3946 LearningRate 0.0000 Epoch: 19 Global Step: 326520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:12:14,503-Speed 5071.74 samples/sec Loss 0.3778 LearningRate 0.0000 Epoch: 19 Global Step: 326530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:12:16,481-Speed 5178.99 samples/sec Loss 0.3915 LearningRate 0.0000 Epoch: 19 Global Step: 326540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:12:18,460-Speed 5173.81 samples/sec Loss 0.3814 LearningRate 0.0000 Epoch: 19 Global Step: 326550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:12:20,423-Speed 5219.23 samples/sec Loss 0.4014 LearningRate 0.0000 Epoch: 19 Global Step: 326560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:12:22,388-Speed 5212.38 samples/sec Loss 0.4008 LearningRate 0.0000 Epoch: 19 Global Step: 326570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:12:24,356-Speed 5203.54 samples/sec Loss 0.3717 LearningRate 0.0000 Epoch: 19 Global Step: 326580 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:12:26,336-Speed 5175.53 samples/sec Loss 0.3982 LearningRate 0.0000 Epoch: 19 Global Step: 326590 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:12:28,311-Speed 5184.41 samples/sec Loss 0.3853 LearningRate 0.0000 Epoch: 19 Global Step: 326600 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:12:30,276-Speed 5216.73 samples/sec Loss 0.4010 LearningRate 0.0000 Epoch: 19 Global Step: 326610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:12:32,237-Speed 5221.94 samples/sec Loss 0.3617 LearningRate 0.0000 Epoch: 19 Global Step: 326620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:12:34,234-Speed 5131.21 samples/sec Loss 0.3889 LearningRate 0.0000 Epoch: 19 Global Step: 326630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:12:36,214-Speed 5172.30 samples/sec Loss 0.3998 LearningRate 0.0000 Epoch: 19 Global Step: 326640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:12:38,248-Speed 5037.42 samples/sec Loss 0.3933 LearningRate 0.0000 Epoch: 19 Global Step: 326650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:12:40,242-Speed 5136.87 samples/sec Loss 0.3853 LearningRate 0.0000 Epoch: 19 Global Step: 326660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:12:42,204-Speed 5220.30 samples/sec Loss 0.3964 LearningRate 0.0000 Epoch: 19 Global Step: 326670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:12:44,166-Speed 5221.85 samples/sec Loss 0.3984 LearningRate 0.0000 Epoch: 19 Global Step: 326680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:12:46,139-Speed 5190.57 samples/sec Loss 0.3747 LearningRate 0.0000 Epoch: 19 Global Step: 326690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:12:48,124-Speed 5160.29 samples/sec Loss 0.3596 LearningRate 0.0000 Epoch: 19 Global Step: 326700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:12:50,125-Speed 5118.58 samples/sec Loss 0.3911 LearningRate 0.0000 Epoch: 19 Global Step: 326710 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:12:52,126-Speed 5120.38 samples/sec Loss 0.3689 LearningRate 0.0000 Epoch: 19 Global Step: 326720 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:12:54,148-Speed 5064.42 samples/sec Loss 0.3812 LearningRate 0.0000 Epoch: 19 Global Step: 326730 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:12:56,118-Speed 5202.04 samples/sec Loss 0.3784 LearningRate 0.0000 Epoch: 19 Global Step: 326740 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:12:58,115-Speed 5129.25 samples/sec Loss 0.3962 LearningRate 0.0000 Epoch: 19 Global Step: 326750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:13:00,084-Speed 5201.71 samples/sec Loss 0.3962 LearningRate 0.0000 Epoch: 19 Global Step: 326760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:13:02,057-Speed 5193.78 samples/sec Loss 0.3840 LearningRate 0.0000 Epoch: 19 Global Step: 326770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:13:04,035-Speed 5178.04 samples/sec Loss 0.4095 LearningRate 0.0000 Epoch: 19 Global Step: 326780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:13:06,008-Speed 5190.92 samples/sec Loss 0.4026 LearningRate 0.0000 Epoch: 19 Global Step: 326790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:13:07,969-Speed 5222.27 samples/sec Loss 0.3677 LearningRate 0.0000 Epoch: 19 Global Step: 326800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:13:09,952-Speed 5167.98 samples/sec Loss 0.4069 LearningRate 0.0000 Epoch: 19 Global Step: 326810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:13:11,946-Speed 5137.38 samples/sec Loss 0.3718 LearningRate 0.0000 Epoch: 19 Global Step: 326820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:13:13,918-Speed 5193.41 samples/sec Loss 0.3978 LearningRate 0.0000 Epoch: 19 Global Step: 326830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:13:15,918-Speed 5122.35 samples/sec Loss 0.3821 LearningRate 0.0000 Epoch: 19 Global Step: 326840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:13:17,910-Speed 5173.10 samples/sec Loss 0.3799 LearningRate 0.0000 Epoch: 19 Global Step: 326850 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:13:19,876-Speed 5208.89 samples/sec Loss 0.3949 LearningRate 0.0000 Epoch: 19 Global Step: 326860 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:13:21,865-Speed 5149.71 samples/sec Loss 0.3798 LearningRate 0.0000 Epoch: 19 Global Step: 326870 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:13:23,840-Speed 5187.28 samples/sec Loss 0.3991 LearningRate 0.0000 Epoch: 19 Global Step: 326880 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:13:25,802-Speed 5221.29 samples/sec Loss 0.3760 LearningRate 0.0000 Epoch: 19 Global Step: 326890 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:13:27,782-Speed 5172.35 samples/sec Loss 0.4012 LearningRate 0.0000 Epoch: 19 Global Step: 326900 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:13:29,764-Speed 5167.21 samples/sec Loss 0.3732 LearningRate 0.0000 Epoch: 19 Global Step: 326910 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:13:31,728-Speed 5215.58 samples/sec Loss 0.3977 LearningRate 0.0000 Epoch: 19 Global Step: 326920 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:13:33,695-Speed 5207.98 samples/sec Loss 0.3803 LearningRate 0.0000 Epoch: 19 Global Step: 326930 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:13:35,667-Speed 5194.20 samples/sec Loss 0.4003 LearningRate 0.0000 Epoch: 19 Global Step: 326940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:13:37,654-Speed 5156.36 samples/sec Loss 0.3967 LearningRate 0.0000 Epoch: 19 Global Step: 326950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:13:39,633-Speed 5174.47 samples/sec Loss 0.3947 LearningRate 0.0000 Epoch: 19 Global Step: 326960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:13:41,657-Speed 5062.19 samples/sec Loss 0.3861 LearningRate 0.0000 Epoch: 19 Global Step: 326970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:13:43,618-Speed 5223.49 samples/sec Loss 0.3671 LearningRate 0.0000 Epoch: 19 Global Step: 326980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:13:45,581-Speed 5219.62 samples/sec Loss 0.3984 LearningRate 0.0000 Epoch: 19 Global Step: 326990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:13:47,577-Speed 5130.59 samples/sec Loss 0.4015 LearningRate 0.0000 Epoch: 19 Global Step: 327000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:13:49,587-Speed 5095.60 samples/sec Loss 0.4060 LearningRate 0.0000 Epoch: 19 Global Step: 327010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:13:51,566-Speed 5177.33 samples/sec Loss 0.3923 LearningRate 0.0000 Epoch: 19 Global Step: 327020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:13:53,528-Speed 5221.48 samples/sec Loss 0.3868 LearningRate 0.0000 Epoch: 19 Global Step: 327030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:13:55,488-Speed 5224.42 samples/sec Loss 0.3818 LearningRate 0.0000 Epoch: 19 Global Step: 327040 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:13:57,469-Speed 5171.80 samples/sec Loss 0.3869 LearningRate 0.0000 Epoch: 19 Global Step: 327050 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:13:59,462-Speed 5138.34 samples/sec Loss 0.3642 LearningRate 0.0000 Epoch: 19 Global Step: 327060 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:14:01,433-Speed 5197.23 samples/sec Loss 0.3893 LearningRate 0.0000 Epoch: 19 Global Step: 327070 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:14:03,418-Speed 5161.31 samples/sec Loss 0.3835 LearningRate 0.0000 Epoch: 19 Global Step: 327080 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:14:05,382-Speed 5215.66 samples/sec Loss 0.3813 LearningRate 0.0000 Epoch: 19 Global Step: 327090 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:14:07,338-Speed 5236.33 samples/sec Loss 0.3597 LearningRate 0.0000 Epoch: 19 Global Step: 327100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:14:09,301-Speed 5218.99 samples/sec Loss 0.3826 LearningRate 0.0000 Epoch: 19 Global Step: 327110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:14:11,268-Speed 5206.61 samples/sec Loss 0.4035 LearningRate 0.0000 Epoch: 19 Global Step: 327120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:14:13,270-Speed 5118.25 samples/sec Loss 0.4024 LearningRate 0.0000 Epoch: 19 Global Step: 327130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:14:15,264-Speed 5137.19 samples/sec Loss 0.4114 LearningRate 0.0000 Epoch: 19 Global Step: 327140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:14:17,238-Speed 5188.77 samples/sec Loss 0.3974 LearningRate 0.0000 Epoch: 19 Global Step: 327150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:14:19,200-Speed 5221.07 samples/sec Loss 0.4309 LearningRate 0.0000 Epoch: 19 Global Step: 327160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:14:21,176-Speed 5182.70 samples/sec Loss 0.4009 LearningRate 0.0000 Epoch: 19 Global Step: 327170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:14:23,142-Speed 5211.72 samples/sec Loss 0.3918 LearningRate 0.0000 Epoch: 19 Global Step: 327180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:14:25,116-Speed 5187.84 samples/sec Loss 0.3994 LearningRate 0.0000 Epoch: 19 Global Step: 327190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:14:27,119-Speed 5115.35 samples/sec Loss 0.4195 LearningRate 0.0000 Epoch: 19 Global Step: 327200 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:14:29,095-Speed 5181.54 samples/sec Loss 0.4003 LearningRate 0.0000 Epoch: 19 Global Step: 327210 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:14:31,061-Speed 5213.16 samples/sec Loss 0.3831 LearningRate 0.0000 Epoch: 19 Global Step: 327220 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:14:33,040-Speed 5176.31 samples/sec Loss 0.3916 LearningRate 0.0000 Epoch: 19 Global Step: 327230 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:14:35,023-Speed 5164.84 samples/sec Loss 0.3889 LearningRate 0.0000 Epoch: 19 Global Step: 327240 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:14:37,033-Speed 5096.12 samples/sec Loss 0.4228 LearningRate 0.0000 Epoch: 19 Global Step: 327250 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:14:39,068-Speed 5033.41 samples/sec Loss 0.3739 LearningRate 0.0000 Epoch: 19 Global Step: 327260 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:14:41,026-Speed 5231.72 samples/sec Loss 0.3906 LearningRate 0.0000 Epoch: 19 Global Step: 327270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:14:42,996-Speed 5197.89 samples/sec Loss 0.3757 LearningRate 0.0000 Epoch: 19 Global Step: 327280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:14:44,964-Speed 5205.32 samples/sec Loss 0.3823 LearningRate 0.0000 Epoch: 19 Global Step: 327290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:14:46,931-Speed 5207.05 samples/sec Loss 0.4025 LearningRate 0.0000 Epoch: 19 Global Step: 327300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:14:48,896-Speed 5214.28 samples/sec Loss 0.3833 LearningRate 0.0000 Epoch: 19 Global Step: 327310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:14:50,861-Speed 5214.73 samples/sec Loss 0.3947 LearningRate 0.0000 Epoch: 19 Global Step: 327320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:14:52,862-Speed 5118.97 samples/sec Loss 0.3905 LearningRate 0.0000 Epoch: 19 Global Step: 327330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:14:54,856-Speed 5135.08 samples/sec Loss 0.3762 LearningRate 0.0000 Epoch: 19 Global Step: 327340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:14:56,827-Speed 5198.49 samples/sec Loss 0.3750 LearningRate 0.0000 Epoch: 19 Global Step: 327350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:14:58,825-Speed 5127.81 samples/sec Loss 0.3620 LearningRate 0.0000 Epoch: 19 Global Step: 327360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:15:00,820-Speed 5134.66 samples/sec Loss 0.3772 LearningRate 0.0000 Epoch: 19 Global Step: 327370 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:15:02,791-Speed 5196.81 samples/sec Loss 0.4063 LearningRate 0.0000 Epoch: 19 Global Step: 327380 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:15:04,788-Speed 5130.83 samples/sec Loss 0.3741 LearningRate 0.0000 Epoch: 19 Global Step: 327390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:15:06,768-Speed 5173.89 samples/sec Loss 0.3992 LearningRate 0.0000 Epoch: 19 Global Step: 327400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:15:08,745-Speed 5181.72 samples/sec Loss 0.3972 LearningRate 0.0000 Epoch: 19 Global Step: 327410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:15:10,727-Speed 5166.54 samples/sec Loss 0.3859 LearningRate 0.0000 Epoch: 19 Global Step: 327420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:15:12,720-Speed 5141.61 samples/sec Loss 0.4006 LearningRate 0.0000 Epoch: 19 Global Step: 327430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:15:14,683-Speed 5218.27 samples/sec Loss 0.3811 LearningRate 0.0000 Epoch: 19 Global Step: 327440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:15:16,662-Speed 5176.27 samples/sec Loss 0.3912 LearningRate 0.0000 Epoch: 19 Global Step: 327450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:15:18,624-Speed 5219.27 samples/sec Loss 0.3889 LearningRate 0.0000 Epoch: 19 Global Step: 327460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:15:20,594-Speed 5200.07 samples/sec Loss 0.4016 LearningRate 0.0000 Epoch: 19 Global Step: 327470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:15:22,566-Speed 5193.57 samples/sec Loss 0.3861 LearningRate 0.0000 Epoch: 19 Global Step: 327480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:15:24,534-Speed 5204.33 samples/sec Loss 0.4018 LearningRate 0.0000 Epoch: 19 Global Step: 327490 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:15:26,504-Speed 5199.78 samples/sec Loss 0.3791 LearningRate 0.0000 Epoch: 19 Global Step: 327500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:15:28,476-Speed 5196.23 samples/sec Loss 0.4045 LearningRate 0.0000 Epoch: 19 Global Step: 327510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:15:30,445-Speed 5201.69 samples/sec Loss 0.3849 LearningRate 0.0000 Epoch: 19 Global Step: 327520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:15:32,428-Speed 5164.87 samples/sec Loss 0.3974 LearningRate 0.0000 Epoch: 19 Global Step: 327530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:15:34,393-Speed 5213.74 samples/sec Loss 0.3858 LearningRate 0.0000 Epoch: 19 Global Step: 327540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:15:36,363-Speed 5198.68 samples/sec Loss 0.4189 LearningRate 0.0000 Epoch: 19 Global Step: 327550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:15:38,333-Speed 5201.62 samples/sec Loss 0.4038 LearningRate 0.0000 Epoch: 19 Global Step: 327560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:15:40,308-Speed 5186.61 samples/sec Loss 0.4023 LearningRate 0.0000 Epoch: 19 Global Step: 327570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:15:42,275-Speed 5207.67 samples/sec Loss 0.3797 LearningRate 0.0000 Epoch: 19 Global Step: 327580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:15:44,238-Speed 5216.22 samples/sec Loss 0.3758 LearningRate 0.0000 Epoch: 19 Global Step: 327590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:15:46,204-Speed 5210.06 samples/sec Loss 0.3900 LearningRate 0.0000 Epoch: 19 Global Step: 327600 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:15:48,184-Speed 5174.68 samples/sec Loss 0.3649 LearningRate 0.0000 Epoch: 19 Global Step: 327610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:15:50,193-Speed 5097.54 samples/sec Loss 0.3946 LearningRate 0.0000 Epoch: 19 Global Step: 327620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:15:52,180-Speed 5155.88 samples/sec Loss 0.3845 LearningRate 0.0000 Epoch: 19 Global Step: 327630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:15:54,170-Speed 5146.44 samples/sec Loss 0.3797 LearningRate 0.0000 Epoch: 19 Global Step: 327640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:15:56,143-Speed 5194.49 samples/sec Loss 0.3684 LearningRate 0.0000 Epoch: 19 Global Step: 327650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:15:58,140-Speed 5129.26 samples/sec Loss 0.4109 LearningRate 0.0000 Epoch: 19 Global Step: 327660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:16:00,105-Speed 5211.28 samples/sec Loss 0.3842 LearningRate 0.0000 Epoch: 19 Global Step: 327670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:16:02,076-Speed 5198.12 samples/sec Loss 0.3661 LearningRate 0.0000 Epoch: 19 Global Step: 327680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:16:04,045-Speed 5202.11 samples/sec Loss 0.3785 LearningRate 0.0000 Epoch: 19 Global Step: 327690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:16:06,020-Speed 5188.06 samples/sec Loss 0.3907 LearningRate 0.0000 Epoch: 19 Global Step: 327700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:16:07,987-Speed 5206.76 samples/sec Loss 0.4140 LearningRate 0.0000 Epoch: 19 Global Step: 327710 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:16:09,992-Speed 5108.28 samples/sec Loss 0.3766 LearningRate 0.0000 Epoch: 19 Global Step: 327720 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:16:11,967-Speed 5186.04 samples/sec Loss 0.3794 LearningRate 0.0000 Epoch: 19 Global Step: 327730 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:16:13,957-Speed 5147.40 samples/sec Loss 0.3944 LearningRate 0.0000 Epoch: 19 Global Step: 327740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:16:15,950-Speed 5138.88 samples/sec Loss 0.4055 LearningRate 0.0000 Epoch: 19 Global Step: 327750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:16:17,935-Speed 5163.43 samples/sec Loss 0.3783 LearningRate 0.0000 Epoch: 19 Global Step: 327760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:16:19,903-Speed 5203.87 samples/sec Loss 0.3973 LearningRate 0.0000 Epoch: 19 Global Step: 327770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:16:21,871-Speed 5204.97 samples/sec Loss 0.3833 LearningRate 0.0000 Epoch: 19 Global Step: 327780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:16:23,842-Speed 5197.41 samples/sec Loss 0.3953 LearningRate 0.0000 Epoch: 19 Global Step: 327790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:16:25,820-Speed 5179.29 samples/sec Loss 0.3980 LearningRate 0.0000 Epoch: 19 Global Step: 327800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:16:27,792-Speed 5193.03 samples/sec Loss 0.3974 LearningRate 0.0000 Epoch: 19 Global Step: 327810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:16:29,755-Speed 5218.23 samples/sec Loss 0.3922 LearningRate 0.0000 Epoch: 19 Global Step: 327820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:16:31,721-Speed 5211.22 samples/sec Loss 0.3842 LearningRate 0.0000 Epoch: 19 Global Step: 327830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:16:33,696-Speed 5185.29 samples/sec Loss 0.3786 LearningRate 0.0000 Epoch: 19 Global Step: 327840 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:16:35,712-Speed 5081.86 samples/sec Loss 0.3778 LearningRate 0.0000 Epoch: 19 Global Step: 327850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:16:37,714-Speed 5116.28 samples/sec Loss 0.3859 LearningRate 0.0000 Epoch: 19 Global Step: 327860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:16:39,689-Speed 5186.45 samples/sec Loss 0.3635 LearningRate 0.0000 Epoch: 19 Global Step: 327870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:16:41,662-Speed 5194.08 samples/sec Loss 0.3812 LearningRate 0.0000 Epoch: 19 Global Step: 327880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:16:43,631-Speed 5201.71 samples/sec Loss 0.4236 LearningRate 0.0000 Epoch: 19 Global Step: 327890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:16:45,604-Speed 5191.68 samples/sec Loss 0.3954 LearningRate 0.0000 Epoch: 19 Global Step: 327900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:16:47,592-Speed 5152.86 samples/sec Loss 0.3910 LearningRate 0.0000 Epoch: 19 Global Step: 327910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:16:49,567-Speed 5185.30 samples/sec Loss 0.3805 LearningRate 0.0000 Epoch: 19 Global Step: 327920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:16:51,552-Speed 5162.08 samples/sec Loss 0.3781 LearningRate 0.0000 Epoch: 19 Global Step: 327930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:16:53,545-Speed 5138.94 samples/sec Loss 0.3992 LearningRate 0.0000 Epoch: 19 Global Step: 327940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:16:55,508-Speed 5217.24 samples/sec Loss 0.3727 LearningRate 0.0000 Epoch: 19 Global Step: 327950 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:16:57,473-Speed 5212.81 samples/sec Loss 0.3829 LearningRate 0.0000 Epoch: 19 Global Step: 327960 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:16:59,462-Speed 5149.54 samples/sec Loss 0.3880 LearningRate 0.0000 Epoch: 19 Global Step: 327970 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:17:01,451-Speed 5150.10 samples/sec Loss 0.3904 LearningRate 0.0000 Epoch: 19 Global Step: 327980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:17:03,437-Speed 5160.16 samples/sec Loss 0.4018 LearningRate 0.0000 Epoch: 19 Global Step: 327990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:17:05,418-Speed 5170.95 samples/sec Loss 0.4125 LearningRate 0.0000 Epoch: 19 Global Step: 328000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:17:32,217-[lfw][328000]XNorm: 21.678740 Training: 2022-04-11 21:17:32,218-[lfw][328000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 21:17:32,218-[lfw][328000]Accuracy-Highest: 0.99833 Training: 2022-04-11 21:18:03,021-[cfp_fp][328000]XNorm: 22.152766 Training: 2022-04-11 21:18:03,022-[cfp_fp][328000]Accuracy-Flip: 0.99000+-0.00404 Training: 2022-04-11 21:18:03,022-[cfp_fp][328000]Accuracy-Highest: 0.99071 Training: 2022-04-11 21:18:29,621-[agedb_30][328000]XNorm: 22.783601 Training: 2022-04-11 21:18:29,622-[agedb_30][328000]Accuracy-Flip: 0.98283+-0.00633 Training: 2022-04-11 21:18:29,622-[agedb_30][328000]Accuracy-Highest: 0.98450 Training: 2022-04-11 21:18:31,604-Speed 118.81 samples/sec Loss 0.3863 LearningRate 0.0000 Epoch: 19 Global Step: 328010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:18:33,564-Speed 5226.25 samples/sec Loss 0.4055 LearningRate 0.0000 Epoch: 19 Global Step: 328020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:18:35,530-Speed 5211.93 samples/sec Loss 0.3670 LearningRate 0.0000 Epoch: 19 Global Step: 328030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:18:37,507-Speed 5179.60 samples/sec Loss 0.3939 LearningRate 0.0000 Epoch: 19 Global Step: 328040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:18:39,473-Speed 5211.40 samples/sec Loss 0.3983 LearningRate 0.0000 Epoch: 19 Global Step: 328050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:18:41,502-Speed 5048.08 samples/sec Loss 0.3923 LearningRate 0.0000 Epoch: 19 Global Step: 328060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:18:43,475-Speed 5191.11 samples/sec Loss 0.4035 LearningRate 0.0000 Epoch: 19 Global Step: 328070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:18:45,444-Speed 5202.42 samples/sec Loss 0.3874 LearningRate 0.0000 Epoch: 19 Global Step: 328080 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:18:47,473-Speed 5048.36 samples/sec Loss 0.3902 LearningRate 0.0000 Epoch: 19 Global Step: 328090 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:18:49,448-Speed 5187.30 samples/sec Loss 0.3928 LearningRate 0.0000 Epoch: 19 Global Step: 328100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:18:51,419-Speed 5198.39 samples/sec Loss 0.3866 LearningRate 0.0000 Epoch: 19 Global Step: 328110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:18:53,402-Speed 5164.40 samples/sec Loss 0.3825 LearningRate 0.0000 Epoch: 19 Global Step: 328120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:18:55,372-Speed 5198.26 samples/sec Loss 0.4013 LearningRate 0.0000 Epoch: 19 Global Step: 328130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:18:57,354-Speed 5170.90 samples/sec Loss 0.3904 LearningRate 0.0000 Epoch: 19 Global Step: 328140 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:18:59,388-Speed 5035.49 samples/sec Loss 0.3760 LearningRate 0.0000 Epoch: 19 Global Step: 328150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:19:01,392-Speed 5111.48 samples/sec Loss 0.3896 LearningRate 0.0000 Epoch: 19 Global Step: 328160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:19:03,370-Speed 5177.30 samples/sec Loss 0.3806 LearningRate 0.0000 Epoch: 19 Global Step: 328170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:19:05,343-Speed 5191.55 samples/sec Loss 0.3721 LearningRate 0.0000 Epoch: 19 Global Step: 328180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:19:07,310-Speed 5210.02 samples/sec Loss 0.3749 LearningRate 0.0000 Epoch: 19 Global Step: 328190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:19:09,290-Speed 5172.09 samples/sec Loss 0.3726 LearningRate 0.0000 Epoch: 19 Global Step: 328200 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:19:11,271-Speed 5169.77 samples/sec Loss 0.3898 LearningRate 0.0000 Epoch: 19 Global Step: 328210 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:19:13,271-Speed 5121.46 samples/sec Loss 0.3649 LearningRate 0.0000 Epoch: 19 Global Step: 328220 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:19:15,258-Speed 5157.38 samples/sec Loss 0.3574 LearningRate 0.0000 Epoch: 19 Global Step: 328230 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:19:17,245-Speed 5153.80 samples/sec Loss 0.3736 LearningRate 0.0000 Epoch: 19 Global Step: 328240 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:19:19,235-Speed 5148.03 samples/sec Loss 0.3726 LearningRate 0.0000 Epoch: 19 Global Step: 328250 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:19:21,222-Speed 5156.29 samples/sec Loss 0.3751 LearningRate 0.0000 Epoch: 19 Global Step: 328260 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:19:23,204-Speed 5167.46 samples/sec Loss 0.3855 LearningRate 0.0000 Epoch: 19 Global Step: 328270 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:19:25,197-Speed 5138.75 samples/sec Loss 0.3861 LearningRate 0.0000 Epoch: 19 Global Step: 328280 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:19:27,221-Speed 5060.73 samples/sec Loss 0.3703 LearningRate 0.0000 Epoch: 19 Global Step: 328290 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:19:29,188-Speed 5208.77 samples/sec Loss 0.3782 LearningRate 0.0000 Epoch: 19 Global Step: 328300 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:19:31,153-Speed 5212.88 samples/sec Loss 0.3860 LearningRate 0.0000 Epoch: 19 Global Step: 328310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:19:33,140-Speed 5156.09 samples/sec Loss 0.3863 LearningRate 0.0000 Epoch: 19 Global Step: 328320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:19:35,120-Speed 5173.14 samples/sec Loss 0.4020 LearningRate 0.0000 Epoch: 19 Global Step: 328330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:19:37,102-Speed 5167.20 samples/sec Loss 0.3984 LearningRate 0.0000 Epoch: 19 Global Step: 328340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:19:39,072-Speed 5200.92 samples/sec Loss 0.3982 LearningRate 0.0000 Epoch: 19 Global Step: 328350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:19:41,045-Speed 5190.27 samples/sec Loss 0.3847 LearningRate 0.0000 Epoch: 19 Global Step: 328360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:19:43,015-Speed 5200.86 samples/sec Loss 0.4009 LearningRate 0.0000 Epoch: 19 Global Step: 328370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:19:45,005-Speed 5148.36 samples/sec Loss 0.3817 LearningRate 0.0000 Epoch: 19 Global Step: 328380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:19:47,057-Speed 4991.23 samples/sec Loss 0.4028 LearningRate 0.0000 Epoch: 19 Global Step: 328390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:19:49,047-Speed 5148.72 samples/sec Loss 0.3848 LearningRate 0.0000 Epoch: 19 Global Step: 328400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:19:51,033-Speed 5157.63 samples/sec Loss 0.3721 LearningRate 0.0000 Epoch: 19 Global Step: 328410 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:19:53,009-Speed 5181.56 samples/sec Loss 0.3561 LearningRate 0.0000 Epoch: 19 Global Step: 328420 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:19:54,980-Speed 5196.72 samples/sec Loss 0.3898 LearningRate 0.0000 Epoch: 19 Global Step: 328430 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:19:56,952-Speed 5195.15 samples/sec Loss 0.3832 LearningRate 0.0000 Epoch: 19 Global Step: 328440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:19:58,934-Speed 5168.81 samples/sec Loss 0.3829 LearningRate 0.0000 Epoch: 19 Global Step: 328450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:20:00,920-Speed 5156.83 samples/sec Loss 0.3659 LearningRate 0.0000 Epoch: 19 Global Step: 328460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:20:02,905-Speed 5161.78 samples/sec Loss 0.3861 LearningRate 0.0000 Epoch: 19 Global Step: 328470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:20:04,885-Speed 5173.09 samples/sec Loss 0.3817 LearningRate 0.0000 Epoch: 19 Global Step: 328480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:20:06,871-Speed 5157.78 samples/sec Loss 0.3793 LearningRate 0.0000 Epoch: 19 Global Step: 328490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:20:08,844-Speed 5193.05 samples/sec Loss 0.3717 LearningRate 0.0000 Epoch: 19 Global Step: 328500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:20:10,835-Speed 5144.16 samples/sec Loss 0.3792 LearningRate 0.0000 Epoch: 19 Global Step: 328510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:20:12,831-Speed 5131.74 samples/sec Loss 0.3865 LearningRate 0.0000 Epoch: 19 Global Step: 328520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:20:14,807-Speed 5183.25 samples/sec Loss 0.3924 LearningRate 0.0000 Epoch: 19 Global Step: 328530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:20:16,784-Speed 5182.49 samples/sec Loss 0.3886 LearningRate 0.0000 Epoch: 19 Global Step: 328540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:20:18,770-Speed 5156.21 samples/sec Loss 0.4006 LearningRate 0.0000 Epoch: 19 Global Step: 328550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:20:20,741-Speed 5197.33 samples/sec Loss 0.3979 LearningRate 0.0000 Epoch: 19 Global Step: 328560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:20:22,712-Speed 5197.29 samples/sec Loss 0.4104 LearningRate 0.0000 Epoch: 19 Global Step: 328570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:20:24,695-Speed 5164.16 samples/sec Loss 0.3820 LearningRate 0.0000 Epoch: 19 Global Step: 328580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:20:26,692-Speed 5131.12 samples/sec Loss 0.4039 LearningRate 0.0000 Epoch: 19 Global Step: 328590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:20:28,683-Speed 5144.09 samples/sec Loss 0.3949 LearningRate 0.0000 Epoch: 19 Global Step: 328600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:20:30,650-Speed 5208.72 samples/sec Loss 0.3832 LearningRate 0.0000 Epoch: 19 Global Step: 328610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 21:20:32,619-Speed 5201.03 samples/sec Loss 0.3912 LearningRate 0.0000 Epoch: 19 Global Step: 328620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 21:20:34,605-Speed 5159.26 samples/sec Loss 0.3837 LearningRate 0.0000 Epoch: 19 Global Step: 328630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 21:20:36,602-Speed 5129.48 samples/sec Loss 0.3883 LearningRate 0.0000 Epoch: 19 Global Step: 328640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 21:20:38,584-Speed 5168.43 samples/sec Loss 0.3966 LearningRate 0.0000 Epoch: 19 Global Step: 328650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 21:20:40,585-Speed 5118.78 samples/sec Loss 0.3796 LearningRate 0.0000 Epoch: 19 Global Step: 328660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 21:20:42,563-Speed 5178.01 samples/sec Loss 0.3944 LearningRate 0.0000 Epoch: 19 Global Step: 328670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 21:20:44,530-Speed 5206.14 samples/sec Loss 0.3674 LearningRate 0.0000 Epoch: 19 Global Step: 328680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 21:20:46,523-Speed 5140.60 samples/sec Loss 0.3830 LearningRate 0.0000 Epoch: 19 Global Step: 328690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 21:20:48,507-Speed 5162.55 samples/sec Loss 0.4088 LearningRate 0.0000 Epoch: 19 Global Step: 328700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-11 21:20:50,502-Speed 5136.16 samples/sec Loss 0.3845 LearningRate 0.0000 Epoch: 19 Global Step: 328710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:20:52,476-Speed 5189.09 samples/sec Loss 0.4153 LearningRate 0.0000 Epoch: 19 Global Step: 328720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:20:54,476-Speed 5121.81 samples/sec Loss 0.3816 LearningRate 0.0000 Epoch: 19 Global Step: 328730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:20:56,453-Speed 5181.77 samples/sec Loss 0.3922 LearningRate 0.0000 Epoch: 19 Global Step: 328740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:20:58,419-Speed 5208.38 samples/sec Loss 0.3738 LearningRate 0.0000 Epoch: 19 Global Step: 328750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:21:00,387-Speed 5206.43 samples/sec Loss 0.3853 LearningRate 0.0000 Epoch: 19 Global Step: 328760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:21:02,388-Speed 5118.77 samples/sec Loss 0.3834 LearningRate 0.0000 Epoch: 19 Global Step: 328770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:21:04,373-Speed 5161.28 samples/sec Loss 0.3768 LearningRate 0.0000 Epoch: 19 Global Step: 328780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:21:06,365-Speed 5142.27 samples/sec Loss 0.3877 LearningRate 0.0000 Epoch: 19 Global Step: 328790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:21:08,372-Speed 5101.82 samples/sec Loss 0.3779 LearningRate 0.0000 Epoch: 19 Global Step: 328800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:21:10,353-Speed 5172.67 samples/sec Loss 0.3782 LearningRate 0.0000 Epoch: 19 Global Step: 328810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:21:12,339-Speed 5157.05 samples/sec Loss 0.4154 LearningRate 0.0000 Epoch: 19 Global Step: 328820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:21:14,324-Speed 5159.97 samples/sec Loss 0.3785 LearningRate 0.0000 Epoch: 19 Global Step: 328830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:21:16,315-Speed 5145.30 samples/sec Loss 0.3568 LearningRate 0.0000 Epoch: 19 Global Step: 328840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:21:18,280-Speed 5213.29 samples/sec Loss 0.4055 LearningRate 0.0000 Epoch: 19 Global Step: 328850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:21:20,248-Speed 5206.38 samples/sec Loss 0.3717 LearningRate 0.0000 Epoch: 19 Global Step: 328860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:21:22,217-Speed 5201.04 samples/sec Loss 0.4035 LearningRate 0.0000 Epoch: 19 Global Step: 328870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:21:24,234-Speed 5078.27 samples/sec Loss 0.3913 LearningRate 0.0000 Epoch: 19 Global Step: 328880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:21:26,223-Speed 5149.81 samples/sec Loss 0.3872 LearningRate 0.0000 Epoch: 19 Global Step: 328890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:21:28,211-Speed 5153.46 samples/sec Loss 0.3857 LearningRate 0.0000 Epoch: 19 Global Step: 328900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:21:30,195-Speed 5162.35 samples/sec Loss 0.3878 LearningRate 0.0000 Epoch: 19 Global Step: 328910 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:21:32,196-Speed 5120.32 samples/sec Loss 0.3903 LearningRate 0.0000 Epoch: 19 Global Step: 328920 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:21:34,173-Speed 5181.60 samples/sec Loss 0.3803 LearningRate 0.0000 Epoch: 19 Global Step: 328930 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:21:36,150-Speed 5181.25 samples/sec Loss 0.3873 LearningRate 0.0000 Epoch: 19 Global Step: 328940 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:21:38,143-Speed 5137.98 samples/sec Loss 0.3772 LearningRate 0.0000 Epoch: 19 Global Step: 328950 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:21:40,118-Speed 5188.18 samples/sec Loss 0.3865 LearningRate 0.0000 Epoch: 19 Global Step: 328960 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:21:42,083-Speed 5212.63 samples/sec Loss 0.3931 LearningRate 0.0000 Epoch: 19 Global Step: 328970 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:21:44,047-Speed 5214.54 samples/sec Loss 0.4299 LearningRate 0.0000 Epoch: 19 Global Step: 328980 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:21:46,021-Speed 5190.52 samples/sec Loss 0.3636 LearningRate 0.0000 Epoch: 19 Global Step: 328990 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:21:48,040-Speed 5074.04 samples/sec Loss 0.3965 LearningRate 0.0000 Epoch: 19 Global Step: 329000 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:21:50,025-Speed 5158.57 samples/sec Loss 0.3743 LearningRate 0.0000 Epoch: 19 Global Step: 329010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:21:52,016-Speed 5146.24 samples/sec Loss 0.4250 LearningRate 0.0000 Epoch: 19 Global Step: 329020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:21:53,983-Speed 5206.57 samples/sec Loss 0.3914 LearningRate 0.0000 Epoch: 19 Global Step: 329030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:21:55,954-Speed 5198.58 samples/sec Loss 0.3977 LearningRate 0.0000 Epoch: 19 Global Step: 329040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:21:57,923-Speed 5201.91 samples/sec Loss 0.3717 LearningRate 0.0000 Epoch: 19 Global Step: 329050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:21:59,892-Speed 5200.41 samples/sec Loss 0.4052 LearningRate 0.0000 Epoch: 19 Global Step: 329060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:22:01,867-Speed 5188.64 samples/sec Loss 0.3806 LearningRate 0.0000 Epoch: 19 Global Step: 329070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:22:03,843-Speed 5182.57 samples/sec Loss 0.3993 LearningRate 0.0000 Epoch: 19 Global Step: 329080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:22:05,813-Speed 5199.70 samples/sec Loss 0.3908 LearningRate 0.0000 Epoch: 19 Global Step: 329090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:22:07,779-Speed 5210.17 samples/sec Loss 0.3738 LearningRate 0.0000 Epoch: 19 Global Step: 329100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:22:09,781-Speed 5116.87 samples/sec Loss 0.3855 LearningRate 0.0000 Epoch: 19 Global Step: 329110 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:22:11,761-Speed 5173.66 samples/sec Loss 0.3662 LearningRate 0.0000 Epoch: 19 Global Step: 329120 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:22:13,734-Speed 5192.54 samples/sec Loss 0.3991 LearningRate 0.0000 Epoch: 19 Global Step: 329130 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:22:15,708-Speed 5186.79 samples/sec Loss 0.3852 LearningRate 0.0000 Epoch: 19 Global Step: 329140 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:22:17,680-Speed 5197.79 samples/sec Loss 0.3822 LearningRate 0.0000 Epoch: 19 Global Step: 329150 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:22:19,647-Speed 5207.06 samples/sec Loss 0.3859 LearningRate 0.0000 Epoch: 19 Global Step: 329160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:22:21,613-Speed 5209.63 samples/sec Loss 0.3818 LearningRate 0.0000 Epoch: 19 Global Step: 329170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:22:23,611-Speed 5127.11 samples/sec Loss 0.3976 LearningRate 0.0000 Epoch: 19 Global Step: 329180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:22:25,608-Speed 5129.29 samples/sec Loss 0.3879 LearningRate 0.0000 Epoch: 19 Global Step: 329190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:22:27,598-Speed 5146.26 samples/sec Loss 0.3818 LearningRate 0.0000 Epoch: 19 Global Step: 329200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:22:29,571-Speed 5192.09 samples/sec Loss 0.3792 LearningRate 0.0000 Epoch: 19 Global Step: 329210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:22:31,541-Speed 5199.78 samples/sec Loss 0.4016 LearningRate 0.0000 Epoch: 19 Global Step: 329220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:22:33,544-Speed 5113.16 samples/sec Loss 0.4087 LearningRate 0.0000 Epoch: 19 Global Step: 329230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:22:35,568-Speed 5061.34 samples/sec Loss 0.4020 LearningRate 0.0000 Epoch: 19 Global Step: 329240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:22:37,559-Speed 5144.95 samples/sec Loss 0.3559 LearningRate 0.0000 Epoch: 19 Global Step: 329250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:22:39,534-Speed 5187.54 samples/sec Loss 0.3886 LearningRate 0.0000 Epoch: 19 Global Step: 329260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:22:41,505-Speed 5196.84 samples/sec Loss 0.3926 LearningRate 0.0000 Epoch: 19 Global Step: 329270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:22:43,472-Speed 5208.63 samples/sec Loss 0.4015 LearningRate 0.0000 Epoch: 19 Global Step: 329280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:22:45,439-Speed 5205.81 samples/sec Loss 0.3809 LearningRate 0.0000 Epoch: 19 Global Step: 329290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:22:47,452-Speed 5090.65 samples/sec Loss 0.3853 LearningRate 0.0000 Epoch: 19 Global Step: 329300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:22:49,435-Speed 5166.17 samples/sec Loss 0.3994 LearningRate 0.0000 Epoch: 19 Global Step: 329310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:22:51,422-Speed 5154.26 samples/sec Loss 0.3921 LearningRate 0.0000 Epoch: 19 Global Step: 329320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:22:53,407-Speed 5160.37 samples/sec Loss 0.3836 LearningRate 0.0000 Epoch: 19 Global Step: 329330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:22:55,384-Speed 5179.82 samples/sec Loss 0.3926 LearningRate 0.0000 Epoch: 19 Global Step: 329340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:22:57,379-Speed 5134.05 samples/sec Loss 0.3706 LearningRate 0.0000 Epoch: 19 Global Step: 329350 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:22:59,377-Speed 5126.30 samples/sec Loss 0.3876 LearningRate 0.0000 Epoch: 19 Global Step: 329360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:23:01,388-Speed 5096.97 samples/sec Loss 0.4040 LearningRate 0.0000 Epoch: 19 Global Step: 329370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:23:03,425-Speed 5028.97 samples/sec Loss 0.3820 LearningRate 0.0000 Epoch: 19 Global Step: 329380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:23:05,402-Speed 5179.92 samples/sec Loss 0.3863 LearningRate 0.0000 Epoch: 19 Global Step: 329390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:23:07,372-Speed 5199.10 samples/sec Loss 0.3985 LearningRate 0.0000 Epoch: 19 Global Step: 329400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:23:09,349-Speed 5182.06 samples/sec Loss 0.3725 LearningRate 0.0000 Epoch: 19 Global Step: 329410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:23:11,332-Speed 5165.70 samples/sec Loss 0.3859 LearningRate 0.0000 Epoch: 19 Global Step: 329420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:23:13,304-Speed 5193.69 samples/sec Loss 0.3833 LearningRate 0.0000 Epoch: 19 Global Step: 329430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:23:15,315-Speed 5094.96 samples/sec Loss 0.3972 LearningRate 0.0000 Epoch: 19 Global Step: 329440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:23:17,309-Speed 5135.93 samples/sec Loss 0.3966 LearningRate 0.0000 Epoch: 19 Global Step: 329450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:23:19,275-Speed 5210.10 samples/sec Loss 0.4041 LearningRate 0.0000 Epoch: 19 Global Step: 329460 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:23:21,247-Speed 5196.55 samples/sec Loss 0.3884 LearningRate 0.0000 Epoch: 19 Global Step: 329470 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:23:23,231-Speed 5161.68 samples/sec Loss 0.3575 LearningRate 0.0000 Epoch: 19 Global Step: 329480 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:23:25,259-Speed 5052.42 samples/sec Loss 0.3911 LearningRate 0.0000 Epoch: 19 Global Step: 329490 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:23:27,262-Speed 5112.39 samples/sec Loss 0.4039 LearningRate 0.0000 Epoch: 19 Global Step: 329500 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:23:29,284-Speed 5066.31 samples/sec Loss 0.3935 LearningRate 0.0000 Epoch: 19 Global Step: 329510 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:23:31,271-Speed 5154.28 samples/sec Loss 0.4066 LearningRate 0.0000 Epoch: 19 Global Step: 329520 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:23:33,247-Speed 5186.10 samples/sec Loss 0.3874 LearningRate 0.0000 Epoch: 19 Global Step: 329530 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:23:35,223-Speed 5181.38 samples/sec Loss 0.3973 LearningRate 0.0000 Epoch: 19 Global Step: 329540 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:23:37,224-Speed 5121.17 samples/sec Loss 0.3674 LearningRate 0.0000 Epoch: 19 Global Step: 329550 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:23:39,209-Speed 5159.54 samples/sec Loss 0.4114 LearningRate 0.0000 Epoch: 19 Global Step: 329560 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:23:41,207-Speed 5125.99 samples/sec Loss 0.3727 LearningRate 0.0000 Epoch: 19 Global Step: 329570 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:23:43,185-Speed 5179.99 samples/sec Loss 0.3705 LearningRate 0.0000 Epoch: 19 Global Step: 329580 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:23:45,155-Speed 5198.93 samples/sec Loss 0.4093 LearningRate 0.0000 Epoch: 19 Global Step: 329590 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:23:47,133-Speed 5180.54 samples/sec Loss 0.3910 LearningRate 0.0000 Epoch: 19 Global Step: 329600 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:23:49,094-Speed 5223.54 samples/sec Loss 0.3802 LearningRate 0.0000 Epoch: 19 Global Step: 329610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:23:51,090-Speed 5130.41 samples/sec Loss 0.3670 LearningRate 0.0000 Epoch: 19 Global Step: 329620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:23:53,110-Speed 5070.22 samples/sec Loss 0.3778 LearningRate 0.0000 Epoch: 19 Global Step: 329630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:23:55,082-Speed 5196.21 samples/sec Loss 0.3672 LearningRate 0.0000 Epoch: 19 Global Step: 329640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:23:57,062-Speed 5171.15 samples/sec Loss 0.4039 LearningRate 0.0000 Epoch: 19 Global Step: 329650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:23:59,056-Speed 5136.94 samples/sec Loss 0.3818 LearningRate 0.0000 Epoch: 19 Global Step: 329660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:24:01,042-Speed 5158.31 samples/sec Loss 0.3850 LearningRate 0.0000 Epoch: 19 Global Step: 329670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:24:03,029-Speed 5155.84 samples/sec Loss 0.3924 LearningRate 0.0000 Epoch: 19 Global Step: 329680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:24:05,001-Speed 5195.06 samples/sec Loss 0.3799 LearningRate 0.0000 Epoch: 19 Global Step: 329690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:24:06,993-Speed 5142.58 samples/sec Loss 0.3602 LearningRate 0.0000 Epoch: 19 Global Step: 329700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:24:08,958-Speed 5211.87 samples/sec Loss 0.3705 LearningRate 0.0000 Epoch: 19 Global Step: 329710 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:24:10,967-Speed 5101.28 samples/sec Loss 0.3986 LearningRate 0.0000 Epoch: 19 Global Step: 329720 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:24:12,971-Speed 5110.95 samples/sec Loss 0.4075 LearningRate 0.0000 Epoch: 19 Global Step: 329730 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:24:14,936-Speed 5212.86 samples/sec Loss 0.3953 LearningRate 0.0000 Epoch: 19 Global Step: 329740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:24:16,908-Speed 5194.01 samples/sec Loss 0.3906 LearningRate 0.0000 Epoch: 19 Global Step: 329750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:24:18,892-Speed 5161.43 samples/sec Loss 0.3872 LearningRate 0.0000 Epoch: 19 Global Step: 329760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:24:20,860-Speed 5206.94 samples/sec Loss 0.3954 LearningRate 0.0000 Epoch: 19 Global Step: 329770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:24:22,846-Speed 5157.05 samples/sec Loss 0.3760 LearningRate 0.0000 Epoch: 19 Global Step: 329780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:24:24,830-Speed 5162.67 samples/sec Loss 0.3792 LearningRate 0.0000 Epoch: 19 Global Step: 329790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:24:26,815-Speed 5160.49 samples/sec Loss 0.3926 LearningRate 0.0000 Epoch: 19 Global Step: 329800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:24:28,790-Speed 5187.80 samples/sec Loss 0.3633 LearningRate 0.0000 Epoch: 19 Global Step: 329810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:24:30,760-Speed 5198.65 samples/sec Loss 0.3968 LearningRate 0.0000 Epoch: 19 Global Step: 329820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:24:32,742-Speed 5168.89 samples/sec Loss 0.3844 LearningRate 0.0000 Epoch: 19 Global Step: 329830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:24:34,755-Speed 5087.18 samples/sec Loss 0.3909 LearningRate 0.0000 Epoch: 19 Global Step: 329840 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:24:36,740-Speed 5160.84 samples/sec Loss 0.3864 LearningRate 0.0000 Epoch: 19 Global Step: 329850 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:24:38,728-Speed 5152.05 samples/sec Loss 0.3921 LearningRate 0.0000 Epoch: 19 Global Step: 329860 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:24:40,696-Speed 5206.86 samples/sec Loss 0.4117 LearningRate 0.0000 Epoch: 19 Global Step: 329870 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:24:42,663-Speed 5207.53 samples/sec Loss 0.3961 LearningRate 0.0000 Epoch: 19 Global Step: 329880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:24:44,635-Speed 5192.66 samples/sec Loss 0.3773 LearningRate 0.0000 Epoch: 19 Global Step: 329890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:24:46,606-Speed 5197.46 samples/sec Loss 0.4050 LearningRate 0.0000 Epoch: 19 Global Step: 329900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:24:48,582-Speed 5184.74 samples/sec Loss 0.4121 LearningRate 0.0000 Epoch: 19 Global Step: 329910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:24:50,560-Speed 5180.60 samples/sec Loss 0.3699 LearningRate 0.0000 Epoch: 19 Global Step: 329920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:24:52,563-Speed 5112.66 samples/sec Loss 0.3649 LearningRate 0.0000 Epoch: 19 Global Step: 329930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:24:54,562-Speed 5124.30 samples/sec Loss 0.3904 LearningRate 0.0000 Epoch: 19 Global Step: 329940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:24:56,541-Speed 5176.61 samples/sec Loss 0.3937 LearningRate 0.0000 Epoch: 19 Global Step: 329950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:24:58,545-Speed 5109.84 samples/sec Loss 0.3862 LearningRate 0.0000 Epoch: 19 Global Step: 329960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:25:00,527-Speed 5169.14 samples/sec Loss 0.3942 LearningRate 0.0000 Epoch: 19 Global Step: 329970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:25:02,525-Speed 5126.52 samples/sec Loss 0.4014 LearningRate 0.0000 Epoch: 19 Global Step: 329980 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:25:04,534-Speed 5099.86 samples/sec Loss 0.3588 LearningRate 0.0000 Epoch: 19 Global Step: 329990 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:25:06,533-Speed 5123.26 samples/sec Loss 0.4057 LearningRate 0.0000 Epoch: 19 Global Step: 330000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:25:33,021-[lfw][330000]XNorm: 21.603245 Training: 2022-04-11 21:25:33,021-[lfw][330000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 21:25:33,022-[lfw][330000]Accuracy-Highest: 0.99833 Training: 2022-04-11 21:26:03,654-[cfp_fp][330000]XNorm: 22.109502 Training: 2022-04-11 21:26:03,655-[cfp_fp][330000]Accuracy-Flip: 0.99000+-0.00424 Training: 2022-04-11 21:26:03,655-[cfp_fp][330000]Accuracy-Highest: 0.99071 Training: 2022-04-11 21:26:30,126-[agedb_30][330000]XNorm: 22.731949 Training: 2022-04-11 21:26:30,127-[agedb_30][330000]Accuracy-Flip: 0.98317+-0.00626 Training: 2022-04-11 21:26:30,127-[agedb_30][330000]Accuracy-Highest: 0.98450 Training: 2022-04-11 21:26:32,109-Speed 119.66 samples/sec Loss 0.3772 LearningRate 0.0000 Epoch: 19 Global Step: 330010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:26:34,072-Speed 5216.93 samples/sec Loss 0.3723 LearningRate 0.0000 Epoch: 19 Global Step: 330020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:26:36,070-Speed 5128.25 samples/sec Loss 0.3996 LearningRate 0.0000 Epoch: 19 Global Step: 330030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:26:38,073-Speed 5113.79 samples/sec Loss 0.3693 LearningRate 0.0000 Epoch: 19 Global Step: 330040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:26:40,056-Speed 5164.25 samples/sec Loss 0.4149 LearningRate 0.0000 Epoch: 19 Global Step: 330050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:26:42,031-Speed 5186.17 samples/sec Loss 0.3576 LearningRate 0.0000 Epoch: 19 Global Step: 330060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:26:44,003-Speed 5195.45 samples/sec Loss 0.3992 LearningRate 0.0000 Epoch: 19 Global Step: 330070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:26:45,986-Speed 5165.07 samples/sec Loss 0.3802 LearningRate 0.0000 Epoch: 19 Global Step: 330080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:26:47,952-Speed 5212.05 samples/sec Loss 0.3900 LearningRate 0.0000 Epoch: 19 Global Step: 330090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:26:49,942-Speed 5145.59 samples/sec Loss 0.3803 LearningRate 0.0000 Epoch: 19 Global Step: 330100 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:26:51,943-Speed 5119.31 samples/sec Loss 0.3917 LearningRate 0.0000 Epoch: 19 Global Step: 330110 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:26:53,915-Speed 5196.20 samples/sec Loss 0.3978 LearningRate 0.0000 Epoch: 19 Global Step: 330120 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:26:55,884-Speed 5202.40 samples/sec Loss 0.4020 LearningRate 0.0000 Epoch: 19 Global Step: 330130 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:26:57,858-Speed 5188.86 samples/sec Loss 0.3834 LearningRate 0.0000 Epoch: 19 Global Step: 330140 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:26:59,857-Speed 5121.85 samples/sec Loss 0.3995 LearningRate 0.0000 Epoch: 19 Global Step: 330150 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:27:01,829-Speed 5196.05 samples/sec Loss 0.3844 LearningRate 0.0000 Epoch: 19 Global Step: 330160 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:27:03,824-Speed 5135.14 samples/sec Loss 0.3940 LearningRate 0.0000 Epoch: 19 Global Step: 330170 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:27:05,801-Speed 5181.04 samples/sec Loss 0.3652 LearningRate 0.0000 Epoch: 19 Global Step: 330180 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:27:07,787-Speed 5155.89 samples/sec Loss 0.3839 LearningRate 0.0000 Epoch: 19 Global Step: 330190 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:27:09,795-Speed 5104.53 samples/sec Loss 0.3763 LearningRate 0.0000 Epoch: 19 Global Step: 330200 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:27:11,783-Speed 5152.31 samples/sec Loss 0.3894 LearningRate 0.0000 Epoch: 19 Global Step: 330210 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:27:13,751-Speed 5203.74 samples/sec Loss 0.3752 LearningRate 0.0000 Epoch: 19 Global Step: 330220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:27:15,753-Speed 5117.37 samples/sec Loss 0.3788 LearningRate 0.0000 Epoch: 19 Global Step: 330230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:27:17,724-Speed 5196.06 samples/sec Loss 0.3890 LearningRate 0.0000 Epoch: 19 Global Step: 330240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:27:19,703-Speed 5177.77 samples/sec Loss 0.3940 LearningRate 0.0000 Epoch: 19 Global Step: 330250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:27:21,691-Speed 5152.72 samples/sec Loss 0.3751 LearningRate 0.0000 Epoch: 19 Global Step: 330260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:27:23,667-Speed 5182.50 samples/sec Loss 0.3782 LearningRate 0.0000 Epoch: 19 Global Step: 330270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:27:25,669-Speed 5115.74 samples/sec Loss 0.3793 LearningRate 0.0000 Epoch: 19 Global Step: 330280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:27:27,666-Speed 5129.96 samples/sec Loss 0.3799 LearningRate 0.0000 Epoch: 19 Global Step: 330290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:27:29,681-Speed 5082.81 samples/sec Loss 0.3868 LearningRate 0.0000 Epoch: 19 Global Step: 330300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:27:31,676-Speed 5136.43 samples/sec Loss 0.3977 LearningRate 0.0000 Epoch: 19 Global Step: 330310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:27:33,652-Speed 5185.30 samples/sec Loss 0.3834 LearningRate 0.0000 Epoch: 19 Global Step: 330320 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:27:35,631-Speed 5173.69 samples/sec Loss 0.3735 LearningRate 0.0000 Epoch: 19 Global Step: 330330 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:27:37,628-Speed 5129.92 samples/sec Loss 0.4060 LearningRate 0.0000 Epoch: 19 Global Step: 330340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:27:39,614-Speed 5158.49 samples/sec Loss 0.3970 LearningRate 0.0000 Epoch: 19 Global Step: 330350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:27:41,609-Speed 5134.80 samples/sec Loss 0.3599 LearningRate 0.0000 Epoch: 19 Global Step: 330360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:27:43,585-Speed 5183.10 samples/sec Loss 0.3728 LearningRate 0.0000 Epoch: 19 Global Step: 330370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:27:45,568-Speed 5167.33 samples/sec Loss 0.3735 LearningRate 0.0000 Epoch: 19 Global Step: 330380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:27:47,568-Speed 5121.93 samples/sec Loss 0.3869 LearningRate 0.0000 Epoch: 19 Global Step: 330390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:27:49,552-Speed 5160.37 samples/sec Loss 0.3750 LearningRate 0.0000 Epoch: 19 Global Step: 330400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:27:51,565-Speed 5090.62 samples/sec Loss 0.3924 LearningRate 0.0000 Epoch: 19 Global Step: 330410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:27:53,607-Speed 5015.75 samples/sec Loss 0.3726 LearningRate 0.0000 Epoch: 19 Global Step: 330420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:27:55,592-Speed 5160.37 samples/sec Loss 0.4268 LearningRate 0.0000 Epoch: 19 Global Step: 330430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:27:57,589-Speed 5130.11 samples/sec Loss 0.3815 LearningRate 0.0000 Epoch: 19 Global Step: 330440 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:27:59,581-Speed 5140.99 samples/sec Loss 0.3947 LearningRate 0.0000 Epoch: 19 Global Step: 330450 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:28:01,567-Speed 5158.57 samples/sec Loss 0.3934 LearningRate 0.0000 Epoch: 19 Global Step: 330460 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:28:03,539-Speed 5194.55 samples/sec Loss 0.3831 LearningRate 0.0000 Epoch: 19 Global Step: 330470 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:28:05,519-Speed 5174.16 samples/sec Loss 0.3776 LearningRate 0.0000 Epoch: 19 Global Step: 330480 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:28:07,505-Speed 5155.85 samples/sec Loss 0.3828 LearningRate 0.0000 Epoch: 19 Global Step: 330490 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:28:09,484-Speed 5176.91 samples/sec Loss 0.3897 LearningRate 0.0000 Epoch: 19 Global Step: 330500 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:28:11,471-Speed 5154.53 samples/sec Loss 0.3861 LearningRate 0.0000 Epoch: 19 Global Step: 330510 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:28:13,469-Speed 5128.21 samples/sec Loss 0.3917 LearningRate 0.0000 Epoch: 19 Global Step: 330520 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:28:15,452-Speed 5166.02 samples/sec Loss 0.3846 LearningRate 0.0000 Epoch: 19 Global Step: 330530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:28:17,448-Speed 5132.36 samples/sec Loss 0.4038 LearningRate 0.0000 Epoch: 19 Global Step: 330540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:28:19,416-Speed 5202.97 samples/sec Loss 0.4128 LearningRate 0.0000 Epoch: 19 Global Step: 330550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:28:21,400-Speed 5164.19 samples/sec Loss 0.3873 LearningRate 0.0000 Epoch: 19 Global Step: 330560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:28:23,370-Speed 5199.81 samples/sec Loss 0.3779 LearningRate 0.0000 Epoch: 19 Global Step: 330570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:28:25,365-Speed 5133.30 samples/sec Loss 0.3789 LearningRate 0.0000 Epoch: 19 Global Step: 330580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:28:27,344-Speed 5175.92 samples/sec Loss 0.3726 LearningRate 0.0000 Epoch: 19 Global Step: 330590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:28:29,310-Speed 5209.60 samples/sec Loss 0.3893 LearningRate 0.0000 Epoch: 19 Global Step: 330600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:28:31,275-Speed 5214.10 samples/sec Loss 0.3771 LearningRate 0.0000 Epoch: 19 Global Step: 330610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:28:33,269-Speed 5137.59 samples/sec Loss 0.3907 LearningRate 0.0000 Epoch: 19 Global Step: 330620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:28:35,266-Speed 5129.02 samples/sec Loss 0.3913 LearningRate 0.0000 Epoch: 19 Global Step: 330630 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:28:37,277-Speed 5094.16 samples/sec Loss 0.3911 LearningRate 0.0000 Epoch: 19 Global Step: 330640 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:28:39,305-Speed 5051.72 samples/sec Loss 0.3953 LearningRate 0.0000 Epoch: 19 Global Step: 330650 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:28:41,279-Speed 5187.82 samples/sec Loss 0.3822 LearningRate 0.0000 Epoch: 19 Global Step: 330660 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:28:43,249-Speed 5201.51 samples/sec Loss 0.3994 LearningRate 0.0000 Epoch: 19 Global Step: 330670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:28:45,218-Speed 5201.52 samples/sec Loss 0.3729 LearningRate 0.0000 Epoch: 19 Global Step: 330680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:28:47,189-Speed 5196.88 samples/sec Loss 0.3944 LearningRate 0.0000 Epoch: 19 Global Step: 330690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:28:49,172-Speed 5164.89 samples/sec Loss 0.3883 LearningRate 0.0000 Epoch: 19 Global Step: 330700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:28:51,159-Speed 5154.78 samples/sec Loss 0.3742 LearningRate 0.0000 Epoch: 19 Global Step: 330710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:28:53,141-Speed 5169.19 samples/sec Loss 0.3888 LearningRate 0.0000 Epoch: 19 Global Step: 330720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:28:55,114-Speed 5190.48 samples/sec Loss 0.3931 LearningRate 0.0000 Epoch: 19 Global Step: 330730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:28:57,080-Speed 5210.55 samples/sec Loss 0.4041 LearningRate 0.0000 Epoch: 19 Global Step: 330740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:28:59,057-Speed 5180.56 samples/sec Loss 0.3979 LearningRate 0.0000 Epoch: 19 Global Step: 330750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:29:01,038-Speed 5171.92 samples/sec Loss 0.3730 LearningRate 0.0000 Epoch: 19 Global Step: 330760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:29:03,032-Speed 5138.23 samples/sec Loss 0.3946 LearningRate 0.0000 Epoch: 19 Global Step: 330770 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:29:05,050-Speed 5075.15 samples/sec Loss 0.3557 LearningRate 0.0000 Epoch: 19 Global Step: 330780 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:29:07,029-Speed 5178.18 samples/sec Loss 0.4069 LearningRate 0.0000 Epoch: 19 Global Step: 330790 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:29:09,025-Speed 5130.50 samples/sec Loss 0.3828 LearningRate 0.0000 Epoch: 19 Global Step: 330800 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:29:11,023-Speed 5127.40 samples/sec Loss 0.3991 LearningRate 0.0000 Epoch: 19 Global Step: 330810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:29:12,991-Speed 5205.52 samples/sec Loss 0.3917 LearningRate 0.0000 Epoch: 19 Global Step: 330820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:29:14,987-Speed 5131.88 samples/sec Loss 0.3744 LearningRate 0.0000 Epoch: 19 Global Step: 330830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:29:16,966-Speed 5175.23 samples/sec Loss 0.3665 LearningRate 0.0000 Epoch: 19 Global Step: 330840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:29:18,937-Speed 5197.91 samples/sec Loss 0.4040 LearningRate 0.0000 Epoch: 19 Global Step: 330850 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:29:20,929-Speed 5140.40 samples/sec Loss 0.3896 LearningRate 0.0000 Epoch: 19 Global Step: 330860 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:29:22,900-Speed 5196.87 samples/sec Loss 0.3870 LearningRate 0.0000 Epoch: 19 Global Step: 330870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:29:24,897-Speed 5131.16 samples/sec Loss 0.3824 LearningRate 0.0000 Epoch: 19 Global Step: 330880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:29:26,864-Speed 5208.31 samples/sec Loss 0.3776 LearningRate 0.0000 Epoch: 19 Global Step: 330890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:29:28,832-Speed 5203.03 samples/sec Loss 0.3895 LearningRate 0.0000 Epoch: 19 Global Step: 330900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:29:30,803-Speed 5199.50 samples/sec Loss 0.4110 LearningRate 0.0000 Epoch: 19 Global Step: 330910 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:29:32,779-Speed 5181.74 samples/sec Loss 0.3919 LearningRate 0.0000 Epoch: 19 Global Step: 330920 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:29:34,753-Speed 5189.40 samples/sec Loss 0.4045 LearningRate 0.0000 Epoch: 19 Global Step: 330930 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:29:36,743-Speed 5147.51 samples/sec Loss 0.3888 LearningRate 0.0000 Epoch: 19 Global Step: 330940 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:29:38,737-Speed 5136.12 samples/sec Loss 0.3817 LearningRate 0.0000 Epoch: 19 Global Step: 330950 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:29:40,718-Speed 5172.49 samples/sec Loss 0.4030 LearningRate 0.0000 Epoch: 19 Global Step: 330960 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:29:42,705-Speed 5153.85 samples/sec Loss 0.3934 LearningRate 0.0000 Epoch: 19 Global Step: 330970 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:29:44,690-Speed 5160.60 samples/sec Loss 0.4055 LearningRate 0.0000 Epoch: 19 Global Step: 330980 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:29:46,684-Speed 5137.57 samples/sec Loss 0.3812 LearningRate 0.0000 Epoch: 19 Global Step: 330990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:29:48,682-Speed 5128.39 samples/sec Loss 0.4045 LearningRate 0.0000 Epoch: 19 Global Step: 331000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:29:50,660-Speed 5177.57 samples/sec Loss 0.3849 LearningRate 0.0000 Epoch: 19 Global Step: 331010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:29:52,657-Speed 5130.75 samples/sec Loss 0.3775 LearningRate 0.0000 Epoch: 19 Global Step: 331020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:29:54,629-Speed 5193.95 samples/sec Loss 0.3673 LearningRate 0.0000 Epoch: 19 Global Step: 331030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:29:56,595-Speed 5209.31 samples/sec Loss 0.3891 LearningRate 0.0000 Epoch: 19 Global Step: 331040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:29:58,563-Speed 5204.69 samples/sec Loss 0.3946 LearningRate 0.0000 Epoch: 19 Global Step: 331050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:30:00,532-Speed 5201.67 samples/sec Loss 0.3811 LearningRate 0.0000 Epoch: 19 Global Step: 331060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:30:02,513-Speed 5172.41 samples/sec Loss 0.3906 LearningRate 0.0000 Epoch: 19 Global Step: 331070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:30:04,481-Speed 5203.18 samples/sec Loss 0.3953 LearningRate 0.0000 Epoch: 19 Global Step: 331080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:30:06,448-Speed 5207.79 samples/sec Loss 0.3843 LearningRate 0.0000 Epoch: 19 Global Step: 331090 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:30:08,433-Speed 5161.21 samples/sec Loss 0.3966 LearningRate 0.0000 Epoch: 19 Global Step: 331100 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:30:10,445-Speed 5091.68 samples/sec Loss 0.3962 LearningRate 0.0000 Epoch: 19 Global Step: 331110 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:30:12,513-Speed 4954.94 samples/sec Loss 0.3919 LearningRate 0.0000 Epoch: 19 Global Step: 331120 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:30:14,529-Speed 5079.56 samples/sec Loss 0.3740 LearningRate 0.0000 Epoch: 19 Global Step: 331130 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:30:16,559-Speed 5046.26 samples/sec Loss 0.4062 LearningRate 0.0000 Epoch: 19 Global Step: 331140 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:30:18,537-Speed 5179.39 samples/sec Loss 0.3677 LearningRate 0.0000 Epoch: 19 Global Step: 331150 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:30:20,527-Speed 5147.60 samples/sec Loss 0.3996 LearningRate 0.0000 Epoch: 19 Global Step: 331160 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:30:22,516-Speed 5149.37 samples/sec Loss 0.3723 LearningRate 0.0000 Epoch: 19 Global Step: 331170 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:30:24,520-Speed 5111.89 samples/sec Loss 0.3593 LearningRate 0.0000 Epoch: 19 Global Step: 331180 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:30:26,491-Speed 5196.72 samples/sec Loss 0.3939 LearningRate 0.0000 Epoch: 19 Global Step: 331190 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:30:28,473-Speed 5167.63 samples/sec Loss 0.3900 LearningRate 0.0000 Epoch: 19 Global Step: 331200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:30:30,456-Speed 5163.80 samples/sec Loss 0.3865 LearningRate 0.0000 Epoch: 19 Global Step: 331210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:30:32,438-Speed 5170.31 samples/sec Loss 0.3914 LearningRate 0.0000 Epoch: 19 Global Step: 331220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:30:34,406-Speed 5205.32 samples/sec Loss 0.3752 LearningRate 0.0000 Epoch: 19 Global Step: 331230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:30:36,395-Speed 5149.12 samples/sec Loss 0.3877 LearningRate 0.0000 Epoch: 19 Global Step: 331240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:30:38,386-Speed 5144.58 samples/sec Loss 0.3822 LearningRate 0.0000 Epoch: 19 Global Step: 331250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:30:40,365-Speed 5175.55 samples/sec Loss 0.3909 LearningRate 0.0000 Epoch: 19 Global Step: 331260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:30:42,338-Speed 5192.58 samples/sec Loss 0.3669 LearningRate 0.0000 Epoch: 19 Global Step: 331270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:30:44,314-Speed 5183.84 samples/sec Loss 0.3968 LearningRate 0.0000 Epoch: 19 Global Step: 331280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:30:46,287-Speed 5192.35 samples/sec Loss 0.3992 LearningRate 0.0000 Epoch: 19 Global Step: 331290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:30:48,265-Speed 5178.01 samples/sec Loss 0.3810 LearningRate 0.0000 Epoch: 19 Global Step: 331300 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:30:50,280-Speed 5083.78 samples/sec Loss 0.4110 LearningRate 0.0000 Epoch: 19 Global Step: 331310 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:30:52,275-Speed 5134.18 samples/sec Loss 0.3710 LearningRate 0.0000 Epoch: 19 Global Step: 331320 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:30:54,289-Speed 5087.57 samples/sec Loss 0.3806 LearningRate 0.0000 Epoch: 19 Global Step: 331330 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:30:56,311-Speed 5067.20 samples/sec Loss 0.3942 LearningRate 0.0000 Epoch: 19 Global Step: 331340 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:30:58,295-Speed 5160.42 samples/sec Loss 0.3937 LearningRate 0.0000 Epoch: 19 Global Step: 331350 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:31:00,286-Speed 5145.99 samples/sec Loss 0.3870 LearningRate 0.0000 Epoch: 19 Global Step: 331360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:31:02,255-Speed 5203.14 samples/sec Loss 0.3953 LearningRate 0.0000 Epoch: 19 Global Step: 331370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:31:04,227-Speed 5193.71 samples/sec Loss 0.4033 LearningRate 0.0000 Epoch: 19 Global Step: 331380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:31:06,195-Speed 5204.96 samples/sec Loss 0.3625 LearningRate 0.0000 Epoch: 19 Global Step: 331390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:31:08,176-Speed 5170.64 samples/sec Loss 0.3887 LearningRate 0.0000 Epoch: 19 Global Step: 331400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:31:10,162-Speed 5158.26 samples/sec Loss 0.3861 LearningRate 0.0000 Epoch: 19 Global Step: 331410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:31:12,145-Speed 5165.62 samples/sec Loss 0.4009 LearningRate 0.0000 Epoch: 19 Global Step: 331420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:31:14,128-Speed 5163.48 samples/sec Loss 0.3872 LearningRate 0.0000 Epoch: 19 Global Step: 331430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:31:16,122-Speed 5138.97 samples/sec Loss 0.3873 LearningRate 0.0000 Epoch: 19 Global Step: 331440 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:31:18,120-Speed 5125.69 samples/sec Loss 0.3894 LearningRate 0.0000 Epoch: 19 Global Step: 331450 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:31:20,089-Speed 5204.54 samples/sec Loss 0.3911 LearningRate 0.0000 Epoch: 19 Global Step: 331460 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:31:22,107-Speed 5075.52 samples/sec Loss 0.3977 LearningRate 0.0000 Epoch: 19 Global Step: 331470 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:31:24,101-Speed 5136.00 samples/sec Loss 0.3938 LearningRate 0.0000 Epoch: 19 Global Step: 331480 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:31:26,091-Speed 5147.09 samples/sec Loss 0.4046 LearningRate 0.0000 Epoch: 19 Global Step: 331490 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:31:28,087-Speed 5133.29 samples/sec Loss 0.3741 LearningRate 0.0000 Epoch: 19 Global Step: 331500 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:31:30,079-Speed 5141.95 samples/sec Loss 0.3552 LearningRate 0.0000 Epoch: 19 Global Step: 331510 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:31:32,041-Speed 5221.18 samples/sec Loss 0.3756 LearningRate 0.0000 Epoch: 19 Global Step: 331520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:31:34,039-Speed 5126.10 samples/sec Loss 0.3889 LearningRate 0.0000 Epoch: 19 Global Step: 331530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:31:36,031-Speed 5143.40 samples/sec Loss 0.3917 LearningRate 0.0000 Epoch: 19 Global Step: 331540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:31:38,045-Speed 5086.50 samples/sec Loss 0.3926 LearningRate 0.0000 Epoch: 19 Global Step: 331550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:31:40,031-Speed 5158.49 samples/sec Loss 0.3830 LearningRate 0.0000 Epoch: 19 Global Step: 331560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:31:42,000-Speed 5201.60 samples/sec Loss 0.3803 LearningRate 0.0000 Epoch: 19 Global Step: 331570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:31:43,969-Speed 5202.37 samples/sec Loss 0.3989 LearningRate 0.0000 Epoch: 19 Global Step: 331580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:31:45,941-Speed 5193.51 samples/sec Loss 0.3858 LearningRate 0.0000 Epoch: 19 Global Step: 331590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:31:47,926-Speed 5160.26 samples/sec Loss 0.3855 LearningRate 0.0000 Epoch: 19 Global Step: 331600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:31:49,914-Speed 5153.19 samples/sec Loss 0.3705 LearningRate 0.0000 Epoch: 19 Global Step: 331610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:31:51,896-Speed 5168.55 samples/sec Loss 0.3789 LearningRate 0.0000 Epoch: 19 Global Step: 331620 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:31:53,865-Speed 5202.05 samples/sec Loss 0.3724 LearningRate 0.0000 Epoch: 19 Global Step: 331630 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:31:55,833-Speed 5205.83 samples/sec Loss 0.3907 LearningRate 0.0000 Epoch: 19 Global Step: 331640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:31:57,834-Speed 5117.19 samples/sec Loss 0.3871 LearningRate 0.0000 Epoch: 19 Global Step: 331650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:31:59,838-Speed 5112.89 samples/sec Loss 0.3967 LearningRate 0.0000 Epoch: 19 Global Step: 331660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:32:01,834-Speed 5133.04 samples/sec Loss 0.4078 LearningRate 0.0000 Epoch: 19 Global Step: 331670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:32:03,826-Speed 5141.05 samples/sec Loss 0.4013 LearningRate 0.0000 Epoch: 19 Global Step: 331680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:32:05,806-Speed 5174.03 samples/sec Loss 0.3922 LearningRate 0.0000 Epoch: 19 Global Step: 331690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:32:07,783-Speed 5182.69 samples/sec Loss 0.3826 LearningRate 0.0000 Epoch: 19 Global Step: 331700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:32:09,769-Speed 5156.12 samples/sec Loss 0.3758 LearningRate 0.0000 Epoch: 19 Global Step: 331710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:32:11,764-Speed 5134.17 samples/sec Loss 0.4022 LearningRate 0.0000 Epoch: 19 Global Step: 331720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:32:13,770-Speed 5106.24 samples/sec Loss 0.3677 LearningRate 0.0000 Epoch: 19 Global Step: 331730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:32:15,771-Speed 5119.51 samples/sec Loss 0.3888 LearningRate 0.0000 Epoch: 19 Global Step: 331740 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:32:17,776-Speed 5108.44 samples/sec Loss 0.3753 LearningRate 0.0000 Epoch: 19 Global Step: 331750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:32:19,765-Speed 5149.38 samples/sec Loss 0.4133 LearningRate 0.0000 Epoch: 19 Global Step: 331760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:32:21,745-Speed 5173.25 samples/sec Loss 0.3932 LearningRate 0.0000 Epoch: 19 Global Step: 331770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:32:23,739-Speed 5140.37 samples/sec Loss 0.3804 LearningRate 0.0000 Epoch: 19 Global Step: 331780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:32:25,713-Speed 5188.61 samples/sec Loss 0.3982 LearningRate 0.0000 Epoch: 19 Global Step: 331790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:32:27,689-Speed 5182.60 samples/sec Loss 0.3933 LearningRate 0.0000 Epoch: 19 Global Step: 331800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:32:29,668-Speed 5177.41 samples/sec Loss 0.3875 LearningRate 0.0000 Epoch: 19 Global Step: 331810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:32:31,636-Speed 5202.70 samples/sec Loss 0.3868 LearningRate 0.0000 Epoch: 19 Global Step: 331820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:32:33,618-Speed 5168.19 samples/sec Loss 0.3834 LearningRate 0.0000 Epoch: 19 Global Step: 331830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:32:35,600-Speed 5170.06 samples/sec Loss 0.3835 LearningRate 0.0000 Epoch: 19 Global Step: 331840 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:32:37,584-Speed 5161.17 samples/sec Loss 0.3702 LearningRate 0.0000 Epoch: 19 Global Step: 331850 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:32:39,582-Speed 5128.11 samples/sec Loss 0.3836 LearningRate 0.0000 Epoch: 19 Global Step: 331860 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:32:41,554-Speed 5194.14 samples/sec Loss 0.3880 LearningRate 0.0000 Epoch: 19 Global Step: 331870 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:32:43,525-Speed 5196.45 samples/sec Loss 0.3885 LearningRate 0.0000 Epoch: 19 Global Step: 331880 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:32:45,531-Speed 5108.09 samples/sec Loss 0.3883 LearningRate 0.0000 Epoch: 19 Global Step: 331890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:32:47,542-Speed 5092.38 samples/sec Loss 0.3866 LearningRate 0.0000 Epoch: 19 Global Step: 331900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:32:49,540-Speed 5127.88 samples/sec Loss 0.3941 LearningRate 0.0000 Epoch: 19 Global Step: 331910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:32:51,515-Speed 5187.15 samples/sec Loss 0.3928 LearningRate 0.0000 Epoch: 19 Global Step: 331920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:32:53,571-Speed 4980.48 samples/sec Loss 0.3837 LearningRate 0.0000 Epoch: 19 Global Step: 331930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:32:55,544-Speed 5193.22 samples/sec Loss 0.3876 LearningRate 0.0000 Epoch: 19 Global Step: 331940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:32:57,531-Speed 5155.08 samples/sec Loss 0.3636 LearningRate 0.0000 Epoch: 19 Global Step: 331950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:32:59,537-Speed 5104.20 samples/sec Loss 0.3771 LearningRate 0.0000 Epoch: 19 Global Step: 331960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:33:01,544-Speed 5105.34 samples/sec Loss 0.3886 LearningRate 0.0000 Epoch: 19 Global Step: 331970 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:33:03,519-Speed 5185.70 samples/sec Loss 0.3864 LearningRate 0.0000 Epoch: 19 Global Step: 331980 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:33:05,501-Speed 5169.50 samples/sec Loss 0.3772 LearningRate 0.0000 Epoch: 19 Global Step: 331990 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:33:07,471-Speed 5200.12 samples/sec Loss 0.3726 LearningRate 0.0000 Epoch: 19 Global Step: 332000 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:33:34,121-[lfw][332000]XNorm: 21.558851 Training: 2022-04-11 21:33:34,122-[lfw][332000]Accuracy-Flip: 0.99817+-0.00273 Training: 2022-04-11 21:33:34,122-[lfw][332000]Accuracy-Highest: 0.99833 Training: 2022-04-11 21:34:04,976-[cfp_fp][332000]XNorm: 22.070367 Training: 2022-04-11 21:34:04,977-[cfp_fp][332000]Accuracy-Flip: 0.99071+-0.00379 Training: 2022-04-11 21:34:04,977-[cfp_fp][332000]Accuracy-Highest: 0.99071 Training: 2022-04-11 21:34:31,711-[agedb_30][332000]XNorm: 22.697462 Training: 2022-04-11 21:34:31,711-[agedb_30][332000]Accuracy-Flip: 0.98383+-0.00654 Training: 2022-04-11 21:34:31,712-[agedb_30][332000]Accuracy-Highest: 0.98450 Training: 2022-04-11 21:34:33,697-Speed 118.76 samples/sec Loss 0.3653 LearningRate 0.0000 Epoch: 19 Global Step: 332010 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:34:35,706-Speed 5097.81 samples/sec Loss 0.3840 LearningRate 0.0000 Epoch: 19 Global Step: 332020 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:34:37,706-Speed 5122.21 samples/sec Loss 0.3950 LearningRate 0.0000 Epoch: 19 Global Step: 332030 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:34:39,751-Speed 5008.83 samples/sec Loss 0.3845 LearningRate 0.0000 Epoch: 19 Global Step: 332040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:34:41,733-Speed 5167.90 samples/sec Loss 0.3820 LearningRate 0.0000 Epoch: 19 Global Step: 332050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:34:43,696-Speed 5218.80 samples/sec Loss 0.3847 LearningRate 0.0000 Epoch: 19 Global Step: 332060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:34:45,697-Speed 5149.27 samples/sec Loss 0.4006 LearningRate 0.0000 Epoch: 19 Global Step: 332070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:34:47,685-Speed 5152.92 samples/sec Loss 0.3935 LearningRate 0.0000 Epoch: 19 Global Step: 332080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:34:49,670-Speed 5160.09 samples/sec Loss 0.3702 LearningRate 0.0000 Epoch: 19 Global Step: 332090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:34:51,650-Speed 5172.88 samples/sec Loss 0.3659 LearningRate 0.0000 Epoch: 19 Global Step: 332100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:34:53,619-Speed 5203.14 samples/sec Loss 0.3755 LearningRate 0.0000 Epoch: 19 Global Step: 332110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:34:55,585-Speed 5210.64 samples/sec Loss 0.3555 LearningRate 0.0000 Epoch: 19 Global Step: 332120 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:34:57,599-Speed 5084.84 samples/sec Loss 0.4313 LearningRate 0.0000 Epoch: 19 Global Step: 332130 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:34:59,595-Speed 5132.52 samples/sec Loss 0.3719 LearningRate 0.0000 Epoch: 19 Global Step: 332140 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:35:01,591-Speed 5132.30 samples/sec Loss 0.3825 LearningRate 0.0000 Epoch: 19 Global Step: 332150 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:35:03,564-Speed 5193.77 samples/sec Loss 0.3864 LearningRate 0.0000 Epoch: 19 Global Step: 332160 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:35:05,539-Speed 5184.67 samples/sec Loss 0.3723 LearningRate 0.0000 Epoch: 19 Global Step: 332170 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:35:07,518-Speed 5176.16 samples/sec Loss 0.3803 LearningRate 0.0000 Epoch: 19 Global Step: 332180 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:35:09,486-Speed 5206.62 samples/sec Loss 0.3919 LearningRate 0.0000 Epoch: 19 Global Step: 332190 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:35:11,479-Speed 5139.93 samples/sec Loss 0.4069 LearningRate 0.0000 Epoch: 19 Global Step: 332200 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:35:13,459-Speed 5171.09 samples/sec Loss 0.3986 LearningRate 0.0000 Epoch: 19 Global Step: 332210 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:35:15,439-Speed 5175.64 samples/sec Loss 0.3902 LearningRate 0.0000 Epoch: 19 Global Step: 332220 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:35:17,422-Speed 5165.54 samples/sec Loss 0.3941 LearningRate 0.0000 Epoch: 19 Global Step: 332230 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:35:19,389-Speed 5206.31 samples/sec Loss 0.3817 LearningRate 0.0000 Epoch: 19 Global Step: 332240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:35:21,370-Speed 5171.72 samples/sec Loss 0.3743 LearningRate 0.0000 Epoch: 19 Global Step: 332250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:35:23,372-Speed 5116.16 samples/sec Loss 0.4083 LearningRate 0.0000 Epoch: 19 Global Step: 332260 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:35:25,360-Speed 5153.29 samples/sec Loss 0.3805 LearningRate 0.0000 Epoch: 19 Global Step: 332270 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:35:27,359-Speed 5123.85 samples/sec Loss 0.3897 LearningRate 0.0000 Epoch: 19 Global Step: 332280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:35:29,373-Speed 5087.34 samples/sec Loss 0.3695 LearningRate 0.0000 Epoch: 19 Global Step: 332290 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:35:31,359-Speed 5157.10 samples/sec Loss 0.3947 LearningRate 0.0000 Epoch: 19 Global Step: 332300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:35:33,330-Speed 5196.97 samples/sec Loss 0.3645 LearningRate 0.0000 Epoch: 19 Global Step: 332310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:35:35,324-Speed 5136.88 samples/sec Loss 0.3768 LearningRate 0.0000 Epoch: 19 Global Step: 332320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:35:37,315-Speed 5143.87 samples/sec Loss 0.3990 LearningRate 0.0000 Epoch: 19 Global Step: 332330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:35:39,293-Speed 5178.73 samples/sec Loss 0.4037 LearningRate 0.0000 Epoch: 19 Global Step: 332340 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:35:41,271-Speed 5180.44 samples/sec Loss 0.3945 LearningRate 0.0000 Epoch: 19 Global Step: 332350 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:35:43,253-Speed 5166.85 samples/sec Loss 0.3845 LearningRate 0.0000 Epoch: 19 Global Step: 332360 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:35:45,234-Speed 5172.45 samples/sec Loss 0.3959 LearningRate 0.0000 Epoch: 19 Global Step: 332370 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:35:47,222-Speed 5152.96 samples/sec Loss 0.3983 LearningRate 0.0000 Epoch: 19 Global Step: 332380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:35:49,204-Speed 5167.83 samples/sec Loss 0.3784 LearningRate 0.0000 Epoch: 19 Global Step: 332390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:35:51,213-Speed 5097.55 samples/sec Loss 0.4063 LearningRate 0.0000 Epoch: 19 Global Step: 332400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:35:53,189-Speed 5183.80 samples/sec Loss 0.3919 LearningRate 0.0000 Epoch: 19 Global Step: 332410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:35:55,164-Speed 5186.85 samples/sec Loss 0.3805 LearningRate 0.0000 Epoch: 19 Global Step: 332420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:35:57,140-Speed 5184.03 samples/sec Loss 0.3664 LearningRate 0.0000 Epoch: 19 Global Step: 332430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:35:59,137-Speed 5128.51 samples/sec Loss 0.4098 LearningRate 0.0000 Epoch: 19 Global Step: 332440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:36:01,113-Speed 5183.88 samples/sec Loss 0.3820 LearningRate 0.0000 Epoch: 19 Global Step: 332450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:36:03,090-Speed 5181.26 samples/sec Loss 0.3970 LearningRate 0.0000 Epoch: 19 Global Step: 332460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:36:05,073-Speed 5166.68 samples/sec Loss 0.3630 LearningRate 0.0000 Epoch: 19 Global Step: 332470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:36:07,058-Speed 5159.97 samples/sec Loss 0.3811 LearningRate 0.0000 Epoch: 19 Global Step: 332480 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:36:09,034-Speed 5184.45 samples/sec Loss 0.3969 LearningRate 0.0000 Epoch: 19 Global Step: 332490 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:36:11,031-Speed 5129.62 samples/sec Loss 0.3693 LearningRate 0.0000 Epoch: 19 Global Step: 332500 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:36:13,016-Speed 5160.02 samples/sec Loss 0.3879 LearningRate 0.0000 Epoch: 19 Global Step: 332510 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:36:15,009-Speed 5139.55 samples/sec Loss 0.3902 LearningRate 0.0000 Epoch: 19 Global Step: 332520 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:36:16,999-Speed 5146.96 samples/sec Loss 0.3804 LearningRate 0.0000 Epoch: 19 Global Step: 332530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:36:18,976-Speed 5182.82 samples/sec Loss 0.3433 LearningRate 0.0000 Epoch: 19 Global Step: 332540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:36:20,958-Speed 5167.12 samples/sec Loss 0.3876 LearningRate 0.0000 Epoch: 19 Global Step: 332550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:36:22,956-Speed 5126.59 samples/sec Loss 0.3727 LearningRate 0.0000 Epoch: 19 Global Step: 332560 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:36:24,959-Speed 5114.22 samples/sec Loss 0.3886 LearningRate 0.0000 Epoch: 19 Global Step: 332570 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:36:26,930-Speed 5197.41 samples/sec Loss 0.3675 LearningRate 0.0000 Epoch: 19 Global Step: 332580 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:36:28,902-Speed 5195.43 samples/sec Loss 0.3723 LearningRate 0.0000 Epoch: 19 Global Step: 332590 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:36:30,872-Speed 5199.93 samples/sec Loss 0.4009 LearningRate 0.0000 Epoch: 19 Global Step: 332600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:36:32,864-Speed 5140.96 samples/sec Loss 0.3815 LearningRate 0.0000 Epoch: 19 Global Step: 332610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:36:34,850-Speed 5159.38 samples/sec Loss 0.3749 LearningRate 0.0000 Epoch: 19 Global Step: 332620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:36:36,866-Speed 5080.77 samples/sec Loss 0.3944 LearningRate 0.0000 Epoch: 19 Global Step: 332630 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:36:38,840-Speed 5187.61 samples/sec Loss 0.3650 LearningRate 0.0000 Epoch: 19 Global Step: 332640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:36:40,822-Speed 5170.06 samples/sec Loss 0.3902 LearningRate 0.0000 Epoch: 19 Global Step: 332650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:36:42,801-Speed 5174.29 samples/sec Loss 0.4095 LearningRate 0.0000 Epoch: 19 Global Step: 332660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:36:44,787-Speed 5159.82 samples/sec Loss 0.3910 LearningRate 0.0000 Epoch: 19 Global Step: 332670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:36:46,774-Speed 5154.64 samples/sec Loss 0.3922 LearningRate 0.0000 Epoch: 19 Global Step: 332680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:36:48,755-Speed 5170.49 samples/sec Loss 0.4132 LearningRate 0.0000 Epoch: 19 Global Step: 332690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:36:50,779-Speed 5060.46 samples/sec Loss 0.3806 LearningRate 0.0000 Epoch: 19 Global Step: 332700 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:36:52,757-Speed 5180.57 samples/sec Loss 0.3947 LearningRate 0.0000 Epoch: 19 Global Step: 332710 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:36:54,741-Speed 5163.04 samples/sec Loss 0.3847 LearningRate 0.0000 Epoch: 19 Global Step: 332720 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:36:56,747-Speed 5104.85 samples/sec Loss 0.3692 LearningRate 0.0000 Epoch: 19 Global Step: 332730 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:36:58,772-Speed 5059.71 samples/sec Loss 0.3777 LearningRate 0.0000 Epoch: 19 Global Step: 332740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:37:00,776-Speed 5109.64 samples/sec Loss 0.3958 LearningRate 0.0000 Epoch: 19 Global Step: 332750 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:37:02,758-Speed 5168.43 samples/sec Loss 0.3869 LearningRate 0.0000 Epoch: 19 Global Step: 332760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:37:04,806-Speed 5000.69 samples/sec Loss 0.3927 LearningRate 0.0000 Epoch: 19 Global Step: 332770 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:37:06,802-Speed 5134.56 samples/sec Loss 0.4001 LearningRate 0.0000 Epoch: 19 Global Step: 332780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:37:08,797-Speed 5134.09 samples/sec Loss 0.3889 LearningRate 0.0000 Epoch: 19 Global Step: 332790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:37:10,803-Speed 5105.90 samples/sec Loss 0.3956 LearningRate 0.0000 Epoch: 19 Global Step: 332800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:37:12,797-Speed 5136.44 samples/sec Loss 0.3983 LearningRate 0.0000 Epoch: 19 Global Step: 332810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:37:14,772-Speed 5187.65 samples/sec Loss 0.3922 LearningRate 0.0000 Epoch: 19 Global Step: 332820 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:37:16,746-Speed 5188.09 samples/sec Loss 0.3831 LearningRate 0.0000 Epoch: 19 Global Step: 332830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:37:18,728-Speed 5167.90 samples/sec Loss 0.3749 LearningRate 0.0000 Epoch: 19 Global Step: 332840 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:37:20,712-Speed 5163.66 samples/sec Loss 0.3952 LearningRate 0.0000 Epoch: 19 Global Step: 332850 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:37:22,747-Speed 5034.10 samples/sec Loss 0.3833 LearningRate 0.0000 Epoch: 19 Global Step: 332860 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:37:24,750-Speed 5114.98 samples/sec Loss 0.3867 LearningRate 0.0000 Epoch: 19 Global Step: 332870 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:37:26,749-Speed 5123.30 samples/sec Loss 0.3796 LearningRate 0.0000 Epoch: 19 Global Step: 332880 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:37:28,758-Speed 5098.74 samples/sec Loss 0.3585 LearningRate 0.0000 Epoch: 19 Global Step: 332890 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:37:30,744-Speed 5156.95 samples/sec Loss 0.4027 LearningRate 0.0000 Epoch: 19 Global Step: 332900 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:37:32,724-Speed 5173.53 samples/sec Loss 0.4054 LearningRate 0.0000 Epoch: 19 Global Step: 332910 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:37:34,720-Speed 5131.75 samples/sec Loss 0.3834 LearningRate 0.0000 Epoch: 19 Global Step: 332920 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:37:36,710-Speed 5149.79 samples/sec Loss 0.3926 LearningRate 0.0000 Epoch: 19 Global Step: 332930 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:37:38,686-Speed 5183.03 samples/sec Loss 0.3897 LearningRate 0.0000 Epoch: 19 Global Step: 332940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:37:40,664-Speed 5178.63 samples/sec Loss 0.3823 LearningRate 0.0000 Epoch: 19 Global Step: 332950 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:37:42,636-Speed 5194.61 samples/sec Loss 0.3859 LearningRate 0.0000 Epoch: 19 Global Step: 332960 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:37:44,608-Speed 5191.92 samples/sec Loss 0.3769 LearningRate 0.0000 Epoch: 19 Global Step: 332970 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:37:46,582-Speed 5191.06 samples/sec Loss 0.3946 LearningRate 0.0000 Epoch: 19 Global Step: 332980 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:37:48,613-Speed 5042.40 samples/sec Loss 0.3942 LearningRate 0.0000 Epoch: 19 Global Step: 332990 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:37:50,621-Speed 5102.95 samples/sec Loss 0.3935 LearningRate 0.0000 Epoch: 19 Global Step: 333000 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:37:52,644-Speed 5062.07 samples/sec Loss 0.3887 LearningRate 0.0000 Epoch: 19 Global Step: 333010 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:37:54,643-Speed 5125.01 samples/sec Loss 0.3713 LearningRate 0.0000 Epoch: 19 Global Step: 333020 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:37:56,657-Speed 5086.54 samples/sec Loss 0.3894 LearningRate 0.0000 Epoch: 19 Global Step: 333030 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:37:58,658-Speed 5118.20 samples/sec Loss 0.3999 LearningRate 0.0000 Epoch: 19 Global Step: 333040 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:38:00,635-Speed 5181.10 samples/sec Loss 0.3988 LearningRate 0.0000 Epoch: 19 Global Step: 333050 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:38:02,621-Speed 5159.29 samples/sec Loss 0.3976 LearningRate 0.0000 Epoch: 19 Global Step: 333060 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:38:04,594-Speed 5190.75 samples/sec Loss 0.3883 LearningRate 0.0000 Epoch: 19 Global Step: 333070 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:38:06,569-Speed 5187.05 samples/sec Loss 0.4051 LearningRate 0.0000 Epoch: 19 Global Step: 333080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:38:08,546-Speed 5182.06 samples/sec Loss 0.3763 LearningRate 0.0000 Epoch: 19 Global Step: 333090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:38:10,563-Speed 5078.07 samples/sec Loss 0.4002 LearningRate 0.0000 Epoch: 19 Global Step: 333100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:38:12,579-Speed 5081.50 samples/sec Loss 0.3823 LearningRate 0.0000 Epoch: 19 Global Step: 333110 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:38:14,567-Speed 5151.40 samples/sec Loss 0.3833 LearningRate 0.0000 Epoch: 19 Global Step: 333120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:38:16,545-Speed 5178.95 samples/sec Loss 0.3888 LearningRate 0.0000 Epoch: 19 Global Step: 333130 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:38:18,541-Speed 5130.78 samples/sec Loss 0.3785 LearningRate 0.0000 Epoch: 19 Global Step: 333140 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:38:20,523-Speed 5170.01 samples/sec Loss 0.3674 LearningRate 0.0000 Epoch: 19 Global Step: 333150 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:38:22,505-Speed 5168.71 samples/sec Loss 0.4155 LearningRate 0.0000 Epoch: 19 Global Step: 333160 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:38:24,481-Speed 5184.77 samples/sec Loss 0.3998 LearningRate 0.0000 Epoch: 19 Global Step: 333170 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:38:26,451-Speed 5197.72 samples/sec Loss 0.3875 LearningRate 0.0000 Epoch: 19 Global Step: 333180 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:38:28,424-Speed 5194.15 samples/sec Loss 0.3897 LearningRate 0.0000 Epoch: 19 Global Step: 333190 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:38:30,400-Speed 5183.05 samples/sec Loss 0.3896 LearningRate 0.0000 Epoch: 19 Global Step: 333200 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:38:32,386-Speed 5156.55 samples/sec Loss 0.4010 LearningRate 0.0000 Epoch: 19 Global Step: 333210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:38:34,361-Speed 5186.17 samples/sec Loss 0.4034 LearningRate 0.0000 Epoch: 19 Global Step: 333220 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:38:36,333-Speed 5196.17 samples/sec Loss 0.4130 LearningRate 0.0000 Epoch: 19 Global Step: 333230 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:38:38,324-Speed 5144.22 samples/sec Loss 0.3956 LearningRate 0.0000 Epoch: 19 Global Step: 333240 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:38:40,295-Speed 5196.39 samples/sec Loss 0.3868 LearningRate 0.0000 Epoch: 19 Global Step: 333250 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:38:42,280-Speed 5160.43 samples/sec Loss 0.3823 LearningRate 0.0000 Epoch: 19 Global Step: 333260 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:38:44,251-Speed 5198.04 samples/sec Loss 0.3793 LearningRate 0.0000 Epoch: 19 Global Step: 333270 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:38:46,232-Speed 5170.32 samples/sec Loss 0.4014 LearningRate 0.0000 Epoch: 19 Global Step: 333280 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:38:48,225-Speed 5138.84 samples/sec Loss 0.3639 LearningRate 0.0000 Epoch: 19 Global Step: 333290 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:38:50,233-Speed 5101.82 samples/sec Loss 0.3949 LearningRate 0.0000 Epoch: 19 Global Step: 333300 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:38:52,224-Speed 5146.01 samples/sec Loss 0.3934 LearningRate 0.0000 Epoch: 19 Global Step: 333310 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:38:54,202-Speed 5177.80 samples/sec Loss 0.3931 LearningRate 0.0000 Epoch: 19 Global Step: 333320 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:38:56,185-Speed 5166.20 samples/sec Loss 0.4027 LearningRate 0.0000 Epoch: 19 Global Step: 333330 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:38:58,208-Speed 5063.94 samples/sec Loss 0.3849 LearningRate 0.0000 Epoch: 19 Global Step: 333340 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:39:00,182-Speed 5187.40 samples/sec Loss 0.3811 LearningRate 0.0000 Epoch: 19 Global Step: 333350 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:39:02,191-Speed 5099.38 samples/sec Loss 0.3789 LearningRate 0.0000 Epoch: 19 Global Step: 333360 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:39:04,201-Speed 5095.29 samples/sec Loss 0.3668 LearningRate 0.0000 Epoch: 19 Global Step: 333370 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:39:06,211-Speed 5097.10 samples/sec Loss 0.3732 LearningRate 0.0000 Epoch: 19 Global Step: 333380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:39:08,185-Speed 5190.59 samples/sec Loss 0.3759 LearningRate 0.0000 Epoch: 19 Global Step: 333390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:39:10,173-Speed 5152.43 samples/sec Loss 0.3853 LearningRate 0.0000 Epoch: 19 Global Step: 333400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:39:12,184-Speed 5092.90 samples/sec Loss 0.3900 LearningRate 0.0000 Epoch: 19 Global Step: 333410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:39:14,207-Speed 5062.90 samples/sec Loss 0.3948 LearningRate 0.0000 Epoch: 19 Global Step: 333420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:39:16,205-Speed 5127.87 samples/sec Loss 0.3962 LearningRate 0.0000 Epoch: 19 Global Step: 333430 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:39:18,178-Speed 5191.51 samples/sec Loss 0.3960 LearningRate 0.0000 Epoch: 19 Global Step: 333440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:39:20,149-Speed 5198.03 samples/sec Loss 0.3743 LearningRate 0.0000 Epoch: 19 Global Step: 333450 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:39:22,179-Speed 5044.24 samples/sec Loss 0.3899 LearningRate 0.0000 Epoch: 19 Global Step: 333460 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:39:24,193-Speed 5085.46 samples/sec Loss 0.3860 LearningRate 0.0000 Epoch: 19 Global Step: 333470 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:39:26,196-Speed 5114.70 samples/sec Loss 0.3965 LearningRate 0.0000 Epoch: 19 Global Step: 333480 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:39:28,202-Speed 5107.12 samples/sec Loss 0.3719 LearningRate 0.0000 Epoch: 19 Global Step: 333490 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:39:30,184-Speed 5169.34 samples/sec Loss 0.3755 LearningRate 0.0000 Epoch: 19 Global Step: 333500 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:39:32,155-Speed 5196.22 samples/sec Loss 0.3848 LearningRate 0.0000 Epoch: 19 Global Step: 333510 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:39:34,137-Speed 5169.48 samples/sec Loss 0.4061 LearningRate 0.0000 Epoch: 19 Global Step: 333520 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:39:36,102-Speed 5210.83 samples/sec Loss 0.4214 LearningRate 0.0000 Epoch: 19 Global Step: 333530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:39:38,074-Speed 5195.14 samples/sec Loss 0.3874 LearningRate 0.0000 Epoch: 19 Global Step: 333540 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:39:40,044-Speed 5199.99 samples/sec Loss 0.3925 LearningRate 0.0000 Epoch: 19 Global Step: 333550 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:39:42,020-Speed 5183.09 samples/sec Loss 0.3893 LearningRate 0.0000 Epoch: 19 Global Step: 333560 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:39:44,006-Speed 5158.01 samples/sec Loss 0.4021 LearningRate 0.0000 Epoch: 19 Global Step: 333570 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:39:46,002-Speed 5131.14 samples/sec Loss 0.3717 LearningRate 0.0000 Epoch: 19 Global Step: 333580 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:39:48,000-Speed 5127.18 samples/sec Loss 0.3870 LearningRate 0.0000 Epoch: 19 Global Step: 333590 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:39:50,011-Speed 5094.33 samples/sec Loss 0.3830 LearningRate 0.0000 Epoch: 19 Global Step: 333600 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:39:51,994-Speed 5166.16 samples/sec Loss 0.4053 LearningRate 0.0000 Epoch: 19 Global Step: 333610 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:39:53,967-Speed 5192.56 samples/sec Loss 0.4003 LearningRate 0.0000 Epoch: 19 Global Step: 333620 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:39:55,963-Speed 5130.09 samples/sec Loss 0.3981 LearningRate 0.0000 Epoch: 19 Global Step: 333630 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:39:57,934-Speed 5197.24 samples/sec Loss 0.3989 LearningRate 0.0000 Epoch: 19 Global Step: 333640 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:39:59,911-Speed 5181.17 samples/sec Loss 0.3929 LearningRate 0.0000 Epoch: 19 Global Step: 333650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:40:01,883-Speed 5194.35 samples/sec Loss 0.3780 LearningRate 0.0000 Epoch: 19 Global Step: 333660 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:40:03,861-Speed 5180.65 samples/sec Loss 0.3946 LearningRate 0.0000 Epoch: 19 Global Step: 333670 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:40:05,852-Speed 5142.55 samples/sec Loss 0.3813 LearningRate 0.0000 Epoch: 19 Global Step: 333680 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:40:07,842-Speed 5147.15 samples/sec Loss 0.3798 LearningRate 0.0000 Epoch: 19 Global Step: 333690 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:40:09,834-Speed 5144.52 samples/sec Loss 0.3885 LearningRate 0.0000 Epoch: 19 Global Step: 333700 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:40:11,910-Speed 4934.33 samples/sec Loss 0.3783 LearningRate 0.0000 Epoch: 19 Global Step: 333710 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:40:13,953-Speed 5014.66 samples/sec Loss 0.3873 LearningRate 0.0000 Epoch: 19 Global Step: 333720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:40:15,962-Speed 5098.89 samples/sec Loss 0.3959 LearningRate 0.0000 Epoch: 19 Global Step: 333730 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:40:17,939-Speed 5180.17 samples/sec Loss 0.3872 LearningRate 0.0000 Epoch: 19 Global Step: 333740 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:40:19,910-Speed 5197.12 samples/sec Loss 0.3654 LearningRate 0.0000 Epoch: 19 Global Step: 333750 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:40:21,889-Speed 5176.78 samples/sec Loss 0.3581 LearningRate 0.0000 Epoch: 19 Global Step: 333760 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:40:23,869-Speed 5172.67 samples/sec Loss 0.3919 LearningRate 0.0000 Epoch: 19 Global Step: 333770 Fp16 Grad Scale: 131072 Required: 0 hours Training: 2022-04-11 21:40:25,848-Speed 5174.63 samples/sec Loss 0.3861 LearningRate 0.0000 Epoch: 19 Global Step: 333780 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:40:27,817-Speed 5202.04 samples/sec Loss 0.3760 LearningRate 0.0000 Epoch: 19 Global Step: 333790 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:40:29,807-Speed 5147.78 samples/sec Loss 0.4021 LearningRate 0.0000 Epoch: 19 Global Step: 333800 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:40:32,049-Speed 4569.35 samples/sec Loss 0.3876 LearningRate 0.0000 Epoch: 19 Global Step: 333810 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-11 21:40:34,053-Speed 5111.70 samples/sec Loss 0.3918 LearningRate 0.0000 Epoch: 19 Global Step: 333820 Fp16 Grad Scale: 65536 Required: -0 hours